dabl
.EasyPreprocessor¶
-
class
dabl.
EasyPreprocessor
(scale=True, force_imputation=True, verbose=0, types=None)[source]¶ A simple preprocessor
Detects variable types, encodes everything as floats for use with sklearn.
Applies one-hot encoding, missing value imputation and scaling.
- Parameters
- scaleboolean, default=True
Whether to scale continuous data.
- force_imputationbool, default=True
Whether to create imputers even if not training data is missing.
- verboseint, default=0
Control output verbosity.
- Attributes
- ct_ColumnTransformer
Main container for all transformations.
- columns_pandas columns
Columns of training data
- dtypes_Series of dtypes
Dtypes of training data columns.
- types_something
Inferred input types.
-
__init__
(scale=True, force_imputation=True, verbose=0, types=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(X, y=None)[source]¶ A reference implementation of a fitting function for a transformer.
- Parameters
- Xarray-like or sparse matrix of shape = [n_samples, n_features]
The training input samples.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
Returns self.
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
- Xnumpy array of shape [n_samples, n_features]
Training set.
- ynumpy array of shape [n_samples]
Target values.
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newnumpy array of shape [n_samples, n_features_new]
Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfobject
Estimator instance.