AI/ML

Combine transformers with imperio CombinatorTransformer

Sigmoid

Jul 13, 2021 • 2 min read

Very often we would like to combine different transformations on different columns of our feature matrix in a pipeline. That need motivated us to a special module that gives you this opportunity in imperio — CombinatorTransformer.

How CombinatorTransformer works?

CombinatorTransformer allows you to apply a specific feature transformation on a specific set of columns. Mostly it was created to apply a transformation to numerical columns and another to categorical columns as shown below:

How to use CombinatorTransformer.

All transformers from imperio follow the transformers API from sci-kit-learn, which makes them fully compatible with sci-kit learn pipelines. First, if you didn’t install the library, then you can do it by typing the following command:

pip install imperio

To create a CombinatorTransformer you will need first to import some transformers. For example, I will import the FrequencyImputationTransformer for categorical columns and BoxCoxTransformer for numerical ones. Next, you must set up the CombinatorTransformer like this.

from imperio import FrequencyImputationTransformer, BoxCoxTransformer, CombinatorTransformercombinator = CombinatorTransformer(
          num_index = [0, 3, 4, 7, 9],
          cat_index = [1, 2, 5, 6, 8, 10, 11, 12],
          num_transformer = BoxCoxTransformer(),
          cat_transformer = FrequencyImputationTransformer()
)

Following the sci-kit learn API for transformers, you can fit a transformer and make some transformations:

combinator.fit(X_train, y_train)
X_transformed = combinator.transform(X_train)

Even more, you can fit and transform the data at the same time using the fit_transform function:

X_trainsformed = combinator.fit_transform(X_train, y_train)

Besides the general sci-kit learn API, you can use the apply function to apply the transformer on a pandas Data Frame:

new_df = combinator.apply(df,
                          target = 'target',
                          num_columns = ['num1', 'num2'],
                          cat_columns = ['cat1', 'cat2']
)

The CombinatorTransformer constructor has the following arguments:

cat_index (list, default = None): A parameter that specifies the list of indexes of categorical columns that the categorical transformer will be applied on.
num_index (list, default = None): A parameter that specifies the list of indexes of numerical columns that the numerical transformer will be applied on.
cat_transformer (default = None): The sklearn or imperio transformer to apply on categorical columns.
num_transformer (default = None): The sklearn or imperio transformer to apply on numerical columns.

The apply function has the following arguments.

df (pd.DataFrame): The pandas DataFrame on which the transformer should be applied.
target (str): The name of the target column.
cat_columns (list, default = None): The list with the names of categorical columns on which the categorical transformer should be applied.
num_columns (list, default = None): The list with the names of numerical columns on which the numerical transformer should be applied.

Below you can see an example of a Pipeline built with CombinatorTransformer and under it the comparison of confusion matrices of a simple LogisticRegression and the pipeline. These additional modules gave us 4% to the accuracy of the model on the Heart Disease UCI data set.

pipe = Pipeline(
    [
      ('combinator', CombinatorTransformer(
          num_index = [0, 3, 4, 7, 9],
          cat_index = [1, 2, 5, 6, 8, 10, 11, 12],
          num_transformer = BoxCoxTransformer(),
          cat_transformer = FrequencyImputationTransformer()
      )),
        ('scaler', StandardScaler()),
        ('model', LogisticRegression())
    ]
)

Made with ❤ by Sigmoid.

Discussion

Community guidelines

How CombinatorTransformer works?

How to use CombinatorTransformer.

Discussion

Sign up for more like this.