clintk.utils.unfold module

unfolds merges dataframes into a big feature matrix All the features are labeled with a date and two keys for identification

Better explainations and schemas can be found on the repo wiki

class clintk.utils.unfold.Unfolder(key1, key2, feature, value, date, group_date=True, n_jobs=1)[source]

Bases: sklearn.base.BaseEstimator

Takes a dataframe[key1, key2, feature, value, date] to build a matrix of the parameters grouped by [key1, key2, date]

This object is to be used after a timeframe of the feature has been build to group them into a feature matrix. The idea is to facilitate the data preparation for a sequential learning task.

Parameters:
  • key1 (str) – primary key
  • key2 (str) – secondary key
  • feature (str) – name of the feature
  • value (float) – value of the feature feature
  • date (datetime) – date at which feature was measured
  • group_date (bool, default=True) – set True to use date column as key to group data
n_jobs : int
number of CPUs to use for computation. If -1, all the available cores are used
fit(df)[source]

saves dataframe for multiprocessing convenience

Parameters:df (pandas.DataFrame)
Returns:
Return type:self
unfold()[source]

performs the unfolding transformation

Returns:The dataframe that contains the added feature columns Rows are ordered by [key1, key2, date] for convenience
Return type:pandas.DataFrame
clintk.utils.unfold.transform_and_label(df, key1, key2, date, feature, value, estimator, return_estimator=False, **kwargs)[source]

Takes dataframe as input, applies transformation on value column and returns df with a new columns of the transformed feature

The transformation returns a copy of the input dataframe

Only implements unsupervised transformation

@TODO keep sparse representation to unfold data

Parameters:
  • df (pandas.DataFrame) – should contain only one unique value in its feature column
  • feature (str) – features names column
  • value (str) – features values column
  • estimator (sklearn.BaseEstimator) – sklearn compatible transformer that implements .fit() and .transform() methods
  • return_estimator (bool) – if true, returns the trained estimator
  • **kwargs (additional keyword arguments for estimator object)
Returns:

same as df with additional rows for the transformed feature

Return type:

pandas.DataFrame