clintk.utils.unfold module¶

unfolds merges dataframes into a big feature matrix All the features are labeled with a date and two keys for identification

Better explainations and schemas can be found on the repo wiki

class clintk.utils.unfold.Unfolder(key1, key2, feature, value, date, group_date=True, n_jobs=1)[source]¶

Bases: sklearn.base.BaseEstimator

Takes a dataframe[key1, key2, feature, value, date] to build a matrix of the parameters grouped by [key1, key2, date]

This object is to be used after a timeframe of the feature has been build to group them into a feature matrix. The idea is to facilitate the data preparation for a sequential learning task.

Parameters:	key1 (str) – primary key key2 (str) – secondary key feature (str) – name of the feature value (float) – value of the feature feature date (datetime) – date at which feature was measured group_date (bool, default=True) – set True to use date column as key to group data

n_jobs : int: number of CPUs to use for computation. If -1, all the available cores are used

fit(df)[source]¶

saves dataframe for multiprocessing convenience

Parameters:	df (pandas.DataFrame)
Returns:
Return type:	self

unfold()[source]¶

performs the unfolding transformation

Returns:	The dataframe that contains the added feature columns Rows are ordered by [key1, key2, date] for convenience
Return type:	pandas.DataFrame

clintk.utils.unfold.transform_and_label(df, key1, key2, date, feature, value, estimator, return_estimator=False, **kwargs)[source]¶

Takes dataframe as input, applies transformation on value column and returns df with a new columns of the transformed feature

The transformation returns a copy of the input dataframe

Only implements unsupervised transformation

@TODO keep sparse representation to unfold data

Parameters:	df (pandas.DataFrame) – should contain only one unique value in its feature column feature (str) – features names column value (str) – features values column estimator (sklearn.BaseEstimator) – sklearn compatible transformer that implements .fit() and .transform() methods return_estimator (bool) – if true, returns the trained estimator *kwargs (additional keyword arguments for estimator object*)
Returns:	same as df with additional rows for the transformed feature
Return type:	pandas.DataFrame