clintk.utils.unfold module¶
unfolds merges dataframes into a big feature matrix All the features are labeled with a date and two keys for identification
Better explainations and schemas can be found on the repo wiki
- 
class 
clintk.utils.unfold.Unfolder(key1, key2, feature, value, date, group_date=True, n_jobs=1)[source]¶ Bases:
sklearn.base.BaseEstimatorTakes a dataframe[key1, key2, feature, value, date] to build a matrix of the parameters grouped by [key1, key2, date]
This object is to be used after a timeframe of the feature has been build to group them into a feature matrix. The idea is to facilitate the data preparation for a sequential learning task.
Parameters: - key1 (str) – primary key
 - key2 (str) – secondary key
 - feature (str) – name of the feature
 - value (float) – value of the feature feature
 - date (datetime) – date at which feature was measured
 - group_date (bool, default=True) – set True to use date column as key to group data
 
- n_jobs : int
 - number of CPUs to use for computation. If -1, all the available cores are used
 
- 
clintk.utils.unfold.transform_and_label(df, key1, key2, date, feature, value, estimator, return_estimator=False, **kwargs)[source]¶ Takes dataframe as input, applies transformation on value column and returns df with a new columns of the transformed feature
The transformation returns a copy of the input dataframe
Only implements unsupervised transformation
@TODO keep sparse representation to unfold data
Parameters: - df (pandas.DataFrame) – should contain only one unique value in its feature column
 - feature (str) – features names column
 - value (str) – features values column
 - estimator (sklearn.BaseEstimator) – sklearn compatible transformer that implements .fit() and .transform() methods
 - return_estimator (bool) – if true, returns the trained estimator
 - **kwargs (additional keyword arguments for estimator object)
 
Returns: same as df with additional rows for the transformed feature
Return type: pandas.DataFrame