clintk.utils.unfold module¶
unfolds merges dataframes into a big feature matrix All the features are labeled with a date and two keys for identification
Better explainations and schemas can be found on the repo wiki
-
class
clintk.utils.unfold.
Unfolder
(key1, key2, feature, value, date, group_date=True, n_jobs=1)[source]¶ Bases:
sklearn.base.BaseEstimator
Takes a dataframe[key1, key2, feature, value, date] to build a matrix of the parameters grouped by [key1, key2, date]
This object is to be used after a timeframe of the feature has been build to group them into a feature matrix. The idea is to facilitate the data preparation for a sequential learning task.
Parameters: - key1 (str) – primary key
- key2 (str) – secondary key
- feature (str) – name of the feature
- value (float) – value of the feature feature
- date (datetime) – date at which feature was measured
- group_date (bool, default=True) – set True to use date column as key to group data
- n_jobs : int
- number of CPUs to use for computation. If -1, all the available cores are used
-
clintk.utils.unfold.
transform_and_label
(df, key1, key2, date, feature, value, estimator, return_estimator=False, **kwargs)[source]¶ Takes dataframe as input, applies transformation on value column and returns df with a new columns of the transformed feature
The transformation returns a copy of the input dataframe
Only implements unsupervised transformation
@TODO keep sparse representation to unfold data
Parameters: - df (pandas.DataFrame) – should contain only one unique value in its feature column
- feature (str) – features names column
- value (str) – features values column
- estimator (sklearn.BaseEstimator) – sklearn compatible transformer that implements .fit() and .transform() methods
- return_estimator (bool) – if true, returns the trained estimator
- **kwargs (additional keyword arguments for estimator object)
Returns: same as df with additional rows for the transformed feature
Return type: pandas.DataFrame