clintk.utils.fold module¶
As data may come from different sources, it is best to retrieve all the bases into one single dataframe that would enables fetching the features very easily, as well as the dates at which the events/measures occured.
Doing so allows to retrieve the full timelines of the patients and therefore complete various tasks.
The objective of this module is to parse the databases available in order to have each one of them organized as
key1 | key2 | feature_name | value | date
-
class
clintk.utils.fold.
Folder
(key1, key2, features, date, n_jobs=1)[source]¶ Bases:
object
This object enables “unfolding” the features of a DataFrame, which means for a df that has 5 feature columns for instance, the unfolding would result in two feature columns: one is for the feature name and the other is the feature value.
All the attributes are column names to indicate how to unfold the dataframe
Parameters: - key1 (str) – indicator of the primary key indicator
- key2 (str, (optionnal?)) – secondary key
- features (list) – column names that contain the features
- date (str) – name of the date column,
- n_jobs (int) – number of CPUs to use for computation. If -1, all the available cores are used
-
fold
(df_base)[source]¶ Parameters: df_base (pandas DataFrame) Returns: columns are [key1, key2, feature, value, date] where feature contains the features names and values are the values. Return type: pandas.DataFrame Examples
>>> df = pd.DataFrame({'id1': [1, 2, 3], 'id2': ['id1', 'id2', 'id3'], ... 'feature_a': [0, 0.3, 1.4], ... 'date': ["12122012", "12122012","12122012"]}) >>> folder = fold.Folder('id1', 'id2', ['feature_a'], 'date') >>> folded = folder.fold(df) >>> print(folded) id1 id2 feature value date 0 1 id1 feature_a 0.0 12122012 1 2 id2 feature_a 0.3 12122012 2 3 id3 feature_a 1.4 12122012 For several features: >>> df['feature_b'] = [-1, 1, 0] >>> folder = fold.Folder('id1', 'id2', ['feature_a', 'feature_b'], ... 'date') >>> folded = folder.fold(df) >>> print(folded) id1 id2 feature value date 0 1 id1 feature_a 0.0 12122012 1 1 id1 feature_b -1.0 12122012 2 2 id2 feature_a 0.3 12122012 3 2 id2 feature_b 1.0 12122012 4 3 id3 feature_a 1.4 12122012 5 3 id3 feature_b 0.0 12122012