clintk.cat2vec.lasso_gridsearch module¶
The objective of this script is to select the best categories of a high cardinality categorical feature using LASSO penalization.
For the moment only binary/continuous logistic regression is implemented
>> reload_ext autoreload >> autoreload 2
-
clintk.cat2vec.lasso_gridsearch.
lr_coefficients
(path, features, targets, key, output_path, **kwargs)[source]¶ Performs categorical variable selection using L1-penalized logistic regression model
It only supports binary or continuous target for the moment
Parameters: - path (str) – input path or url for the dataframe
- features (str) – column name of the categorical column
- targets (str) – name of the target column in the df
- key (str) – key to group categorical variables
- output_path (str) – path to save the coefficients in a csv file
- kwargs – keyword arguments for the hyperparameter grid
Returns: - array
- the coefficients of the L1-logistic regression
Examples
>>> lr_coefficients('input.csv', 'medication_name', 'target', solver=['liblinear', 'saga'], C=np.logspace(-6, 2, 10))