clintk.cat2vec.tools module¶
sample script for categorical encoding
-
clintk.cat2vec.tools.
normalize_cat
(X, strat='tokens')[source]¶ normalize categories in a
Parameters: - X (iterable)
- strat (str, (‘tokens’, ‘strings’), default=’tokens”) – if ‘tokens’, words in a category are kept split (use this for embedding categories by a nlp aproach) if ‘strings’, each category is considered as a single word
Returns: same size as input, each entry corresponding to the normalized category name
Return type: pandas.Series