clintk.cat2vec.tools module¶

sample script for categorical encoding

clintk.cat2vec.tools.normalize_cat(X, strat='tokens')[source]¶

normalize categories in a

Parameters:	X (iterable) strat (str, (‘tokens’, ‘strings’), default=’tokens”) – if ‘tokens’, words in a category are kept split (use this for embedding categories by a nlp aproach) if ‘strings’, each category is considered as a single word
Returns:	same size as input, each entry corresponding to the normalized category name
Return type:	pandas.Series