clintk.cat2vec.tools module

sample script for categorical encoding

clintk.cat2vec.tools.normalize_cat(X, strat='tokens')[source]

normalize categories in a

Parameters:
  • X (iterable)
  • strat (str, (‘tokens’, ‘strings’), default=’tokens”) – if ‘tokens’, words in a category are kept split (use this for embedding categories by a nlp aproach) if ‘strings’, each category is considered as a single word
Returns:

same size as input, each entry corresponding to the normalized category name

Return type:

pandas.Series