clintk.text2vec.transformers module¶
object classes for sklearn pipeline compatibility
- 
class 
clintk.text2vec.transformers.AverageWords2Vector(n_components=128)[source]¶ Bases:
sklearn.base.BaseEstimatortrains a unsupervised word2vec model, and then fold text data according to it This function is only for convenience in using word2vec in a pipeline
Parameters: n_components (int, default=128) – dimension of the embedding vector - 
fit(parsed_reports, y=None, **kwargs)[source]¶ Trains the word2vec model with given corpus as input
Parameters: - parsed_reports (iterable of iterables) – contains parsed tokenized reports
 - y (None)
 - **kwargs – additionnal arguments to pass to gensim.Word2Vec (see appropriate documentation for details)
 
- 
fit_pretrained(path, **kwargs)[source]¶ fits a pretrained model from https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
Parameters: path (str) – path to the model 
- 
 
- 
class 
clintk.text2vec.transformers.Text2Vector(n_components=128, dm=1, window=3)[source]¶ Bases:
sklearn.base.BaseEstimatorimplementation of Doc2Vec model adapted to sklearn for hyperparameters tuning
- 
fit(reports, y=None, **kwargs)[source]¶ tags reports (for gensim’s model consistence) and trains Doc2Vec model on the corpus
Parameters: - reports (iterable of iterables) – list of tokenized reports
 - y (not used, default=None)
 - **kwargs – additionnal arguments to pass to gensim.Word2Vec (see appropriate documentation for details)
 
-