Gensim
Installation
Testing whether the fast version is installed:
1 2 | |
Models
Phrases
This model detects multi-word phrases that can be grouped, such as new_york_times. Can be used as a preprocessor for word2vec or doc2vec models.
1 2 | |
word2vec
Vocab object contains a word and its frequency (count) and other properties (e.g. sample_int is used for sampling purpose)
Let V as the size of the vocabulary and N as the dimension of the hidden layer (vector dimension).
model.syn0: matrix.model.syn0[wordindex]returns the word vector.
doc2vec
Doc2Vec class
_do_train_job(self, job, alpha, inits): job is just sentences.
DocvecsArray
the document vectors are stored in this object.
indexed_doctags(self, doctag_tokens): given doctag_tokens (a list of document tags), return (integer index, doctag_syn0, self.doctag_syn0_lockf, doctag_tokens).