Gensim
Installation
Testing whether the fast version is installed:
1 2 |
|
Models
Phrases
This model detects multi-word phrases that can be grouped, such as new_york_times
. Can be used as a preprocessor for word2vec or doc2vec models.
1 2 |
|
word2vec
Vocab
object contains a word and its frequency (count
) and other properties (e.g. sample_int
is used for sampling purpose)
Let V as the size of the vocabulary and N as the dimension of the hidden layer (vector dimension).
model.syn0
: matrix.model.syn0[wordindex]
returns the word vector.
doc2vec
Doc2Vec class
_do_train_job(self, job, alpha, inits)
: job
is just sentences.
DocvecsArray
the document vectors are stored in this object.
indexed_doctags(self, doctag_tokens)
: given doctag_tokens
(a list of document tags), return (integer index, doctag_syn0
, self.doctag_syn0_lockf
, doctag_tokens
).