How To Perform Efficient Queries With Gensim Doc2vec?
I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gen
Solution 1:
Creating your own subset of vectors, as a KeyedVectors instance, isn't quite as easy as it could or should be. 
But, you should be able to use a WordEmbeddingsKeyedVectors (even though you're working with doc-vectors) that you load with just the vectors of interest. I haven't tested this, but assuming d2v_model is your Doc2Vec model, and list_of_tags are the tags you want in your subset, try something like:
subset_vectors = WordEmbeddingsKeyedVectors(vector_size)
subset_vectors.add(list_of_tags, d2v_model.docvecs[list_of_tags])
Then you can perform the usual operations, like most_similar() on subset_vectors.
Post a Comment for "How To Perform Efficient Queries With Gensim Doc2vec?"