Word2Vec / Glove / Fasttext - How to implement and use them

In this post we will understand basic concepts of word2vec and see how to implement and use it.

Previously we have seen word embedding models like Count Vector/TfIDF. While these models are useful they are simply based frequency of words. These models loose most the language characteristics and meanings of the words. Word2Vec is a model in which words are converted into a vector space in which similar words are close by vectors.

This means that word vectors have actual meaning and are not just random numbers. Since word vectors which are numbers now have actual meaning we are able to add or delete actual words to find new words!

This is very counter intuitive at first but once you understand the model in detail its becomes very exciting to play around with it.

Below of some of must read articles to understand word2vec in detail

https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469

http://jalammar.github.io/illustrated-word2vec/

An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec

If you have read the above articles properly, two things would be clear. We can either use pre-trained word2vec models like gloVe or generate our own word2vec model on our data set using a library like genisim.

Let’s see how to implement both of them.

Word Embeddings in Python with Spacy and Gensim

https://www.machinelearningplus.com/nlp/gensim-tutorial/

Fasttext.cc are also similar vector produced by facebook you can read about them here https://fasttext.cc/docs/en/english-vectors.html

and to use it in code follow this https://stackoverflow.com/questions/50828314/how-does-the-gensim-fasttext-pre-trained-model-get-vectors-for-out-of-vocabulary

https://shuzhanfan.github.io/2018/08/understanding-word2vec-and-doc2vec/

General playing around

https://colab.research.google.com/drive/1W7_F0JaU6Xyhfyyi3Sq_QkTTfsrz7sCx

Word2Vec / Glove / Fasttext – How to implement and use them

Like this:

Share this:

Like this: