Skip-gram negative sampling as (unshifted) PMI matrix factorization

In previous post, we arrived at two formulas showing the equivalence between SGNS and shifted PMI:

p(D|w,c) = \sigma(w \cdot c) = \frac{1}{1 + e^{-w \cdot c}}    (1)

p(D|w,c) = \frac{1}{1 + ke^{-\mathrm{PMI}(w,c)}}    (2)

Apparently, the reason for the “shift” is that in (1) there’s no while in (2) there is. The “shift” is not just an ugly patch in the formula but it might also have a negative effect on the quality of learned embeddings.

The solution is easy: replace \sigma(x) by \sigma_k(x)=\frac{1}{1+ke^{-x}} where k is the number of negative examples. This translates into the modification of line #698 in word2vec.c in the original Google’s source code into:

    expTable[i] = expTable[i] / (expTable[i] + negative);

This small change will give you small but consistent improvement on similarity benchmarks such as SimLex-999 (figure). Not too bad, right? Source code is available at Bitbucket repository.

Comparing original and modified negative sampling


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s