# Reproducing Chen & Manning (2014)

Neural dependency parsing is attractive for several reasons: first, distributed representation generalizes better, second, fast parsing unlocks new applications, and third, fast training means parsers can be co-trained with other NLP modules and integrated into a bigger system.

Chen & Manning (2014) from Stanford were the first to show that neural dependency parsing works and Google folks were quick to adopt this paradigm to improve the state-of-the-art (e.g. Weiss et al., 2015).

Though Stanford open-sourced their parser as part of CoreNLP, they didn’t release the code of their experiments. As anybody in academia probably knows, reproducing experiments is non-trivial, even extremely difficult at times. Since I have painstakingly gone through the process, I think it’s a good idea to share with you.

# Skip-gram negative sampling as (unshifted) PMI matrix factorization

In previous post, we arrived at two formulas showing the equivalence between SGNS and shifted PMI:

$p(D|w,c) = \sigma(w \cdot c) = \frac{1}{1 + e^{-w \cdot c}}$    (1)

$p(D|w,c) = \frac{1}{1 + ke^{-\mathrm{PMI}(w,c)}}$    (2)

Apparently, the reason for the “shift” is that in (1) there’s no while in (2) there is. The “shift” is not just an ugly patch in the formula but it might also have a negative effect on the quality of learned embeddings. Continue reading

# A new proof of the equivalence of word2vec’s SGNS and Shifted PMI

[removed section]

At the heart of the argument was Levy and Goldberg’s proof that minimizing the loss of Skip-gram negative sampling (SGNS) is effectively approximating a shifted PMI matrix. Starting with the log-likelihood, they worked their way to local objective for each word-context pair and compare its derivative to zero to arrive at a function of PMI. One might rightly question if the loss function is essential in this proof or there is a deeper link between the two formalizations?

# Similarity, co-occurrence, functional relation, part-whole relation, subcategorization, what else?

In word sense disambiguation and named-entity disambiguation, an important assumption is that a document consists of related concepts and entities.

There are millions of concepts and entities, what makes some related but not others? This question is difficult and I don’t have the definitive answer. But it is a good start to list some classes of relatedness. Continue reading

# Knowledge base completion 101

Knowledge base completion (KBC) is not a standard task in natural language processing nor in machine learning. A search on Google scholar results in only over 100 article containing this phrase. Although it is similar to link prediction, “a long-standing challenge in modern information science” (Lü & Zhou, 2011), it has received much less attention.

However KBC is potentially an important step towards natural language understanding and recent advances in representation learning have enabled researchers to learn larger datasets with improved precision. Actually, a half of KBC articles were published in or after 2010. Continue reading

# Intuition, knowledge base completion and natural language understanding

Everybody has experienced the gut feeling that we “know” something but can’t explain. It might be that a new classmate will be your best friend or there is something wrong in a situation. Human intuition plays an important role in our everyday lives as well as in business, science and technological innovation.

# Natural language understanding: Will we ever get there?

When I was in high school in Vietnam, my teachers insisted that the sole goal of high school years is to get us into a good university. As many other Asian countries, there was an extremely competitive national college entrance examination for which girls, boys and their parents would pay any price to win. For two years, my friends and I went to night classes five days a week and spent the summers in classroom. I was accepted to a prestigious university but never used anything I had learnt again.