Two ideas of error control in natural language processing

While I was still active in NLP, one thing that struck me was how hard it was to interpret errors. When people make errors, for example, while reading garden-path sentences, it’s easy to see why. Maybe the verb is used in an uncommon way or we are too eager to connect words together. But the errors of a statistical parser don’t make any sense. They appeared totally random, at least to me.

The multi-layered approach to language analysis has a rich tradition but little real-world application. These days, people in industry use neural end-to-end models. One reason for that is that the performance on many tasks is still lower than it needs to be. A syntactic parser with 92% accuracy sounds great but, assuming each sentence has 20 tokens, it only gets a sentence completely correct 0.9220=19% of the time. Errors throw off further processing and we don’t know what to do with them. We can’t even detect them so far.

Error control seems to be the most pressing neglected problem in NLP. If we could detect and ignore stupid errors, we can have much more confidence in the rest of the analysis (i.e. trading recall for accuracy) so more applications will open. Thresholding, where we remove predictions with scores under a certain bar, is an obvious choice. So, calibrating the scores or probabilities of models to match the empirical chance of success is a worthy research direction (which is, sadly, also under-researched.) I’d like to add to that two more ideas for error control borrowed from telecommunication.

Continue reading

Replicable by design

At EACL last year, I had a lengthy chat with a guy next to his poster about the (ir)replicability of some high-profile papers in information retrieval [1]. During some 5 years of research that I’ve gone through, I also often ran into reproducibility problems. Probably many PhD students out there have relatable experiences.

Obviously, researchers should take full responsibility to produce replicable research. But we should also recognize the underlying systemic issue. Researchers are not rewarded to make their work repeatable. Once a paper is accepted, you are already in the middle of a new one so there’s no time to make your old code re-runable (if that’s possible at all). Added to that, the likelihood (or threat) of your work being reproduced is terribly small. There are not many reports of reproducibility problem in NLP and retracted papers are non-existent. While big conferences are starting to address this problem (COLING 2018 has a track for reproduction and LREC 2018 also mentions “replicability and reproducibility issues”), I suspect it will take years for the effect to be felt.

In the meantime, what we could do is to align the effort to the incentive. Ideally, it should take no extra work to make your research replicable. The solution, I think, is to make experiments replicable by design. Continue reading

Phrase Detectives caught me by surprise

Screen Shot 2017-07-13 at 16.48.42

I got 52% play Phrase Detective on Facebook. How could I get a PhD in Natural Language Processing?

Just kidding, I’m not worrying at all about graduation but just a bit surprised by some features of the game. I’m studying the possibility of running a crowd-sourcing task on coreference resolution so I’m very much interested in how to do crowd-sourcing properly. So these are the things that I found surprising: Continue reading

Notes on machine learning and exceptions

Statistical machine learning has been the de-facto standard in NLP research and practice. However, its very success might be hiding its the problems. One such problem is exceptions.

Natural language is full of exceptions: idiomatic phrases that defy compositionality, irregular verbs and exceptions to grammatical rules, or unexpected events that, though not linguistic phenomena themselves, happen to be communicated via language. So far, statistical NLP has treated them as inconvenient oddity and, in most cases, swept them under the rug, hoping that they wouldn’t reduce F-score.

But a system doesn’t really understand language without handling exceptions and I will argue that (not) handling exceptions has important consequences to machine learning. Continue reading

Knowledge base completion 101

Knowledge base completion (KBC) is not a standard task in natural language processing nor in machine learning. A search on Google scholar results in only over 100 article containing this phrase. Although it is similar to link prediction, “a long-standing challenge in modern information science” (Lü & Zhou, 2011), it has received much less attention.

However KBC is potentially an important step towards natural language understanding and recent advances in representation learning have enabled researchers to learn larger datasets with improved precision. Actually, a half of KBC articles were published in or after 2010. Continue reading

Intuition, knowledge base completion and natural language understanding

Everybody has experienced the gut feeling that we “know” something but can’t explain. It might be that a new classmate will be your best friend or there is something wrong in a situation. Human intuition plays an important role in our everyday lives as well as in business, science and technological innovation.

Continue reading

Natural language understanding: Will we ever get there?

When I was in high school in Vietnam, my teachers insisted that the sole goal of high school years is to get us into a good university. As many other Asian countries, there was an extremely competitive national college entrance examination for which girls, boys and their parents would pay any price to win. For two years, my friends and I went to night classes five days a week and spent the summers in classroom. I was accepted to a prestigious university but never used anything I had learnt again.

Continue reading