When I was in high school in Vietnam, my teachers insisted that the sole goal of high school years is to get us into a good university. As many other Asian countries, there was an extremely competitive national college entrance examination for which girls, boys and their parents would pay any price to win. For two years, my friends and I went to night classes five days a week and spent the summers in classroom. I was accepted to a prestigious university but never used anything I had learnt again.
What did I learn from this experience? It was all about choosing the right goal. Later on when preparing for TOEFL, I ignored most of the strategies, tips and tricks to improve the score. Instead, I put all my effort on improving my English. I got just enough score for a scholarship and the skills I acquired back then is still with me today.
In my limited view of the literature, the NLP community is facing the same problem. There is a lot of evaluation data for specific subtasks and researchers continue to create more of them. The effort goes into increasing some percentage of accuracy, F-score or whatever measure it might be. Unless the goal of NLP is to optimize for each ad-hoc subtasks, we can hardly say that we made much progress over the last decade.
I am not going to say that NLP researchers are all idiots. They are brilliant and the amount of knowledge they created goes beyond what I can comprehend. But I can’t help thinking that they are optimizing wrong objective functions.
Easier said than done, you may think, how does that freaking correct objective function look like? What function can assign a greater number to a “more understanding” machine? The bad news is that there is no such function, just like no standardized test will ever measure a student’s knowledge properly. People keep taking those tests not because they are correct, but because there are no better options.
How to progress without an objective function? I think it is the time for NLP to come back to its science town after decades wandering in the engineering jungle. That means creating falsifiable hypotheses about what a language understanding machine might be, challenging those hypotheses with real data and eliminating the unfitted. Such an approach will play down dysfunctional objective functions and focus research effort to a more promising area.
What are those hypotheses? How to test them? How much will it help? What answers (or questions) will they bring? Those are questions I will need to address during my four years in VU Amsterdam. I will keep you updated 😉