I got 52% play Phrase Detective on Facebook. How could I get a PhD in Natural Language Processing?
Just kidding, I’m not worrying at all about graduation but just a bit surprised by some features of the game. I’m studying the possibility of running a crowd-sourcing task on coreference resolution so I’m very much interested in how to do crowd-sourcing properly. Please tell me what you think in the comment section! Continue reading
A quick note from EACL: some papers related to LSDSem workshop (Bugert et al. 2017; Zhou et al. 2015) use McNemar’s test to establish statistical significance and I find it very odd.
McNemar’s test examine “marginal (probability) homogeneity” which in our case is whether two systems yield (statistically) the same performance. According to the source code I found on Github, the way it works is:
- Obtain predictions of System 1 and System 2
- Compare them to gold labels to fill this table:
- Compute the test statistics: and p-value
- If p-value is less than a certain level (e.g. the magical 0.05), we reject the null hypothesis which is p(Sys1 correct) == p(Sys2 correct)
As it happens in the papers, the difference is statistically significant and therefore results are meaningful. Happy?
Not so fast. Continue reading
Last week, I had a good time at CLIN 27. The city was pretty, with a cute morning market and tasty croissants. The snow was kind to me (sometimes I does like snow if it is gentle). The poem presentation from Tim van de Cruys was funny and I met some old friends. I brought to CLIN my own side project where I explore, explain and (slightly) improve word2vec: