I believe that every scientific experiment should be reproducible. Therefore, I’ve done my best to ensure that all of my papers are fully reproducible (with all results, not only the best ones). Besides, whenever I take time to reproduce others’ experiments, I share them with the community.
Stanford Neural Dependency Parsing Experiment
A fast reimplementation of Stanford neural parser in Lua/Torch7. Using a GPU, training takes only 1.5 hour instead of 8 hours of Stanford’s Java implementation. Nitty-gritty details of Chen and Manning (2014) are readily implemented. See this post for details.
Reinforcement Learning and Error Propagation (EACL 2017)
This codebase extends Stanford neural parser (written in Java) by adding reinforcement learning and measurement of error propagation. We found that reinforcement learning reduces error propagation and improves performance. The repo is hosted on Bitbucket.
BabelFy is state-of-the-art software in word sense disambiguation and entity linking (as of 2016) but the source code is proprietary. One can only use it via an API, whether for a very limited amount of documents or with a fee. We attempted to reimplement BabelFy but didn’t have enough resources (time, RAM, CPU hours). We open the source code anyway with the hope that somebody will continue the work. CLIN 26 presentation – Github repo
Vietnamese Text Processing
- Tokenizer based on vnTokenizer
- POS tagger
- Rule-based named entity recognizer (NER) implemented in GATE
- Orthomatcher and Co-referencer for Vietnamese names
- Clause recognizer (incubating)
- Tool for annotating and linking in GATE
Lienkate is a Java library for parsing Vietnamese text using link grammar formalism but it can also parse an arbitrary language given a link grammar dictionary. In addition to basic features of a link grammar parser, it provides some utilities:
- Link grammar expression to automaton converter
- Deterministic parser for link grammar (incubating)