Collect lexicon and build n-gram dataset for NLP in Chinese github.com/tchaikov1 pointadulau15 years ago