My project experience is more focused on the traditional search engine techniques and basic nlp problems.
Search Engine
In search engine, I have implemented query operators and documents ranking with two unranked Boolean retrieval algorithms, two ranking documents algorithms named BM25 and Indri, pseudo relevance feedback, feature and test learning to rank(LeToR), diversified ranking algorithms,which are based on Lucene API and Java programing language. For further learning, Elastic Search will be one key important issue in industry. For the search engine domain, I know the knowledge about web crawling, text representation, search engine indexes, query structure, unsupervised ranking, feature-based ranking, federated & vertical search, page features, evaluation, search log analysis, diversity, personalization, enterprise search, etc. For the algrithm details, see my cheat sheet of search engine.
NLP
When talking about natural languae processing, it’s more about automate analyze, generate and acquisite of human language. In the domain of NLP, I knew the basic ideas of NLP. For the NLP cheatsheet, see here for more detals. In project, I have written language model n-gram and Native Bayes Gaussian to classify the topic category, question answer system implementation with co-reference resolution, NER, question reform, question classification and select candidates with tf-idf, this project is based on nltk NLP tool.
