Research Interests: Information retrieval, computational finance
Office Phone: 86-10-6276 5815-8005
Yan, Hongfei is an associate professor at the Institute of Network Computing and Information Systems, Department of Computer Science, School of EECS, Peking University. He obtained a B.Sc. and a M.Sc. degree in CS from Harbin Engineering University in 1996 and 1999 respectively. He received a Ph.D. degree in CS from Peking University in 2002. His research interests include information retrieval and computational finance.
Dr. Yan has published more than 60 research papers, and most of them are published in top-tier conferences, such as SIGIR, WSDM, KDD, EMNLP and ACL. He was awarded the second prize of Beijing Science and Technology Progress (2004), and the second prize of China Computer Federation Science and Technology (2016).
Dr. Yan has more than five research projects including NSFC, Core-High-Basic programs ("core electronic devices, high-end general chips and basic software products" National Science and technology major projects), 863 project, etc. His research achievements are summarized as follows:
1) Scalable event detection: Mining retrospective events from text streams has been an important research topic. Classic text representation model (i.e., vector space model) cannot model temporal aspects of documents. To address it, he proposed a novel burst-based text representation model, denoted as BurstVSM. BurstVSM corresponds dimensions to bursty features instead of terms, which can capture semantic and temporal information. Meanwhile, it significantly reduces the number of non-zero entries in the representation. He test it via scalable event detection, and experiments in a 10-year news archive show that his methods are both effective and efficient.
2) Event discovery and retrieval on multi-type historical data: He present EventSearch, a system for event extraction and retrieval on four types of news-related historical data, i.e., Web news articles, newspapers, TV news program, and micro-blog short messages. The system incorporates over 11 million web pages extracted from "Web InfoMall", the Chinese Web Archive since 2001. The newspaper and TV news video clips also span from 2001 to 2011. The system, upon a user query, returns a list of event snippets from multiple data sources. A novel burst model is used to discover events from time-stamped texts. In addition to offline event extraction, his system also provides online event extraction to further meet the user needs. EventSearch provides meaningful analytics that synthesize an accurate description of events. Users interact with the system by ranking the identified events using different criteria (scale, recency and relevance) and submitting their own information needs in different input fields.
Architectural design and evaluation of an efficient Web-crawling system: He presents an architectural design and evaluation result of an efficient Web-crawling system. The design involves a fully distributed architecture, a URL allocating algorithm, and a method to assure system scalability and dynamic reconfigurability. Simulation experiment shows that load balance, scalability and efficiency can be achieved in the system. This distributed Web-crawling subsystem has been successfully integrated with WebGather, a well-known Chinese and English Web search engine, aimed at collecting all the Web pages in China and keeping pace with the rapid growth of Chinese Web information. In addition, he believe that the design can also be useful in other context such as digital library, etc.