Research Interests: IR, web search, NLP
Office Phone: 86-10-6276 5835-102
Sun, Bin is an associate professor at the Department of Computer Sci. and Tech., School of EECS. He obtained his Ph.D. in Computer Sci. in 2000 and Master of Theoretical Physics in 1993, both from Peking Univ. (PKU). He joined the Institute of Computational Linguistics, PKU in 2002, and served as the Administrative Director of the Institute from 2004 to 2010, and is the deputy director of the Join Research Center of Fujitsu-PKU Information Tech. His research interests include Chinese processing, information retrieval, Web search, network protocols and software methodologies.
Dr. Sun published more than 60 research papers in related technical conferences and journals, including Asia IR Symposiums, China Conf. IR, WWW Conferences, and J. CIS, Chinese J. Computers, ACM SIGPLAN Notices, Physical Review D. He has also been issued 5 invention patents. He was jointly awarded a Second Prize for National Science Progress of China (2011), a First Prize for Information Technology of the Chinese Institute of Electronics (2010), and a First Prize for Science and Technology of the Ministry of Education, China (2008).
He served in the Chinese Information Processing Society of China as council member, and is a committee member of the Professional IR Committee. He also served in the technical program committees of various conferences including AIRS and CCIR.
Dr. Sun has accomplished more than 10 independent research projects as PI, as well as other joint projects, that were funded by the NSFC China, the 863 Plan, etc.
Major results and achievements from his research works are summarized as follows:
A novel approach and architecture for huge text data processing and search (FIOS): using methods of memory mapping & pooling, in-file hashing, extreme density of key/value storage & search, disk & network I/O streamlining, etc., the approach supports a very high performance distributed processing system, with per-node capacity and efficiency to 500 million webpages and 3000 doc/sec indexing.
A sense-matrix model for IR (SMM): proposed a new IR model that represents every document as a word-by-sense matrix, where the word sense distributions are modeled by frequency statistics (e.g., part-of-speeches stat.), and techniques for matrix similarity measures and matrix transformations (discrete cosine transform, etc.) are explored. The model is applicable to a variety of text analysis applications beyond IR.
A structured-description method for speeding up the HTTP transactions (Structural HTTP): a simple and compatible extension to the HTTP to transfer Web resources in a batch manner within a single request and response transaction, including new message headers for resource transmission control and for describing the structural information of Web contents, achieving performance improvements of transmission acceleration being around 70% to 400% and the same magnitude of packet data savings.
A type-constrained methodology for OOP languages (XOOP): proposed a framework for “generic typing” OOP methods that making use of static requirements on types for direct compiler checking, with design principles for type constraints libraries, applicable to class library development of C++ templates, C# and/or Java generics.
Additionally, Dr. Sun with his research team has issued (via the PKU Office of Technology Development) above 100 licenses of software & resources technology transfer to a number of institutional users (including dozens of universities and institutes, as well as large companies such as Microsoft Research, IBM, Intel, PARC Incorporated, Fujitsu, NTT, NEC, and all the major Chinese internet companies and search engines), feeding back over 5 million RMB funding income for this university.