Zhou, Minghui


Zhou, Minghui


Research Interests: Empirical software engineering, software measurement, mining

Office Phone: 86-10-6275 7670-12

Email: zhmh@pku.edu.cn

Zhou, Minghui is an professor in the Department of Computer Science and technology, School of EECS. She obtained her B.S, M.S. and Ph.D. from National University of Defense Technology in 1995, 1999 and 2002 respectively. Her research interests include empirical software engineering, software measurement, mining software repositories and software digital archeology.

Dr. Zhou has published more than 50 research papers, and many of them are published in top-tier conferences and journals, such as ICSE, FSE, TSE, and TOSEM. She received ACM SIGSOFT Distinguished Paper Award in (FSE) 2010, and COMPASAC Best Paper Award in 2012. One paper is the front page paper of TOSEM (2016). She has served in the Technical Program Committee of various major conferences including ICSE, FSE, MSR and ESEM, and as Demo Co-Chair of FSE 2014 and PC Co-Chair of Internetware 2014. She also serves as reviewer for several Journals such as TSE, JSS and CSUR. She co-organizes a Shanon meeting (the only Dagstuhl-like Seminar in Asia) in 2017. She has been serving on the OW2 technical committee and board committee since 2009. She served as the chief consultant for OW2 (open source consortium) president in 2013. She has been invited as the external expert reviewer by Siemens Research since 2014. She was awarded MOE New Century Excellent Talents (2012), CVIC Software Talent Award (2015), and received National Technology Invention Second Prize twice (in 2008 and 2015 respectively).

Dr. Zhou has acquired peer-reviewed research funding and industry funding for over 10 million RMB. As PI, she won the first key project of the Natural Science Foundation of China that targets data-driven software engineering. As sub-PI, she participates in the first national funding in China (973) to investigate open source ecosystems. Her research achievements are summarized as follows:

1) Developer fluency and sustainability: To survive and succeed, software projects need to bring newcomers up-to-speed and to develop project competence, while at the same time retaining long term developers able to accomplish critical project tasks. She proposed the concepts of developer fluency and long term contributor, found ways to quantify them, and investigated factors affecting them. The findings allow people to outline empirically-based approaches to help the communities with the recruitment of contributors for long-term participation and to help the participants contribute more effectively.

2) Micro-practices: Best practices in software engineering have always been the goal of practitioners and researchers striving to improve software productivity and quality. She proposed studying micro-practices in a large-scale evidence-based approach to address this challenge. The approach involves the inductive generalization from in-depth studies of specific projects from one side and the categorization of micro-practices in the entire universe from the other side. She provided examples and roadmaps to lead the investigation.

3) Open source ecosystem: FLOSS ecosystems have had a tremendous impact on computing and society and have captured the attention of businesses, researchers, and policy makers. Despite the substantial amount of research on FLOSS, it remains unclear how and why FLOSS ecosystems form, how they achieve their impact, or how they sustain themselves. She initiated a series of projects on this area. She led a team to have developed an open source application server, she conducted empirical studies to discover the models of commercial involvement, and has been actively participating in the open source activities to exert world impact.

4) Data analysis methods: Data in software supporting systems are actively being used by developers seeking to learn, share code, and for other key tasks, and are also heavily used in software engineering research. However, operational data often do not faithfully represent the intended aspects of software development and, therefore, may jeopardize the conclusions derived from it. To address this challenge she proposed an approach to identify and correct problematic event data based on the individual capacity constraints and redundancies present in operational data. She also proposed a method to build Multi-extract and Multi-level Dataset for sharing, and shared a Mozilla issue tracking dataset covering a 15-year history as an example.