Contact: Michel Wermelinger (M.A.Wermelinger 'at' open.ac.uk)
Software evolution is the phenomenon of continual software change. Software evolution is driven by feedback from users, in the form of new and changed requirements, and other social, economic and technological changes in the real world. The alternative to evolutionary changes, which are often incremental, is revolutionary change, as for example (re)development from scratch. Both, evolution and revolution, are present in the software world; surveys suggest that most of the effort is spent on evolution.
It practical terms this implies that many applications ‘stay around’ several years, more than initially foreseen, becoming legacy systems and likely to suffer from software aging and code decay. This also means that huge resources go into software evolution. There is a vast opportunity for research to help improve efficiency and effectiveness in software evolution.
As software is evolved, code and data are accumulated in software code repositories. This offers the opportunity to apply a range of techniques to study these large datasets, in search of interesting facts that may help managers and developers to evolve software. This research can contribute to an evidence-based scientific understanding of software evolution. Free/Libre/Open Software (FLOSS) repositories offer a great opportunity for researchers in this field. Not only code, but also mailing lists and defect databases are likely to be available for study. Because of the specifics of this domain, results of FLOSS research may not be applicable to proprietary settings. Proprietary systems and their organisations can be studied on their own but access should be secured with the system’s owners before the research starts and confidentiality preserved, particularly when publishing any results. As applications co-evolve with the wider world (e.g. cultures, societies, business processes), there is also an interest in understanding and managing projects within a wider software eco-system.
Some of the interesting questions are:
• What are the relevant human aspects of software evolution, e.g. what is the role and impact of people as drivers, implementers and, in general, stakeholders of software evolution?
• How is software evolution characterised in given domains, and what similarities and differences can be found across domains (e.g. FLOSS vs. proprietary) with respect to any relevant observable aspects?
• How can the quality of the software evolution product and process be systematically measured, evaluated and improved? For example, how can productivity and efficacy be modelled and measured from existing code repositories and other historical data?
Possible projects may involve one or more of these techniques:
• Software metrics are applied over releases or real time, involving the systematic measurement of product and process properties (such as size, complexity and productivity) and their analysis, in order to identify trends and outliers.
• Data visualisation helps find evolutionary patterns using scatter diagrams, box-plots or other diagrammatic techniques, and helps developers and managers to understand the evolution of their systems and plan the way forward.
• Data mining and other techniques detect regularities and patterns in the data, which can then be used to find ‘hot-spots’ of complexity, for example to guide refactoring, or to help evaluate existing or identify new hypotheses of the software evolution phenomenon (see Lehman’s laws).
• Simulation and other theory-based models embody a conceptual grasp of some aspect of the software evolution phenomenon or analyse a practical situation related to process improvement. These models should be evaluable against empirical data.
• Information recovery, e.g. of the software’s requirements and architecture, help re-document applications, evaluate design integrity and compliance to software engineering principles, recover traceability links, and document assumptions.
• Ethnographic studies, and similar social-science inspired studies of actual organisations in charge of evolving software systems, help identify good practice, difficulties, and success factors in the context of long-term software evolution.
• Empirical research methods include controlled experiments for the study of the impact of particular techniques or approaches on software evolution and the replication, in a different setting, of existing studies of software evolution.
• Infrastructural tools include: new tools to help evolve software (e.g. new code versioning systems with better integration with bug reporting systems); new e-Science based technologies to help software evolution researchers to more effectively use existing repositories and replicate studies; high performance approaches (like parallel computing) to make the processing of software repositories more tractable.
Some PhD topics, especially those under the under the themes 'Empirical Studies of Software Development' and 'Requirements Engineering and Design', are related to software evolution, and can be adapted to smaller Mphil projects.
Some resources you may wish to explore are:
• Click here for a “roadmap” to research in this field
• Books containing collections of papers on software evolution include Lehman and Belady (1985), Madhavji et al (2006), Mens and Demeyer (2008). Meir M. (Manny) Lehman was one of the pioneers in this field.
• The ERCIM working group on software evolution offers a variety of resources on their website and organizes an annual workshop.
• Papers on software evolution are published in many venues, including the Journal of Software Maintenance and Evolution, the IEEE ICSM conferences, the IWPSE workshops, the MSR working conference, which runs a mining challenge based on open source projects.
• The European Laboratory on Software Evolution is a vision to achieve a virtual lab helping researchers to co-operate in this field.
• Try a Google Scholar search on 'software evolution' or ‘software evolution laws’.