While the findings are based on the analysis of over 3,000 projects, we cannot claim that the findings can be generalised to other GitHub repositories that are not in the datasets considered, i.e. the top 1,000 most popular GitHub repositories and those owned by Google. We also cannot make claims of the generalisability of our findings for projects hosted on other version control platforms.
35
Chapter 6
Conclusions and Future Work
In this thesis, we proposed an approach that can automatically detect outdated refer- ences to code elements caused by removing all source code instances. We investigated the current state of documentation in software repositories, extended the approach to analyse the state of documentation throughout projects’ history, explored how out- dated documentation is resolved in open source projects, and with the results, we alerted Google developers of potentially outdated code element references in their projects. In addition, we created a publicly available tool that enables developers to scan for outdated references and used OCR to detect images containing outdated references in software documentation.
In detail, we found that the majority of the most popular projects on GitHub con- tained at least one outdated reference to a code element at some point during their history and these outdated references usually survived in the documentation for years before they were fixed. By analysing the full history of projects, we discovered that outdated references are more likely fixed by updating the source code or document than deleting the entire document. Moreover, our GitHub issues have led to instances of outdated documentation getting fixed in real-world projects. To assist developers with discovering outdated documentation in their projects, we created a GitHub Ac- tion tool that can automatically scan for outdated code element references whenever a pull request is submitted to a repository. Finally, we were able to use OCR to extract text from images to detect outdated references in real-world projects.
One of the potential directions for future work is to investigate how traceability links can be established for renamed code elements. Our approach currently only detects if a code element reference is outdated by performing an exact match to find source code instances, but it does not know what the code element has been renamed to. Being able to automatically establish links means that the tool can automatically suggest how to update the documentation. Other future work may come in the form of small improvements to the current approach. Currently, deleting the final code element instance that is part of a source code comment may lead to falsely flagged references. Applying customised sets of regular expressions for files written in different programming languages may be one such improvement to help with more accurate matches in the source code, e.g. avoiding matching code elements that are part of a source code comment. Another small useful extension to the tool could be a feature where the project developer can reply to the tool’s comment for code elements they do not want to keep track of. The tool will then automatically add the code elements to the project’s exclude list. Finally, more investigations on outdated references in images, e.g. extending the analysis to the top1000 dataset, will help the software engineering community better understand the state of images in documentation.
We hope that this research will be a step toward keeping documentation in software repositories up-to-date.
37
Bibliography
Aghajani, Emad, Csaba Nagy, Olga Lucero Vega-Márquez, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, and Michele Lanza (2019). “Software documenta- tion issues unveiled”. In:Proceedings of the International Conference on Software Engineering, pp. 1199–1210.
Aldaeej, Abdullah (2021). “Towards Effective Technical Debt Decision Making in Soft- ware Startups: A Multiple Case Study of Web and Mobile App Startups”. PhD thesis. University of Maryland, Baltimore County.
Bacchelli, Alberto, Michele Lanza, and Romain Robbes (2010). “Linking e-mails and source code artifacts”. In:Proceedings of the 32nd ACM/IEEE International Con- ference on Software Engineering-Volume 1, pp. 375–384.
Dagenais, Barthélémy and Martin P Robillard (2012). “Recovering traceability links between an API and its learning resources”. In:2012 34th International Conference on Software Engineering (ICSE). IEEE, pp. 47–57.
Dagenais, Barthélémy and Martin P Robillard (2014). “Using traceability links to recommend adaptive changes for documentation evolution”. In:IEEE Transactions on Software Engineering 40.11, pp. 1126–1146.
Forward, Andrew and Timothy C Lethbridge (2002). “The relevance of software doc- umentation, tools and technologies: a survey”. In: Proceedings of the Symposium on Document Engineering, pp. 26–33.
Kajko-Mattsson, Mira (2005). “A survey of documentation practice within corrective maintenance”. In:Empirical Software Engineering 10.1, pp. 31–55.
Kruchten, Philippe, Robert L Nord, and Ipek Ozkaya (2012). “Technical debt: From metaphor to theory and practice”. In:IEEE Software 29.6, pp. 18–21.
Lee, Seonah, Rongxin Wu, Shing-Chi Cheung, and Sungwon Kang (2019). “Automatic detection and update suggestion for outdated API names in documentation”. In:
IEEE Transactions on Software Engineering.
Lethbridge, Timothy C, Janice Singer, and Andrew Forward (2003). “How software engineers use documentation: The state of the practice”. In:IEEE Software 20.6, pp. 35–39.
Liu, Jiakun, Qiao Huang, Xin Xia, Emad Shihab, David Lo, and Shanping Li (2021).
“An exploratory study on the introduction and removal of different types of tech- nical debt in deep learning frameworks”. In:Empirical Software Engineering 26.2, pp. 1–36.
Mendes, Thiago Souto, Mário André de F. Farias, Manoel Mendonça, Henrique Frota Soares, Marcos Kalinowski, and Rodrigo Oliveira Spínola (2016). “Impacts of agile requirements documentation debt on software projects: a retrospective study”. In:
Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1290–
1295.
Panthaplackel, Sheena, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, and Raymond J Mooney (2020). “Learning to Update Natural Language Comments Based on Code Changes”. In:arXiv preprint arXiv:2004.12169.
Parnas, David Lorge (1994). “Software aging”. In:Proceedings of International Con- ference on Software Engineering, pp. 279–287.
38 Bibliography Petrosyan, Gayane, Martin P Robillard, and Renato De Mori (2015). “Discovering in- formation explaining API types using text classification”. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. Vol. 1. IEEE, pp. 869–879.
Prana, Gede Artha Azriadi, Christoph Treude, Ferdian Thung, Thushari Atapattu, and David Lo (2019). “Categorizing the content of github readme files”. In: Em- pirical Software Engineering 24.3, pp. 1296–1327.
Ratol, Inderjot Kaur and Martin P Robillard (2017). “Detecting fragile comments”. In:
Proceedings of the International Conference on Automated Software Engineering, pp. 112–122.
Rigby, Peter C and Martin P Robillard (2013). “Discovering essential code elements in informal documentation”. In: 2013 35th International Conference on Software Engineering (ICSE). IEEE, pp. 832–841.
Rios, Nicolli, Leonardo Mendes, Cristina Cerdeiral, Ana Patrícia F Magalhães, Boris Perez, Darío Correal, Hernán Astudillo, Carolyn Seaman, Clemente Izurieta, Glei- son Santos, et al. (2020). “Hearing the voice of software practitioners on causes, effects, and practices to deal with documentation debt”. In: International Work- ing Conference on Requirements Engineering: Foundation for Software Quality. Springer, pp. 55–70.
Robillard, Martin P, Andrian Marcus, Christoph Treude, Gabriele Bavota, Oscar Cha- parro, Neil Ernst, Marco Aurélio Gerosa, Michael Godfrey, Michele Lanza, Mario Linares-Vásquez, et al. (2017). “On-demand developer documentation”. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp. 479–483.
Sholler, Dan, Igor Steinmacher, Denae Ford, Mara Averick, Mike Hoye, and Greg Wilson (2019). “Ten simple rules for helping newcomers become contributors to open projects”. In:PLoS computational biology 15.9, e1007296.
Souza, Sergio Cozzetti B de, Nicolas Anquetil, and Káthia M de Oliveira (2005). “A study of the documentation essential to software maintenance”. In: Proceedings of the International Conference on Design of Communication: Documenting &
Designing for Pervasive Information, pp. 68–75.
Steinmacher, Igor, Christoph Treude, and Marco Aurélio Gerosa (2018). “Let me in:
Guidelines for the successful onboarding of newcomers to open source projects”.
In: IEEE Software 36.4, pp. 41–49.
Tan, Shin Hwei, Darko Marinov, Lin Tan, and Gary T Leavens (2012). “@tcomment:
Testing Javadoc comments to detect comment-code inconsistencies”. In: Proceed- ings of the International Conference on Software Testing, Verification and Valida- tion, pp. 260–269.
Treude, Christoph, Martin P Robillard, and Barthélémy Dagenais (2014). “Extracting development tasks to navigate software documentation”. In:IEEE Transactions on Software Engineering 41.6, pp. 565–581.
Uddin, Gias and Martin P Robillard (2015). “How API documentation fails”. In:IEEE Software 32.4, pp. 68–75.
Wen, Fengcai, Csaba Nagy, Gabriele Bavota, and Michele Lanza (2019). “A large- scale empirical study on code-comment inconsistencies”. In: Proceedings of the International Conference on Program Comprehension, pp. 53–64.
Zhong, Hao and Zhendong Su (2013). “Detecting API documentation errors”. In:Pro- ceedings of the International Conference on Object Oriented Programming Systems Languages & Applications, pp. 803–816.