Tool developed at CERN makes software citation easier

Source code from the popular software-development site GitHub can now be preserved and cited through the CERN-hosted online repository Zenodo


Research output amounts to much more than just academic papers. It is important that underlying datasets, and the software used to analyze them, are also properly cited and that the researchers behind these are given credit for their work. Fortunately, as of today, software citation has become significantly easier. Researchers working at CERN have developed a tool that allows source code from the popular software development site GitHub to be preserved and cited through the CERN-hosted online repository Zenodo.

Launched almost one year ago, Zenodo is powered by Invenio and was created through the European Commission’s OpenAIREplus project. It facilitates sharing of research outputs from a wide variety of formats across all fields of science. Now, people working on software in GitHub will be able to ensure that their code is not only preserved through Zenodo, but is also provided with a unique digital object identifier (DOI), just like an academic paper.

"Open science is not only about open-access publications; it also means the publication of your data," says Tim Smith, group leader for collaboration and information services within the CERN IT department. "For data to be reusable, you need to have the software alongside it that was used to read and interpret it."

The citation tool was developed by Lars Holm Nielsen, a software engineer based at CERN, and Amit Kapadia, who now works at US-based company Mapbox. Kapadia previously worked at Zooniverse as part of the team behind Galaxy Zoo and other popular citizen science projects. He spent two weeks at CERN working with Nielsen to create the tool for Zenodo. "This is an exciting project," says Kapadia. "As a software developer, I’m looking forward to using these features myself."

"We want to give researchers the credit they deserve for creating great software, by making it citable and helping to preserve it," explains Nielsen. "However, it’s not only about preservation and citability, but reproducibility as well. It’s important that you can find the software that was used behind the results reported in a paper, so that you can reproduce them if you wish."

Nielsen and Kapadia also spent time working with the INSPIRE team, to prototype dedicated services for the high-energy physics community. "We want scientists in our community to be able to publish their data and software, and link this tightly back to the INSPIRE records that describe the analysis," says Smith.

This article originally appeared in International Science Grid This Week