GitHub is not just for software and code! But GitHub does make coding easier with version control, solid tools for collaboration, and real-time feedback and reviews. It can also tell you more about the interest in and the use of your software and code than simply posting it to your website can.
Github is an online tool that facilitates sharing and collaborating on software development. Though optimized for software, Github is also useful to those writing anything – not just software. Code is written as plain text. As such, any plain text document can be uploaded to Github and benefit from its version control and sharing capabilities. That being said, this chapter is written with software and coding in mind. Know that most instances of the words “code” and “software” in this chapter could just as easily be replaced with “document” or “manuscript.”
If you write code for your research or scholarship, you’re missing out if you’re not on GitHub. GitHub is a collaborative coding website that hosts over 1 million open source projects and is increasingly being used by academics who code.
GitHub makes coding easier with its excellent version control, solid tools for collaboration, and real-time feedback and reviews. Even better, GitHub can tell you much more about the interest in, use and adaptation of your open source software and code than simply posting it to your website can.
In this chapter, we’ll give you a very high-level overview of how GitHub works and some of the benefits you can expect to see if you share your code on GitHub. For a more in-depth, low-level, guide on using Github and its underlying component, Git, consider following a Software Carpentry lesson on the subject.
GitHub is built on top of the distributed version control system, Git. Git allows multiple users to edit a single piece of code at once (asynchronously). Simply put, it tracks edits and allows each to be merged without unintentionally overwriting each other.
GitHub is an software-hosting platform that takes a lot of the pain out of using Git. Users create profiles on the site, download software, and start coding. GitHub Desktop can do most of the heavy lifting for you without requiring you to go through the command line tool, making it relatively easy to push your local code to the cloud and vice versa.
Individual software projects are hosted in GitHub “repositories.” Later on in this challenge, you’ll create repositories for your code (or for a manuscript on which you’re collaborating).
When you’re ready to collaborate, you can search others’ repositories, “fork” their code for your reuse, and suggest changes via “pull requests.” You can also invite others to collaborate on your code – more on that below.
Full-on Git & GitHub tutorials are beyond the scope of this chapter, but we encourage you to check out Lauren Orsini’s excellent GitHub primer (Part 1 & Part 2) to begin learning the basics of Git.
Once you’ve got your local software setup, it’s time to create a GitHub profile. This is the centralized place where all of your code and contributions will be collected.
Here are some tips for creating a profile that will make your academic code shine:
By following all of these tips, you’ll have a profile that’s much more searchable on GitHub. Plus, having a complete profile that showcases what makes you tick will make you more appealing to potential collaborators.
And good news if you’re from OU! OU is finalizing an institutional education membership with GitHub, which is planned to be active in 2019. One advantage of this membership will be the ability to create private repositories for sensitive work or work not quite yet ready to be shared outside your research group. And if you’re not from OU, GitHub recently started letting anyone create private repositories with up to three collaborators.
Once your profile is complete, it’s time to get your code online. Individual projects go into GitHub repositories. And repository-based reuse and interest metrics can help us learn about how our software is being used by others. Here are some tips for creating a great repository.
Choose a short but descriptive title for your repository: it will help with both memorability and findability. Naming your repository after the software itself is a good choice.
Create a killer Readme file: you want your code to be reusable, don’t you? Documentation is a huge boost to reusability, and a Readme file is the best place to keep your documentation. The original 30-Day Impact Challenge recommended including the following:
Consider also adding information about the grant that funded the development of your code and links to any related publications. To increase your code’s discoverability, also try to include keywords that others might use to find your software. You can find a template for Readme files on GitHub and additional best practices in Jesse Luoto’s GitHub repository.
Choose an open license: In a separate License.md file, include a license that clearly explains what rights you’ll allow others who want to reuse and adapt your code. There are strong feelings about open which open licenses are most appropriate, and pros and cons for each that are worth looking into, but we prefer relatively permissive licenses like the MIT license. GitHub also explains why licensing is important and provides a handy license chooser.
Have something besides code that you want to license (remember, we said GitHub is great for your writing projects too)? Take a look at the OU Impact Challenge chapter Understand Open Licenses.
Add collaborators: Invite anyone who has contributed to developing the code to be a collaborator on your personal code. For code that’s not yours but instead is part of the work an organization or institution does, you can also create an “organization” for code repositories. In the example below, Matt Jones belongs to the rOpenSci and DataONE organizations on GitHub, as we see at the bottom of his profile.
You may want to create an Organization for faculty research groups to help track projects and code being worked on by students in the research group. This can help you keep track of code long after a student has graduated.
For more information on adding others to a GitHub organization, see this guide.
Some academics like using GitHub for storing and working with numerical data. It has the advantage of being stored in a repository alongside the code that’s used for analysis, making your research project into a single, neatly packaged reproducible object.
For some examples of how others use GitHub for data, check out Carl Boettiger’s R workflow, Caitlin Rivers’ Ebola data archive, and OKFN’s government data archive.
Some drawbacks to using GitHub to store your data include its lack of solid preservation strategy and that it doesn’t specialize in one kind of data like some of the repositories we discussed in the previous chapter, making it difficult to find data to reuse. Also, GitHub doesn’t work well for large data stores (often described as big data) where storage is in the hundreds of gigabytes.
Now that your code (and possibly also your data and a manuscript) is online, let’s make it easier to track its impacts.
A challenge for tracking the scholarly impact of software is the lack of persistent identifiers that are available for code. That’s why Mozilla Science, GitHub, Zenodo, and Figshare partnered to begin issuing DOIs for code repositories on GitHub, which are often included in citations in publications.
To create a DOI for your code using Zenodo, take a look at the previous Impact Challenge chapter about making your data discoverable. You’ll need to sign up for a Zenodo account and then connect it to a GitHub repository to mint a DOI.
Once you’ve minted DOIs for your repositories, put the DOIs into each of your repositories’ Readme files alongside a preferred citation. It’ll make it easier for others to cite your code in their papers and articles.
It’s important to note, additional DOIs can be issued for code later in its lifecycle to allow for updates and bug fixes. DOIs issued at the time of a paper submission can allow others to see exactly what state the code was in at the time of the paper. This helps with reproducibility of results, and it allows for the continued development and improvement of your code.
Citations aren’t the only type of impact you can start to accrue when you make your code openly available on GitHub. GitHub has some good metrics that can tell you how your code is being reused and commented on – in real time. A few GitHub metrics to be aware of include:
Each of these metrics can tell a more nuanced story of the use of your code in your discipline than citations alone can.
Despite its popularity, GitHub has some limitations. Github can be frustrating and learning Git can be too high a barrier for entry for some to overcome.
Git’s focus on plain text files and GitHub’s file size limitations are drawbacks for others. Moreover, the problems with GitHub’s search function make it difficult to search for code or rank by relevancy when searching code documentation. A good workaround for this is to just use a regular search engine like Google.
Finally, GitHub is a for-profit company owned by Microsoft. They reserve the right to delete your code and data at any time, for any reason, making the long-term storage of code a questionable proposition. Therefore keep in mind that Git is not a backup system, it is a version control system and still an important part of any data management plan.
Learning the basics of Git will allow for you to use any of the other cloud services such as Bitbucket or GitLab.
First things first: read these excellent tutorials [1] [2] [3] [4] and practice using Git and GitHub. Once you’ve got your footing, it’s time to get your work online.
Deposit at least one of your best-known software projects, code snippets, or a piece of writing you want to share to a GitHub repository. Then (if it’s software) mint a DOI for it and add your preferred citation to the top of your Readme.md file.
Finally, get social! GitHub has social networking features, so try a few Google searches to see if you can find and follow others in your field. Bonus points for exploring their repositories to see if there’s any code you can borrow/fork for future projects.
This guide is based on the "30-Day Impact Challenge" by Stacy Konkiel and used here under a CC BY 4.0 International License and the OU Impact Challenge which is also licensed CC BY 4.0. Many thanks to those authors for creating and sharing these materials.