Continuing my quest to master R, I learnt about using GitHub as a way to manage all the files and their different drafts or ‘versions’. I will say that GitHub is only one of many options but it is very popular.
I thought that version control and GitHub was something that fancy developers used to manage their coding projects but I had no idea that you could link it to your projects in RStudio. In fact, you can manage website pages and upload any file – even a word doc!
Here’s a quick rundown of some keywords:
Repository (also called a ‘repo’, if you want to be cool): basically think of this as a master harddrive or folder where you want everything in your project to be stored.
Master: within the repo, the master is where the main or current version of files are kept. It’s the source of all the files in the repo. Think of it like the trunk of a tree and everything branches off from it.
Branch: these are… well, branches or different versions of the master. It’s useful when you have a team working together on different sections of a file or if you want to be able to go back to past versions of a file. Creating your own branch lets you make the edits to the file without worrying about deleting or overwriting changes that have been made.
Merging: if you have multiple branches and have made changes, then it’s important to try and merge these back into the master to create the update. GitHub has a handy feature that compares the original file with the one being uploaded to show you the changes you’re making.
Getting started with GitHub
Creating a new repository
First you need to create a GitHub account and log into it. You can create a new repo by clicking the + at the top right corner of any page or clicking the green ‘new’ button that has a little book logo (see figure below).
From there you’ll enter the details of your new repo (name, description, default settings and visibility) and create it. The page should look something like this:
Repositories for RStudio
Before creating a repo for RStudio, you need to install Git – the system on which GitHub is built (it’s free and open source!). Follow the installation instructions for your operating system with the default settings (for Windows, it might give you a security warning but click “Run” or “Allow” to continue).
After it’s installed, it needs to be connected to GitHub, so we need to tell Git your GitHub account details (email and username) so it know who is uploading or making changes. Launch Git command prompt and enter the following code:
git config --global user.name "Your Username" git config --global user.email firstname.lastname@example.org
To confirm these changes have been made, type the following code and check the last 2 lines in the output/results for the assigned username and email.
git config --list
Link RStudio with Git
In RStudio, go to Tools > Global Options > Git/SVN and check that “Git executable” points to the correct folder where Git is located.
In this same window, click the “Create RSA Key” button and close it. Navigate back to the window and then click “view public key” – copy this key into your GitHub account by heading to your Account Settings in GitHub (click your profile in the top right corner then “Settings”). Go to the “SSH and GPC keys” tab > “New SSH Key” > past the copied public key and give it a name that you’d understand (e.g. RStudio) > Add.
Repositories and RStudio
There are a few ways to link RStudio with a GitHub repository
New project and repo
- Create a new GitHub Repo (following the instructions above)
- Copy the URL of the new repo
- In RStudio, create a new project (File > New Project). Then in the window select “Version Control” > Git > paste the copied URL, set a name, and choose where to keep the project > “Create project”
Linking an existing project to a repo
- Go to Git Bash or the Terminal and go to the directory of your R project (type
cd ~/main folder/R_project_folder)
- Then type
git add .(this will create a git repository and add all the files in the directory into it)
- Commit this change with
git commit -m "Initial commit"
- Go to GitHub and now create a new repository with the EXACT same name as the R project, making sure not to check any of the initialise options (because you’re importing an existing repo that you just made)
- Choose to “push an existing repository from the command line” and copy that code into Git Bash or the Terminal. Check it’s all linked by refreshing the page and you should see a the dashboard of a new repository
- Restart the project in RStudio and you should now see the “Git” tab in the upper right quadrant.
Forking a repository
Now we know how to create a new repo, but what if you wanted to use files that someone has shared on their account? This is where forking is really handy.
Forking lets you create a copy of someone else’s repository that you can edit and change under your own account. To create a fork, you’d need to go to the repo you want to copy and click “Fork” in the toolbar in the top right corner (see figure below). This will automatically create and direct you to a copy under your account.
Clone to RStudio
Clone the repository (creates a local copy so that you can edit files offline on your computer) by clicking the green “code” button on the top right and copying the link (see image below). You’ll then need to create a new RStudio project (New Project > Version Control > Git) and use the copied URL.
Pushing and committing
After you’ve made your changes, you then need to update your repo through staging, commiting and pushing changes.
Create or change a file and when you’re ready to upload, in the environment quadrant (usually top right), go to the “Git” tab and click the checkbox next to the file (it’ll say “staged” above the checkbox) that you want to upload, and press “commit”. This will open a new window showing you all the changes that’s been made compared to previous versions (green = added; red = removed). You can also add a commit message in the top right section to outline the changes. Then click “commit” and “push” to push the changes through to GitHub.
The final commit (conclusion)
There are so many useful ways we can use GitHub and other version control systems to manage all our files.