Getting People Started
A large part of my work is teaching R–for private clients, at Columbia Business School, at conferences and facilitating public workshops for others.
A common theme is that getting everyone setup on their individual computers is very difficult. No matter how many instructions I provide, there are always a good number of people without a proper environment. This can mean not using RStudio projects, not having the right packages installed, not downloading the data and sometimes not even installing R.
After many experiments I finally came upon a solution. For every class I teach I now create a skeleton project hosted on GitHub with instructions for setup.
The instructions (in the README) consist of three blocks of code.
- Package installation
- Copying the project structure from the repo (no git required)
- Downloading data
All the user has to do is copy and paste these three blocks of code into the R console and they have the exact same environment as the instructor and other students.
packages <- c( 'coefplot', 'rprojroot', 'tidyverse', 'usethis' ) install.packages(packages)
newProject <- usethis::use_course('https://github.com/jaredlander/WorkshopExampleRepo/archive/master.zip')
Using this process, 95% of my students are prepared for class.
The inspiration for this idea came from a fun coffee with Hadley Wickham and Jenny Bryan during a conference in New Zealand and the implementation is made possible thanks to the
Automating the Setup
Now that I found a good way to get students started, I wanted to make it easier for me to setup the repo. So I created an R package called
RepoGenerator and put it on CRAN.
The first step to using the package is to create a GitHub Personal Access Token (instructions are in the README). Then you build a
data.frame listing datasets you want the students to download. The
data.frame needs at least the following three columns.
Local: The name, not path, the file should have on disk
Remote: The URL where the data files are stored online
Mode: The mode needed to write the file to disk, ‘w’ for regular text files, ‘wb’ for binary files such as Excel or rds files
data.frame is available in the
data(datafiles, package='RepoGenerator') datafiles[1:6, c('Local', 'Remote', 'Mode')]
After that you define the packages you want your students to use. There can be as few or as many as you want. In addition to any packages you list,
usethis are added so that the instructions in the new repo will be certain to work.
packages <- c('caret', 'coefplot','DBI', 'dbplyr', 'doParallel', 'dygraphs', 'foreach', 'ggthemes', 'glmnet', 'jsonlite', 'leaflet', 'odbc', 'recipes', 'rmarkdown', 'rprojroot', 'RSQLite', 'rvest', 'tidyverse', 'threejs', 'usethis', 'UsingR', 'xgboost', 'XML', 'xml2')
Now all you need to do is call the
createRepo( # the name to use for the repo and project name='WorkshopExampleRepo', # the location on disk to build the project path='~/WorkshopExampleRepo', # the data.frame listing data files for the user to download data=datafiles, # vector of packages the user should install packages=packages, # the GitHub username to create the repo for user='jaredlander', # the new repo's README has the name of who is organizing the class organizer='Lander Analytics', # the name of the environment variable storing the GitHub Personal Access Token token='MyGitHubPATEnvVar' )
After this you will have a new repo setup for your users to copy, including instructions.
Reducing setup issues at the start of a training can really improve the experience for everyone and allow you to get straight into teaching.
Please check it out and let me know how it works for you.
Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.
Leave a Reply