Shortly after I learned LaTeX I used it to write my resume (or CV if you will), freeing me from the headache of using Microsoft Word and the associated formatting troubles. Even that wasn’t enough though because different audiences needed different information and job listings. I could have stored all the information in the file and commented out bullet points I did not want to use, but that seemed sloppy. So instead I wrote an R package called resumer.

The trick is to store all of the data in a CSV, one row per bullet point.1

JobName Company Location Title Start End Bullet BulletName Type Description
Tech Startup Pied Piper New York, NY CTO 2013 Present Set up company’s computing platform 1 Job NA
Tech Startup Pied Piper New York, NY CTO 2013 Present Designed data strategy overseeing many datasources 2 Job NA
Tech Startup Pied Piper New York, NY CTO 2013 Present Constructed statistical models for predictive analytics of big data 3 Job NA
Large Bank Goliath National Bank New York, NY Quant 2011 2013 Built quantitative models for derivatives trades 1 Job NA
Large Bank Goliath National Bank New York, NY Quant 2011 2013 Wrote algorithms using the R statistical programming language 2 Job NA
Bank Intern Goliath National Bank New York, NY Intern 2010 NA Got coffee for senior staff 1 Job NA

Each row represents a detail about a job. So a job may take multiple rows.

The columns are:

  • JobName: Name identifying this job. This is identifying information used when selecting which jobs to display.
  • Company: Name of company.
  • Location: Physical location of job.
  • Title: Title held at job.
  • Start: Start date of job, usually represented by a year.
  • End: End date of job. This would ordinarily by a year, ‘Present’ or blank.
  • Bullet: The detail about the job.
  • BulletName: Identifier for this detail, used when selecting which details to display.
  • Type: Should be either Job or Research.
  • Description: Used for a quick blurb about research roles.

There are many parts to using this package which are all explained in the README and mostly reproduced here.

The yaml header holds your name, address, the location of the jobs CSV file, education information and any highlights. Remember, proper indenting is required for yaml.

The name and address fields are self explanatory. output takes the form of package::function which for this package is resumer::resumer.

The location of the jobs CSV is specified in the JobFile slot of the params entry. This should be the absolute path to the CSV.

These would look like this.

---
name: "Generic Name"
address: "New York"
output: resumer::resumer
params:
    JobFile: "examples/jobs.csv"
---

Supplying education information is done as a list in the education entry, with each school containing slots for school, dates and optionally notes. Each slot of the list is started with a -. The notes slot starts with a | and each line (except the last line) must end with two spaces.

For example:

---
education:
-   school: "Hudson University"
    dates: "2007--2009"
    notes: |
        GPA 3.955  
        Master of Arts in Statistics
-   school: "Smallville College"
    dates: "2000--2004"
    notes: |
        Cumulative GPA 3.838 Summa Cum Laude, Honors in Mathematics  
        Bachelor of Science in Mathematics, Journalism Minor  
        The Wayne Award for Excellence in Mathematics  
        Member of Pi Mu Epsilon, a national honorary mathematics society
---

To provide a highlights section set doHighlights: yes and create a highlights tag.

Each bullet in the highlights entry should be a list slot started by -. For example.

---
doHighlights: yes
highlights:
-   bullet: Author of \emph{Pulitzer Prize} winning article
-   bullet: Organizer of \textbf{Glasses and Cowl} Meetup
-   bullet: Analyzed global survey by the \textbf{Surveyors Inc}
-   bullet: Professor of Journalism at \textbf{Hudson University}
-   bullet: Thesis on \textbf{Facial Recognition Errors}
-   bullet: Served as reporter in \textbf{Vientiane, Laos}
---

Jobs and details are selected for display by building a list of lists named jobList. Each inner list represents a job and should have three unnamed elements: – CompanyNameJobName – Vector of BulletNames

An example is:

jobList <- list(
    list("Pied Piper", "Tech Startup", c(1, 3)),
    list("Goliath National Bank", "Large Bank", 1:2),
    list("Goliath National Bank", "Bank Intern", 1:3),
    list("Surveyors Inc", "Survery Stats", 1:2),
    list("Daily Planet", "Reporting", 2:4),
    list("Hudson University", "Professor", c(1, 3:4)),
    list("Hooli", "Coding Intern", c(1:3))
)

Research is specified similarly in researchList.

# generate a list of lists of research that list the company name, job name and bullet
researchList <- list(
    list("Hudson University", "Oddie Research", 4:5),
    list("Daily Planet", "Winning Article", 2)
)

The job file is read into the jobs variable using read.csv2.

library(resumer)
jobs <- read.csv2(params$JobFile, header=TRUE, sep=',', stringsAsFactors=FALSE)

The jobs and details are written to LaTeX using a code chunk with results='asis'.

Same with research details.

Regular LaTeX code can be used, such as in specifying an athletics section. Note that this uses a special rSection environment.

\begin{rSection}{Athletics}
\textbf{Ice Hockey} \emph{Goaltender} | \textbf{Hudson University} | 2000--2004 \\
\textbf{Curling} \emph{Vice Skip} | \textbf{Hudson University} | 2000--2004
\end{rSection}

A complete template is available when creating a new file in RStudio.

Any suggestions or, even better, pull requests are welcome at the GitHub page.


  1. A helper function, createJobFile, creates a CSV with the correct headers.
Posted in R.

Snowstorm Stella impacted both our numbers and our location, but last night a smaller crew braved the cold weather and messy streets to celebrate Pi Day with pizza and Pi Cake at Ribalta.

We naturally ate a lot of round pies and even a rectangular pie to honor Hippocrates’ squaring the lune.

This year’s Pi Cake came from Empire Cakes for the third year in a row.  It was their Brooklyn Blackout cake with Chocolate frosting, a blue Pi symbol on top and blue circles with red radii around the sides.

Some pictures from last night:

IMG_20170314_224825_430 IMG_20170314_225301_523 IMG_1967 IMG_20170314_201119 IMG_20170314_205344

And all the years’ Pi Cakes:









I’m speaking in a few places over the next few weeks, so rather than just giving people a day’s notice I figured I should lay it out a bit. Right now I have three public talks lined up with a few more about to solidify. Soon I will update this map to have past talks too.


Talk Event City Date
Modeling and Machine Learning in R ODSC San Francisco 2017-03-01
Scraping and Analyzing NFL Data Sloan Sports Analytics Conference Boston 2017-03-03
Fun with R New York R Conference New York 2017-04-21


Highlights from the 2016 New York R Conference

Originally posted on www.work-bench.com.

image

You might be asking yourself, “How was the 2016 New York R Conference?”

Well, if we had to sum it up in one picture, it would look a lot like this (thank you to Drew Conway for the slide & delivering the battle cry for data science in NYC):

image

Our 2nd annual, sold-out New York R Conference was back this year on April 8th & 9th at Work-Bench. Co-hosted with our friends at Lander Analytics, this year’s conference was bigger and better than ever, with over 250 attendees, and speakers from Airbnb, AT&T, Columbia University, eBay, Etsy, RStudio, Socure, and Tamr. In case you missed the conference or want to relive the excitement, all of the talks and slides are now live on the R Conference website.

With 30 talks, each 20 minutes long and two forty-minute keynotes, the topics of the presentations were just as diverse as the speakers. Vivian Peng gave an emotional talk on data visualization using non-visual senses and “The Feels.” Bryan Lewis measured the shadows of audience members to demonstrate the pros and cons of projection methods, and Daniel Lee talked about life, love, Stan, and March Madness. But, even with 32 presentations from a diverse selection of speakers, two dominant themes emerged: 1) Community and 2) Writing better code.

Given the amazing caliber of speakers and attendees, community was on everyone’s mind from the start. Drew Conway emoted the past, present, and future of data science in NYC, and spoke to the dangers of tearing down the tent we built. Joe Rickert from Microsoft discussed the R Consortium and how to become involved. Wes McKinney talked about community efforts in improving interoperability between data science languages with the new Feather data frame file format under the Apache Arrow project. Elena Grewal discussed how Airbnb’s data science team made changes to the hiring process to increase the number of female hires, and Andrew Gelman even talked about how your political opinions are shaped by those around you in his talk about Social Penumbras.

Writing better code also proved to be a dominant theme throughout the two day conference. Dan Chen of Lander Analytics talked about implementing tests in R. Similarly, Neal Richardson and Mike Malecki of Crunch.io talked about how they learned to stop munging and love tests, and Ben Lerner discussed how to optimize Python code using profilers and Cython. The perfect intersection of themes came from Bas van Schaik of Semmle who discussed how to use data science to write better code by treating code as data. While everyone had some amazing insights, these were our top five highlights:

JJ Allaire Releases a New Preview of RStudio

image

JJ Allaire, the second speaker of the conference, got the crowd fired up by announcing new features of RStudio and new packages. Particularly exciting was bookdown for authoring large documents, R Notebooks for interactive Markdown files and shared sessions so multiple people can code together from separate computers.

Andrew Gelman Discusses the Political Impact of the Social Penumbra

image

As always, Dr. Andrew Gelman wowed the crowd with his breakdown of how political opinions are shaped by those around us. He utilized his trademark visualizations and wit to convey the findings of complex models.

Vivian Peng Helps Kick off the Second Day with a Punch to the Gut

image

On the morning of the second day of the conference, Vivian Peng gave a heartfelt talk on using data visualization and non-visual senses to drive emotional reaction and shape public opinion on everything from the Syrian civil war to drug resistance statistics.

Ivor Cribben Studies Brain Activity with Time Varying Networks

image

University of Alberta Professor Ivor Cribben demonstrated his techniques for analyzing fMRI data. His use of network graphs, time series and extremograms brought an academic rigor to the conference.

Elena Grewal Talks About Scaling Data Science at Airbnb

image

After a jam-packed 2 full days, Elena Grewal helped wind down the conference with a thoughtful introspection on how Airbnb has grown their data science team from 5 to 70 people, with a focus on increasing diversity and eliminating bias in the hiring process.

See the full conference videos & presentations below, and sign up for updates for the 2017 New York R Conference on www.rstats.nyc. To get your R fix in the meantime, follow @nyhackr, @Work_Bench, and @rstatsnyc on Twitter, and check out the New York Open Programming Statistical Meetup or one of Work-Bench’s upcoming events!

Ohio State Buckeyes defensive lineman Joey Bosa

Been a busy few weeks with the New York R Conference, speaking engagements, writing the second edition of R for Everyone and coding open source packages.  The most exciting news involves the news as the Wall Street Journal wrote an article about my NFL Draft work.

It is a great piece with some nice quotes from the Vikings General Manager Rick Spielman and ESPN’s legendary John Clayton that succinctly sums up the work I did and runs the numbers on a few select players.

So now I’ve been in the news for pizza, the lottery and football.  Fun mix.

MIT Sports Analytics Conference

Last year, as I embarked on my NFL sports statistics work, I attended the Sloan Sports Analytics Conference for the first time. A year later, after a very successful draft, I was invited to present an R workshop to the conference.

My time slot was up against Nate Silver so I didn’t expect many people to attend.    Much to my surprise when I entered the room every seat was taken, people were lining the walls and sitting in the aisles.

My presentation, which was unrelated to the work I did, analyzed the Giants’ probability of passing versus rushing and the probability of which receiver was targeted.  It is available at the talks section of my site.

After the talk I spent the rest of the day fielding questions and gave away copies of R for Everyone and an NYC Data Mafia shirt.

Last night we celebrated Rounded Pi Day by rounding at the 10,000’s digit to get 3.1416 which nicely works with the date 3/14/16.  This was great after Mega Pi Day worked out so perfectly last year.  And this all built upon previous years’ celebrations.

We ate a large quantity of pizza at Lombardi’s. and for the second year in a row we got the Pi Cake from Empire Cakes with peanut butter and chocolate flavors.  The base was inscribed with historic approximations of Pi:  25/8, 256/81, 339/108, 223/71, 377/120, 3927/1250, 355/113, 62832/20000, 22/7.

Some pictures from the fantastic night:

IMG_20160314_193523 IMG_20160314_203411 IMG_20160314_203443

Previous year’s Pi Cakes:

BDA3

Earlier this week, my company, Lander Analytics, organized our first public Bayesian short course, taught by Andrew Gelman, Bob Carpenter and Daniel Lee.  Needless to say the class sold out very quickly and left a long wait list.  So we will schedule another public training (exactly when tbd) and will make the same course available for private training.

This was the first time we utilized three instructors (as opposed to a main instructor and assistants which we often use for large classes) and it led to an amazing dynamic.  Bob laid the theoretical foundation for Markov chain Monte Carlo (MCMC), explaining both with math and geometry, and discussed the computational considerations of performing simulation draws.  Daniel led the participants through hands-on examples with Stan, covering everything from how to describe a model, to efficient computation to debugging.  Andrew gave his usual, crowd dazzling performance use previous work as case studies of when and how to use Bayesian methods.

It was an intensive three days of training with an incredible amount of information.  Everyone walked away knowing a lot more about Bayes, MCMC and Stan and eager to try out their new skills, and an autographed copy of Andrew’s book, BDA3.

A big help, as always was Daniel Chen who put in so much effort making the class run smoothly from securing the space, physically moving furniture and running all the technology.


I R NY
On April 24th and 25th Lander Analytics and Work-Bench coorganized the (sold-out) inaugural New York R Conference. It was an amazing weekend of nerding out over R and data, with a little Python and Julia mixed in for good measure. People from all across the R community gathered to see rockstars discuss their latest and greatest efforts.

Highlights include:


BryanLewis
Bryan Lewis wowing the crowd (there were literally gasps) with rthreejs implemented with htmlwidgets.


HilaryParker
Hilary Parker receiving spontaneous applause in the middle of her talk about reproducible research at Etsy for her explainr, catsplainr and mansplainr packages.


JamesPowell
James Powell speaking flawless Mandarin in a talk tangentially about Python.


VivianPeng
Vivian Peng also receiving spontaneous applause for her discussion of storytelling with data.


WesMcKinney
Wes McKinney showing love for data.frames in all languages and sporting an awesome R t-shirt.


DanChen
Dan Chen using Shiny to study Ebola data.


AndyGelman
Andrew Gelman blowing away everyone with his keynote about Bayesian methods with particular applications in politics.

Videos of the talks are available at http://www.rstats.nyc/#speakers with slides being added frequently.

A big thank you to sponsors RStudio, Revolution Analytics, DataKind, Pearson, Brewla Bars and Twillory.


Next year’s conference is already being planned for April. To inquire about sponsoring or speaking please get in touch.

Pi Cake 2015
This year we celebrated Mega Pi Day with the date (3/14/15) covering the first four digits of Pi. And of course, we unveiled the Pi Cake at 9:26 to get the next three digits.  This year the cake came from Empire Cakes and was peanut butter flavored.  We even had the bakery put as many digits as would fit around the cake.

A large group from the NYC Data Mafia came out and Scott Wiener of Scott’s Pizza Tours ensured we had the perfect assortment and quantity of pizza.

 

A look at Pi Cakes from previous years: