Less than a month ago, Drew Conway suggested that our R user group present an analysis of the WikiLeaks data.  In that short time he, Mike Dewar, John Myles White and Harlan Harris have put together a beautiful visualization of attacks in Afghanistan.  The static image you see here has since been animated which is a really nice touch.

Within a few hours of them posting their initial results the work spread across the internet, even getting written up in Wired’s Danger Room.  Today, they got picked up by the New York Times where you can see the animation.

The bulk of the work was, of course, done in R.  I remember talking with them about how they were going to scrape the data from the WikiLeaks documents, but I am not certain how they did it in the end.  As is natural for these guys they made their code available on GitHubso you can recreate their results, after you’ve downloaded the data yourself from WikiLeaks.

Briefly looking at their code I can see they used Hadley Wickham’s ggplot and plyr packages (which are almost standard for most R users) as well as R’s mapping packages.  If you want to learn more about how they did this fantastic job come to the next R Meetup where they will present their findings.

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Today, Google announced two new services that are sure to be loved by data geeks.  First is their BigQuery which lets you analyze “Terabytes of data, trillions of records.”  This is great for people with large datasets.  I wonder if a program like R(my favorite statistical analysis package) can read it?  If so would R just pull down the data like it would from any other database?  That would most likely result in a data.frame that is far too large for a standard computer to handle.  Maybe R can be ran in a way that it hits the BigQuery service and leaves the data in there.  Maybe even the processing can be done on Google’s end, allowing for much better computation time.  This is something I’ve been dreaming of for a while now.

Further, can BigQuery produce graphics?  If so, this might be a real shot at Business Intelligence tools like QlikView or Cognosthat specialize in handling LARGE datasets. Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Thanks to Drew Conway for posting this video of me dicussing my thesis (pdf) on NYC pizza.  It was part of the New York R User Meetup on Applications of R.

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Drew Conway has a piece on his Zero Intelligence Agents blog about how well informed Tea Party protesters are about tax policy.  His analysis is pretty technical and he even offers up the R code he used to analyze the data and build the graphs which were made with a package called ggplot2 by Hadley Wickham at Rice University.

More after the break. Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

I don’t mean to shamelessly self-promote here, but I wanted to note that the Slice story on my pizza paper (pdf) has also been picked up by NBC New York’s food blog, Feast, and by Revolution Computing’s blog.  For people who don’t know, Revolution Computing optimizes R, the language used by a large number of statisticians for computations.

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.