Today, Google announced two new services that are sure to be loved by data geeks. First is their BigQuery which lets you analyze “Terabytes of data, trillions of records.” This is great for people with large datasets. I wonder if a program like R(my favorite statistical analysis package) can read it? If so would R just pull down the data like it would from any other database? That would most likely result in a data.frame that is far too large for a standard computer to handle. Maybe R can be ran in a way that it hits the BigQuery service and leaves the data in there. Maybe even the processing can be done on Google’s end, allowing for much better computation time. This is something I’ve been dreaming of for a while now.
The other day, I was working near Houston street, teaching a class on QlikView (which itself could be a great post topic about data munging for statisticians). On the last day of the class we decided to head to Bleecker street for a pizza feast.
I, and a lot of people, love his column. However, last week’s piece on probability was not received so well by the statistics community., particularly on Andy Gelman’s blog and Junk Charts’ sister blog, Numbers Rule Your World. Continue reading
Pizza Girl, a pizza delivery girl who is a regular contributor on Slice, tallied up and analyzed the time she spends on various duties in her pizzeria. This is just the first part in a series, but so far she determined that she spends 67% of her shift driving.
According to her pay schedule, she makes less money while driving ($4.95/hr) than she does while in the pizzeria ($7.50). Continue reading
The New York Times, in what seems like a continuing series on NYC transportation, has an article about a decline in subway ridership. The article points out declines that were to be expected such as in the financial district or Midtown as well as expected increases like along J, which shares a route with the M and Z which are facing service cuts. It will be interesting to see how these findings impact the expected service cuts.
Another area with expected results was a massive drop off at the moribund Mets’ stop and a below average drop at the World Champion Yankees stop. However, the Mets–unlike the Yankees–have a convenient commuter rail stop. Perhaps that explains the drop more than the team’s performance. Continue reading
Slice recently reported that Fark user “Certainly You Jest” tabulated a list of the 25 most mentioned pizzerias. Naturally, I decided to play with the numbers. Rather than write up another formal paper, I did some quick ad hoc analysis for posting on this blog and I will skip some of the more technical aspects.
First, I augmented the data with the price of a typical plain pie that could feed two to four people and the pizzeria’s distance from New York City. Adding the distance meant I had to remove the multi-state chains, like Monical’s, from the data.
While the number of times a pizzeria is mentioned is count data, it doesn’t quite fit a poisson distribution, and the poisson regression didn’t seem to be a good fit. This makes sense since I have three predictors (distance from New York, price and their interaction). You can see this in the two histograms below.
This Thursday, April 8th, I’ll be giving two brief talks (5 to 10 minutes) about statistical methods at the New York R User Meetup. The first will be applying multilevel models to World Health Organization data to study noncommunicable diseases. The second, and probably more fun, will be a presentation of my pizza paper (pdf) that was featured on Slice.
I just filled out my Census form and I have to say it was fairly painless and simple. The short form (pdf) really only asks about age, ethnicity and other residences. If anyone has a long form (now called the American Community Survey), please let me know your experiences filling that out.
The question concerning residence can be a bit tricky these days with so many people having multiple residences, children who live on their own but visit home frequently and couples who live togetherbut also maintain separate residences.
Drew Conway has a piece on his Zero Intelligence Agents blog about how well informed Tea Party protesters are about tax policy. His analysis is pretty technical and he even offers up the R code he used to analyze the data and build the graphs which were made with a package called ggplot2 by Hadley Wickham at Rice University.
More after the break. Continue reading