Both the Journal and the Times reported on a studyabout New York City traffic which someone has called the “most statistically ambitious ever undertaken by a U.S. city.”  That just sounds awesome to me, both as a statistican and a pedestrian.  According to the report, New York is one of the safest cities in America to travel in but trails a number of major European and Asian cities.

One takeaway from the report is, that contrary to common belief, taxis are responsible for very few accidents.  This was always my feeling since cabbies are the experts of New York City streets and are under heavy scrutiny from the police and T&LC.  They have more incentive to be alert and cautious than private drivers.

It also found that Manhattan is more dangerous than the other boroughs.  I hope that doesn’t encourage congestion pricing though.  That’s an idea I still can’t get behind.

The Bloomberg administration is likely to use the report to further its (popular) street reforms.  As a biker, I like the dedicated bike lanes that use a column of parked cars–and sometimes a concrete median–to separate cyclists from moving traffic.  As a pedestrian it’s the countdown cross signals that are already in place near Union Square and Greenwich Avenue.  Hopefully Union Square will also be getting its own pedestrian plaza.

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Thanks to some early data from Pizza Girl, of Slice fame, I have some very preliminary findings.

There are a few different ways to tip, check (only one person did this), credit card at the door, pre-tipping with a credit card and cash.  As seen in these boxplots, cash tippers were the highest, on average.  Pre-tippers, who really are just tipping based on feeling, not performance, have the greatest variability.  There was even someone who only pre-tipped a dollar.  Pre-tipping a large amount might be a good idea–kind of like greasing a palm at a restaurant to get a table–but I don’t see how a small pre-tip is a good idea.

I wonder why people give bigger tips with cash than with credit cards.  I would have thought it would be the other way around.

This is just the beginning.  Pizza Girl is providing more data as the weeks go on.  And as I get more data the analysis will become more sophisticated, so stay tuned as we unravel the world of pizza delivery.  In the mean time, check out Pizza Girl’s third installment of her findings on Slice.

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Temple professor John Allen Paulos has an article in the New York Times that got Slashdotted today suggesting people be wary of all the metrics that fill our daily lives.

His first contention is whether assumptions about categorization are correct.  This is certainly important, but hopefully qualified statisticians, social scientists, doctors, etc. . .are making these decisions and properly counting the results.

Next he discusses whether numbers you are looking at have been aggregated properly and were arrived at by using the proper choices of criteria, protocols and weights.  He gives articles such as “The 10 Friendliest Colleges” and “The 20 Most Lovable Neighborhoods” as examples.  Having done a lot of work where variable selection and shrinkage is important I can say that I, for one, allow the data to speak for itself and use various statistical methods to arrive at the correct decision.

Dr. Paulos makes more points, but I’ll let you read the article for yourself.  The important take away–at least to me–is that when looking at reported statistics and measurements, try to figure out what methods were used.  That’s why I always am disappointed when articles do not report their methods.  I realize that understanding the techniques might be beyond the average person, but that’s when you ask your statistician friend.

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Today, Google announced two new services that are sure to be loved by data geeks.  First is their BigQuery which lets you analyze “Terabytes of data, trillions of records.”  This is great for people with large datasets.  I wonder if a program like R(my favorite statistical analysis package) can read it?  If so would R just pull down the data like it would from any other database?  That would most likely result in a data.frame that is far too large for a standard computer to handle.  Maybe R can be ran in a way that it hits the BigQuery service and leaves the data in there.  Maybe even the processing can be done on Google’s end, allowing for much better computation time.  This is something I’ve been dreaming of for a while now.

Further, can BigQuery produce graphics?  If so, this might be a real shot at Business Intelligence tools like QlikView or Cognosthat specialize in handling LARGE datasets. Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Steven Strogatz is writing a column for the New York Times where he discusses math, starting with basic concepts and working his way up to the complex and cerebral.

I, and a lot of people, love his column.  However, last week’s piece on probability was not received so well by the statistics community., particularly on Andy Gelman’s blog and Junk Charts’ sister blog, Numbers Rule Your World. Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Pizza Girl, a pizza delivery girl who is a regular contributor on Slice, tallied up and analyzed the time she spends on various duties in her pizzeria.  This is just the first part in a series, but so far she determined that she spends 67% of her shift driving.

According to her pay schedule, she makes less money while driving ($4.95/hr) than she does while in the pizzeria ($7.50). Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

The New York Times, in what seems like a continuing series on NYC transportation, has an article about a decline in subway ridership.  The article points out declines that were to be expected such as in the financial district or Midtown as well as expected increases like along J, which shares a route with the M and Z which are facing service cuts.  It will be interesting to see how these findings impact the expected service cuts.

Another area with expected results was a massive drop off at the moribund Mets’ stop and a below average drop at the World Champion Yankees stop.  However, the Mets–unlike the Yankees–have a convenient commuter rail stop.  Perhaps that explains the drop more than the team’s performance. Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Slice recently reported that Fark user “Certainly You Jest” tabulated a list of the 25 most mentioned pizzerias.  Naturally, I decided to play with the numbers.  Rather than write up another formal paper, I did some quick ad hoc analysis for posting on this blog and I will skip some of the more technical aspects.

First, I augmented the data with the price of a typical plain pie that could feed two to four people and the pizzeria’s distance from New York City.  Adding the distance meant I had to remove the multi-state chains, like Monical’s, from the data.

While the number of times a pizzeria is mentioned is count data, it doesn’t quite fit a poisson distribution, and the poisson regression didn’t seem to be a good fit.  This makes sense since I have three predictors (distance from New York, price and their interaction).  You can see this in the two histograms below.

  Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

This Thursday, April 8th, I’ll be giving two brief talks (5 to 10 minutes) about statistical methods at the New York R User Meetup.  The first will be applying multilevel models to World Health Organization data to study noncommunicable diseases.  The second, and probably more fun, will be a presentation of my pizza paper (pdf) that was featured on Slice.

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

I just filled out my Census form and I have to say it was fairly painless and simple.  The short form (pdf) really only asks about age, ethnicity and other residences.  If anyone has a long form (now called the American Community Survey), please let me know your experiences filling that out.

The question concerning residence can be a bit tricky these days with so many people having multiple residences, children who live on their own but visit home frequently and couples who live togetherbut also maintain separate residences.

Continue reading

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.