A great way to visualize the results of a regression is to use a Coefficient Plot like the one to the right. I’ve seen people on Twitter asking how to build this and there has been an option available using Andy Gelman’s coefplot() in the arm package. Not knowing this I built my own (as seen in this post about taste testing tomatoes) and they both suffered the same problems:. Long coefficient names often got cut off by the left margin of the graph and the name of the variable was appended to all the levels of a factor. One big difference between his and mine is that his does not include the Intercept by default. Mine includes the intercept with the option of excluding it.
I managed to solve the latter problem pretty quickly using some regular expressions. Now the levels of factors are displayed alone, without being prepended by the factor name. As for the former, I fixed that yesterday by taking advantage of ggplot by Hadley Wickham which deals with the margins better than I do.
Both of these changes made for a vast improvement over what I had avialable before. Future improvements will address the sorting of the coefficients displayed and allow users to choose their own display names for the coefficients.
The function is in this file and is called plotCoef() and is very customizable, down to the color and line thickness. I kept my old version, plotCoefBase(), in the file in case some people are adverse to using ggplot, though no one should be. I sent the code to Dr. Gelman to hopefully be incorporated into his function which I’m sure gets used by a lot more people than mine will. Examples of my old version and of Dr. Gelman’s are after the break.
|Using Base Graphics||Andy Gelman’s coefplot|
Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.