leaflet « Jared Lander

So far this year I have logged many miles in the air and on the rails. In between trips to Minneapolis and Boston I spent about a month traveling through India and Southeast Asia, mainly to conduct R courses in Singapore and Kuala Lumpur for the likes of Intel, Micron, Celcom, Maxis, DBS and other similar companies. The training courses were organized through Revolution Analytics’ Singapore office. Given the success of the classes, there will be more opportunities this spring or summer in Singapore, Kuala Lumpur and also in Australia.

Quite a lot of material was covered based on the offerings of my company, Lander Analytics and the content of my R for Everyone.

Day 1 – Basics

Getting and installing R
The RStudio Environment
The basics of R
- Variables
- Data Types
- Reading data
- Calling functions
- Missing Data
Basic Math
Advanced Data Structures
- data.frames
- lists
- matrices
- arrays
Reading Data into R
- read.table
- RODBC
- Binary data
Matrix Calculations
Data Munging
- Base R
- plyr
- reshape2
Writing functions
Conditionals
Loops
String manipulation and regular expressions
Visualization
- Base R
- ggplot2

Day 2 – Modeling

Basic Statistics
- Probability Distributions
- Averages, standard deviations and correlations
- t-test
Linear Models
- Simple linear regression
- Multiple Regression
Generalized Linear Models
- Logistic Regression
- Poisson Regression
Survival Analysis
Assessing Model Quality
- MSE
- AIC
- BIC
- Residual Analysis
Time Series
Variable Selection

Day 3 – Machine Learning

Variable selection for high dimensional data with glmnet
Reduce uncertainty with weakly informative priors and Bayesian regression
K-Means clustering
Hierarchical clustering
Multidimensional scaling
Decision Trees for classification
Random Forests for ensembling decision trees
Bootstrap for measuring uncertainty
Cross validation for model assessment
Support Vector Machines
Neural Networks

Day 4 – Data Presentation and Portability

Reproducible reports using knitr
Basic Introduction to Markdown
Using knitr to automatically generate reports with embedded analytics
Using Markdown and knitr to automatically generate websites with embedded analytics
Using Markdown and knitr to make HTML5 slideshows with embedded analytics
Advanced plotting
Building R Packages
Shiny Overview

Day 5 – High Performance Computing with R

Benchmarking code using microbenchmark
The different speeds of various aggregation functions
- aggregate
- tapply
- plyr
- data.table
Fast manipulation using dplyr
Running dplyr commands in a database
Parallel Code
- foreach
- doParallel
- plyr
Integrating C++
- Rcpp

Given my extensive time abroad I thought it would be good to look at it all on a map using the Leaflet package in R.

Using the Google Maps API we can look up the latitude and longitude of the visited cities.

library(XML)
library(plyr)

cities <- c('Hong Kong', 'Haripal, India', 'Kolkata, India', 'Jaipur, India', 'Agra, India', 'Delhi, India', 
            'Singapore', 'Kuala Lumpur, Malaysia', 'Geroge Town, Malaysia')
lat.long <- function(place)
{
    theURL <- sprintf('http://maps.google.com/maps/api/geocode/xml?sensor=false&address=%s', place)
    doc <- xmlToList(theURL)
    data.frame(Place=place, Latitude=as.numeric(doc$result$geometry$location$lat), Longitude=as.numeric(doc$result$geometry$location$lng), stringsAsFactors=FALSE)
}

places <- adply(cities, 1, lat.long)

knitr::kable(places[, -1], digits=3, row.names=FALSE)

Place	Latitude	Longitude
Hong Kong	22.396	114.109
Haripal, India	22.817	88.105
Kolkata, India	22.573	88.364
Jaipur, India	26.912	75.787
Agra, India	27.177	78.008
Delhi, India	28.614	77.209
Singapore	1.352	103.820
Kuala Lumpur, Malaysia	3.139	101.687
Geroge Town, Malaysia	5.415	100.330

Now that we have the coordinates we use Leaflet to plot them.

library(leaflet)
leaflet(data=places) %>% addTiles() %>% setView(90, 15, zoom=4) %>% addPopups(lng=~Longitude, lat=~Latitude, popup=~Place) %>% addPolylines(~Longitude, ~Latitude, data=places[c(1, 3, 2:9, 1), ]) %>% addMarkers(lng=~Longitude, lat=~Latitude, popup=~Place, icon=JS("L.icon({iconUrl: 'https://www.jaredlander.com/images/jaredlanderfavicon.png', iconSize: [20, 20]})"))

Calculating all the miles traveled could be as simple as looking it up on TripIt, or we could do some quick Haversine distance calculations with the geosphere package.

First, we get the coordinates for New York, Minneapolis and Boston to have a complete picture of the distance.

newCities <- adply(c('New York, NY', 'Minneapolis, MN', 'Boston, MA'), 1, lat.long)
allPlaces <- rbind(newCities[c(1, 2, 1), ], places[c(1, 3, 2:9, 1), ], newCities[c(1, 3, 1), ])

Then in order to use distHaversine we need to set up a to and from relationship between the places. The easiest way will be to just shift the columns.

library(useful)

## Loading required package: ggplot2

shiftedPlaces <- shift.column(data=allPlaces, columns=names(places)[-1], newNames=c('To', 'Lat2', 'Long2'))

Now we can calculate the distance. This assumes that all trips followed a great circle, which might not be the case, especially for the car and rail portions of the trip.

library(geosphere)

## Loading required package: sp

shiftedPlaces$Distance <- distHaversine(shiftedPlaces[, c("Longitude", "Latitude")], shiftedPlaces[, c("Long2", "Lat2")], r=3959)

In total this led to 25,727 miles traveled.

knitr::kable(shiftedPlaces[, -1], digits=c(1, 3, 3, 1, 3, 3, 0), row.names=FALSE)

Place	Latitude	Longitude	To	Lat2	Long2	Distance
New York, NY	40.713	-74.006	Minneapolis, MN	44.978	-93.265	1016
Minneapolis, MN	44.978	-93.265	New York, NY	40.713	-74.006	1016
New York, NY	40.713	-74.006	Hong Kong	22.396	114.109	8046
Hong Kong	22.396	114.109	Kolkata, India	22.573	88.364	1642
Kolkata, India	22.573	88.364	Haripal, India	22.817	88.105	24
Haripal, India	22.817	88.105	Kolkata, India	22.573	88.364	24
Kolkata, India	22.573	88.364	Jaipur, India	26.912	75.787	844
Jaipur, India	26.912	75.787	Agra, India	27.177	78.008	138
Agra, India	27.177	78.008	Delhi, India	28.614	77.209	111
Delhi, India	28.614	77.209	Singapore	1.352	103.820	2574
Singapore	1.352	103.820	Kuala Lumpur, Malaysia	3.139	101.687	192
Kuala Lumpur, Malaysia	3.139	101.687	Geroge Town, Malaysia	5.415	100.330	183
Geroge Town, Malaysia	5.415	100.330	Hong Kong	22.396	114.109	1491
Hong Kong	22.396	114.109	New York, NY	40.713	-74.006	8046
New York, NY	40.713	-74.006	Boston, MA	42.360	-71.059	190
Boston, MA	42.360	-71.059	New York, NY	40.713	-74.006	190

leaflet(data=allPlaces) %>% addTiles() %>% setView(80, 20, zoom = 3) %>% addPolylines(~Longitude, ~Latitude) %>% addMarkers(lng=~Longitude, lat=~Latitude, popup=~Place, icon=JS("L.icon({
    iconUrl: 'https://www.jaredlander.com/images/jaredlanderfavicon.png', iconSize: [20, 20]})"))

Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Jared Lander

Tag Archives: leaflet

Teaching R in Asia

Day 1 – Basics

Day 2 – Modeling

Day 3 – Machine Learning

Day 4 – Data Presentation and Portability

Day 5 – High Performance Computing with R

Related Posts