While playing Words with Friends my randomly chosen opponent played “radiale” as her first word. Since that used up all of her tiles, she received a bonus on top of all the points the word itself got, resulting in a one-move score of 53 points! Rather than being impressed I was upset at the large deficit I would have to overcome.
To combat this I did what comes naturally: Write an R script to find the perfect word!
Needing to combine my seven letters with one of her letters there were two routes I could take. The first would be for each combination of my seven letters and one of hers, find all 40,320 (8!) permutations then hit dictionary.com to see if it is a real word for a total of 282,240 (8!*7) http calls. That seemed a bit excessive and impractical so I moved on to the next idea.
So, first thing I did was pull a list of common eight-letter words. Then for each combination of my letters and one of hers (only 7 iterations) I checked if those letters (in any order) matched the letters in any of the possible words. Once a match was found there was a check for the counts of the letters and if that passed then the word was recorded as a true match.
The algorithm took about 17 seconds to run and found me one possible word for my letters combined with one of hers: “headrace”, for 63 points! Perhaps I should have been able to figure that out on my own, but where would be the fun in that. Find the code after the break.
require(RCurl)
require(XML)
require(plyr)
myLetters <- "ercaehd"
theirLetters <- "radiale"
wordSite <- getURL("http://www.poslarchive.com/math/scrabble/lists/common-8.html") # get page with words
# parse out the page
wordsParsed <- htmlTreeParse(wordSite, handlers=list("pre"=function(x, ...){ x }, "p"=function(x, ...){ NULL },
"head"=function(x, ...){ NULL }), asTree=TRUE)
# get just the list of words
words <- as.character(wordsParsed$children$html[["body"]][["pre"]][["text"]])
words <-gsub("\n", " ", words) # sub out carriage return characters for spaces
## Turn the block of words into a nice list, each element of which is a vector of characters
words <- strsplit(words, split=" ")[[6]] # split the long string into a vector seperating by spaces
words <- as.list(words) # change from vector to list
words <- lapply(words, strsplit, split="") # split each element of the list into a vector of it's characters
findAllWords <- function(baseLetters, theirWord, listOfWords)
{
extraLetters <- strsplit(theirWord, split="")[[1]] # split their word into letters, probably should have removed redundant letters first for a slight efficiency gain
wordSuccess <- vector("list", length(extraLetters))
for(a in 1:length(extraLetters))
{
seedLetters <- paste(baseLetters, extraLetters[a], sep="")
#seedLetters <- strsplit(seedLetters, split="")[[1]]
wordSuccess[[a]] <- checkForWord(words=listOfWords, letters=seedLetters)
}
return(wordSuccess)
}
checkForWord <- function(words, letters)
{
holder <- vector("list", length(words)) # to hold results
letters <- strsplit(letters, split="")[[1]] # split up letters
letterFrame <- data.frame(Letters=letters, stringsAsFactors=FALSE) # to facilitate counting
letterCount <- ddply(letterFrame, .(Letters), nrow) # counts for each letter
rm(letterFrame); gc()
names(letterCount)[2] <- "LetterCount"
## loop through every word in the list and see if our letters are found in there
for(i in 1:length(words))
{
if(all(letters %in% words[[i]][[1]])) # if all the letters are found in the word
{
# check counts
wordFrame <- data.frame(Letters=words[[i]][[1]], stringsAsFactors=FALSE) # put letters from word into DF
wordCount <- ddply(wordFrame, .(Letters), nrow) # get counts
rm(wordFrame); gc()
names(wordCount)[2] <- "WordLetterCount"
joinedLetters <- join(letterCount, wordCount, by="Letters") # join the tables for a comparison
holder[[i]] <- all(with(joinedLetters, LetterCount == WordLetterCount)) # holder gets the result of the comparison
}else
{
holder[[i]] <- FALSE
}
}
holder <- laply(holder, function(x) x) # convert to vector
if(sum(holder) == 0)
{
return(sum(holder))
}else
{
return(which(holder == TRUE))
}
}
system.time(result <- findAllWords(baseLetters=myLetters, theirWord=theirLetters, listOfWords=words))
Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.
Leave a Reply