While playing Words with Friends my randomly chosen opponent played “radiale” as her first word.  Since that used up all of her tiles, she received a bonus on top of all the points the word itself got, resulting in a one-move score of 53 points!  Rather than being impressed I was upset at the large deficit I would have to overcome.

To combat this I did what comes naturally:  Write an R script to find the perfect word!

Needing to combine my seven letters with one of her letters there were two routes I could take.  The first would be for each combination of my seven letters and one of hers, find all 40,320 (8!) permutations then hit dictionary.com to see if it is a real word for a total of 282,240 (8!*7) http calls.  That seemed a bit excessive and impractical so I moved on to the next idea.

So, first thing I did was pull a list of common eight-letter words. Then for each combination of my letters and one of hers (only 7 iterations) I checked if those letters (in any order) matched the letters in any of the possible words.  Once a match was found there was a check for the counts of the letters and if that passed then the word was recorded as a true match.

The algorithm took about 17 seconds to run and found me one possible word for my letters combined with one of hers:  “headrace”, for 63 points!  Perhaps I should have been able to figure that out on my own, but where would be the fun in that.  Find the code after the break.

require(RCurl)
require(XML)
require(plyr)

myLetters <- "ercaehd"
theirLetters <- "radiale"

wordSite <- getURL("http://www.poslarchive.com/math/scrabble/lists/common-8.html")  # get page with words

# parse out the page
wordsParsed <- htmlTreeParse(wordSite, handlers=list("pre"=function(x, ...){ x }, "p"=function(x, ...){ NULL },
                                                    "head"=function(x, ...){ NULL }), asTree=TRUE)

# get just the list of words
words <- as.character(wordsParsed$children$html[["body"]][["pre"]][["text"]])
words <-gsub("\n", " ", words)  # sub out carriage return characters for spaces

## Turn the block of words into a nice list, each element of which is a vector of characters
words <- strsplit(words, split=" ")[[6]]    # split the long string into a vector seperating by spaces
words <- as.list(words)     # change from vector to list
words <- lapply(words, strsplit, split="")  # split each element of the list into a vector of it's characters

findAllWords <- function(baseLetters, theirWord, listOfWords)
{
    extraLetters <- strsplit(theirWord, split="")[[1]]   # split their word into letters, probably should have removed redundant letters first for a slight efficiency gain

    wordSuccess <- vector("list", length(extraLetters))

    for(a in 1:length(extraLetters))
    {
        seedLetters <- paste(baseLetters, extraLetters[a], sep="")
        #seedLetters <- strsplit(seedLetters, split="")[[1]]

        wordSuccess[[a]] <- checkForWord(words=listOfWords, letters=seedLetters)
    }

    return(wordSuccess)
}

checkForWord <- function(words, letters)
{
    holder <- vector("list", length(words)) # to hold results
    letters <- strsplit(letters, split="")[[1]]  # split up letters
    letterFrame <- data.frame(Letters=letters, stringsAsFactors=FALSE)  # to facilitate counting
    letterCount <- ddply(letterFrame, .(Letters), nrow)   # counts for each letter
    rm(letterFrame); gc()
    names(letterCount)[2] <- "LetterCount"

    ## loop through every word in the list and see if our letters are found in there
    for(i in 1:length(words))
    {
        if(all(letters %in% words[[i]][[1]]))   # if all the letters are found in the word
        {
            # check counts
            wordFrame <- data.frame(Letters=words[[i]][[1]], stringsAsFactors=FALSE)    # put letters from word into DF
            wordCount <- ddply(wordFrame, .(Letters), nrow) # get counts
            rm(wordFrame); gc()
            names(wordCount)[2] <- "WordLetterCount"

            joinedLetters <- join(letterCount, wordCount, by="Letters") # join the tables for a comparison

            holder[[i]] <- all(with(joinedLetters, LetterCount == WordLetterCount)) # holder gets the result of the comparison
        }else
        {
            holder[[i]] <- FALSE
        }
    }

    holder <- laply(holder, function(x) x) # convert to vector

    if(sum(holder) == 0)
    {
        return(sum(holder))
    }else
    {
        return(which(holder == TRUE))
    }
}

system.time(result <- findAllWords(baseLetters=myLetters, theirWord=theirLetters, listOfWords=words))

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Leave a Reply