Big Data versus Game of Thrones: who will perish next?
A group of self-professed machine learning nerds perform some algorithmic voodoo to predict the future in the popular HBO series
While it’s simply an educated guess, the tool suggests that boy-king Tommen Baratheon should watch his step.
The students extracted information culled from a popular web cache of fan-developed data covering the more than 2,000 characters in the saga, developing a data set that uses 24 different features to describe a character.
Then the group applied machine learning theories to statistically compare features of dead and alive characters and select features that are most relevant for distinguishing between them.
For feature selection, the students used the RELIEF function with its default parameters of the WEKA workbench, which resulted in measuring features such as Character’s appearance in the book; the house to which a character belongs; the character’s social group; when they appear in the books, the characters nobility; whether they are male or female; and many more.
They also used John Platt’s sequential minimal optimization algorithm for training a support vector machine with the polynomial kernel, which is provided in WEKA. They split the data set into 10 equally-sized subsets and trained the model on nine subsets and tested on the remaining one, using a procedure called a 10-fold cross-validation.