Analyst uses Twitter to predict outcome of football games

Claire Gianakas Sep 29, 2013

Twitter, created in 2006, serves its purpose as a worldwide social networking phenomenon — but it’s also a useful resource for predicting the outcomes of events such as elections or movie premieres. Recent findings by researchers at Carnegie Mellon show that Twitter analysis may also be able to help gamblers beat the spread on NFL games.

Christopher Dyer, assistant professor in Carnegie Mellon’s Language Technologies Institute, led a research team in gathering tweets every 15 minutes during football season, then distinguishing tweets that included hashtags associated with NFL teams. The goal of the research was to determine a correlation between tweets about a game and the outcome — specifically, which team will win and which team will beat the spread.

They found that, while Twitter data did not help determine which team would win a game, it was about 56 percent accurate with regards to beating the spread. They also discovered that increases in volume of tweets compared to a team’s baseline number of tweets corresponded with a loss. “It looks like people are tweeting when they are anxious,” Dyer said. “There is some kind of emotional thing going on.”

According to Dyer, sports betting works well with machine learning. In sports betting, bookies give a handicap to the team expected to lose, which is added onto that team’s score after the game’s completion. The bookies generally attempt to create the handicap so that the chances for each team to win will be 50 percent, ensuring that they receive approximately the same number of bets for each team. If the team someone bet on has the higher score after the handicap has been added, they have beaten the spread.

Dyer explained that they assumed many people were betting illogically based on teams they liked, rather than which team was actually more likely to win. Optimistic versus pessimistic fan bases can also influence betting.

“The bookies know the dynamics of the two teams, so they are going to be setting the line in a slightly more favorable way to make money. If we can detect this kind of irrationality from a signal in social media, then maybe we can make money,” Dyer said. “Football is a great sport for this; the teams have national followings and the games are broadcast in national markets.” Football’s large sphere of influence was a major reason the team chose this sport for the research. It was also beneficial that teams only play once a week — this made it easy to determine which games the tweets were referring to.

Dyer also commented on the benefits of using football from a scientific perspective.

“You have a lot of work on Twitter saying we can forecast elections, but elections come along every four years,” he explained. “In football we have 140 games during the course of a season, so we get replicated experiments.”

Although Twitter is constantly developing, the main aspects of each game were the same, which allowed the results to be more scientifically relevant.

Despite promising conclusions, the research had its limitations. “People are talking about games using all kinds of different words. It’s Twitter; people don’t even use typical, nice, edited English. Figuring out how to get a signal out of that was a big challenge in the research,” Dyer said. “We do a really naive categorization of the tweets.” The research didn’t separate words into positive and negative, and tweets weren’t interpreted based on associations with how people speak. Each word in the tweets was treated individually, which caused some of the meaning to be lost. Dyer explains that the tweet “the Steelers beat the Eagles” was exactly the same as “the Eagles beat the Steelers” based on the model.

“We haven’t found a way to beat Vegas,” Dyer said. “We still make a lot of mistakes.”

Despite the challenges, this research provides valuable information for future analysis of Twitter data and has the potential to increase the accuracy with which we predict future events.