Tool developed to help with information overload

Daniel Tkacik Jan 24, 2011

The U.S. mortgage crisis, the O.J. Simpson trial, the Florida recount of the 2000 presidential election, and the Monica Lewinsky controversy; these stories were big issues during their time, resulting in hundreds — sometimes thousands — of news articles being written about them. In this age of the Internet, with so many options for news sources and types of media, this so-called “information overload” can cause us to miss the big picture on various important issues. So how can we keep up? How can we sift through so many articles to retrieve the most useful information? Researchers at Carnegie Mellon are in the process of developing a tool to help us understand the big picture.

Graduate student Dafna Shahaf and her faculty adviser Carlos Guestrin in the computer science department have developed a model that can make connections between various news stories in an effort to communicate the big-picture meaning of certain issues. Guestrin draws analogies between his research and issue maps, which graphically show the deep structures of different issues and the connections between the various subparts of each issue. “The goal here is to mathematically construct an issue map for any story,” Guestrin explained. In creating this virtual issue map, one can optimize the massive amounts of information about an issue or subject to help users retrieve the most useful information.

The model, put simply, works via the following process. The user supplies the model with two news articles, and based on the words or certain phrases used in the articles, connections are formed between the two. These connections take the form of other articles. For example, the model can form a chain of articles starting with the mortgage crisis and ending with the ongoing debate over health care. However, if the user finds the resulting chain of articles unsatisfactory, he or she may make refinements to the chain, such as removing an article that does not seem to fit or adding an article that may make the connection smoother or more coherent. Additionally, users may make changes to what the connections focus on. For example, instead of focusing on DNA evidence during the O.J. Simpson trial, users may choose to focus on the racial aspects of the trial.

The results from Shahaf and Guestrin’s study with this model showed that users of the model seemed to better understand big-picture issues after being exposed to the chain of articles than before. Users were also presented with article chains produced in several different ways and asked to grade the chains in terms of relevance, coherence, and redundancy. As expected, the chains that were given the highest grade by users also turned out to be the most effective in increasing the users’ familiarity with the subjects of the news articles. The results were written up and received “Best Research Paper” honors at the 16th Association of Computing Machinery Conference on Knowledge, Discovery, and Data Mining in Washington, D.C. last July.

Guestrin explained that the idea of “connecting the dots” is the first step in understanding basic information. Such a tool does not exist for public users at the moment, though that is the goal in mind for the future. Shahaf said that her model has even helped her understand big-picture issues. “I finally understand the Greek credit crisis in Europe,” she said.

This technique of connecting the dots, though applied here to news articles, can be expanded to many arenas of life, helping us to understand the direct and indirect effects of our behavior and decisions that we make. “I think if we’re better able to find information that we trust [and] understand the big picture, we can make better personal choices,” Guestrin explained. “I would like to empower the individual to really understand information, and I think we can do that.”