CMU researchers develop web tool to analyze big data

Credit: Saman Amirpour Amraii Credit: Saman Amirpour Amraii

Two researchers in Carnegie Mellon’s Robotics Institute have developed Explorable Visual Analytics (EVA) to change the way people visualize and find patterns in large data sets, as well as share their findings.

Saman Amirpour Amraii, a senior system/software engineer in Carnegie Mellon’s Robotics Institute, and Amir Yahyavi, a post-doctoral researcher in Carnegie Mellon’s Robotics Institute, both of the CREATE Lab in Carnegie Mellon’s Robotics Institute, created EVA, an online platform for “visualizing and analyzing large and high-dimensional data that allows users to intuitively navigate terabytes of data consisting of hundreds of dimensions,” Amraii said. “Users can build simple geographical representations or complex, abstract, 5-dimensional projections of the data. They can also bookmark their favorite views and share the entire exploration and discovery process via a simple link.”

Nowadays, our world is facing a big data challenge. We collect data from a multitude of sources — from cell phone screens to aerospace monitors, from demography to the retail industry. In a world crowded with information, people are forced to find a way to organize and analyze this data in a relatively short time period.

Currently, the most used software for data organization and analysis is Excel, which was designed in a time when datasets were small and easily stored on personal computers. Nowadays, even modern personal computers cannot store large datasets such as those from the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) program, which takes up millions of rows and approximately hundreds of gigabytes.

In order to process data of this sort in a traditional manner, users would need a cluster or super computer. Moreover, they would need the programming skills to successfully handle these datasets and a large amount of time to process everything, which is unrealistic and inefficient. The development of EVA, however, attempts to bridge these gaps by providing users with an online platform that allows for a new method of data visualization and analysis that can handle large datasets without these traditional setbacks.

Amraii describes EVA as a powerful tool that makes the analysis of huge datasets as easy as the analysis of small sets using Excel. The main benefit of EVA lies in the fact that it’s a web-based tool, meaning that the bulk of the data can be stored in an external network, with users only downloading the portion of the data that is being analyzed.

In this way, EVA “give[s] users the illusion they are working with all of a massive dataset while actually sending only a small proportion of the data to the client,” Yahyavi said in a university press release. If data is displayed on a map, for instance, users can zoom in on an area of interest and EVA will only process the data relevant to that particular portion of the map, effectively reducing the large dataset into a smaller, more manipulable set of data. This freedom allows users to “play” with data by quickly adjusting parameters and graphic displays. The researchers noted that speed was the main focus for EVA.

“The system has to be fast,” Amraii said in a university press release. “If it takes a half hour to get an answer to your query, you may forget why you asked in the first place.”

Another benefit of EVA is that it provides more possibilities for data exploration. Users can easily find underlying correlations by incorporating additional parameters such as yearly income or ethnic group. By setting time constraints from several years ago to current time, users can visualize growing or declining trends.

EVA also promotes collaboration by enabling self-explanatory storytelling. Users are able to record their findings by taking a screenshot of the high resolution visual map of the data they produced. Additionally, the bookmark feature allows a user to share a set of their findings with another collaborator via a URL. The collaborator can then trace all the steps the user took before they came to the final result to see what paths the initial user took.

EVA was developed by Amraii and Yahyavi, along with staff members in the CREATE Lab, and took nearly three years to develop. The online platform has a promising future in many fields such as in business, where EVA can help analyze massive amounts of sales information. EVA will allow users to upload numeric big data sets, analyze them, and tell their unique story to the world. EVA will expand our collective ability to utilize big data by making it easy for everyone, with or without computer science and statistics knowledge, to effectively analyze big data.