SciTech

CMU does research on cloud computing

The era of sifting through encyclopedia-sized books for pieces of information is long gone. Since search engines like Google and Yahoo! have become highly popular, vast stores of information are now at everyone’s fingertips. This concept of accessing data stored in different locations from the comfort of one’s home makes up the basis of a field of research known as “cloud computing.” The concept, as simple as it seems, has a plethora of uses, some of which are being explored here at Carnegie Mellon University.

Randal Bryant, dean of the School of Computer Science, explained that from the research perspective, the large amount of data that becomes available through cloud computing is very important. “We want to use computers that are remote from us but it’s because we need computers that are so much bigger to handle much larger sets of data than we could do with our own machines. We call this data-intensive computing,” Bryant said. He further explained that companies like Google and Yahoo! have already made use of this concept with their search engines; the search engines are capable of scanning large amounts of data and searching for keywords within that data. Carnegie Mellon is undertaking similar projects, but on a smaller scale.

“We’ve been given access to a very large cluster that’s owned and operated by Yahoo! The computer is in Sunnyvale, California, but we’re making use of it remotely. So it’s sort of a cloud,” Bryant said. One of the projects that Carnegie Mellon is doing using this cluster is processing different images. The group has downloaded nearly 6 million images from www.flickr.com, the photo sharing service, and is using these images to find certain characteristics of images that can be used to develop a variety of applications. Another interesting project has scientists in the human-computer interaction department studying how different people collaborate on projects through www.wikipedia.org.
Cloud computing techniques are also being used in the field of language translations. “The way that it’s [language translations] done nowadays is [by developing] a statistical model of the two languages and the relation between them by just scanning through millions of documents, especially ones that you have in both languages,” Bryant said.
By comparing the same texts in different languages, sentences that have the same meaning in both languages could be gathered and this information could be used to translate new documents. In fact, this method of translation has already been applied by Stephan Vogel, a research scientist at the Language Technologies Institute, and two graduate students in the institute, Qin Gao and Roger Hsiao.

Vogel’s team has developed a small handheld device that can translate English to Arabic and Arabic to English, for American soldiers to use in Iraq. Gao explained the basic working of the device, saying, “[When] the soldiers speak to the device, the device recognizes the speech and then translates it to Iraqi, after that it is outputted using speech synthesis.”

Perhaps the most interesting facet of the device is that it does not require any Internet connection to work. “The algorithm does not need the Internet connection,” Gao said. “It just needs a very compact knowledge base inside the device’s memory.”

Gao explained this technology by providing the analogy of a student going to a library and reading from different books to learn about a topic. After reading the books, the student can retain the information and no longer needs the original books from which the information was obtained. The device works in the same way. Initially, to build the memory in the device, thousands of documents in both languages have to be scanned. The information regarding the two languages obtained from these documents is then stored in the device. However, Gao explained that scanning so many documents using just one server may take months. This is where cloud computing comes into play. Using cloud computing techniques, by which connected computers can analyze large amounts of data, speeds up this initial process tremendously.

“Cloud computing helped us build lots of models quickly; currently we can build a model in two hours,” Gao said. The team is also building similar translators for different languages like Chinese and Spanish.

These are just a couple of projects underway at Carnegie Mellon right now, but the number of projects in this field going on at the university is still on the rise. According to the Pittsburgh Tribune-Review, Carnegie Mellon earned a $350,000 grant in October to create a cloud of computers. The popularity of this field is also evident from the fact that a class in the computer science department has students work with a cluster facility operated by Google and operate a number of projects with this facility. The popularity of cloud computing stems from the fact that its techniques affect a variety of fields and not just the field of computer science. As Bryant said, “It [applications of cloud computing] really stretches from [the] pure business world to social networks, to scientific research, to everything.”