SciTech

Search engine thrives in Carnegie Mellon's backyard

The next time you need to do an Internet search, clusty.com may become your first choice, thanks to a new feature that other search engines don’t have.

Like its competitor search engines, Clusty searches the Internet using the terms a user supplies.
But unlike its competitors, it returns a set of “clusters” on the left side of the search results. The clusters act as a set of topics associated with the search words.

Now, bloggers and webmasters can bring the convenience of clusters to their own sites via the Clusty Cloud, a tool that can be placed on a web page. The cloud is produced from a query on a relevant subject. For instance, if a blogger writes about robotics, he or she can place a cloud on the blogspace that groups results into categories like “Automation,” “Research,” and “Robotics Society.”

A user can click on these topics for an in-depth look of the results. A search for “Carnegie Mellon,” for example, returns “Science,” “Pittsburgh,” and “Carnegie Mellon School” as the first three clusters.

Users can click on each cluster to choose from even more search categories and narrow the search criteria even further.

The idea is to help users narrow down their search results to relevant categories in order to make browsing the results easier. A search for “Carnegie Mellon” on Google could return 13,300,000 results, but it is rare to find someone who would look through all these results.

“One of the main problems of user search is that they get too much stuff and only look at the first handful of results,” said Raul Valdes-Perez, CEO and co-founder of Vivísimo, Clusty’s parent company.

Based in Squirrel Hill, Vivísimo was one of few local Internet firms in Pittsburgh. That has changed with Google’s decision to open offices at Carnegie Mellon. But how different is Clusty from its new neighbor?

“Search engine user experience has been pretty static for a while,” Valdes-Perez said.
By sorting results in themes, Clusty saves time and “lets users know what is important at the moment. [Otherwise,] they’re really missing other themes. They are missing a lot of stuff that could be of value to them.”

The clusters group results by discerning main themes in the search results using an artificial intelligence-based algorithm. However, the algorithms try to emulate human considerations by constructing a tree of major themes.

A search on Google is also based on an algorithm, but the algorithm ranks search results based on the number of websites that link to a specific website and cross-references that information with text matching of the search terms to provide the most relevant results.

Google’s current search development is personalized search, a method that ranks search results based on a user’s search history.

Valdes-Perez contends Clusty would be just as accurate, but better. “The same content and same pages you would find on Yahoo! or Google you would find on Clusty,” he said. “But we wanted to show the world there’s a different user experience for searching the Web.”

Will Clusty provide the next mainstream advance in searching and become “the approach you will see everywhere”? Valdes-Perez thinks it has a good chance, even though the search engine is only two years old. He regards personalized search as “a dead end.”

As a meta-search engine, Clusty uses its own search engines to crawl the Internet but also queries search engines such as Ask, MSN, Gigablast, and Wisenut to get the best results.

“The four engines give the same results, but the order is different,” said Valdes-Perez. “A meta-search engine dampens out the noise.”

The idea was first developed when Valdes-Perez was still a graduate student at Carnegie Mellon.
Along with Jame Pesenti and Christopher Palmer, Valdes-Perez created the site cluster.cs.cmu.edu, an earlier version of Clusty.

In 2000, the team co-founded Vivísimo to sell search software to governments and large companies for their websites and for internal use.

In fact, Vivísimo’s biggest customer base is the government, using the Vivísimo Velocity Search Platform to develop sites such as militaryhomefront.dod.mil.

Vivísimo software is currently used to help power the directories that military service members and their families use to obtain information about programs and services.

Although Valdes-Perez says Vivísimo’s main focus is business search while Google deals often with advertising on the Web, he admits the two companies do compete.

“Google does have a program they sell to businesses,” he said. However, he was comfortable with the idea of Google being in Pittsburgh.

“The more operations in technology there are [in Pittsburgh] the more that benefits us.”
As a relatively small company, Vivísimo is looking for people who like a company in its growing stages, especially those who want to stay in Pittsburgh.

“It’s interesting working in a company like this instead of researching,” said Valdes-Perez. “As a researcher you write papers and you impact 10, 20, maybe hundreds of people. But this, you reach millions of people. It’s thrilling and a lot more satisfying than writing a paper for dozens of people.”

For those who are looking for an alternative to Carnegie Mellon’s current website search, Valdes-Perez said the company “would be thrilled to work with Carnegie Mellon University and give it the best search engine of all universities.”