How Things Work: Search engines
Every day, millions of individuals and organizations worldwide browse the Internet to carry out all sorts of research and analyses.
One of the great features of researching online is the vast amount of information available to the users. From recipes to TV listings to items for sale, virtually anything can be found in the vast realm of the Internet. However, without knowing the URL, finding what you need may prove to be rather cumbersome. This is where search engines come in.
Search engines are interesting pieces of technology, as they are radically different from regular websites.
Instead of storing the information by itself, a search engine categorizes information by words, linking websites that relate to these words. This technology is improved every year, allowing the big search engine names such as Google, Yahoo, and MSN to be able to pull up hundreds of millions of web pages in just a fraction of a second.
A search engine collects large amounts of data through a multi-faceted technology called web crawling. Web crawling involves an automated piece of software known as the spider. The spider goes to the root of a site and indexes keywords, images, or phrases that it collects by crawling through the site itself, then creates a link that the engine can use to relay the site address to the user when they search.
This is a very simplified reflection upon how this software works, as it is incredibly complicated and often involves several different approaches.
Two ways that companies use spiders can be seen with the Google and AltaVista search engines. Google, when using a spider to find information, looks at the source page of the specific website. This allows Google to not only extract information off of the site, but also to store this site for the use of their users.
AltaVista uses a much more generic approach of simply scanning all of the words on a given site and indexing them as fast as possible and as appropriately.
To “index” is to take the information gathered off of a website (generally through the spider) and store it so that when the user searches the data it can be accessed efficiently. This is probably the most important part of the process, simply because the speed in which the user can pull data out of the server that stores it is rather quick.
Generally, when a site is indexed, the spider picks out some of the main words or phrases of a given website and places them first on the priority list, making them more likely to be found when users search for specific items.
For example, if a spider is indexing a website that gives instructions on making a “delicious pumpkin pie,” it is more likely that the phrase “pumpkin pie” will show up, rather than “delicious” or “sugar.”
During this part of the process a search engine looks inside the code of the website itself for items called meta tags. These allow web developers to add keywords to their websites that search engines will pick up.
This speeds up the indexing process and makes it more efficient, as the authors of the website are more likely to put relevant keywords in the meta tags to attract users to the site through the search engines.
Also, the meta tags make it easier for the search engine to identify which definition of a specific word is being looked up, depending on the contexts of the various web pages. This is especially helpful when a particular word has more than one meaning, and thus, allows the search engine to provide more accurate results.
The last and most obvious step in the process is that of the searching itself. When you prompt the engine, or ask for something specific and hit “search,” the engine will quickly pull the information out of its index by looking at the keywords given by the user and relating it to the keywords located in the index.
Generally, a search engine is only as good as the relevance of the results generated by said search. In terms of the “pumpkin pie” analogy, if the user were to search for the words “pumpkin” and “pie,” the search engine would begin to retrieve information on how to bake apple and cherry pies, in addition to pumpkin pie. However, before the results go to the user, the search engine filters out the search results so that only the pumpkin-related pages remained.
Search engines are powerful instruments of technology, and through their web crawling, indexing, and searching they allow users to get any information they want in a very fast manner.
Some of the biggest companies in the world began and are still primarily search engine companies, simply due to the amount of revenue they can generate by putting ads on their website with the amount of people that visit them. They are a main focal point on how information is exchanged on the Internet, not to mention one of its most popular features.