Knowledge Handler: January 2005

Metasearch

Metasearch resources such as www.metacrawler.com, www.ixquick.com and www.dogpile.com present a user's query to a number of search engines simultaneously and provide the user with the compiled results. Metacrawler at this time searches Google, Yahoo, Ask Jeeves, About, LookSmart, Overture and FindWhat to provide its results, and lists which searched tool has web pages containing the sought word or phrase.

The weakness of metasearch is that since each search engine has unique commands not used by the metasearch tool, the search results cannot be winnowed as much as if each unique search engine itself was queried. However, they allow the searcher to quickly see which search tools have significant occurances of a word or phrase. Therefore, use metasearch tools after using directories, or after a couple of search engine searchs have been unsatisfactory.

- DD

Search Engine and Directory Examples

The University Libraries, University of Albany (New York) have created a nifty web page http://library.albany.edu/internet/choose.html listing the search strategies supported by common search engines and directories. Other good sourcesof information on search engines and directories include http://searchenginewatch.com/facts/and http://www.searchengineshowdown.com/reviews/.

None of these sites have comparison information that is updated regularly - the information I found was one to four years old! Consequently one has to check the article index on searchenginewatch.com for specifications of new tools or scan that source for news articles.

For example, today I stumbled upon a Chinese-funded search engine http://www.accoona.com/ that started December 7, 2004.
The only independent review I found of this new tool was a Searchenginewatch article, which panned it for having a small set of results. Though when I sought data on companies (especially Chinese firms) I found its "Business" search results were unique in a free tool, providing contact information, sales volume, and the number of employees.
-DD

Scout Project - A Directory

The Scout Archives http://scout.wisc.edu/Archives/index.php is a directory of websites that have been reviewed by subject experts and classified by subject, using the Library of Congress subject headings. The compilation is organized by the University of Wisconsin, with funding from the Andrew Mellon Foundation and the National Science Foundation.

The quality of the indexed resources appears to be top-notch, and using it I found websites such as http://www.lostindiana.net which I had unsuccessfully sought with Google (I was looking for "forgotten Indiana" and could not find the relevant site).

The weakness of having humans compile subject directories is that coverage is very selective, and this tool today has less then 17,000 webpages indexed, compared with Google's claimed 4 billion. And though the project maintains a current blog listing newly added resources, the humans on the project are apparently not taking the time to run a "link checker" to verify that websites have remainded at the same URL over time - when I sought for entries on the subject of "Indiana", over half of the results, when followed, resulted in a "404 error", meaning that the website was not able to be found at the cited address.

-DD

Search Engines

Search engines are automated systems that record and index text (keywords) from each webpage they scan. The search engine indexing software typically notes the links in a page, checks to see if the links can be tracked to unindexed pages, and moves on to index these pages and continue this process automatically. Because search engines are automated, they can index billions of web pages (as www.google.com does ), as compared with manually indexed directories, which rarely reach a million indexed pages.

While search engines provide the most comprehensive indexing of the Web, the keyword searching they offer often makes it difficult to retrieve information.

Another weakness is that many search engines only index the title, URL, perhaps any metadata coded by the web page developer, and a set number of words from a document. This limit on the number of words indexed per page often results in long documents being incompletely indexed.

-DD

Directories

The earliest search tools were human-created indexes, called directories. Each website or resource in a directory has been chosen by a person for inclusion, and the resources are arranged by topic, like the entries in the yellow pages of the phone book.

Examples of online directories range from simple listings like Indiana Wesleyan University's www.indwes.edu/ocls/reference to the database-driven Open Directory Project http://dmoz.org. Some directories give credit to the person organizing the material on a subject, so that the user can judge their authority - these include www.about.com and JoeAnt www.joeant.com.

Directories are most useful for researching topics of such general interest that the subject used as a category. For example, when using the Yellow Pages, once one finds the topic "Automobile", one can choose from a number or dealers and repair centers, all of which are highly relevant to automobile owners.

Directory usage can be frustrating if one is using the directory for an obscure term that is not used as a catagory (imagine searching the Yellow Pages for "crossbows") or if one cannot think of the term the directory makers used for the subject [for example, there is no entry in most Yellow Pages for the term "Cars."] Well-done directories offer "see" references to guide the searcher: for example, in the Yellow Pages, if one searches for "Car Dealers" one finds a note that says "Car Dealers see Automobile Dealers - New Cars; also Automobile Dealers - Used Cars".

_-DD

Knowledge Handler

About Me

Tuesday, January 18, 2005

Metasearch

Sunday, January 09, 2005

Search Engine and Directory Examples

Friday, January 07, 2005

Scout Project - A Directory

Wednesday, January 05, 2005

Search Engines

Saturday, January 01, 2005

Directories