Web Searching

Laura E. Ray, MA, MLS
Educational Programing Librarian
Cleveland-Marshall College of Law

September 2004

When using research methodologies, the WorldWideWeb may be viewed as one electronic information medium to be considered with print, audiovisual, and other electronic information media.  In addition, as in other information media, it can be difficult to find the information you want.  This "Web Searching" guide will help you to be a better Web researcher, as well as help you to better evaluate Web sites you find in your research. 

Web Site Evaluation
Web Searching - General Issues
Web Searching Principles and Guidelines
Web Search Directories
Web Search Engines
Web Metasearch Engines
Invisible Web
Additional Resources
 
 

Web Site Evaluation

A WorldWideWeb "site" is composed of many "pages."  Current size estimates of the Web put it at 3 billion pages on 20 million sites.  [For statistics, news, and research information on the Web, consult ClickZ]

Maybe you found a Web site that seems to have information you want.  How can you know if the Web site is reliable?  You can probably trust sites from established organizations, but what about Web sites of organizations you know nothing about?
 

Web Site Home Page and Site Index

When evaluating a Web site, go to its "home page," or opening page of the site, and look for these key pieces of information:

If the home page doesn't address this information, look for a link to the "site index" or "site map."  As its name implies, the site index is like the index of a book.  Examining the site index should help you decide if the Web site has the information you want.
 

Web Site Evaluation Criteria

Beyond the home page and site index, consider the following criteria when evaluating a Web site:


Web Site Reviews

Internet Scout Project
    Produces "The Scout Report," which announces and reviews Web sites and mailing lists.
    "Table of Contents" for current Report.
    "Archives" contain approximately 17,000 reports.
             At "Archives" page, also note ability to browse for reviews by Library of Congress Subject Headings.
                    [eg, "Law - United States - cases"]
    "Archives" are searchable; "Advanced Search" also available.

 

 

Web Searching - General Issues

The Web offers a variety of good search services.  Many provide "value-added" services, such as customized display of search results.  However, it is important to remember that economic forces are quite active on the Web, and search services do not reach all Web-based information.  [Web search services index an estimated 20-50% of Web.  See Invisible Web as well as page 53 of  The Invisible Web: Uncovering Information Sources Search Engines Can't See available from the Cleveland-Marshall College of Law Library.]
 

Web Search Service Trends

Over the last decade, several trends have emerged in Web search service operations:


Web Search Service Problems

Despite their wonderful capability of finding information, there can be problems with the information from Web search services.  Web Search Directories cover less of the Web than Engines, but tend to have fewer problems because human beings compile verified information about Web sites.  Web Search Engines cover more of the Web than Directories, but they electronically compile information that may or may not be verified.  All Web search services can be subject to the following problems:


Web Search Service Evaluation

Search Engine Showdown
    Search service review Web site produced by Greg R. Notess.
    Provides reviews, analyses, and comparative information.

Search Engine Watch
    Search service review Web site created by Danny Sullivan, who continues to edit the site for Jupitermedia Corporation.
    Provides statistics, reviews, and comparative testing.
    Includes free daily "SearchDay" and monthly "Search Engine Report" newsletters.

In addition to consulting the above review sites, consider the following criteria when deciding whether a Web search service meets your needs:

 
 
 

Web Searching Principles and Guidelines
 

Key Principles


General Guidelines

Web Search Directories

Web Search Directories are created by human beings who identify Web sites and list them in a subject classification.
You can browse or search a Web Search Directory.
Usually the main page of a Web sites, rather than multiple pages of that site, is listed in search results.
Annotations and evaluations of Web sites are often included in search results.

Open Directory Project
    Developed and maintained by 30,000 volunteer editors.
    Netscape owns the copyright to ODP's compilations, but freely grants license to them.
    Thus, ODP is used by many search engines as their subject directories.

    Check "Law" sub-category under "Society" category.

Yahoo!
    Founded by David Filo and Jerry Yang in 1994; now a corporation based in Sunnyvale, CA.
    Charges "commercial" site annual fee to be listed.
    Recently acquired search engines "All the Web" and "AltaVista" and started "Yahoo!Search."  [See Web Search Engines ]

    Check "Government" category.

WWW Virtual Library
    First search directory founded by Tim Berners-Lee, creator of HTML and the Web.
    Maintained by volunteers.

    Check "Law" category.
 
 

Web Search Engines

To establish its information bank, a Search Engine's search software sends out "robots" (or "spiders" or "crawlers") that use HTTP to request data from Gopher, FTP, and HTTP servers.  [HTTP - HyperText Transmission Protocol; FTP - File Transfer Protocol.]  Data is indexed and stored; main page and additional pages of site are indexed.

Note:  When you use a Search Engine, you are only searching information indexed and stored by that Engine.

Google and Yahoo!Search are two leading search engines.  Both have the following features:

See additional descriptive information on these two search engines below.  Also included are results for sample searches seeking information on "Ohio legal research."

Google
    Also searches Usenet newsgroups.

    Supports over 100 interface languages.
    Does not support truncation, but automatically searches for variant forms of word.
    Can use * (ie, asterisk) as a "wildcard" within a phrase search.
    Supports ~ (ie, tilde) in front of word to search for its synonyms.
    Can search within search results.
    Divides search results into "Web," "Images," "Groups," "News," and "Froogle" (ie, products) categories.

"ohio legal research"
        About 130 Web pages will be retrieved.
"legal research" +ohio
        About 135,000 Web pages will be retrieved.
"legal research" AND ohio
        About 134,000 Web pages will be retrieved.


Yahoo!Search
    Also searches Yahoo! directory, as well as other Yahoo! portal databases (eg, Yahoo!News).
    Supports search nesting within parentheses.
    Can use stop word (eg, "a") as a "wildcard" within a phrase search.
    Also translates text or Web page from English to Chinese, Dutch, Greek, Japanese, Korean, and Russian; Chinese, Dutch, Greek, Japanese, Korean, and Russian to English; Dutch to French; as well as French to Dutch, Greek, Italian, Portuguese, and Spanish.
    Divides search results into "Web," "Images," "Directory," "Yellow Pages," "News," and "Products" categories.

"ohio legal research"
        About 60 Web pages will be retrieved.
"legal research" +ohio
        About 111,000 Web pages will be retrieved.
"legal research" AND ohio
        About 113,000 Web pages will be retrieved.

 
 

Web Metasearch Engines

A Web Metasearch Engine sends your search statement to several search services, receives results, deletes duplicates, and displays results in single list.  This type of Web search service can save aggravation and time, because you only need to know one interface to search and don't need to search multiple search services.  In addition, you can often select which services your search will go to.  However, since a Metasearch Engine sends a search statement to several search services, and search services have different search methods, a complex search statement may not run effectively.

Query Server and Vivisimo are two leading metasearch engines.  Both have the following features:

See additional descriptive information on these two metasearch engines below.  Also included are results for sample searches seeking information on "Ohio legal research."
 

Query Server
    Opt to search 8 Web search engines, 9 Media sites, 11 Health sites, or 12 US federal government sites.
    After selecting Web, Media, Health, or US Federal Government search group, can click on "Customize" below search box to select particular engines/sites within group of search engines/sites.
    "Customize" also enables you to select whether your search results "cluster" by content, site, or both content and site.
    Can use * (ie, asterisk) to truncate search words.

"ohio legal research"
       About 60 Web pages, in 8 clusters, will be retrieved.  [About 30 Web pages identified as duplicates.]
"legal research" NEAR ohio
       About 80 Web pages, in 6 clusters, will be retrieved.  [About 30 Web pages identified as duplicates.]
"legal research" AND ohio
       About 80 Web pages, in 6 clusters, will be retrieved.  [About 30 Web pages identified as duplicates.]
 

Vivisimo
    Over 12 search engines, Open Directory, Yahoo, as well as numerous media, government, and business sites.
    Use "advanced search" to run search on selected search engines, news, etc.
    "Clusters" (ie, arranges by subject) results and ranks clusters. Cluster classification not predetermined (as in Query Server); this allows for maximum classification flexibility.
    Supports field searching.

"ohio legal research"
       About 60 Web pages, in 15 clusters (some with sub-clusters), will be retrieved.
"legal research" NEAR ohio
       About 160 Web pages, in 23 clusters (some with sub-clusters), will be retrieved.
"legal research" AND ohio
       About 150 Web pages, in 31clusters (some with sub-clusters), will be retrieved.


 

Invisible Web

The Web includes a lot of "Dynamic Content."  This "Invisible Web" or "Deep Web" is information transmitted via the Web, rather than stored on the Web in "static" form, and thus is not available for indexing by search engines.  [Robots Exclusion Protocol - Prevents search engine's robot crawler from accessing/indexing portions of Web site.]
The Invisible Web includes information in databases, password restricted information, text within graphics, etc., and is estimated to be 400,000 Web sites.

If you cannot find information on a topic via a Web Search Directory or Search Engine, try an Invisible Web search service.  You may be led to a fee-based Web site, but at least you'll find out if any information is available on your topic.

CompletePlanet
    Over 70,000 searchable databases and search services.
    Can browse sites via its 34 top-level subject "browse tree" (eg, "Government").  ["Browse tree" extends to five sub-levels.]
    Can search within "browse tree" subject directory.
    "Advanced" searching available.

Librarians' Index to the Internet
    Searchable annotated subject directory of over 14,000 Internet resources; librarians select resources based on their "usefulness to users of public libraries."
    Can browse subject directory (eg, "Government & Law").
    "Advanced Search" available.


 

Additional Resources


Web-Based Articles

Deep Web Research
Marcus P. Zillman
Law Library Resource Xchange
2/23/03

Evaluating the Quality of Information on the Internet
The Virtual Chase, created 9/14/01, revised 9/16/04

Books

Links are provided for items held at the Cleveland-Marshall College of Law Library.

The Extreme Searcher's Guide To Web Search Engines: A Handbook For the Serious Searcher
Randolph Hock, foreword by Reva Basch. - 2nd ed. - Medford, NJ: CyberAge Books, c2001.
[Available in Cleveland-Marshall College of Law Library - Reference ZA4226 .H63 2001]

Government Information On The Internet
Peggy Garvin, ed. - 6th ed. - Lanham, MD: Bernan, a division of Kraus Organization Ltd., c2003.
[Available in Cleveland-Marshall College of Law Library - Reference ZA5075 .G68]

Internet Blue Pages: The Guide To Federal Government Web Sites
Laurie Andriot, comp. - Medford, NJ: Information Today, Inc., c2000, 2001-2002.
[Available in Cleveland-Marshall College of Law Library - Reference ZA5075 .A53 2001-2002]

Internet Power Searching: the Advanced Manual / Phil Bradley. - 2nd ed. - New York, NY: Neal-Schuman Publishers, c2002.

The Invisible Web: Uncovering Information Sources Search Engines Can't See
Chris Sherman and Gary Price. - Medford, NJ: CyberAge Books, c2001.
[Available in Cleveland-Marshall College of Law Library - ZA4450 .S54 2001]

The Librarian's Internet Survival Guide: Strategies For the High-Tech Reference Desk
Irene E. McDermott; edited by Barbara Quint. - Medford, NJ: Information Today, Inc., c2002.
[Available in Cleveland-Marshall College of Law Library - Reference ZA4201 .M36 2002]

Search Engines For The World Wide Web / Alfred and Emily Glossbrenner. - 3rd ed. - Berkeley, CA: Peachpit Press, c2001.

Toward a Cyberlegal Culture / Mirela Roznovschi. - 2nd ed. - Ardsley, NY: Transnational Publishers, c2002.

The United States Government Internet Manual / Peggy Garvin, ed. - Lanham, MD:  Bernan Press, c2004.
[Available in Cleveland-Marshall College of Law Library - Reference ZA5075 .G68 2003-2004]

Web of Deception:  Misinformation on the Internet / Anne P. Mintz, ed. - Medford, NJ:  CyberAge Books, c2002.
[Available in Cleveland-Marshall College of Law Library - Reference ZA4201 .W43 2002]

Annuals and Periodicals

The Internet Lawyer. - Baltimore, MD: Daily Record Company. Monthly.
[Available from Cleveland-Marshall College of Law Library; current issue on Reserve.]

Internet Law & Strategy. - Philadelphia, PA: Law Journal Newsletters, a division of American Lawyer Media. Monthly.
[Available from Cleveland-Marshall College of Law Library; current issue on Reserve.]

The Legal List: Research On The Internet
St. Paul, MN: West Group, Inc. Annual.
[Available from Cleveland-Marshall College of Law Library - KF242 .A1 L375; current edition in Reference]