Web Searching
Laura E. Ray, MA, MLS
Educational Programing Librarian
Cleveland-Marshall College of Law
September 2004
When using research methodologies, the WorldWideWeb may be viewed as one electronic information medium to be considered with print, audiovisual, and other electronic information media. In addition, as in other information media, it can be difficult to find the information you want. This "Web Searching" guide will help you to be a better Web researcher, as well as help you to better evaluate Web sites you find in your research.
Web Site Evaluation
Web Searching - General Issues
Web Searching Principles
and Guidelines
Web Search Directories
Web Search Engines
Web Metasearch Engines
Invisible Web
Additional Resources
A WorldWideWeb "site" is composed of many "pages." Current size estimates of the Web put it at 3 billion pages on 20 million sites. [For statistics, news, and research information on the Web, consult ClickZ]
Maybe you found a Web site that seems to have information you want. How
can you know if the Web site is reliable? You can probably trust sites
from established organizations, but what about Web sites of organizations you
know nothing about?
Web Site Home Page and Site Index
When evaluating a Web site, go to its "home page," or opening page of the site, and look for these key pieces of information:
Web Site Evaluation Criteria
Beyond the home page and site index, consider the following criteria when evaluating a Web site:
Web Site Reviews
Internet Scout Project
Produces "The Scout Report," which announces and reviews
Web sites and mailing lists.
"Table of Contents" for current Report.
"Archives" contain approximately 17,000 reports.
At "Archives"
page, also note ability to browse for reviews by Library of Congress Subject
Headings.
[eg, "Law - United States - cases"]
"Archives" are searchable; "Advanced Search" also available.
Web Searching - General Issues
The
Web offers a variety of good search services. Many provide "value-added"
services, such as customized display of search results. However, it is
important to remember that economic forces are quite active on the Web, and
search services do not reach all Web-based information. [Web search services
index an estimated 20-50% of Web. See Invisible Web as well as page 53 of The
Invisible Web: Uncovering Information Sources Search Engines Can't See available
from the Cleveland-Marshall College of Law Library.]
Web Search Service Trends
Over the last decade, several trends have emerged in Web search service operations:
Web Search Service Problems
Despite their wonderful capability of finding information, there can be problems with the information from Web search services. Web Search Directories cover less of the Web than Engines, but tend to have fewer problems because human beings compile verified information about Web sites. Web Search Engines cover more of the Web than Directories, but they electronically compile information that may or may not be verified. All Web search services can be subject to the following problems:
Web Search Service Evaluation
Search Engine Showdown
Search service review Web site produced by Greg R. Notess.
Provides reviews, analyses, and comparative information.
Search Engine Watch
Search service review Web site created by Danny Sullivan,
who continues to edit the site for Jupitermedia Corporation.
Provides statistics, reviews, and comparative testing.
Includes free daily "SearchDay" and monthly "Search Engine
Report" newsletters.
In addition to consulting the above review sites, consider the following criteria when deciding whether a Web search service meets your needs:
Web Searching Principles
and Guidelines
Key Principles
General Guidelines
Narrow search - will have fewer items retrieved, but they will likely have high relevance.
Broad search - will have many items retrieved, but most will likely have low relevance.
Use upper case letters.
Check help information for default connector (often defaults to "and").
Often use + in front of word for "and" (ie, includes term).
Often use - in front of word for "not" (ie, excludes term).
Web
Search Directories are created by human beings who identify Web sites and list
them in a subject classification.
You can browse or search a Web Search Directory.
Usually the main page of a Web sites, rather than multiple pages of that site,
is listed in search results.
Annotations and evaluations of Web sites are often included in search results.
Open Directory Project
Developed and maintained by 30,000 volunteer editors.
Netscape owns the copyright to ODP's compilations, but freely
grants license to them.
Thus, ODP is used by many search engines as their subject
directories.
Check "Law" sub-category under "Society" category.
Yahoo!
Founded by David Filo and Jerry Yang in 1994; now a corporation
based in Sunnyvale, CA.
Charges "commercial" site annual fee to be listed.
Recently acquired search engines "All the Web" and "AltaVista"
and started "Yahoo!Search." [See Web Search Engines ]
Check "Government" category.
WWW Virtual Library
First search directory founded by Tim Berners-Lee, creator
of HTML and the Web.
Maintained by volunteers.
Check "Law" category.
To establish its information bank, a Search Engine's search software sends out "robots" (or "spiders" or "crawlers") that use HTTP to request data from Gopher, FTP, and HTTP servers. [HTTP - HyperText Transmission Protocol; FTP - File Transfer Protocol.] Data is indexed and stored; main page and additional pages of site are indexed.
Note: When you use a Search Engine, you are only searching information indexed and stored by that Engine.
Google and Yahoo!Search are two leading search engines. Both have the following features:
Google
Also searches Usenet newsgroups.
Supports over 100 interface languages.
Does not support truncation, but automatically searches for
variant forms of word.
Can use * (ie, asterisk) as a "wildcard" within a phrase
search.
Supports ~ (ie, tilde) in front of word to search for its
synonyms.
Can search within search results.
Divides search results into "Web," "Images," "Groups," "News,"
and "Froogle" (ie, products) categories.
"ohio
legal research"
About 130 Web pages will be retrieved.
"legal research" +ohio
About 135,000 Web pages will be retrieved.
"legal research" AND ohio
About 134,000 Web pages will be retrieved.
Yahoo!Search
Also searches Yahoo! directory, as well as other Yahoo! portal
databases (eg, Yahoo!News).
Supports search nesting within parentheses.
Can use stop word (eg, "a") as a "wildcard" within a phrase
search.
Also translates text or Web page from English to Chinese,
Dutch, Greek, Japanese, Korean, and Russian; Chinese, Dutch, Greek, Japanese,
Korean, and Russian to English; Dutch to French; as well as French to Dutch,
Greek, Italian, Portuguese, and Spanish.
Divides search results into "Web," "Images," "Directory,"
"Yellow Pages," "News," and "Products" categories.
"ohio
legal research"
About 60 Web pages will be retrieved.
"legal research" +ohio
About 111,000 Web pages will be retrieved.
"legal research" AND ohio
About 113,000 Web pages will be retrieved.
A Web Metasearch Engine sends your search statement to several search services, receives results, deletes duplicates, and displays results in single list. This type of Web search service can save aggravation and time, because you only need to know one interface to search and don't need to search multiple search services. In addition, you can often select which services your search will go to. However, since a Metasearch Engine sends a search statement to several search services, and search services have different search methods, a complex search statement may not run effectively.
Query Server and Vivisimo are two leading metasearch engines. Both have the following features:
Query Server
Opt to search 8 Web search engines, 9 Media sites, 11 Health
sites, or 12 US federal government sites.
After selecting Web, Media, Health, or US Federal Government
search group, can click on "Customize" below search box to select particular
engines/sites within group of search engines/sites.
"Customize" also enables you to select whether your search
results "cluster" by content, site, or both content and site.
Can use * (ie, asterisk) to truncate search words.
"ohio
legal research"
About 60 Web pages, in 8 clusters, will
be retrieved. [About 30 Web pages identified as duplicates.]
"legal research" NEAR ohio
About 80 Web pages, in 6 clusters, will
be retrieved. [About 30 Web pages identified as duplicates.]
"legal research" AND ohio
About 80 Web pages, in 6 clusters, will be retrieved.
[About 30 Web pages identified as duplicates.]
Vivisimo
Over 12 search engines, Open Directory, Yahoo, as well as
numerous media, government, and business sites.
Use "advanced search" to run search on selected search engines,
news, etc.
"Clusters" (ie, arranges by subject) results and ranks clusters.
Cluster classification not predetermined (as in Query Server); this allows for
maximum classification flexibility.
Supports field searching.
"ohio
legal research"
About 60 Web pages, in 15 clusters (some
with sub-clusters), will be retrieved.
"legal research" NEAR ohio
About 160 Web pages, in 23 clusters (some
with sub-clusters), will be retrieved.
"legal research" AND ohio
About 150 Web pages, in 31clusters (some with
sub-clusters), will be retrieved.
The
Web includes a lot of "Dynamic Content." This "Invisible Web" or "Deep
Web" is information transmitted via the Web, rather than stored on the Web in
"static" form, and thus is not available for indexing by search engines.
[Robots Exclusion Protocol - Prevents search engine's robot crawler from accessing/indexing
portions of Web site.]
The Invisible Web includes information in databases, password restricted information,
text within graphics, etc., and is estimated to be 400,000 Web sites.
If you cannot find information on a topic via a Web Search Directory or Search Engine, try an Invisible Web search service. You may be led to a fee-based Web site, but at least you'll find out if any information is available on your topic.
CompletePlanet
Over 70,000 searchable databases and search services.
Can browse sites via its 34 top-level subject "browse tree"
(eg, "Government"). ["Browse tree" extends to five sub-levels.]
Can search within "browse tree" subject directory.
"Advanced" searching available.
Librarians' Index to the Internet
Searchable annotated subject directory of over 14,000 Internet
resources; librarians select resources based on their "usefulness to users of
public libraries."
Can browse subject directory (eg, "Government & Law").
"Advanced Search" available.
Web-Based Articles
Deep Web Research
Marcus P. Zillman
Law Library Resource Xchange
2/23/03
Evaluating the Quality of Information
on the Internet
The Virtual Chase, created 9/14/01, revised 9/16/04
Books
Links are provided for items held at the Cleveland-Marshall College of Law Library.
The
Extreme Searcher's Guide To Web Search Engines: A Handbook For the Serious Searcher
Randolph Hock, foreword by Reva Basch. - 2nd ed. - Medford, NJ: CyberAge Books,
c2001.
[Available in Cleveland-Marshall College of Law Library - Reference ZA4226 .H63
2001]
Government
Information On The Internet
Peggy Garvin, ed. - 6th ed. - Lanham, MD: Bernan, a division of Kraus Organization
Ltd., c2003.
[Available in Cleveland-Marshall College of Law Library - Reference ZA5075 .G68]
Internet
Blue Pages: The Guide To Federal Government Web Sites
Laurie Andriot, comp. - Medford, NJ: Information Today, Inc., c2000, 2001-2002.
[Available in Cleveland-Marshall College of Law Library - Reference ZA5075 .A53
2001-2002]
Internet Power Searching: the Advanced Manual / Phil Bradley. - 2nd ed. - New York, NY: Neal-Schuman Publishers, c2002.
The
Invisible Web: Uncovering Information Sources Search Engines Can't See
Chris Sherman and Gary Price. - Medford, NJ: CyberAge Books, c2001.
[Available in Cleveland-Marshall College of Law Library - ZA4450 .S54 2001]
The
Librarian's Internet Survival Guide: Strategies For the High-Tech Reference
Desk
Irene E. McDermott; edited by Barbara Quint. - Medford, NJ: Information Today,
Inc., c2002.
[Available in Cleveland-Marshall College of Law Library - Reference ZA4201 .M36
2002]
Search Engines For The World Wide Web / Alfred and Emily Glossbrenner. - 3rd ed. - Berkeley, CA: Peachpit Press, c2001.
Toward a Cyberlegal Culture / Mirela Roznovschi. - 2nd ed. - Ardsley, NY: Transnational Publishers, c2002.
The
United States Government Internet Manual / Peggy Garvin, ed. - Lanham, MD:
Bernan Press, c2004.
[Available in Cleveland-Marshall College of Law Library - Reference ZA5075 .G68
2003-2004]
Web
of Deception: Misinformation on the Internet / Anne P. Mintz, ed.
- Medford, NJ: CyberAge Books, c2002.
[Available in Cleveland-Marshall College of Law Library - Reference ZA4201 .W43
2002]
Annuals and Periodicals
The
Internet Lawyer. - Baltimore, MD: Daily Record Company. Monthly.
[Available from Cleveland-Marshall College of Law Library; current issue on
Reserve.]
Internet
Law & Strategy. - Philadelphia, PA: Law Journal Newsletters, a division
of American Lawyer Media. Monthly.
[Available from Cleveland-Marshall College of Law Library; current issue on
Reserve.]
The
Legal List: Research On The Internet
St. Paul, MN: West Group, Inc. Annual.
[Available from Cleveland-Marshall College of Law Library - KF242 .A1 L375;
current edition in Reference]