Web
Searching
When
using research methodologies, the WorldWideWeb may be viewed as one electronic
information medium to be considered with print, audiovisual, and other
electronic information media. In addition, as is the case with all
information media, one must understand how to effectively find valid and
reliable information on the Web. This "Web Searching" guide will
help you to be a better Web researcher, as well as help you to better
evaluate Web sites you find in your research.
Web
Site Evaluation
Web Searching - General Issues
Web Searching Principles and Guidelines
Web Search Directories
Web Search Engines
Web Metasearch Engines
Invisible Web
Additional Resources
Web
Site Evaluation
A
WorldWideWeb "site" is composed of many "pages." Current size estimates
of the indexable Web put it at 11.5 billion pages, and the Invisible
Web may be as large as 500 billion pages. [For statistics,
news, and research information on the Web, consult ClickZ.] With all these
Web pages, how can you find the
information you want? In addition, even if you find a Web site that
appears to have your desired information, how can you know if the Web
site is reliable? You can probably trust sites from established
organizations, but what about Web sites of organizations you know nothing
about?
Web
Site Home Page and Site Index
When
evaluating a Web site, go to its "home page," or opening page of the site,
and look for these key pieces of information:
-
Purpose
- Scope
of Services and Information
- Navigation
Methods
- Help
Information
If
the home page doesn't address this information, look for a link to the
"site index" or "site map." As its name implies, the site index
is like the index of a book. Examining the site index should help
you decide if the Web site has the information you want.
Web
Site Evaluation Criteria
Beyond
the home page and site index, consider the following criteria when evaluating
a Web site:
- Credibility /
History - who created the site; are they a reliable authority; who are
their affiliations, partners, or sponsors?
- Purpose / Philosophy
- why was the site created?
- Audience / Relevance
- who was the site designed for?
- Content - what
are the site's topics or subjects?
- Scope / Context
/ Coverage - how much information on a topic does the site cover; does
the site report its size and growth rate?
- Selection Criteria
/ Critical Thinking / Objectivity / Censorship - how is information
selected for the site?
- Accuracy / Documentation
- does the site document its sources?
- Currency / Updating
- how often is the site's information updated and what is the scope
of that updating?
- Writing Quality
- can you cognitively understand the site's information?
- Design / Presentation
Format - can you visually or auditorially understand the site's information?
- Stability / Continuity
/ Maintenance - can you consistently connect to the site?
- Accessibility
- is the site accessible via different browsers; does the site have
"text only" or "no frames" versions?
- Interface - can
you understand the site's graphical and textual controls; how quickly
do these controls operate?
- Navigation / Site
Index / Searchability - can you understand how to find information on
the site?
- Connectivity -
does the site link to other helpful sites?
- Help Information
/ Frequently Asked Questions / Customer Support
- Usefulness / Value-to-Cost
Ratio - how does the site compare to other analogous sites; if it's
a fee-based site, is its information worth the cost?
Web Site Reviews
Internet Scout Project
Produces The
Scout Report, which announces and reviews Web sites and mailing
lists. The searchable Scout Report Archives
contain approximately 23,000 reports, and include the ability to browse
for reviews by Library of Congress Subject Headings (eg, "Law - United
States - cases").
Web
Searching - General Issues
The
Web offers a variety of good search services. Many provide "value-added"
services, such as customized display of search results. However,
it is important to remember that economic forces are quite active on the
Web. In addition, search services do not reach all Web-based information.
[See the Invisible Web section of this guide.]
Web
Search Service Trends
Over
the last decade, several trends have emerged in Web search service operations:
- Partnerships -
Partnering with organizations, such as "amazon.com," to offer easy material
purchase and other customer services.
- Advertisements
- Using pop-up ads to cover maintenance expenses or make additional
income.
- Fee Services -
Offering fee-based companion services, such as a full-text database.
- Charges for preferential
listings - Selling high positions in search results lists. [On
27 June 2002, responding to a complaint filed by Commercial Alert, the
US Federal Trade Commission sent a letter to the leading search engine
companies warning them to adequately disclose and distinguish paid placements
from unpaid ones. See Commercial
Alert Complaint Letter.]
- Customized search
form and results display - Offering capability of personalizing how
you search and see search results; "cookies" (ie, information stored
on your computer by a Web site) make this possible.
Web Search Service Problems
Despite
their wonderful capability of finding information, there can be problems
with the information from Web search services. Web Search Directories cover less of the Web than
Engines, but tend to have fewer problems because human beings compile
verified information about Web sites. Web Search Engines cover more of the Web than Directories,
but they electronically compile information that may or may not be verified.
All
Web search services can be subject to the following problems:
- Inadequate updating
and relocating leads to retrieval of many invalid and duplicate URLs.
[The URL - uniform resource locater - is the "address" of a Web site
or page.]
- Because each search
service covers different parts of the Web, as well as indexes overlapping
parts in different ways, searchers need to use multiple search services
to be really thorough.
- "Popular," rather
than the best, Web sites tend to get indexed more. [A "popular"
site is one that other sites link to; the more sites that link to a
site, the more "popular" that site is.]
- It can take months
for a new site to get indexed.
- Information in
frames or image maps is often not indexed.
- <alt> tags
describing graphics are indexed, but not the words in the graphics themselves.
- Inaccessible information
in "Invisible Web" (eg, password sites). [More on this in
Invisible Web section of this guide.]
- Partnership constraints
may affect information provided.
- Annoying, distracting
advertisements.
- Slow, congested
traffic! [WorldWideWait]
Web Search Service Evaluation
Search Engine Watch
Search service review Web site created by Danny Sullivan, who continues
to edit the site for Incisive Media. Provides statistics, reviews,
and comparative testing. Includes free daily SearchDay
and monthly Search
Engine Report newsletters, as well as the Search
Engine Blog and podcasts.
In
addition to consulting the above review site, consider the following criteria
when deciding whether a Web search service meets your needs:
- Selection criteria
or human involvement in indexing, reviewing, and screening information.
- "Refresh" frequency
- how often is the directory information updated; how often does the
engine robot "recrawl" to refresh information?
- Interface - can
you understand the search service's graphical and textual controls;
how quickly do these controls operate; is there an advanced search capability?
- Search Tips /
Help Information.
- Boolean Logic
Connectors (eg, "and," "or," "not") - can you use these to construct
a search statement?
- Proximity Searching
(eg, "near," "adj") - this connector is especially desired for full-text
searching.
- Phrase Searching.
- Case Sensitivity
- does the service understand upper and lower case?
- Punctuation Sensitivity
- does the service understand punctuation?
- Truncation Capability
- can the service retrieve varying forms of word (eg, where "child*"
retrieves "child" "children" etc.)?
- Field Searching
- can you limit a search to a particular portion of Web site?
- "Stop Words" (eg,
to, be) - does the service not search certain common words?
- Results Display
- does the service limit search results in any way; how does the service
rank search results?
- Cost / Fee.
Web
Searching Principles and Guidelines
Key
Principles
- If your first
thirty hits are not on point, change your search statement and/or use
a different search service.
- Evaluate retrieved
sites based on criteria outlined in the Web Site Evaluation section of this guide.
- Bookmark appropriate
and valuable Web sites.
General Guidelines
- Write down what
you are seeking - combine keywords into search statement - before going
online.
- Consider browsing
a Search Directory before using a
Search Engine.
- Check search service's
tips and help information.
- Keep search simple
- much irrelevant information often retrieved with complex searches,
because the searcher could not effectively combine keywords.
- Use several synonyms
of keywords.
- Start specific;
only use general terms if necessary.
Narrow
search - will have fewer items retrieved, but they will likely have
high relevance.
Broad search - will have many items retrieved, but most will likely
have low relevance.
- Enter most important
concept first.
- Use phrases -
phrases are usually enclosed in quotation marks.
- Boolean Connectors
-
Use
UPPER CASE letters.
Check
help information for default connector (often defaults to "and").
Often
use + in front of word for "and" (ie, includes term).
Often
use - in front of word for "not" (ie, excludes term).
- Proximity Connector
(eg, "near," "adj") - this connector especially desired for full-text
searching.
- Truncation symbol
usually an asterisk ( * ).
- Use parentheses
to combine keywords, similarly to data combinations in algebra or deductive
logic.
- If available,
use "field" searching - Web site title and URL particularly helpful.
- If available,
use "limit/refine" capability.
Web
Search Directories
Web
Search Directories are created by human beings who identify Web sites
and list them according to a subject classification. You can browse
or search a Web Search Directory. When searching within a Directory,
usually the main page of a Web site, rather than multiple pages of that
site, is listed in your search results. This feature helps to reduce duplicates
in search results. Particularly helpful, because of their human
indexing, Web Search Directory search results often include annotations
and evaluations of Web sites.
Open Directory Project
Developed and maintained by 74,000 volunteer editors.
Netscape owns the copyright to ODP's compilations, but freely grants license
to them. Thus, ODP is used by many search engines as their subject
directories.
Note the Law sub-category
under the Society category.
Yahoo!Directory
Note the Law
sub-category under the Government
category.
The
original Yahoo! was founded
by David Filo and Jerry Yang in 1994; now a corporation based in Sunnyvale,
CA.
Yahoo!
charges "commercial" sites annual fees to be listed.
Yahoo! started Yahoo!Search
in 2004 after acquiring the "All the Web" and "AltaVista" search engines.
[See the Web Search Engines section
of this guide.]
WWW Virtual Library
The first search directory founded by Tim Berners-Lee, creator of HTML
and the Web.
Maintained by volunteers.
Note the Law category.
Web
Search Engines
To
establish its information bank, a Search Engine's search software sends
out "robots" (or "spiders" or "crawlers") that use HTTP to request data
from Gopher, FTP, and HTTP servers. [HTTP - HyperText Transmission
Protocol; FTP - File Transfer Protocol.] Data is indexed and stored;
the main page and additional pages of a site are indexed.
Note:
When you use a Search Engine, you are only searching information indexed
and stored by that Engine.
Google
and Yahoo!Search
are two leading search engines. Both have the following features:
- Regular and "advanced"
searching
- Searches Web pages
as well as image, PDF, and other file types (eg, Rich Text Format, PowerPoint).
- When using Boolean
operators, defaults to "and"; also supports Boolean operators
"or" and "not"
- "+" for "must include"
and "-" for "exclude"
- Phrase searching
- Not case sensitive
- Field searching
- File format (eg,
pdf) searching
- Supports 3-month,
6-month, and year date restrictions
- Can search in at
least 35 languages
- Translates text
or Web page from at least English to French, German, Italian, Portuguese,
and Spanish; these five languages into English; as well as German to
French and French to German
- "Mature" or "adult"
filter option
See additional descriptive
information on these two search engines below. Also included
are results for sample searches seeking information on "Ohio legal research."
Google
Also searches Usenet newsgroups.
Supports over 100 interface languages.
Does not support truncation, but automatically searches for variant forms
of word.
Can use * (ie, asterisk) as a "wildcard" within a phrase search.
Supports ~ (ie, tilde) in front of word to search for its synonyms.
Can search within search results.
Divides search results into "Web," "Images," "Groups," "News," and "Froogle"
(ie, products) categories.
Includes
"Google Scholar" feature that searches articles, books, theses,
and other scholarly materials.
"ohio
legal research"
About 13,200 Web pages will be retrieved.
ohio "legal research"
About
1,070,000 Web pages will be retrieved. [Google assumes the "and"
Boolean connector between words unless otherwise specified.]
"legal
research" ohio
About 1,080,000 Web pages will be retrieved. [Google assumes the
"and" Boolean connector between words unless otherwise specified.]
Yahoo!Search
Also searches Yahoo! directory, as well as other Yahoo! portal databases
(eg, Yahoo!News).
Supports search nesting within parentheses.
Can use stop word (eg, "a") as a "wildcard" within a phrase search.
Also translates text or Web page from English to Chinese, Dutch, Greek,
Japanese, Korean, and Russian; Chinese, Dutch, Greek, Japanese, Korean,
and Russian to English; Dutch to French; as well as French to Dutch, Greek,
Italian, Portuguese, and Spanish.
Divides search results into "Web," "Images," "Video," "Audio,"
"Directory," "Local [ie, Businesses]," "News," and "Shopping" categories.
"ohio
legal research"
About 22,500 Web pages will be retrieved.
ohio "legal research"
About 991,000 Web pages will be retrieved.
"legal research" ohio
About 998,000 Web pages will be retrieved.
Web
Metasearch Engines
A
Web Metasearch Engine sends your search statement to several search services,
receives results, deletes duplicates, and displays results in single list.
This type of Web search service can save aggravation and time, because
you only need to know one interface to search and don't need to search
multiple search services. In addition, you can often select which
services your search will go to. However, since a Metasearch Engine
sends a search statement to several search services, and search services
have different search methods, a complex search statement may not run
effectively.
Dogpile
and Vivisimo are two leading
metasearch engines. Both have the following features:
- Boolean operators
"and," "or," and "not" (Vivisimo uses "-" for "not")
- Proximity searching
- "near" [Note: No current search engines support "near"
and this type of searching may no longer work in the metasearch engines.]
- If a search engine
doesn't support an operator (eg, "and," "near"), the metasearch engine
uses the next general operator
- Phrase searching
- Not case sensitive
- Search nesting
within parentheses
See additional descriptive
information on these two metasearch engines below. Also included
are results for sample searches seeking information on "Ohio legal research."
Dogpile
Allows one to search at least 6 Web search engines/directories (ie, About,
AskJeeves, Google, LookSmart, MIVA, MSN, and Yahoo!Search), as well as
several audio, video, image, and news services.
Divides
search results into "Web," "Images," "Audio,"
"Video," "News,"Yellow Pages," and "White
Pages" categories.
Can
view search results highlighted by the search service that retrieved them.
Also
provides some links to categories related to your search; uses Vivisimo
technology (see Vivisimo section
of this guide).
Note that Dogpile's default filter setting is "moderate."
"ohio
legal research"
About 90 Web pages will be retrieved.
ohio NEAR "legal research"
About 110 Web pages will be retrieved.
ohio AND "legal research"
About 60 Web pages will be retrieved.
Vivisimo
Allows one to search 8 Web search engines/directories (ie, BBC, GigaBlast,
LII, LookSmart, Lycos, MSN, Open Directory, and WiseNut) as well as numerous
news, business, and government, and sites.
Presents
highly ranked search results first, with an option to continue browsing
additional search results.
Vivisimo's hallmark feature is that search results are arranged in ranked
"Clusters" (ie, subject areas). This cluster classification is not predetermined
(as in Query Server), allowing for maximum classification flexibility.
Note that Vivisimo supports field searching.
"ohio
legal research"
About 170 Web pages, in 24 clusters (some with sub-clusters), will be
retrieved. An additional 33,950 Web pages will also be retrieved.
ohio NEAR "legal research"
About 130 Web pages, in 31 clusters (some with sub-clusters), will be
retrieved. An additional 28,260 Web pages will also be retrieved.
ohio AND "legal research"
About 130 Web pages, in 22 clusters (some with sub-clusters), will be
retrieved. An additional 216,860 Web pages will also be retrieved.
Invisible
Web
The
Web includes a lot of "Dynamic Content." This "Invisible Web" or
"Deep Web" is information transmitted via the Web, rather than stored
on the Web in "static" form, and thus is not available for indexing by
search engines. [Specifically, the Robots Exclusion Protocol prevents
a search engine's robot crawler from accessing/indexing portions of a
Web site.] The Invisible Web includes information in databases,
password restricted information, text within graphics, etc., and is estimated
to be 400,000 Web sites. Web search services only index an estimated
20% - 50% of the Web.
If
you cannot find information on a topic via a Web Search Directory or Search
Engine, try an Invisible Web search service. You may be led to a
fee-based Web site, but at least you'll find out if any information is
available on your topic.
CompletePlanet
Covers over 70,000 searchable databases and search services.
Browse resources via 34 top-level subject "browse tree" (eg, Government).
The "Browse tree" extends to five sub-levels.
Can also search within "browse tree" subject directory, and "Advanced"
searching is available.
Librarians' Internet Index
Searchable annotated subject directory of over 14,000 Internet resources;
librarians select the resources based on their "usefulness to users of
public libraries."
Browse resources within subject directory (eg, Government,
with sub-category Law).
Can also search within directory, and "Advanced Search" is available.
Additional
Resources
Web-Based
Tutorials
Guide
to Effective Searching of the Internet
Available
on the BrightPlanet Web site.
The
Pandia Goalgetter: a Short and Easy Internet Search Tutorial
Reviews
Web directories, search engines, metasearch engines, as well as search
principles and guidelines.
Web-Based Articles
Deep
Web Research Research 2006 / Marcus P. Zillman. Law Library
Resource Xchange, 1/15/06.
Evaluating the Quality of Information
on the Internet / The Virtual Chase, created 9/14/01, revised
9/16/04.
Deep Web Research
/ Marcus P. Zillman. Law Library Resource Xchange, 2/23/03.
Books
Ambient
Findability / Peter Morville. O'Reilly, c2005.
[Electronic
resource available from Cleveland State University - QA76.9 .D26 M673
2005eb.]
The
Extreme Searcher's Guide To Web Search Engines: A Handbook For the Serious
Searcher / Randolph Hock, foreword by Reva Basch. CyberAge
Books, c2001.
[Available from Cleveland-Marshall College of Law Library - Reference
ZA4226 .H63 2001]
Google
and Other Search Engines / Diane Poremsky. Peachpit Press,
c2004.
[Available
from Cleveland State University Library - TK5105.884 .P67 2004]
Internet
Blue Pages: The Guide To Federal Government Web Sites / Laurie
Andriot, comp. Information Today, Inc., c2000, 2001-2002.
[Available from Cleveland-Marshall College of Law Library - Reference
ZA5075 .A53]
Internet
Power Searching: the Advanced Manual / Phil Bradley. Neal-Schuman
Publishers, c2002.
[Available
from Cleveland State University Library - ZA4201 .B69 2002]
The
Invisible Web: Uncovering Information Sources Search Engines Can't See
/ Chris Sherman and Gary Price. CyberAge Books, c2001.
[Available from Cleveland-Marshall College of Law Library - ZA4450 .S54
2001]
IssueWeb:
a Guide and Sourcebook for Researching Controversial Issues on the Web
/ Karen R. Diaz and Nancy O'Hanlon. Libraries Unlimited, c2004.
[Available
from Cleveland State University Library - Curr Mats ZA4228 .D53 2004]
The Librarian's
Internet Survival Guide: Strategies For the High-Tech Reference Desk
/ Irene E. McDermott; edited by Barbara Quint. Information Today,
Inc., c2002.
[Available from Cleveland-Marshall College of Law Library - Reference
ZA4201 .M36 2002]
The
Professional's Guide To Mining the Internet: Information Gathering and
Research on the Net / Brian Clegg. Kogan Page; Stylus Pub.,
c2001.
[Available
from Cleveland State University Library - ZA4230 .C56 2001]
Search
Engine Visibility / Shari Thurow. New Riders, c2003.
[Electronic
resource available from Cleveland State University - ZA4201 .T48 2003eb.]
Search
Engines For The World Wide Web / Alfred and Emily Glossbrenner.
Peachpit Press, c2001.
[Electronic
resource available from Cleveland State University - ZA4230 .G57 2001eb.]
Sorting
Out the Web: Approaches To Subject Access / Candy Schwartz.
Ablex Pub., c2001.
[Available
from Cleveland-Marshall College of Law Library - ZA4232 .S39 2001]
Toward
a Cyberlegal Culture / Mirela Roznovschi. Transnational
Publishers, c2002.
[Available
from Cleveland-Marshall College of Law Library - K87 .R69 2002]
Using
the Internet as a Reference Tool: a How To-Do-It Manual for Librarians
/ Michael P. Sauers, with contributions by Denice Adkins. Neal-Schuman
Publishers, c2001.
[Available
from Cleveland State University Library - Z711.45 .S28 2001]
Web
of Deception: Misinformation on the Internet / Anne P. Mintz,
ed. CyberAge Books, c2002.
[Available from Cleveland-Marshall College of Law Library - Reference
ZA4201 .W43 2002]
Web
Research: Selecting, Evaluating, and Citing / Marie L. Radford,
Susan B. Barnes, and Linda R. Barr. Allyn and Bacon, c2002.
[Available
from Cleveland State University Library - ZA4201 .R33 2002]
The
World At Your Fingertips: Learning Research and Internet Skills
/ Heidi Kay and Karen DelVecchio. UpstartBooks, c2002.
[Electronic
resource available from Cleveland State University.]
Annuals
and Periodicals
The
Internet Lawyer
Baltimore,
MD: Daily Record Company. Monthly.
[Available from Cleveland-Marshall College of Law Library; current issue
on Reserve - KF242 .A1 I57 & Electronic]
Internet
Law & Strategy
Philadelphia,
PA: Law Journal Newsletters, a division of American Lawyer Media. Monthly.
[Available from Cleveland-Marshall College of Law Library; current issue
on Reserve - KF390.5 .C6 I559]
The Legal
List: Research On The Internet
St. Paul, MN: West Group, Inc. Annual.
[Available from Cleveland-Marshall College of Law Library - KF242 .A1
L375; current edition in Reference]
The
United States Government Internet Manual / Peggy Garvin, ed.
Bernan Press, c2004.
[Available
from Cleveland-Marshall College of Law Library - ZA5075 .G68; current
edition in Reference]
Laura
E. Ray, MA, MLS
Educational Programming Librarian
March 2007
|