Searching For Information On The Net

URL: http://ls.berkeley.edu/lscr/support/faq/tips

The internet is a wonderful source for information. Locating and researching useful information, though, has become time consuming and a lot or work. Search engines were created to help you and me sift through vast collections of data dispersed throughout the world on the internet. Here is a limited guide, condensed from information on the internet applicable on PCs and Macintosh computers, to help you with your web searches.

If you are new to searching, be sure to look up the "help" and "search tips" associated with the particular search engine(s) you are going to use. They are each a little different. When you find one you like, get to know it really well.

Some popular common search engines, listed in no particular order, are:

1 Infoseek; 2. HotBot; 3. AltaVista; 4. Excite!; 5. WebCrawler; 6. Yahoo!; 7. Lycos; 8. Disinfo; 9. Snap!;10. Open Text

The Infoseek search engine made its debut in 1994. Infoseek's indexes are generated by a web crawler. Infoseek supports standard Boolean operators and nested logic. Proximity searching is permitted through the entry of bound phrases in quotes. Proper names are indicated to the engine through capitalization of terms. Advanced search features allow terms to be required or excluded

HotBot was developed by Inktomi Corp., formerly part of the Network of Workstations (NOW) Project at the University of California Berkeley. A web crawler visits sites and index their content for the search engine. The web crawler uses artificial intelligence features to record geographic information, URLs and domain names, file names and file types found, page features such as Java scripts, VRML, embeds, etc. This information is accessible through HotBot's search control panel. HotBot's help pages provide a good basic understanding of many advanced Net searching features.

The AltaVista search engine was developed by Digital Corp. and is updated daily. Database entries are gathered by a web crawler." AltaVista claims the largest Web index with over 31 million entries to WWW pages on over 620,000 servers worldwide. It also indexes Usenet newsgroups daily and maintains them for several weeks.

AltaVista permits the use of proximity operators as well as Boolean operators. Term truncation and nesting are permitted. AltaVista also allows the user to designate terms on which to sort search output.

The Excite! search engine developed by Excite Inc. The search engine handles entered phrases and finds the closest matches using fuzzy logic. This makes Excite particularly easy for novices to use since it partially compensates for poorly formed queries. Excite allows the user to find sites similar to any on the output listing using pattern matching techniques with its "More Like This" option. Search results can be sorted by site which makes reviewing results easier.

The Excite! search engine supports standard Boolean operators, nesting and proximity searching. Advanced search options allow searchers to require or disallow terms using + and - prefixes.

WebCrawler was begun in 1994 at the Department of Computer Science and Engineering at the University of Washington. It was the very first full text search engine available on the Net. WebCrawler indexes are built both by user submissions and by a web crawler program. Boolean operators, nested logic, proximity operators and bound phrases are all supported in search queries. A formless search option is also available.

Yahoo!'s general search engine is powered by the OpenText search engine. Search indexes are built primarily by user submissions and supplemented by a web crawler. Yahoo! presents a highly structured, hierarchical subject directory to "thousands" of WWW sites. Yahoo!'s directory is an outgrowth of the early attempts at categorizing information found on the Internet at Stanford University. Searches can be performed on the Yahoo! directory or the whole Internet, including Usenet newsgroups and e-mail addresses.

Yahoo!'s search engine (Yahoo! also uses the Inktomi search engine) uses standard Boolean operators, pipes and supports nested logic. Proximity searches can be done on bound phrases. Prefixes allow search terms to be specifically required or excluded from the result.

The Lycos search engine was originally developed at Carnegie Mellon University but is now independent. Lycos also provides topical guides accessible from its search form pages. These are produced and updated by an editorial team which also provides reviews of the Top 5% Sites. Lycos provides two directory services: A2Z and Point. The distinction between these services is not clear. The engine does not permit the use of Boolean operators or nested search logic and has no mechanism for direct proximity searching using bound phrases or proximity operators.

Some General search tips:

In the search box, type in what you want to find. You will undoubtedly get many, many hits for your search. Narrow down you results. Refine your search parameters. As you hone in your search, the results or hits will be more relevant to what you are looking for, if you know.

Capitalize names, double quotation marks (") around groups of words or use hyphens (-) between words that want to appear together, (Note: Double quotes make words case-sensitive), and be treated as a single name or title. e.g. Rock Hudson will search on the movie star whereas rock hudson will search on rocks, rock climbing, rock music, Henry Hudson, the Hudson River and the movie star. Of course if you are interested in rocks and the Hudson river you could, use the comma to separate a lists or name. e.g. rock, hudson.

Plus sign (+) should be put in front of a word that must appear in your results. This increases the precision of your search. e.g. city guides +San Francisco.

Pipes (|) can be used to really narrow down a search, if supported. Pipes search for one word, and then within that set of results, for another. e.g. dogs | Komondork (a breed of dog).

Natural Language Query is a search by phrase. The user types in a question such as: where is New York city? or, who is Bill Clinton?

Advanced searching (Infoseek) can use field searches which allows restricting searches to certain portions of Web documents by using a specific field syntax. You are able to search for: Web pages' titles, URLs, and embedded hypertext links. Here's how. The field name must be lowercase and followed immediately by a colon and then immediately followed by the search terms. No spaces! e.g.

link:netscape.com finds other sites pointing to Netscape.
url:science finds pages with the word science in the page's URL.
title:"The New York Times" finds pages with the phrase New York Times in the title portion of the document.

Boolean logic is used to construct logical search statements using the logical operators AND, OR, and NOT. Frogs AND toads would search for frogs and toads together and exclude results that did not include both frogs and toads. Useful for narrowing, focusing and coordinating a search.

Frogs OR toads would search for frogs or toads independently. It would also include sites about both frogs and toads together. Useful for broadening and expanding a search.

Frogs NOT toads would search for frogs only. Any reference to toads or frogs and toads will be excluded. Useful for narrowing down and placing limits on a search.

It is possible to get very involved with Boolean statements. So, one must be careful with Boolean constructions. Can you imagine what Paris AND Louvre AND museum would turn up? Click here to try it with: Infoseek., Yahoo! , and AltaVista.



Search Engines and Features Compared

Search Engine Simple Form Boolean Proximity Nesting Fuzzy Logic Term Weighting Sorted Output Ranked Output Find Like
AltaVista
Y Y Y Y   Y Y Y  
Excite! Y Y Y Y Y Y Y Y Y
HotBot   Y Y Y Y Y   Y  
Infoseek Y Y Y Y Y Y Y Y Y
Lycos Y         Y Y Y  
WebCrawler Y Y Y Y Y Y Y Y  
Yahoo! Y Y Y Y   Y   Y  
Simple Form - simple interface containing single entry box and/or up to two simple lists or check boxes; Boolean - supports standard Boolean logic; Proximity - supports some form of word proximity searching, exclusive of the ability to search on phrases; Nesting - permits nested logic in search queries; Truncation allows search terms to be simplified or implicitly shortening entries; Fuzzy Logic - a sort of AI used to interpert a search query either to perform a search or to rank the output; Term Weighting - the use of a prefix (+) to weight or (-) exclude search terms; Sorted Output - output sorted by search engine or user; Ranked Output - output is ranked by relevance; Find Like - search engine displays a link on the results that allow finding similar sites;Web crawler - an internet search engine;Java Script - JavaScript is Netscape's cross-platform, object-based scripting language for client and server applications;VRMLVirtual Reality Modeling Language or Virtual Reality Markup Language.A programming language which used to create the illusion of three-dimensional objects for onscreen virtual reality environments; bound Phrases- group of words or a phrase bound by quotation marks; Advanced Net - The option to use more detailed search engine features;

See earlier Tip Of The Week for additional information.

NOTE! Search engines and their features are constantly changing. Check "Search Tips" and "Help" for current features. Good Luck!