Metric Navigation

An advanced user feature of the agent search control panel is accessed via the Nav Settings button on the Agent Search panel. In many cases this feature is not needed, but it can become a valuable feature when you want to more carefully prune where the search will lead. Typically, if you know that if a certain metric count you are watching for is more than another metric count, you may wish to determine the current page is True for purposes of gathering more links, nor not. This feature provides a variety of different radio button selected settings for search navigation pruning using metrics. This panel also shows a diagram that describes how the search path following different links from pages is pruned by using the Nav and Link test controls in conjunction with the Boolean test evaluation.

This covers the details of the Patent Pending Surf3D link capture rules, order of evaluation, and nav operations.

Operations #1

All links on the target web page are captured into a holding buffer. As they are being 'captured', the Link Text test is performed on each link separately.

If the Link Text is not enabled (check box is not checked), no test is performed and all links entered into the buffer.

In Link Text Test, each text term is evaluated in turn (left to right) to discover if there is a match within both the URL text itself AND the display text that is associated with the URL.

Each term that is entered may optionally be preceded with the ! character (a symbol for exclusion).

If a term has spaces embedded in it, it must be surrounded by enclosing quote marks (like "find me" ).

Case sensitivity is determined by the check box "Case Sensitive"

Depending on the presence of the exclusion character !

If not present --> any match of the term will end the test with success ( i.e. the link is saved, as opposed to thrown out).

If the ! is present --> any match of the term will end the test with failure ( i.e. the link is thrown out).

In both cases, if a result is not found (success and failure respectively), the test moves on to the next term going left to right. If all terms have been tested without result, it is considered a failure and link is thrown out.

Link Text Test

NOTE: If the Link Text check box is checked, ONLY links that pass this test will survive!

Operations #2

The entire web page is then tested for a BOOLEAN True or False depending on the AND OR terms and the NAV settings. ONLY if the NAV check box is enabled AND the result is False, then ALL of the links in the holding buffer are thrown out... ending the process for this particular web page.

Conversely, if NAV is checked and the Boolean result if True, they are all saved. NAV has several modes which affect the result. Various combinations of AND OR, metric values, and a user input X are available for selection. If NAV is not checked, all links in holding buffer are retained.

Boolean Test:

Terms for both AND and OR may be:

Preceded with a ! to signify NOT.

Preceded with a @ to signify "skip tag contents" (see below).

If a term has spaces embedded in it, it must be surrounded by enclosing quote marks (like "find me" ).

If the AND term line is empty (blank), only the OR terms are considered.

If the OR term line is empty (blank), it is ignored and only AND terms are considered.

If both AND and OR terms are empty, the result is always True. Case sensitivity is set by the check box "Case Sensitive".

Both AND and OR are logically OR’d to produce a result. AND must have at least 2 terms, or be empty. Example: AND = one two three, OR = four five.

Page has "three", "two", and "one" on it, is TRUE.

Page has "four" on it, is TRUE.

Page has "five" on it, is TRUE.

Use of the ! (NOT) will reverse the logic when it is evaluated. Example: AND = one !two three OR = !five.

Page has "three" and "one" and does not contain "two" is TRUE

Page does not contain "five" is TRUE.

Page has "three", "one", and "five" and does not contain "two" is TRUE.

Normally when text matching within the web page is performed, only the 'text' portion of the web page is examined for matches. Elements contained inside of HTML tags are excluded from testing. If the term is preceded with a @ character, then the entire page (tags and all) is searched for matches.

Operations #3 Exclusion Test

Any links remaining in the buffer are then tested for "Exclusion" ONLY

if the check box is checked. If it isn't checked, this test is skipped.

Each term is evaluated to see if a match exists within the text of the URL itself. If any match is found, the link is thrown out. If no match is found, the link is saved. The test is case sensitive.

Operations #4 Inclusion Test

All of the links remaining in the holding buffer are then tested for "Inclusion" ONLY if the check box is checked. If it isn't checked, this test is skipped.

Inclusion Test: Each term is evaluated to see if a match exists within the text of the URL itself. If a match is found, the link is saved. If no match is found, the link is thrown out. The test is case sensitive.

Inclusion Test NOTE: If the Inclusion check box is checked, ONLY links that pass this test will survive!

Operations #5

Any links remaining in the buffer are then scrutinized for various other things like duplication, invalid types, etc.. The media file links are then separated out into their own holding buffer. Surviving links are then pronounced 'good' and they are saved in the internal database. The indicator for TOTAL is adjusted accordingly.

This represents the end of the 'link' processing cycle for any given web page (unless it ended early in step #2).

Copyright © 2001-2008 Navagent, Inc.