Search:

Relevance Search

Abstract: This text describes yawk's relevance search expressions.

yawk's uses a relevance search for document search. Usually you don't have to know about search expressions, just enter the words (or word fragments) you want to search and let yawk search. That's all.

yawk's relevance search is basically a non-indexed full text search on all wiki files. Assuming the seach expression "relevance + search" searching works as follows:

  1. Read the (unformatted) wikifile.

    1. Count the occurences for each of the search words "relevance" and "search",

    2. apply the arithmetic expression to them (in this example: add the numbers),

    3. divide the final result by the file's size, and

    4. store that value as file relevance score.

  2. Repeeat the above for each file.

  3. Sort all documents with a non-zero relevance and compute each file's relevance as percentage. The file with the maximum relevance defines 100%.

  4. Output the resulting list.

Search expressions

Search expressions are made of operands (search words) and operators. The following operators (shown by increasing precedence) are defined:

Table 1 - Search Expression Operators
|| logical or
&& logical and
!= > < >= <= >> << >>= <<= comparision operators
+ - plus, minus
* / , ,, multiplication, division and "product+sum"
+ - ! unary plus, minus and not
() numbers words strings expressions can be grouped with parenthesis, numbers, words and strings are literals.

Due to the nature of relevance search not all operators behave as usual.

Literals

The following literals are recognized:

File related literals

The only file related literal is %size which can be used as any other literal. %size returns the file's size in bytes.

Notice that when a file literal appears as first word in the search expression the expression must have at least on blank to prevent the expression from being interpreted as yawk's file search.

Default operator

Whenever two consequetive literals appear in the search expression a double-comma operator is inserted between them.

Sample expressions

The following table gives some sample expressions:

Table 2 - Sample Search Expressions
Search term Description
relevance * search lists all documents that contain the words "relevance" and "search".
relevance + search lists the documents that contain either or, or both words.
relevance , search same as above but documents containing both words get usually a higher ranking.
relevance search exactly as above.
wiki >= 5 lists all documents that contain "wiki" at least five times.
wiki >>= 5 same as above but with document ranking.
%size >> 1000 list all files with more than 1000 characters.
%size > 0 list all files.

Since relevance search uses the comma operator as default you can usually simply enter the words you're looking for. The comma operator tries to resemble the function of common search engines: list all documents containing at least one of the search words but give a higher ranking to those having all words.

< dag | at | awk-scripting.de >