FOA Home
Keywords therefore have a special status in IR and as part of the FOA
process. Not only must they be exhaustive enough to capture the entire
topical scope reflected by the corpora's domain of discourse, but they
must also be expressive enough to characterize any information needs the
users might have.
Of course we need not restrict our users to only one of
these keywords. It seems quite natural for queries to be composed of two
or three, perhaps even dozens, of keywords. Recent empirical evidence
suggests that many typical queries have only two or three keywords (cf.
Section §8.1 ), but even this number
provides a great combinatorial extension to the basic vocabulary of
single keywords. Other applications, for example using a document itself
as a query, (i.e. using it as an example - ``Give me more like this.'')
can generate queries with hundreds of keywords. Regardless of size,
queries defined only as sets of keywords will be called SIMPLE
QUERIES . Many Web search engines support only simple queries. Often
however, the search engines also provide more advanced interfaces
including OPERATORS in the query language as well. Perhaps,
because you have previously been warped by an exposure to computer
science:), you may be thinking that sets of keywords might be especially
useful if joined by Boolean operators. For example, if we have one set
of documents about \term{NEURAL NETWORKS} and another set of
documents about \term{SPEECH RECOGNITION}, we can expect the
query: \smallskip \term{NEURAL NETWORKS AND SPEECH RECOGNITION}
\smallskip to correspond to the intersection of these two sets, while
\smallskip \term{NEURAL NETWORKS OR SPEECH RECOGNITION} \smallskip would
correspond to their union.
The Boolean operator is a bit more of a
problem. If users say they want things that are {\em not} about
\term{NEURAL NETWORKS}, they are in fact referring to the vast majority
of the corpus. That is, \term{NOT} is more appropriately considered a
binary, subtraction operator. To make this distinction explicit we will
call it \term{BUT\_NOT} .
There are other syntactic operators that are
often included in a search engine's query language, but these will be
put off until later. Even with these simple Boolean connectives and a
keyword vocabulary of reasonable size, users are capable of constructing
a vast number of potential queries in attempting to express their
information need.
Top of Page
Query syntax
Subsections