FOA Home
The that critical mapping between documents and descriptive keywords,
has dominated our approach to FOA in all the preceeding chapters. But
there is of course a larger context of available information: FOA can be
accomplished by showing a user relations among keywords, by
acquainting him or her with important authors, by pointing to important
journals where relevant documents are often published, etc. Retrieval of
all these information resources, especially when structured in
meaningful interface, can tell a user much more than simply listing
relevant documents.
This chapter is concerned with exploiting a variety
of other clues we might have about documents (and keywords, authors,
etc.), above and beyond the statistical, word-frequency information that
has been at the heart of the Index relation. In all cases,
these techniques identify some new source of data, represent it
efficiently, and then perform some kind of inference over the
representation.
AI is a subdiscipline of computer science centrally
concerned with questions of knowledge representation and inference over
those representations, especially when these algorithms arguably lead to
``intelligent'' behaviors. (In many ways the best characterization of
the AI domain is extensional one provided by the corpus of Ph.D.
dissertations so classified by U.M.I.) We could expect, therefore, that
there would be a great deal of cross-fertilization between AI methods
and IR methods, both growing up within computer science during the same
period. But for complicated reasons, until recently there has been very
little interaction.
By and large, AI has defined its notions of
inference in logical terms, based originally on automatic
theorem proving results. The last chapter has discussed IR's
probabilistic foundations and so one immediate axis of difference
between AI and IR is the distinction between primarily logical and
primarily probabilistic modes of inference. Nevertheless, some in IR
perceived early on the advantages that AI's KNOWLEDGE
REPRESENTATIONS [REF328] and
expert systems [Fox87] [McCune85] [Fidel86] techniques offered.
Today, the
fields of AI and IR align much more closely. For example, both machine
learning and natural language processing have always been considered
central issues within AI. The next chapter discusses at length machine
learning techniques as they hve been applied to document corpora.
Section §8.2 can only sketch another
large intersection, corpus-based linguistics, where natural language
issues and IR techniques also merge.
The advantages of applying AI
knowledge representation techniques become especially obvious when
additional structured attributes are associated with documents,
keywords, and authors. Early on, Kochen [REF320] considered a broad range of these
forms of information as shown in Figure (figure) Even more
inclusive lists have been proposed [Katzer82] [REF436]
This shows the primary
Index relation in the larger context of other information we
might also have available. What all of these additional forms of
information have in common is their ability to shed new light on the
semantic questions of what the documents are about. Information about
the publication details of documents, for example the journal date, even
page numbers of documents, can help provide a context within which
individual documents can be better understood. Much of this
data-about-data document is now referred to as META-DATA . This
additional modeling of document structure in languages like XML and
codified in standards like the Dublin
Core are one of the most important ways in which database and IR
technologies now interact. This constructively blurs many of the
database/search engine differences mentioned before (cf. Section §1.6 ). Techniques for performing FACT
EXTRACTION -- building database relations from analysis of textual
WWW pages [Craven98] -- suggest a
broad range of new ways that structured attributes may enter into the
retrieval task.
Section §6.1 discusses
one of the most important ways in which documents can be understood
independent of their keywords. In science, in the common law tradition,
and much more recently in email newsgroups and now with HTML hyperlinks,
the ability to link one document to another can provide vital
information about how the arguments of one document relate to those
contained in another.
Section §6.3 will
discuss some of the special representation techniques which have been
used to organize keywords in the vocabulary. It is also possible to
reason about authors of documents. Section §6.4.1 discusses experiments exploiting
Ph.D. ``genealogies'' in which dissertation authors are related to one
another by shared advisors. Co-authorship and membership in the same
research institution have also been proposed as ways to provide context
on a particular author's words. In some cases characterizations of
expertise of the authors, and independent of the documents themselves,
are available.
The chapter concludes with several suggestions of just how
these varied information sources can become integrated as part of
next-generation FOA tools. Section §6.5
considers several ``modes of inference'' by which new conclusion about
keywords and documents can be reached from elementary facts. Section §6.6 suggests a few of the new interface
techniques that become available as richer data streams are provided by
and presented back to a user. Sections §6.7 and §6.8 look at two domains of discourse in
particular -- the law and science surrounding molecular genetics -- as
examples of how such techniques can be marshalled towards particular FOA
purposes. After considering all these ways that the methods of AI can be
used to help with FOA, Section §6.9
concludes with a speculation of how the problem of intelligence itself
might be changed as we take seriously the prospect of basing it on
textual representations.
Top of Page
Inference beyond the \Index
Subsections