Strategies for Seeking and Publishing Biomedical Literature on the WWW
Richard K. Belew
University of California, San Diego
Mark Craven
University of Wisconsin
19 August 2000
Part of the ISMB-2000
Tutorial Program
Talk slides
Synopsis
It is hard to imagine bioinformatics having grown up without
the WWW. Biological scientists now search through, and post data,
scientific publications and curricula in variety of formats as a
natural part of their work. This tutorial will provide an
introduction to both well-established and state-of-the-art methods
for finding, publishing, and extracting information from on-line,
text-based sources.
Topics included:
- Information retrieval basics
- The basic methods and
principles that underlie all information retrieval systems, including
general-purpose WWW search engines as well as biomedically-focused
resources like PubMed.
- Bibliometric search techniques
- Systems such as
Entrez and Swanson's ARROWSMITH that (like Google)
exploit bibliographic citations to uncover important
relationships connecting the biomedical
literature.
- Document similarity
- Methods for identifying
``similar'' documents, based on both keywords and
citations associated with the documents.
- Portals to biomedical sources
- Special-purpose
"portals" for accessing the biomedical
literature. These will include more conventional text
search engines like PubMed and Entrez, as well as
other sequence-based databases, such as SwissProt ,
that provide entry points to the biomedical
literature.
-
- Resources such as the Medical Subject Headings (MeSH)
controlled vocabulary, which is used to index articles in
MEDLINE, and the Unified Medical Language System (UMLS),
intended to help programs and people better regularize content
descriptors.
- Morphological analysis
- Methods that exploit
the special morphology (surface structure) of many
biomedical terms in order to improve retrieval
accuracy.
- Extraction methods
- Emerging techniques that
automatically extract targeted classes of keywords and
relationships among them (e.g. protein-protein
interactions) from text sources.
- Quality assurance mechanisms
- Proposed
mechanisms (e.g., publication and annotation
standards) designed to increase the fidelity of and
confidence in the biological data resources.
- Publishing tricks for the Web
- Finding relevant
information requires intelligent search by browsing
users, but authors can also increase the chances their
resources are found using techniques (e.g., HTML META
tags) that better describe WWW pages to search
engines.