BioText - Infrastructure for Mining of Biological Text

Gaurav Bhalotia and Ariel Schwartz
(Professor Marti A. Hearst --SIMS)
GAANN Fellowship and Genentech

The BioText project's main goal is to provide an intelligent information extraction and retrieval system for use in biomedical and genomics research. The system would enable fast and flexible access to text-based information needed by biological scientists, and would also provide an efficent, modular infrastructure for NLP scientists developing text-mining and text-analysis algorithms [1-3].

We are working on the design and implementation of the systemís infrastructure. Our main interest is in extending object relational databases to support the special requirement of information extraction from biomedical text. Current plans are for the system to include:

M. Hearst, "Untangling Text Data Mining," Proc. ACL Mtg. Assoc. Computational Linguistics, University of Maryland, June 1999.
B. Rosario, M. Hearst, and C. Fillmore, "The Descent of Hierarchy, and Selection in Relational Semantics," Proc. Assoc. Computational Linguistics, July 2002.
A. Schwartz and M. Hearst, "A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text," Proc. Pacific Symp. Biocomputing, Kauai, HI, January 2003.

More information ( or

Send mail to the author : (

Edit this abstract