To facilitate the interpretation of large data sets generated by DNA microarray studies, we are 1) developing a text mining system to extract keywords from MEDLINE abstracts associated with individual gene names and 2) investigating several clustering algorithms to determine relationships between genes based on shared keywords. The basic mechanisms of our keyword extraction algorithm was described previously (Soc Neurosci Abstr 2001, 557.4). Recent progress in evaluating the performance of this algorithm through Precision-Recall calculations and in using extracted keywords to accurately cluster predefined groups of genes are reported here.
Evaluating Text-Mining Strategies for Interpreting DNA Microarray Expression Profiles
by Brian Ciliax, Ying Liu, Jorge Civera, Ashwin Ram, Sham Navathe, Ray Dingledine
Annual Meeting of the Society for Neuroscience (Soc Neurosci Abstr), Orlando, FL, September 2002www.cc.gatech.edu/faculty/ashwin/papers/er-02-01.pdf