Posts Tagged ‘information retrieval’

Discovering Semantic Biomedical Relations Utilizing The Web

To realize the vision of a Semantic Web for Life Sciences, discovering relations between resources is essential. It is very difficult to automatically extract relations from Web pages expressed in natural language formats. On the other hand, because of the explosive growth of information, it is difficult to manually extract the relations. In this paper we present techniques to automatically discover relations between biomedical resources from the Web. For this purpose we retrieve relevant information from Web Search engines and Pubmed database using various lexico-syntactic patterns as queries over SOAP web services. The patterns are initially handcrafted but can be progressively learnt. The extracted relations can be used to construct and augment ontologies and knowledge bases. Experiments are presented for general biomedical relation discovery and domain specific search to show the usefulness of our technique.

Read the paper:

Discovering Semantic Biomedical Relations utilizing the Web

by Saurav Sahay, Sougata Mukherjea, Eugene Agichtein, Ernie Garcia, Sham Navathe, Ashwin Ram

ACM Transactions on Knowledge Discovery from Data, 2(1):3, 2008

Adapting Associative Classification to Text Categorization

Associative classification, which originates from numerical data mining, has been applied to deal with text data recently. Text data is firstly digitalized to database of transactions, and then training and prediction is actually conducted on the derived numerical dataset. This intuitive strategy has demonstrated quite good performance. However, it doesn’t take into consideration the inherent characteristics of text data as much as possible, although it has to deal with some specific problems of text data such as lemmatizing and stemming during digitalization. In this paper, we propose a bottom-up strategy to adapt associative classification to text categorization, in which we take into account structure information of text. Experiments on Reuters-21578 dataset show that the proposed strategy can make use of text structure information and achieve better performance.

Read the paper:

Adapting Associative Classification to Text Categorization

by Baoli Li, Neha Sugandh, Ernie Garcia, Ashwin Ram

ACM Conference on Document Engineering (ACM-DocEng-07), Winnipeg, Canada, August 2007

Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations.

The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.

Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships

by Ying Liu, Sham Navathe, Jorge Civera, Venu Dasigi, Ashwin Ram, Brian Ciliax, Ray Dingledine

IEEE/ACM Transactions on Computational Biology and Bioinformatics,2(4):380-384, Oct-Dec 2005

Interactive Case-Based Reasoning for Precise Information Retrieval

The knowledge explosion has continued to outpace technological innovation in search engines and knowledge management systems. It is increasingly difficult to find relevant information, not just on the World Wide Web at large but even in domain- specific medium-sized knowledge bases—online helpdesks, maintenance records, technical repositories, travel databases, e-commerce sites, and many others. Despite advances in search and database technology, the average user still spends inordinate amounts of time looking for specific information needed for a given task.

This paper describes an adaptive system for the precise, rapid retrieval and synthesis of information from medium-sized knowledge bases in response to problem-solving queries from a diverse user population. We advocate a shift in perspective from “search” to “answers. Instead of returning dozens or hundreds of hits to a user, the system should attempt to find answers that may or may not match the query directly but are relevant to the user’s problem or task.

This problem has been largely overlooked as research has tended to concentrate on techniques for broad searches of large databases over the Internet (as exemplified by Google) and structured queries of well-defined databases (as exemplified by SQL). However, the problem discussed in this chapter is sufficiently different from these extremes to both present a novel set of challenges as well as provide a unique opportunity to apply techniques not traditionally found in the information retrieval literature. Specifically, we discuss an innovative combination of techniques‚ case-based reasoning coupled with text analytics‚ to solve the problem in a practical, real-world context.

We are interested in applications in which users must quickly retrieve answers to specific questions or problems from a complex information database with a minimum of effort and interaction. Examples include internal helpdesk support, web-based self-help for consumer products, decision-aiding systems for support personnel, and repositories for specialized documents such as patents, technical documents, or scientific literature. These applications are characterized by the fact that a diverse user population accesses highly focused knowledge bases in order to find precise answers to specific questions or problems. Despite the growing popularity of on-line service and support facilities for internal use by employees and for external use for customers, most such sites rely on traditional search engine technologies and are not very effective in reducing the time, expertise, and complexity required on the user’s part.

Read the paper:

Interactive Case-Based Reasoning for Precise Information Retrieval

by Ashwin Ram, Mark Devaney

In Case-Based Reasoning in Knowledge Discovery and Data Mining, David Aha and Sankar Pal (editors).

Scaling Spreading Activation for Information Retrieval

The Information Retrieval Intelligent Assistant (IRIA) project applies principles of memory retrieval from cognitive science to the problem of information retrieval from large heterogeneous databases. IRIA uses spreading activation over a semantic network for information retrieval, a technique which has proven effective in a variety of tasks. However, some of the very features which motivated the choice of spreading activation for information retrieval — such the use of fanout to automatically compute term weights, or the use of thresholds to automatically limit computation spent on irrelevant items — can introduce new problems as systems are scaled to larger sizes.

This paper discusses the use of semantic networks and spreading activation for information retrieval in the context of the IRIA approach, reviews some of the problems that arise as these technologies are scaled up to production systems, presents some preliminary results that illustrate these problems in practice, and discusses potential solutions.

Read the paper:

Scaling Spreading Activation for Information Retrieval

by Anthony Francis, Mark Devaney, Juan Santamaria, Ashwin Ram

International Conference on Artificial Intelligence (ICAI-01), Las Vegas, Nevada, March 2001

IRIA: The Information Research Intelligent Assistant

The explosion of information in the modern environment demands the ability to collect, organize, manage, and search large amounts of information across a wide variety of real-world applications. The primary tools available for such tasks are large-scale database systems and keyword-based document search techniques. However, such tools are rapidly proving inadequate: traditional database systems do not enable ready access to relevant knowledge, prompting a market of add-ons and existing search techniques are insufficiently precise or selective to support such tasks, leading to consumer exasperation. In the end users are left unsatisfied, confronted with a sea of unorganized and unhelpful data. A new approach is needed.

The Information Research Intelligent Assistant (IRIA) is an integrated information retrieval architecture that addresses this problem. IRIA enables a user or workgroup to build a personalized map of the relevant information available in a database, intranet, or internet, and the ability to find, add, and use information quickly and easily. An IRIA-based intelligent information management system acts as an autonomous assistant to a user working on a task, working unobtrusively in the background to learn both the user’s interests and the resources available to satisfy those interests. This approach enables “reminding engines” which monitor a user’s work to proactively find and recommend useful information as well as “workgroup memories” which learn from a user’s behavior to build a comprehensive knowledge map of a particular area of interest.

In empirical tests, IRIA has demonstrated the ability to monitor a user’s progress on a task (specifically, web search) and proactively find and recommend information relevant to that task based on the context and history of the user’s interactions with the system. IRIA further demonstrated that it could provide collaborative facilities to the workgroup and that it could learn and improve its knowledge map over time.

Read the paper:

IRIA: The Information Research Intelligent Assistant

by Anthony Francis, Mark Devaney, Ashwin Ram

International Conference on Artificial Intelligence (ICAI-00), Las Vegas, Nevada

Context-Sensitive Asynchronous Memory

Retrieving useful answers from large knowledge bases given under-specified questions is an important problem in the construction of general intelligent agents. The core of this problem is how to get the information an agent needs when it doesn’t know how to ask the right question and doesn’t have the time to exhaustively search all available information.

Context-sensitive asynchronous memory is a model of memory retrieval that solves this problem. The context-sensitive asynchronous memory approach exploits feedback from the task and environment to guide and constrain memory search by interleaving memory retrieval and problem solving. To achieve this behavior, a context-sensitive asynchronous memory uses an asynchronous retrieval system to manage a context- sensitive search process operating over a content-addressable knowledge base. Solutions based on this approach provide useful answers to vague questions efficiently, based on information naturally available during the performance of a task.

The core claims of this approach are:
•  Claim 1: An efficient, domain-independent solution to the problem of retrieving useful answers from large knowledge bases given under-specified queries is to interleave memory retrieval with task performance and use feedback from the task or environment to guide the search of memory.
•  Claim 2: Interleaving memory retrieval with and exploiting feedback from task performance can be achieved in a domain-independent way using a context- sensitive, asynchronous memory retrieval process.
•  Claim 3: A rich, reified, grounded semantic network representation enables context-sensitive memory retrieval processes to retrieve useful information in a domain-independent way for a wide variety of tasks.
•  Claim 4: To effectively use a context-sensitive asynchronous memory to retrieve useful answers, a task must be able to work in parallel with a memory process, communicate with it, provide feedback to it, and must possess integration mechanisms to incorporate asynchronous retrievals provided by the memory.

The context-sensitive asynchronous memory approach is applicable to tasks and domains which exhibit the following criteria: problems are difficult to solve, questions are difficult to formulate, a large knowledge base is available yet contains only a small selection of relevant information, and, most importantly, the environment is regular, in that solutions in the knowledge base occur in patterns and relationships similar to those found in situations in which the solutions are likely to be applicable in the future. This approach is domain independent: it is applicable to a wide variety of tasks and problems from simple search applications to complex cognitive agents.

To exploit context-sensitive asynchronous memory, reasoners need certain properties. Experience-based agency is an agent architecture which provides an outline of how to construct complete intelligent agents which use a context-sensitive asynchronous memory to support a reasoning system performing a real task. The experience-based agent architecture combines a context-sensitive asynchronous memory retrieval process with a global store of experience used by all agent processes, a global working memory to provide a uniform way to collect feedback, and a global task controller which orchestrates reasoning and memory. The experience-based agent architecture also provides principles for constructing integration mechanisms that enable reasoning tasks to work with the context-sensitive asynchronous memory.

Furthermore, to help determine when these approaches should be used, this research also contributes theoretical analyses that predict the classes of tasks and situations in which the context-sensitive asynchronous memory and experience-based agent approaches will provide the greatest benefit.

To evaluate the approach, the experience-based agent architecture has been implemented in the Nicole system. Nicole is a large Common Lisp program providing global long-term and working memory stores represented as a rich, reified, grounded semantic network, a context-sensitive asynchronous memory process based on a novel model of context-directed spreading activation, a control system for orchestrating reasoning and memory, and a task language to implement reasoning tasks. Nicole enables the context-sensitive asynchronous memory approach to be applied to real problems, including information retrieval in Nicole-IRIA, a information management application that uses context to recommend useful information (Francis et al. 2000), planning in Nicole-MPA, a case-based least-commitment planner that adapts multiple plans (Ram & Francis 1995) and language understanding in ISAAC (Moorman 1997), a story understanding system which uses Nicole’s retrieval system as part of its creative understanding process. Nicole and her children thus provide a testbed to evaluate the context-sensitive asynchronous memory approach.

Experiments with Nicole support the claims of the approach. Experiments with Nicole-IRIA demonstrate that a context-sensitive asynchronous memory can use feedback from browsing to improve the quality of memory retrieval, while experiments with Nicole-MPA demonstrate how information derived from reasoning can improve the quantity of retrieval. The use of Nicole’s memory in the ISAAC system demonstrates the generality of the context-sensitive asynchronous memory approach. Other experiments with Nicole-MPA demonstrate the importance of representation as a source of power for context-sensitive asynchronous memory, and further demonstrate that the core features of the experience-based agent architecture are crucial sources of power necessary to enable a reasoning task to work with and exploit a context-sensitive asynchronous memory.

In sum, these evaluations demonstrate that the context-sensitive asynchronous memory approach is a general approach to memory retrieval which can provide concrete benefits to real problems.

Read the thesis:

Context-Sensitive Asynchronous Memory

by Anthony Francis

PhD Thesis, College of Computing, Georgia Institute of Technology, Atlanta, GA, 2000

Structuring On-The-Job Troubleshooting Performance to Aid Learning

This paper describes a methodology for aiding the learning of troubleshooting tasks in the course of an engineer’s work. The approach supports learning in the context of actual, on-the-job troubleshooting and, in addition, supports performance of the troubleshooting task in tandem. This approach has been implemented in a computer tool called WALTS (Workspace for Aiding and Learning Troubleshooting).

This method aids learning by helping the learner structure his or her task into the conceptual components necessary for troubleshooting, giving advice about how to proceed, suggesting candidate hypotheses and solutions, and automatically retrieving cognitively relevant media. WALTS includes three major components: a structured dynamic workspace for representing knowledge about the troubleshooting process and the device being diagnosed; an intelligent agent that facilitates the troubleshooting process by offering advice; and an intelligent media retrieval tool that automatically presents candidate hypotheses and solutions, relevant cases, and various other media. WALTS creates resources for future learning and aiding of troubleshooting by storing completed troubleshooting instances in a self-populating database of troubleshooting cases.

The methodology described in this paper is partly based on research in problem-based learning, learning by doing, case-based reasoning, intelligent tutoring systems, and the transition from novice to expert. The tool is currently implemented in the domain of remote computer troubleshooting.

Read the paper:

Structuring On-The-Job Troubleshooting Performance to Aid Learning

by Brian Minsk, Hari Balakrishnan, Ashwin Ram

World Conference on Engineering Education, Minneapolis, MN, October 1995

Interest-based Information Filtering and Extraction in Natural Language Understanding Systems

Given the vast amount of information available to the average person, there is a growing need for mechanisms that can select relevant or useful information based on some specification of the interests of a user. Furthermore, experience with natural language understanding and reasoning programs in artificial intelligence has demonstrated that the combinatorial explosion of possible conclusions that can be drawn from any input is a serious computational bottleneck in the design of computer programs that process information automatically.

This paper presents a theory of interestingness that serves as the basis for two story understanding programs, one that can filter and extract information likely to be relevant or interesting to a user, and another that can formulate and pursue its own interests based on an analysis of the information necessary to carry out the tasks it is pursuing. We discuss the basis for our theory of interestingness, heuristics for interest-based processing of information, and the process used to filter and extract relevant information from the input.

Read the paper:

Interest-based Information Filtering and Extraction in Natural Language Understanding Systems

by Ashwin Ram

Bellcore Workshop on High-Performance Information Filtering, Morristown, NJ, November 1991

A Goal-based Approach to Intelligent Information Retrieval

Intelligent information retrieval (IIR) requires inference. The number of inferences that can be drawn by even a simple reasoner is very large, and the inferential resources available to any practical computer system are limited. This problem is one long faced by AI researchers. In this paper, we present a method used by two recent machine learning programs for control of inference that is relevant to the design of IIR systems.

The key feature of the approach is the use of explicit representations of desired knowledge, which we call knowledge goals. Our theory addresses the representation of knowledge goals, methods for generating and transforming these goals, and heuristics for selecting among potential inferences in order to feasibly satisfy such goals. In this view, IIR becomes a kind of planning: decisions about what to infer, how to infer and when to infer are based on representations of desired knowledge, as well as internal representations of the system’s inferential abilities and current state.

The theory is illustrated using two case studies, a natural language understanding program that learns by reading novel newspaper stories, and a differential diagnosis program that improves its accuracy with experience. We conclude by making several suggestions on how this machine learning framework can be integrated with existing information retrieval methods.

Read the paper:

A Goal-based Approach to Intelligent Information Retrieval

by Ashwin Ram, Larry Hunter

Eighth International Workshop on Machine Learning (ICML-91), Chicago, IL, June 1991