Archive for the ‘Web / Web 2.0’ Category

NLP: Not (Just) Language, People

As consumers become producers and, now, participants in online social communities, there are new opportunities and challenges in the increasing amounts of textual information and interactions on the web, within enterprises, in government, and in new types of social media and virtual worlds.

Natural Language Processing (NLP) researchers have traditionally regarded language as the object of study. In this talk, I argue that NLP is as much a study of people as of language per se. Doing NLP well requires us to model and reason about Content (domain knowledge), Context (goals and tasks), and Community (social context). I discuss why modeling the three C’s is difficult, and illustrate some approaches to these problems using examples from my recent academic and commercial projects.

Invited talk at PARC (Palo Alto Research Labs), Palo Alto, CA, January 2009

iReMedI – Intelligent Retrieval from Medical Information

Effective encoding of information is one of the keys to qualitative problem solving. Our aim is to explore Knowledge Representation techniques that capture meaningful word associations occurring in documents. We have developed iReMedI, a TCBR-based problem solving system as a prototype to demonstrate our idea. For representation we have used a combination of NLP and graph based techniques which we call as Shallow Syntactic Triples, Dependency Parses and Semantic Word Chains. To test their effectiveness we have developed retrieval techniques based on PageRank, Shortest Distance and Spreading Activation methods. The various algorithms discussed in the paper and the comparative analysis of their results provides us with useful insight for creating an effective problem solving and reasoning system.

Read the paper:

iReMedI – Intelligent Retrieval from Medical Information

by Saurav Sahay, Bharat Ravisekar, Anu Venkatesh, Sundaresan Venkatasubramanian, Priyanka Prabhu, Ashwin Ram

9th European Conference on Case-Based Reasoning (ECCBR-08), Trier, Germany
www.cc.gatech.edu/faculty/ashwin/papers/er-08-05.pdf

Discovering Semantic Biomedical Relations Utilizing The Web

To realize the vision of a Semantic Web for Life Sciences, discovering relations between resources is essential. It is very difficult to automatically extract relations from Web pages expressed in natural language formats. On the other hand, because of the explosive growth of information, it is difficult to manually extract the relations. In this paper we present techniques to automatically discover relations between biomedical resources from the Web. For this purpose we retrieve relevant information from Web Search engines and Pubmed database using various lexico-syntactic patterns as queries over SOAP web services. The patterns are initially handcrafted but can be progressively learnt. The extracted relations can be used to construct and augment ontologies and knowledge bases. Experiments are presented for general biomedical relation discovery and domain specific search to show the usefulness of our technique.

Read the paper:

Discovering Semantic Biomedical Relations utilizing the Web

by Saurav Sahay, Sougata Mukherjea, Eugene Agichtein, Ernie Garcia, Sham Navathe, Ashwin Ram

ACM Transactions on Knowledge Discovery from Data, 2(1):3, 2008
www.cc.gatech.edu/faculty/ashwin/papers/er-08-01.pdf

A Cognitive Model of Problem-Based Learning and its Application to Educational Software Design

Problem-based learning (PBL) is a constructivist pedagogy in which students learn in small groups by working on real-world problems. Despite its many benefits, however, this pedagogy is still not widely used in K-16 classrooms, especially with large numbers of students. Traditional human-facilitated PBL places intense demands on faculty to facilitate problem-solving sessions with small groups of students; on the other hand, most educational technologies do not provide PBL’s collaborative problem-solving experience.

We propose a cognitive model of the problem-based learning process. We present a software environment called CaseBook that allows instructors to author and share problems and provides students with a pedagogically-sound PBL experience based on the cognitive model. CaseBook has been used in high school and undergraduatefrom two studies in actual classrooms.

Read the paper:

A Cognitive Model of Problem-Based Learning and its Application to Educational Software Design

by Ashwin Ram, Preetha Ram, Jennifer Holzmann, Chris Sprague

International Conference on e-Learning (eLearn-07), Lisbon, Portugal, July 2007. Also presented at Eleventh International Conference on Human-Computer Interaction (INTERACT-07), Panel on Human-Centric e-Learning, Rio de Janeiro, Brazil, September 2007.

www.cc.gatech.edu/faculty/ashwin/papers/er-07-05.pdf

Domain Ontology Construction from Biomedical Text

NLM’s Unified Medical Language System (UMLS) is a very large ontology of biomedical and health data. In order to be used effectively for knowledge processing, it needs to be customized to a specific domain. In this paper, we present techniques to automatically discover domain-specific concepts, discover relationships between these concepts, build a context map from these relationships, link these domain concepts with the best-matching concept identifiers in UMLS using our context map and UMLS concept trees, and finally assign categories to the discovered relationships. This specific domain ontology of terms and relationships using evidential information can serve as a basis for applications in analysis, reasoning and discovery of new relationships. We have automatically built an ontology for the Nuclear Cardiology domain as a testbed for our techniques.

Read the paper:

Domain Ontology Construction from Biomedical Text

by Saurav Sahay, Baoli Li, Ernie Garcia, Eugene Agichtein, Ashwin Ram

International Conference on Artificial Intelligence (ICAI-07), Las Vegas, NV, June 2007
www.cc.gatech.edu/faculty/ashwin/papers/er-07-10.pdf

CaseBook: A Problem-Based Learning Online Environment For High School Microbiology

Problem-based learning (PBL) is an educational approach that allows students to improve problem solving and critical thinking skills while learning science. However, PBL requires significant teacher time and expertise to develop problems and facilitate small-group problem-solving sessions. With advances in technology, PBL can be used in today’s classrooms in an effective and scalable manner.

CaseBook is an interactive computer system that allows for easy integration of PBL into the K-16 curriculum. Through a simple web-based interface, teachers enter and edit their case materials. As students work through cases, CaseBook guides them through a 3-stage process in which they analyze, learn and reflect. Students may work independently, or a small group of students may work together and share a Team Notebook, which is used to record facts, ideas, and issues about the case as they progress. Students assess their progress through self and group reflection and through teacher feedback.

We report on the use of CaseBook for a microbiology case in a high school classroom. The results suggest that CaseBook is effective for both advanced and remedial students. As the technological capacity of students and classrooms increase, it is only appropriate to use this technology to implement novel methods of teaching that will provide students the skills they need post- graduation.

Read the paper:

CaseBook: A Problem-Based Learning Online Environment For High School Microbiology

by JL Holzman, G Louizi, SC Fowler, E Lindsey, JJ Harrigan, P Ram, A Ram

12th American Society for Microbiology (ASM) Conference for Undergraduate Educators, Atlanta, GA, May 2006
www.cc.gatech.edu/faculty/ashwin/papers/er-05-06.doc.pdf
www.cc.gatech.edu/faculty/ashwin/papers/er-05-06.pdf

Interactive Case-Based Reasoning for Precise Information Retrieval

The knowledge explosion has continued to outpace technological innovation in search engines and knowledge management systems. It is increasingly difficult to find relevant information, not just on the World Wide Web at large but even in domain- specific medium-sized knowledge bases—online helpdesks, maintenance records, technical repositories, travel databases, e-commerce sites, and many others. Despite advances in search and database technology, the average user still spends inordinate amounts of time looking for specific information needed for a given task.

This paper describes an adaptive system for the precise, rapid retrieval and synthesis of information from medium-sized knowledge bases in response to problem-solving queries from a diverse user population. We advocate a shift in perspective from “search” to “answers. Instead of returning dozens or hundreds of hits to a user, the system should attempt to find answers that may or may not match the query directly but are relevant to the user’s problem or task.

This problem has been largely overlooked as research has tended to concentrate on techniques for broad searches of large databases over the Internet (as exemplified by Google) and structured queries of well-defined databases (as exemplified by SQL). However, the problem discussed in this chapter is sufficiently different from these extremes to both present a novel set of challenges as well as provide a unique opportunity to apply techniques not traditionally found in the information retrieval literature. Specifically, we discuss an innovative combination of techniques‚ case-based reasoning coupled with text analytics‚ to solve the problem in a practical, real-world context.

We are interested in applications in which users must quickly retrieve answers to specific questions or problems from a complex information database with a minimum of effort and interaction. Examples include internal helpdesk support, web-based self-help for consumer products, decision-aiding systems for support personnel, and repositories for specialized documents such as patents, technical documents, or scientific literature. These applications are characterized by the fact that a diverse user population accesses highly focused knowledge bases in order to find precise answers to specific questions or problems. Despite the growing popularity of on-line service and support facilities for internal use by employees and for external use for customers, most such sites rely on traditional search engine technologies and are not very effective in reducing the time, expertise, and complexity required on the user’s part.

Read the paper:

Interactive Case-Based Reasoning for Precise Information Retrieval

by Ashwin Ram, Mark Devaney

In Case-Based Reasoning in Knowledge Discovery and Data Mining, David Aha and Sankar Pal (editors).
www.cc.gatech.edu/faculty/ashwin/papers/er-05-02.pdf

Scaling Spreading Activation for Information Retrieval

The Information Retrieval Intelligent Assistant (IRIA) project applies principles of memory retrieval from cognitive science to the problem of information retrieval from large heterogeneous databases. IRIA uses spreading activation over a semantic network for information retrieval, a technique which has proven effective in a variety of tasks. However, some of the very features which motivated the choice of spreading activation for information retrieval — such the use of fanout to automatically compute term weights, or the use of thresholds to automatically limit computation spent on irrelevant items — can introduce new problems as systems are scaled to larger sizes.

This paper discusses the use of semantic networks and spreading activation for information retrieval in the context of the IRIA approach, reviews some of the problems that arise as these technologies are scaled up to production systems, presents some preliminary results that illustrate these problems in practice, and discusses potential solutions.

Read the paper:

Scaling Spreading Activation for Information Retrieval

by Anthony Francis, Mark Devaney, Juan Santamaria, Ashwin Ram

International Conference on Artificial Intelligence (ICAI-01), Las Vegas, Nevada, March 2001
www.cc.gatech.edu/faculty/ashwin/papers/er-01-01.pdf

IRIA: The Information Research Intelligent Assistant

The explosion of information in the modern environment demands the ability to collect, organize, manage, and search large amounts of information across a wide variety of real-world applications. The primary tools available for such tasks are large-scale database systems and keyword-based document search techniques. However, such tools are rapidly proving inadequate: traditional database systems do not enable ready access to relevant knowledge, prompting a market of add-ons and existing search techniques are insufficiently precise or selective to support such tasks, leading to consumer exasperation. In the end users are left unsatisfied, confronted with a sea of unorganized and unhelpful data. A new approach is needed.

The Information Research Intelligent Assistant (IRIA) is an integrated information retrieval architecture that addresses this problem. IRIA enables a user or workgroup to build a personalized map of the relevant information available in a database, intranet, or internet, and the ability to find, add, and use information quickly and easily. An IRIA-based intelligent information management system acts as an autonomous assistant to a user working on a task, working unobtrusively in the background to learn both the user’s interests and the resources available to satisfy those interests. This approach enables “reminding engines” which monitor a user’s work to proactively find and recommend useful information as well as “workgroup memories” which learn from a user’s behavior to build a comprehensive knowledge map of a particular area of interest.

In empirical tests, IRIA has demonstrated the ability to monitor a user’s progress on a task (specifically, web search) and proactively find and recommend information relevant to that task based on the context and history of the user’s interactions with the system. IRIA further demonstrated that it could provide collaborative facilities to the workgroup and that it could learn and improve its knowledge map over time.

Read the paper:

IRIA: The Information Research Intelligent Assistant

by Anthony Francis, Mark Devaney, Ashwin Ram

International Conference on Artificial Intelligence (ICAI-00), Las Vegas, Nevada
www.dresan.com/research/publications/icai-2000.html

PML: Representing Procedural Domains for Multimedia Presentations

A central issue in the development of multimedia systems is the presentation of the information to the user of the system and how to best represent that information to the designer of the system. Typically, the designers create a system in which content and presentation are inseparably linked; specific presentations and navigational aids are chosen for each piece of content and hard-coded into the system.

We argue that the representation of content should be decoupled from the design of the presentation and navigational structure, both to facilitate modular system design and to permit the construction of dynamic multimedia systems that can determine appropriate presentations in a given situation on the fly. We propose a new markup language called PML (Procedural Markup Language) which allows the content to be represented in a flexible manner by specifying the knowledge structures, the underlying physical media, and the relationships between them using cognitive media roles. The PML description can then be translated into different presentations depending on such factors as the context, goals, presentation preferences, and expertise of the user.

Read the paper:

PML: Representing Procedural Domains for Multimedia Presentations

by Ashwin Ram, Rich Catrambone, Mark Guzdial, Colleen Kehoe, Scott McCrickard, John Stasko

IEEE Multimedia, 6(2):40-52, 1999
www.cc.gatech.edu/faculty/ashwin/papers/git-gvu-98-20.pdf