Comprehensive coverage

to look for an idea - not a keyword

Researchers at the IBM Research Laboratory in Haifa help develop systems for searching using knowledge or ideas instead of keywords

Hayadan

Direct link to this page: https://www.hayadan.org.il/ibm260805.html

Search engines have long become part of our daily routine. The spoken English language even grew and created a new verb - to Google - meaning "to search on Google". Time and time again we try to identify the right keyword, and tap it hoping to locate the content we need. But, when looking for knowledge and new ideas - why actually settle for keywords only?

This question gave rise to extensive research in the field of text analysis, a powerful technology that allows users to penetrate the bowels of unstructured information and look for ideas in it - instead of keywords. Unstructured information is found in a wide variety of sources - text documents, image files, sound or video, blogs or e-mail. All of them are characterized by a format that does not include a predefined key, and present a challenge to the searcher and the search system.

The scientists of the IBM research laboratory in Haifa are now taking a central part in the development of an unstructured information management architecture - UIMA - which is already offered as an open source system, and is capable of processing this unstructured information in order to understand its meanings, the contexts and the relevant facts contained in the analyzed content. UIMA allows the software to search and give meaning to the various forms of information, and to offer the user a search at the concept level - and not at the keyword level.

In order to simplify and facilitate the construction of text analysis applications, IBM has integrated UIMA into its WebSphere information integration product line - its application server. The WebSphere Information Integrator OmniFind Edition system is the first software product that processes information based on the UIMA standard. The OmniFind system also incorporates an information retrieval algorithm and additional capabilities developed in the IBM research laboratory in Haifa. These capabilities expand the UIMA platform, and enable the automatic construction of an index that enables quick retrieval of information from the analyzed text.

Roni Lampel, manager of the information retrieval group at IBM Laboratories in Haifa, explains that the system takes the field of text analysis one step further, and enables the easy and fast development of applications that allow identifying, searching and retrieving the knowledge from the stored texts.

The search in document repositories is usually done by using a special query language - or by combining keywords. Analysis of texts provides and defines a structure for unstructured content through the identification of key terms such as names of people, organizations, events - and the relationship between such key factors, as it is hidden in the text. Text analysis can also identify new concepts or unfamiliar facts - and understand them in the context in which they appear in the unstructured document. So, for example, when a user searches for "world leaders", the system will retrieve information about presidents, prime ministers and religious leaders - even if the user did not include these terms in his query.

Text analysis is already proving itself in early warning systems, customer service centers and medical applications. Solutions from the field of text analysis are used in these fields in order to uncover and discover the difficulties between different types of information and facts that are hidden at different points in different documents and files. In one case, a company used the UIMA platform to develop a text mining solution that would allow car manufacturers to process unstructured information contained in warranty repair claims, maintenance records, repair requests and customer service call logs. The aggregated information is used to provide early warning about problems in the products going to the market.

Another company has developed a series of text analysis components that allow to reveal and identify criminal or terrorist activity. The system analyzes information such as field reports, bills of lading and wiretapping transcripts - and cross-checks them with news articles, publications and international and local money transfer data.

Roni Lampel adds that the work in Haifa focuses on the field of semantic search, using knowledge or ideas - the next to replace the search based on keywords. The semantic search technology developed in Haifa has proven itself in a series of international competitions, such as INEX, where it is used to search and extract semi-structured information from documents written in XML format.

The development of UIMA was accelerated thanks to the joint work with DARPA - the central development organization of the US Department of Defense. A number of leading universities and research institutes, as well as research and development organizations contributed to the advancement of the process. Some of the universities participating in the development effort, such as Carnegie Mellon, Columbia, Stanford and the University of Massachusetts, are already using UIMA in courses and projects in the research fields.

Over 15 software manufacturers have already announced that they will adopt UIMA on a commercial basis. These companies are expected to provide software applications that comply with this standard, solutions and services that will deal with the special needs of different industries.

https://www.hayadan.org.il/BuildaGate4/general2/data_card.php?Cat=~~~249888080~~~207&SiteName=hayadan

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.