Skip to main content

Software  > Globalization > LanguageWare > 

Globalize your On Demand Business

LanguageWare is the new generation IBM linguistic platform. It was designed from the ground up to address the demands posed by today's global applications.
Challenges

Extracting knowledge from unstructured information requires a high level of language understanding. These examples emphasize the complexities that need to be solved:

  • Categorization organizes unstructured data into an understandable structure. This process determines the content and the context of documents so they can be grouped with similar documents. For example, an investment analyst following Virgin Express Holdings might want to review all news articles that quote the chairman of Virgin Express Holdings, Richard Branson. A search for both 'Richard Branson' and 'Virgin Express' might return results that include an article describing a Branson press conference regarding Virgin Express Holdings strategy, an article about Branson's hot air ballooning endeavours, and the document you're now reading. However, while each document includes references to Richard Branson, their context and relevance are dramatically different. To be effective, categorization technology needs to identify contextual differences and group the articles accordingly.
  • Search defines the content and context of requested information. For example, a keyword search on 'equity' at the Google search engine yields 8,460,000 matches, and the first 20 results cover a variety of topics, including gender equity, equity for actors, equity analytics, a real estate company, home loans, and venture capital. Further refining these search results with a sub-search on 'analytics' still produces 31,800 matches, with most of the first 20 results linking to information about companies with 'equity analytics' in their name. However, none of the entries were about the process of equity analytics. Keyword searches do not discriminate between word senses and do not understand the context being used, and cannot effectively find documents with context-specific information.

This search example is familiar to anyone who has used Web search engines, and it illustrates the limitations of simple keyword searches. For this reason, search vendors are working to improve precision through human language technology (HLT) techniques. These technologies allow the use of free-form questions, spelling assistance, and the identification of word variations (morphological, typographical, orthographical and derivational).

In the same way that categorization engines differentiate context from text, a natural language search attempts to discern context based on the user's reqeust, which can then be matched to the context of the appropriately categorized documents.

Ideally, the presentation of the search results should also present an explanation of why the particular result was returned. This provides value in two ways. First, it raises the user's confidence level that the result set is relevant to their needs. Second, if the result set is not targeted enough, the explanation might clue the user to changes in the query that would yield better results.

Continue to "Human language technology (HLT)"


E-mail us
Easy ways to get the answers you need.
E-mail us