topics that matter; ideas worth sharing

share a tip, submit a link, add something new

The future of natural-language processing

March 29, 2001, 06:13 PM —  Unix Insider — 


Ontology: A formal, explicit specification of how to represent the objects, concepts, and other entities in a particular system, as well as the relationships between them


Natural-language processing (NLP) is an area of artificial intelligence research that attempts to reproduce the human interpretation of language. NLP methodologies and techniques assume that the patterns in grammar and the conceptual relationships between words in language can be articulated scientifically. The ultimate goal of NLP is to determine a system of symbols, relations, and conceptual information that can be used by computer logic to implement artificial language interpretation.


Natural-language processing has its roots in semiotics, the study of signs. Semiotics was developed by Charles Sanders Peirce (a logician and philosopher) and Ferdinand de Saussure (a linguist). Semiotics is broken up into three branches: syntax, semantics, and pragmatics.


A complete natural-language processor extracts meaning from language on at least seven levels. However, we'll focus on the four main levels.


Morphological: A morpheme is the smallest part of a word that can carry a discrete meaning. Morphological analysis works with words at this level. Typically, a natural-language processor knows how to understand multiple forms of a word: its plural and singular, for example.


Syntactic: At this level, natural-language processors focus on structural information and relationships.


Semantic: Natural-language processors derive an absolute (dictionary definition) meaning from context.


Pragmatic: Natural-language processors derive knowledge from external commonsense information.


A practical reality?



The realization of a fully communicating artificial intelligence was long considered a science fiction fantasy. However, with the advent of the World Wide Web, XML, and the World Wide Web Consortium's (W3C) RDF, NLP could become a pervasive reality. With powerful Web crawlers needing to index an exponentially growing collection of resources, it's no surprise that information management and data querying is an area that might benefit immensely from NLP.


So, why hasn't NLP escaped a backdrop of impractical artificial intelligence software implementations? How does XML technology fit into all this?


Natural-language limitations



One of the major limitations of modern NLP is that most linguists approach NLP at the pragmatic level by gathering huge amounts of information into large knowledge bases that describe the world in its entirety. These academic knowledge repositories are defined in ontologies that take on a life of their own and never end up in practical, widespread use. There are various knowledge bases, some commercial and some academic. The largest and most ambitious is the Cyc Project. The Cyc Knowledge Server is a monstrous inference engine and knowledge base. Even natural-language modules that perform specific, limited, linguistic services aren't financially feasible for use by the average developer.


In general, NLP faces the following challenges:

  • Physical limitations: The greatest challenge to NLP is representing a sentence or group of concepts with absolute precision. The realities of computer software and hardware limitation make this challenge nearly insurmountable. The realistic amount of data necessary to perform NLP at the human level requires a memory space and processing
I like it!
Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
Resources
White Paper

Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.

Webcast

Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.

White Paper

Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.

Free stuff
Featured Sponsor

AISO founders envisioned a Web hosting company that was environmentally friendly. While the company employed energy-efficient innovations like solar panels, its infrastructure produced unacceptable power and cooling requirements. Find out how AISO leveraged AMD technology to overcome their challenge in this case study white paper.

In this whitepaper, Scalar explores the opportunity to change the landscape with respect to mission critical databases built around Oracle. Leveraging technologies such as Linux, high-end commodity processing power and Oracle RAC technology to architect, design, build and maintain database infrastructure that delivers maximum availability, reliability and performance at a fraction of traditional cost.

On a typical day, weather.com, the Web site for The Weather Channel in Atlanta, serves up between 15 million and 20 million page views. But in September 2004, when back-to-back hurricanes ransacked Florida, the peak traffic on one day more than tripled: over 70 million page views by more than 7 million unique visitors. Read the full success story now.

More Resources