The future of natural-language processing

By Chimezie Thomas-Ogbuji, Unix Insider |  Development 1 comment

Ontology: A formal, explicit specification of how to represent the objects, concepts, and other entities in a particular system, as well as the relationships between them


Natural-language processing (NLP) is an area of artificial intelligence research that attempts to reproduce the human interpretation of language. NLP methodologies and techniques assume that the patterns in grammar and the conceptual relationships between words in language can be articulated scientifically. The ultimate goal of NLP is to determine a system of symbols, relations, and conceptual information that can be used by computer logic to implement artificial language interpretation.


Natural-language processing has its roots in semiotics, the study of signs. Semiotics was developed by Charles Sanders Peirce (a logician and philosopher) and Ferdinand de Saussure (a linguist). Semiotics is broken up into three branches: syntax, semantics, and pragmatics.


A complete natural-language processor extracts meaning from language on at least seven levels. However, we'll focus on the four main levels.


Morphological: A morpheme is the smallest part of a word that can carry a discrete meaning. Morphological analysis works with words at this level. Typically, a natural-language processor knows how to understand multiple forms of a word: its plural and singular, for example.


Syntactic: At this level, natural-language processors focus on structural information and relationships.


Semantic: Natural-language processors derive an absolute (dictionary definition) meaning from context.


Pragmatic: Natural-language processors derive knowledge from external commonsense information.


A practical reality?



The realization of a fully communicating artificial intelligence was long considered a science fiction fantasy. However, with the advent of the World Wide Web, XML, and the World Wide Web Consortium's (W3C) RDF, NLP could become a pervasive reality. With powerful Web crawlers needing to index an exponentially growing collection of resources, it's no surprise that information management and data querying is an area that might benefit immensely from NLP.


So, why hasn't NLP escaped a backdrop of impractical artificial intelligence software implementations? How does XML technology fit into all this?


Natural-language limitations



One of the major limitations of modern NLP is that most linguists approach NLP at the pragmatic level by gathering huge amounts of information into large knowledge bases that describe the world in its entirety. These academic knowledge repositories are defined in ontologies that take on a life of their own and never end up in practical, widespread use. There are various knowledge bases, some commercial and some academic. The largest and most ambitious is the Cyc Project. The Cyc Knowledge Server is a monstrous inference engine and knowledge base. Even natural-language modules that perform specific, limited, linguistic services aren't financially feasible for use by the average developer.


In general, NLP faces the following challenges:

  • Physical limitations: The greatest challenge to NLP is representing a sentence or group of concepts with absolute precision. The realities of computer software and hardware limitation make this challenge nearly insurmountable. The realistic amount of data necessary to perform NLP at the human level requires a memory space and processing capacity that is beyond even the most powerful computer processors.
  • No unifying ontology: NLP suffers from the lack of a unifying ontology that addresses semantic as well as syntactic representation. The various competing ontologies serve only to slow the advancement of knowledge management.

  • No unifying semantic repository: NLP lacks an accessible and complete knowledge base that describes the world in the detail necessary for practical use. The most successful commercial knowledge bases are limited to licensed use and have little chance of wide adoption. Even those with the most academic intentions develop at an unacceptable pace.

  • Current information retrieval systems: The performance of most of the current information retrieval systems is affected by semantic overload. Web crawlers, limited by their method of indexing, more often than not return incorrect matches as a result of ambiguous interpretation.

1 comment

    Anonymous 3 years ago
    A good tool is the Cypher transcoder, a NLP Semantic Web application which produces SPARQL and RDF from plain language

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      DevelopmentWhite Papers & Webcasts

      White Paper

      HP NonStop SQL Fundamentals whitepaper

      This whitepaper offers a detailed look into the fundamentals of HP NonStop SQL solutions. See how this system delivers unprecedented levels of application availability with fail-safe data integrity and meets the needs of enterprises with large-scale business critical applications.

      White Paper

      Nebraska Medical Center case study

      See how the Nebraska Medical Center implemented a SQL solution to make information more readily available to streamline operations, improve patient care and facilitate medical research with an enterprise solution running on HP NonStop servers.

      White Paper

      Concepts of NonStop SQL/MX

      For DBAs and developers who are familiar with Oracle solutions and want to learn about NonStop SQL/MX, this whitepaper provides an overview of the similarities and differences between the two products-with a specific focus on implementation.

      White Paper

      6 Things Your CIO Needs to Know About Requirements

      If your organization is not predictably successful on technology projects, there is likely an issue in requirements. CIOs must take action and own requirements maturity improvement. There are 6 main things a CIO must know about requirements.

      Webcast On Demand

      User Experience Monitoring

      In this webinar, you will learn hints & tips for improving end-user response times from Forrester Research analyst, Jean-Pierre Garbani.

      Sponsor: Nimsoft

      See more White Papers | Webcasts

      Ask a question

      Ask a Question