The Development and Testing of a Natural Language Processor that Identifies Various Parts of Speech



The purpose of this experiment is to design and test a natural language processor that is able to successfully break apart a sentence and identify the part of speech of each word in the sentence with at least fifty per cent accuracy on first grade level sentences, and at least 30 per cent accuracy on second grade level sentences. A natural language processor is a computer program that can interpret a natural language such as English, Spanish, or French.

Hypothesis 1 states that first grade level sentences will have a 50% or higher accuracy. Hypothesis 2 states that second grade level sentences will have a higher accuracy than 30%. Hypothesis 3 states that first grade sentences will have a higher accuracy than second grade.

The natural language processor was written using C++ and uses strings, maps, and lists. Tests were performed by the user typing a sentence into the dialog and the program breaking the sentence apart, identifying the parts of speech, and displaying a report of the sentence, including type of sentence number of words, length of sentence, and parts of speech.

The data did not support hypotheses 1 & 2 and they were rejected. The data did support hypothesis 3, first grade sentences had a higher accuracy than second grade sentences, and it was accepted.