The Development and Testing of a Natural Language Processor that Identifies Various Parts of Speech

Contents

Procedure

In this project, a Dell Pentium 200Mhz. with 64 MB RAM, 4 GB HD, running Windows 84 with SP 1 will be used as the PC.
  1. Install Visual C++ v.6.0 on the PC
  2. Create the project so you can add the necessary files
  3. In global.h (or the main C++ header) write the code to initialize the sentence structure:
    1. Use:
      enum sentence_types{unknown, statement, question};
      enum PartsOfSpeech{unknown, adjective, pronoun, noun, adverb, verb, interjection, conjection, preposition};
      struct sentence_template {
       int num_words;
       int length;
       vector words;
       sentence_types mine;
       string sentence;
      };
      struct word_template {
       int length;
       map mine;
       string word;
      };
  4. In main.cpp (or the main C++ file) write the code to get user input and separate words:
    1. Use:
      #include
      #include
      #include
      using namespace std;
    2. Create a temporary word:
      word_template tempword;
    3. In main(), use:
      cin.getline(data,255);
    4. In parse(), write code to separate words into each individual string. Use after you have broken the words apart and stored them in tempword.word:
      sentence.words.push_back(tempword);
  5. Write the code to identify the words as different parts of speech:
    1. In identify(bool unpunc), write code to remove the last character (in case of a punctuation mark).
    2. Write the code to find a part of speech using set rules, and if that fails, use a dictionary.
  6. Write the code to report the structure of the sentence.
  7. Compile the program as Win32 Release.
  8. Run the program.
  9. Read random sentences from 1st and 2nd level books into the processor.
  10. Record the accuracy of the processor.


    Abstract