Reducing the Complexity of Parsing by a Method of Decomposition.
The complexity of parsing English sentences can be reduced by decomposing the problem into three subtasks. Declarative sentences can almost always be segmented into three concatenated sections: pre-subject, subject, predicate. Other constituents, such as clauses, phrases, noun groups, are contained within these segments, but do not normally cross the boundaries between them. Though a constituent in one section may have dependent links to elements in other sections, such as agreement between the head of the subject and the main verb, once the three sections have been located, they can then be partially processed separately, in parallel. An information theoretic analysis is used to support this approach. If sentences are represented as sequences of part-of-speech tags, then modelling them with the tripartite segmentation reduces the entropy. This indicates that some of the structure of the sentence has been captured. The tripartite segmentation can be produced automatically, using the ALPINE parser, which is then described. This is a hybrid processor in which neural networks operate within a rule based framework. It has been developed using corpora from technical manuals. Performance on unseen data from the manuals on which the processor was trained are over 90%. On data from other technical manuals performance is over 85%.