File senten1.icn

Summary

###########################################################################

	File:     senten1.icn

	Subject:  Procedure to generate sentences

	Author:   Peter A. Bigot

	Date:     August 14, 1996

###########################################################################

   This file is in the public domain.

###########################################################################

 sentence(f) generates the English sentences encountered in a file.

###########################################################################

 The following rules describe what a 'sentence' is.
 
 * A sentence begins with a capital letter.
 
 * A sentence ends with one or more of '.!?', subject to other
   constraints.
 
 * If a period is immediately followed by:
   - a digit
   - a letter
   - one of ',;:'
   it is not a sentence end.
 
 * If a period is followed (with intervening space) by a lower case
   letter, it is not a sentence end (assume it's part of an abbreviation).

 * The sequence '...' does not end a sentence.  The sequence '....' does.
 
 * If a sentence end character appears after more opening parens than
   closing parens in a given sequence, it is not the end of that
   particular sentence. (I.e., full sentences in a parenthetical remark
   in an enclosing sentence are considered part of the enclosing
   sentence.  Their grammaticality is in question, anyway.) (It also
   helps with attributions and abbreviations that would fail outside
   the parens.)

 * No attempt is made to ensure balancing of double-quoted (") material.
 
 * When scanning for a sentence start, material which does not conform is
   discarded.
 
 * Corollary: Quotes or parentheses which enclose a sentence are not
   considered part of it.
 
 * An end-of-line on input is replaced by a space unless the last
   character of the line is 'a-' (where 'a' is any letter), in which case
   the hyphen is deleted.

 * Leading and trailing space (tab, space, newline) chars are removed
   from each line of the input.

 * If a blank line is encountered on input while scanning a sentence,
   the scan is aborted and search for a new sentence begins (rationale:
   ignore section and chapter headers separated from text by newlines).

 * Most titles before names would fail the above constraints.  They are
   special-cased.

 * This does NOT handle when a person uses their middle initial.  To do
   so would rule out sentences such as 'It was I.',  Six of one, half-dozen
   of the other--I made my choice.

 * Note that ':' does not end a sentence.  This is a stylistic choice,
   and can be modified by simply adding ':' to sentend below.

###########################################################################
Procedures:
sentence

This file is part of the (main) package.

Source code.

Details
Procedures:

sentence(infile)



This page produced by UniDoc on 2021/04/15 @ 23:59:54.