File idxGen.icn

Summary

Procedures:
Alt, Log, Timer, Warn, cores, deSpace, entireFile, extendedSearch, filtered, getIndexTerms, main, multiSearch, nextParagraph, noPath, nonplus, postIndex, usage, waitFor

Records:
multiIndex, occurrence, paragraph

Global variables:
floatIdxHits, hits, idx, idxCandidates, idxDir, idxTerms, idx_ex, logging, maxPlus, noComplaints, perFileidx, perFileidx_ex, stats, suppressed, whitespace, writeNewFile

Imports:
threads, xml

Links:
options.icn, ximage.icn

This file is part of the (main) package.

Source code.

Details
Procedures:

Alt(s)

--------------------------------------------------------------------------------
 Generate alternatives from s
   A(B|b)cd            ->   ABcd , Abcd
   word(|s)            ->   word, words
   (E|e)at(en||ing)    ->   Eaten, Eat, Eating, eaten, eat, eating
   science( |-)fiction ->   science fiction, science-fiction
 etc.
   A malformed line results in "???"


Log(p)

--------------------------------------------------------------------------------


Timer(t)

--------------------------------------------------------------------------------


Warn(m)

--------------------------------------------------------------------------------


cores()

--------------------------------------------------------------------------------
   Return the number of cores reported by &features


deSpace(s)

--------------------------------------------------------------------------------
 Remove whitespace and replace it by a single plus sign


entireFile(name)

--------------------------------------------------------------------------------
   Return a string containing the entire contents of a file


extendedSearch(s)

--------------------------------------------------------------------------------
 succeeed if s is an extended search command (i.e. apart from leading plus signs,
 has a plus sign before the end of the string or before a colon char)
 Also, if it has a space (which will get turned into a plus sign).


filtered(line:string)

--------------------------------------------------------------------------------
 Succeed if the line should not be analysed


getIndexTerms(filename:string)

--------------------------------------------------------------------------------
 analyse the XML index configuration file


main(args)

--------------------------------------------------------------------------------


multiSearch(s)

--------------------------------------------------------------------------------
 analyse "aaa+bbb+ccc ..."    or  "a+4+bbb..." and produce a list
  ["aaa",1,"bbb",1,"ccc" ...] or  ["a", 4, "bbb" ...}


nextParagraph(p:paragraph)

--------------------------------------------------------------------------------
 Gobble text up to the next paragraph break; split it into lines and words.
 For each word, store the line of occurrence and it's ordinal position.
 The consumed text is removed from the source string.
 The data is returned to the caller inside the paragraph structure.


noPath(path)

--------------------------------------------------------------------------------


nonplus(s)

--------------------------------------------------------------------------------
 return s stripped of leading "+" chars


postIndex(filename:string, waiter)

--------------------------------------------------------------------------------
 index a file by looking for index terms and placing \index commands beforehand


usage(s)

--------------------------------------------------------------------------------


waitFor(nMess, secs)


Records:

multiIndex(indexTerm, searchList)

 A multiple word search record contains the index term to be used plus the
 search (which is a list of words, interspersed with the distance between them).
 so "Bill+Ben+3+Flowerpot" will result in searchList being
     ["Bill", 1, "Ben", 3, "Flowerpot"]


occurrence(line, wpos)


paragraph(sourceText, lines, words, inProgress)

 A paragraph record holds the remaining text of the whole file, together with
 a list of lines of the current paragraph and a map from each word to a list of
 occurrences in the paragraph. An occurrence is (line no, ordinal position)
 inProgress is a set of index terms that are being defined by
 \PrimaryIndexBegin{term} ... \PrimaryIndexEnd{term}. Normal index hits for
 term are suppressed in between the PrimaryIndexBegin ... End lines.


Global variables:
floatIdxHits -- Float index hits to outside iconline etc.

hits -- total number of index insertions

idx -- map from search string -> index term

idxCandidates -- All possible index terms

idxDir -- output Directory

idxTerms -- index terms that were placed in the index

idx_ex -- map from extended search string -> (index term, search list)

logging -- enable progress/debug information on &errout

maxPlus -- The highest number of leading "+" chars seen in a search term

noComplaints -- Suppress "not found" for these index terms

perFileidx -- map from filename -> (map from search string -> index command)

perFileidx_ex -- map from filename -> ( map from extended search string -> (index term, search list))

stats -- index statistics

suppressed -- map from filename to set of terms (which are suppressed for that file)

whitespace

writeNewFile -- write a new file (to &output if idxDir is null)


This page produced by UniDoc on 2021/04/15 @ 23:59:54.