Class BlockRead |
Provide an encapsulation class for reading in an ASCII text file in large chunks, but guarantee an integral number of lines in each chunk. (I.e. each chunk ends in a newline).
This is a specialized class for efficiently reading in large text files. For example, comparing the performance of the following programs on the same hardware and input file:
procedure main(args) i := 0 every i +:= (!&input, 1) write(&errout, i) endand
import util procedure main(args) bRead := BlockRead() i := 0 while s := bRead.readBlock() do { every i +:= (upto('\n',s),1) } write(&errout, i) end
The second program runs (on sufficiently large files) three times faster. (Note that this example is contrived, as using the function reads() would work just as well and would be slightly faster...)
The following program cannot easily be replaced by one using reads():
import util procedure main(args) wFreq := table(0) every wFreq[genWords(BlockRead().genBlocks())] +:= 1 every wPair := !reverse(sort(wFreq,2)) \ 20 do { write(right(wPair[2],10),": ",wPair[1]) } endand can be almost twice as fast (see caveat one, below) as:
import util procedure main(args) wFreq := table(0) every wFreq[genWords(!&input)] +:= 1 every wPair := !reverse(sort(wFreq,2)) \ 20 do { write(right(wPair[2],10),": ",wPair[1]) } end
Caveat one: It is not always better to read in large chunks of lines - the actions you perform on those chunks have a large influence on overall program efficiency and using large chunks may, in some cases, slow your program down! Choosing a good block size for use in your application is an art.
Caveat two: You cannot easily mix reads using this class with reads of the same file using other functions.
Details |
Constructor |
BlockRead(fileOrName, blockSize)
fileOrName | file or filename to read from. Defaults to
&input. |
Provide an instance of BlockRead for reading in blockSize chunks at a time from fileName. <[/p>
The first argument may be an already opened file.
Methods: |
n | maximum amount to read. Defaults to blockSize |
blocks from the file as strings |
Generate blocks of at most n characters from the file, using the same criteria for defining a block as in readBlock.
read-ahead buffer |
Produce the current read-ahead buffer. This buffer contains input characters that were read on the previous block read, but followed the last newline in the block.
This is a convenience method to help when input needs to be mixed with non-block reads.
n | maximum amount to read. Defaults to blockSize |
string of upto n characters from the file,
terminating with a newline |
if unable to read any characters |
Read in at most n characters from the file. However, always terminate the read at the last newline prior to reaching n characters.
nBuf | new contents for the read-ahead buffer |
Set the current read-ahead buffer. The previous value is lost. It is difficult to imagine a use for this method except to empty the preread when mixing block reads with normal input operations.
Fields: |