Class BlockRead

Summary

Provide an encapsulation class for reading in an ASCII text file in large chunks, but guarantee an integral number of lines in each chunk. (I.e. each chunk ends in a newline).

This is a specialized class for efficiently reading in large text files. For example, comparing the performance of the following programs on the same hardware and input file:

     procedure main(args)
         i := 0
         every i +:= (!&input, 1)
         write(&errout, i)
     end
and
     import util

     procedure main(args)
         bRead := BlockRead()
         i := 0
         while s := bRead.readBlock() do {
             every i +:= (upto('\n',s),1)
             }
         write(&errout, i)
     end

The second program runs (on sufficiently large files) three times faster. (Note that this example is contrived, as using the function reads() would work just as well and would be slightly faster...)

The following program cannot easily be replaced by one using reads():

     import util

     procedure main(args)
         wFreq := table(0)
         every wFreq[genWords(BlockRead().genBlocks())] +:= 1
         every wPair := !reverse(sort(wFreq,2)) \ 20 do {
             write(right(wPair[2],10),": ",wPair[1])
             }
     end
and can be almost twice as fast (see caveat one, below) as:
     import util

     procedure main(args)
         wFreq := table(0)
         every wFreq[genWords(!&input)] +:= 1
         every wPair := !reverse(sort(wFreq,2)) \ 20 do {
             write(right(wPair[2],10),": ",wPair[1])
             }
     end

Caveat one: It is not always better to read in large chunks of lines - the actions you perform on those chunks have a large influence on overall program efficiency and using large chunks may, in some cases, slow your program down! Choosing a good block size for use in your application is an art.

Caveat two: You cannot easily mix reads using this class with reads of the same file using other functions.

Superclasses:
Object

Package:
util
File:
blockread.icn
Methods:
genBlocks, getReadahead, readBlock, setReadahead

Methods inherited from Object:
Type, className, clone, equals, fieldNames, genMethods, getField, get_class, get_class_name, get_id, hasField, hasMethod, hash_code, instanceOf, invoke, is_instance, setField, to_string

Fields:
bSize, buffer, f

Source code.

Details
Constructor

BlockRead(fileOrName, blockSize)

Parameter:
fileOrName
file or filename to read from. Defaults to &input.

Provide an instance of BlockRead for reading in blockSize chunks at a time from fileName. <[/p>

The first argument may be an already opened file.

Methods:

genBlocks(n)

Parameter:
n
maximum amount to read. Defaults to blockSize
Generates:
blocks from the file as strings

Generate blocks of at most n characters from the file, using the same criteria for defining a block as in readBlock.


getReadahead()

Returns:
read-ahead buffer

Produce the current read-ahead buffer. This buffer contains input characters that were read on the previous block read, but followed the last newline in the block.

This is a convenience method to help when input needs to be mixed with non-block reads.


readBlock(n)

Parameter:
n
maximum amount to read. Defaults to blockSize
Returns:
string of upto n characters from the file, terminating with a newline
Fails:
if unable to read any characters

Read in at most n characters from the file. However, always terminate the read at the last newline prior to reaching n characters.


setReadahead(nBuf)

Parameter:
nBuf
new contents for the read-ahead buffer

Set the current read-ahead buffer. The previous value is lost. It is difficult to imagine a use for this method except to empty the preread when mixing block reads with normal input operations.


Fields:
bSize

buffer

f


This page produced by UniDoc on 2021/04/15 @ 23:59:53.