File vhttp.icn

Summary

###########################################################################

	File:     vhttp.icn

	Subject:  Procedure for validating an HTTP URL

	Author:   Gregg M. Townsend

	Date:     October 17, 1997

###########################################################################

   This file is in the public domain.

###########################################################################

	vhttp(url) validates a URL (a World Wide Web link) of HTTP: form
	by sending a request to the specified Web server.  It returns a
	string containing a status code and message.  If the URL is not 
	in the proper form, or if it does not specify the HTTP:	protocol, 
	vhttp fails.

###########################################################################

	vhttp(url) makes a TCP connection to the Web server specified by the
	URL and sends a HEAD request for the specified file.  A HEAD request
	asks the server to check the validity of a request without sending
	the file itself.

	The response code from the remote server is returned.  This is
	a line containing a status code followed by a message.  Here are
	some typical responses:

		200 OK
		200 Document follows
		301 Moved Permanently
		404 File Not Found

	See the HTTP protocol spec for more details.  If a response cannot
	be obtained, vhttp() returns one of these invented codes:

		551 Connection Failed
		558 No Response
		559 Empty Response

###########################################################################

	The request sent to the Web server can be parameterized by setting
	two global variables.

	The global variable vhttp_agent is passed to the Web server as the
	"User-agent:" field of the HEAD request; the default value is
	"vhttp.icn".

	The global variable vhttp_from is passed as the "From:" field of the
	HEAD request, if set; there is no default value.

###########################################################################

	vhttp() contains deliberate bottlenecks to prevent a naive program
	from causing annoyance or disruption to Web servers.  No remote
	host is connected more than once a second, and no individual file
	is actually requested more than once a day.

	The request rate is limited to one per second by keeping a table
	of contacted hosts and delaying if necessary so that no host is
	contacted more than once in any particular wall-clock second.

	Duplicate requests are prevented by using a very simple cache.
	The file $HOME/.urlhist is used to record responses, and these
	responses are reused throughout a single calendar day.  When the
	date changes, the cache is invalidated.

	These mechanisms are crude, but they are effective good enough to
	avoid overloading remote Web servers.  In particular, a program
	that uses vhttp() can be run repeatedly with the same data without
	any effect after the first time on the Web servers referenced.

	The cache file, of course, can be defeated by deleting or editing.
	The most likely reason for this would be to retry connections that
	failed to complete on the first attempt.

###########################################################################

  Links:  cfunc

###########################################################################

  Requires: Unix, dynamic loading

###########################################################################
Procedures:
vhttp, vhttp_addhist, vhttp_contact, vhttp_histval, vhttp_inithist, vhttp_waitclock

Global variables:
vhttp_agent, vhttp_from, vhttp_hfile, vhttp_htable

Links:
cfunc.icn

This file is part of the (main) package.

Source code.

Details
Procedures:

vhttp(url)

: validate HTTP: URL


vhttp_addhist(key, val)


vhttp_contact(host, port, path)


vhttp_histval(key)


vhttp_inithist()


vhttp_waitclock(host)


Global variables:
vhttp_agent -- User_agent:

vhttp_from -- From:

vhttp_hfile

vhttp_htable


This page produced by UniDoc on 2021/04/15 @ 23:59:54.