The Problem

Did you take a look at the pop3client script? Although it does it's job it's not for unattended production use.

I noticed the problem when I was about to get mail from my POP3 server and the modem disconnected for some reason. Since I access the Internet from my Linux box via a different modem router, the Linux box didn't notice the disconnection. And pop3client waited for more input from the server ... and waited and waited. There is no timeout handling inside gawk. Timeout handling was not built into gawk since usually it gets its input from other programs and if the input side of a pipe disappears the operating system takes care that this is propagated to the pipe reader.

Now before we drop gawk looking for something better, gawk offers what we need for the solution: co-processes.

What are co-processes?

Normal UNIX pipes have a clear reader/writer relation: one process (the pipe writer) writes what the other process reads (which is therefore called the pipe reader). Information goes from the write to the reader

# ls -l | wc

is an example for such a pipe construction: the ls command sends it's output to wc which does further data processing.

In contrast to that co-processes don't have such a clear structure. Both processes talk to each other. TCP/IP services are a good sample for a "co-process" structure, e.g. a POP3 client sends commands to a POP3 server which sends output in return to the client. Usually shell interpreters do not have an operator for co-processes like they have for pipes.

Pipes were implemented in the first release of awk (as far as I know). This awk script code is an example for pipes in awk:

for (i in names)
	print name | "sort";

close ("sort");

It sends the indexes of the associative names array to the sort command which sorts and prints it as soon as it sees the end of input.

Co-processes have a different notation. Notice the "|&" operator instead of the pipe bar "|" in the following example.

cmd = "sort";
for (i in names)
	print name |& cmd;

close (cmd, "to");
while (cmd |& getline > 0)
	print $0;

close (cmd);

This time the awk script reads also the output from sort printing it on its own (well it could do other thing with the sorted list).

Coming back to pop3client, co-processes could be used like this

pop3 = "connect pop3.server 110";
print "user " myusername |& pop3;
pop3 |& getline

Here gawk does not longer talk directly to the TCP/IP server but only to it's co-process connect which does the network communication, including timeout observation. Of course this need connect as additional helper program.

Networking by co-processes

So instead of doing timeout unaware networking from within gawk we will let gawk only speak to the connect program which does the networking for us with timeout handling. If now the client/server communication times out connect terminates signaling an 'end of file' to the calling gawk which can act on it.

Having said this we should go back to pop3client and rewrite

pop3 = "/inet/tcp/0/" server "/110";

to

pop3 = "connect " server ":110";

to make pop3client timeout aware. This is all. connect adds more TCP/IP features to gawk, see it's manpage. If you're not interested in these and only looking for timeout control the netcat program nc might also work for you.

We will take a closer look at connect's other features later. If you want a preview equip yourself with the FTP RFC 959 (get it from www.ietf.org) and take a look at (the yet undocumented) ftpclient script, especially the data transmission functions.

< dag | at | awk-scripting.de >