Search:

chl - colour HTTP logfiles

Once you've got the basic idea it is easy to port this to other logfiles. But before this get's to boring we will add an easy to select the logfile lines we want to display.

Webserver logfiles are another logfile format. There is something like the "common logfile format" which is the original format of NCSA's webserver (from more than 10 years ago).

Common logfile format

The common logfile format is

1 - Common logfile format
Pos Data Comment
1 client the client's IP number.
2 ident-user the username from an ident lookup (this is usually not used).
3 username if the request was authenticated, the username is shown here.
4 date and time these values are given in angle bracket.
5 request method the HTTP request method.
6 request the client's request.
7 HTTP protocol the HTTP protocol for the request.
8 HTTP status the HTTP status code sent from the server to the client
9 response size number of bytes the server sent to the client

and that's it. Sounds simple but is actually a little bit difficult. There are angle bracket around the date, but a space inside, the triple request method, request string and protocol is enclosed in double quotes.

But before we go into the details there are some more fields in the logfile I use. The logfile format is configurable under apache and the format I use is known as "combined" format. Now the additional fields are

2 - Additional fields in the common logfile format
Pos Data Comment
10 referer the client's referer URL, enclosed in double quotes.
11 user-agent the client's user agent, enclosed in double quotes.

A short extract from my server's logfile is

  1 192.168.1.xx - - [01/Mar/2007:19:55:57 +0100] "GET /cgi-bin/wiki.cgi/syslog/00-WikiIndex?op=info HTTP/1.0" 200 11304 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/00-WikiIndex?op=edit&target=left" "Opera9.xx (Windows NT 2.7; U; de)"
  2 192.168.1.xx - - [01/Mar/2007:19:56:18 +0100] "GET /cgi-bin/wiki.cgi/?op=open&ph=1&pa=syslog%2F&s=ColouringWeblog&q1=Open&fs=-&m=- HTTP/1.0" 302 123 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/00-WikiIndex?op=info" "Opera9.xx (Windows NT 2.7; U; de)"
  3 192.168.1.xx - - [01/Mar/2007:19:56:18 +0100] "GET /cgi-bin/wiki.cgi/syslog/ColouringWeblog HTTP/1.0" - 175 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/00-WikiIndex?op=info" "Opera9.xx (Windows NT 2.7; U; de)"
  4 192.168.1.xx - - [01/Mar/2007:19:56:29 +0100] "GET /cgi-bin/wiki.cgi/?op=open&ph=1&pa=syslog%2F&s=ColouringWeblog&q1=Open&edit=yes&fs=-&m=- HTTP/1.0" 302 139 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/00-WikiIndex?op=info" "Opera9.xx (Windows NT 2.7; U; de)"
  5 192.168.1.xx - - [01/Mar/2007:19:56:29 +0100] "GET /cgi-bin/wiki.cgi/syslog/ColouringWeblog?op=edit HTTP/1.0" 200 7209 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/00-WikiIndex?op=info" "Opera9.xx (Windows NT 2.7; U; de)"
  6 192.168.1.xx - - [01/Mar/2007:19:56:36 +0100] "GET /cgi-bin/wiki.cgi/syslog/csl HTTP/1.0" 200 3000 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/00-WikiIndex" "Opera9.xx (Windows NT 2.7; U; de)"
  7 192.168.1.xx - - [01/Mar/2007:20:07:13 +0100] "POST /cgi-bin/wiki.cgi/syslog/ColouringWeblog HTTP/1.0" 401 479 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/ColouringWeblog?op=edit" "Opera9.xx (Windows NT 2.7; U; de)"
  8 192.168.1.xx - dag [01/Mar/2007:20:07:13 +0100] "POST /cgi-bin/wiki.cgi/syslog/ColouringWeblog HTTP/1.0" 302 123 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/ColouringWeblog?op=edit" "Opera9.xx (Windows NT 2.7; U; de)"
  9 192.168.1.xx - - [01/Mar/2007:20:07:13 +0100] "GET /cgi-bin/wiki.cgi/syslog/ColouringWeblog HTTP/1.0" 200 7689 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/ColouringWeblog?op=edit" "Opera9.xx (Windows NT 2.7; U; de)"
 10 192.168.1.xx - - [01/Mar/2007:20:08:11 +0100] "GET /cgi-bin/wiki.cgi/syslog/ColouringWeblog?op=edit&target=left HTTP/1.0" 200 7669 "http://192.168.1.12/cgi-bin/wiki.cgi/syslog/ColouringWeblog" "Opera9.xx (Windows NT 2.7; U; de)"

As you can see, request #7 and #9 are unauthenticated, but #8 is. It is also a POST request. Since POST is used to modify the server's content (the wiki) it requires authentication. Request #7, the first attempt to POST some data, was rejected by the sevrer with a 401 response. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for the definition of the various HTTP status codes.

The script's main part

Let's look at the chl script. As csl it has one block that processes each logfile line.

  1 
  2 /./ {
  3 	# Read the data fields ...
  4 
  5 	client  = $1;
  6 	ident   = $2;
  7 	user    = $3;
  8 	date    = $4 " " $5;
  9 
 10 	method  = substr($6, 2);
 11 	request = $7;
 12 	proto   = substr($8, 1, length($8) - 1);
 13 	status  = $9;
 14 	bytes   = $10
 15 	referer = $11;
 16 
 17 	gsub(/(^"|"$)/, "", referer);
 18 	if (match($0, /"([^"]*)"$/, x) == 0)
 19 		agent = "";
 20 	else
 21 		agent = x[1];
 22 
 23 
 24 	refhost = (match(referer, /^[a-z]+:\/\/([^/]+)\//, x) > 0)? x[1]: "";
 25 
 26 
 27 	# ... color the interesting parts ...
 28 
 29 	if (status ~ /^5/)
 30 		cstatus = __red status __normal;
 31 	else if (status ~ /^4/)
 32 		cstatus = __red __bold status __normal;
 33 	else
 34 		cstatus = status;
 35 
 36 	cref = match(refhost, refmarker)? __c6 bold referer __normal: referer;
 37 	cuser = (user != "-")? __blue user __normal: user;
 38 
 39 	crequest = parserequest(request, q);
 40 
 41 	# ... and print the line.
 42 	if (issub == 0  &&  expr == "")
 43 		printline();
 44 	}
 45 
1 - The script's main block

Since the scheme is basically the same as before in short:

The parserequest() and printline() functions are tailored for the HTTP logfile format of course.

  1 
  2 function parserequest(req, arr,   p, r, var, val, x) {
  3 	delete arr;
  4 	if (match(req, /^(.*)\?(.*)$/, x) == 0) {
  5 		p = __yellow req __normal;
  6 		return (p);	# no QUERY_STRING
  7 		}
  8 
  9 	p = __yellow x[1] __normal "?";
 10 	req = x[2];
 11 	while (match(req, /^([^=&]*)=([^&]*)(&)?/, x) > 0) {
 12 		p = p __grey x[1] __normal "=" __yellow __bold x[2]__normal x[3];
 13 		arr[x[1]] = x[2];
 14 		req = substr(req, RSTART+RLENGTH);
 15 		}
 16 
 17 	p = p req;
 18 	return (p);
 19 	}
 20 
 21 function printline() {
 22 	print client, ident, cuser, date, "\"" method, crequest, proto "\"", cstatus, bytes, \
 23 			"\"" cref "\"", "\"" agent "\"" __normal;
 24 	}
 25 

When you have verified the explanation, you'll notice the conditional printline() call. Before we look at this call chl:

# chl </path/to/http/logfile | head -10

Make sure that you feed the logfile with IO redirection, not as command line argument.

You should see the first 10 lines from your webserver logfile, coloured depending on their contents. Now when you tail -f the webserver logfile to watch it and you have a lot of icons and pictures on you server, the tail will just rush across the screen. Every request creates one entry in the logfile.

Now try

# tail -f /path/to/http/logfile | chl 'status >= 400'

this should reduce the output lines to only those that have a HTTP status code above 400 (request errors and server errors).

Notice: If you get the error message "sh: line 1: chl: command not found", call chl (for some reason we'll find below) the following way

# tail -f /path/to/http/logfile |
    PATH=.:$PATH chl 'status >= 400'

So obviously chl is able to take an expression from the command line. How is this working?

The BEGIN block

When chl is called the BEGIN block is executed first. It begins with the usual stuff and colour value initialisation (lines 3 - 16). Line 18 sets the refmarker pattern we already met above in the script's main block, line 36. Hosts in the referer field that match the refmarker pattern give the referer field a special colour in the output.

  1 
  2 BEGIN {
  3 	program = "chl";
  4 
  5 	STDERR = "/dev/stderr";
  6 	IGNORECASE = 1;
  7 
  8 	__bold   = "\033[01m";
  9 	__red    = "\033[31m";
 10 	__green  = "\033[32m";
 11 	__yellow = "\033[33m";
 12 	__blue   = "\033[34m";
 13 	__c5     = "\033[35m";
 14 	__c6     = "\033[36m";
 15 	__grey   = "\033[37m";
 16 	__normal = "\033[0m";
 17 
 18 	refmarker = "(freshmeat.net|\\.google\\.[a-z]+)$";
 19 	argi = 1;
 20 	issub = debug = 0;
 21 	while (argi < ARGC  &&  substr(ARGV[argi], 1, 1) == "-") {
 22 		options = nextarg("option");
 23 		if (options == "--")
 24 			break;
 25 
 26 		for (i = 2; i<=length(options); i++) {
 27 			c = substr(options, i, 1);
 28 			if (c == "0")
 29 				issub = 1;
 30 			else if (c == "d")
 31 				debug = 1;
 32 			else if (c == "q")
 33 				noprint = 1;
 34 			else {
 35 				printf ("%s: unkown option: -%s\n", program, c)
 36 				exit (1);
 37 				}
 38 			}
 39 		}
 40 
 41 	if (issub == 0) {
 42 		if (argi < ARGC)
 43 			expr = nextarg("expression");
 44 
 45 		if (expr == ".")
 46 			expr = "";
 47 		}
 48 
 49 	if (issub == 0  &&  expr != "") {
 50 		cmd = program;
 51 
 52 		p = namesubst(expr, "q");
 53 		if (noprint == 0)
 54 			p = p " { printline(); }";
 55 
 56 		cmd = cmd " --source " quote(p);
 57 
 58 		# Append flag that to prevent endless recursion.
 59 		cmd = cmd " -- -0";
 60 
 61 		# Append any command line arguments we have.
 62 		while (argi < ARGC)
 63 			cmd = cmd " " nextarg("command line argument");
 64 
 65 		if (debug != 0) {
 66 			printf ("cmd= %s\n", cmd);
 67 			exit (0);
 68 			}
 69 
 70 		rc = system(cmd);
 71 		exit (rc);
 72 		}
 73 	}
 74 
2 - The BEGIN block

Lines 19 - 39 process any command line options, lines 41 - 47 read (if "issub == 0") the logline select expression.

In line 49 the interesting part begins. If a select expression is given, the script construct a shell command line in lines 50 - 56:

Line 59 is a safety option to prevent that the chl script we are going to invoke, starts another script, lines 62 and 63 append all arguments from the current command line to the new and line 70 finally calls the script. This script reinvocation is also the reason for the PATH=.:$PATH command prefix above. If you have chl in a directory which is not in your PATH, the script can not be found when it is called the second time.

If you want to see the constructed command without actually running it, put the -d option on the command line:

# tail -f /path/to/http/logfile |
    PATH=.:$PATH chl -d 'status >= 400'

The interesting (and yet missing) functions are

  1 
  2 function quote(string) {
  3 	gsub(/'/, "'\\''", string);
  4 	string = "'" string "'";
  5 	return (string);
  6 	}
  7 
  8 function namesubst(string, arr,   p, x) {
  9 	while (match(string, /\$([a-z]+)/, x) > 0) {
 10 		p = p substr(string, 1, RSTART-1) arr "[\"" x[1] "\"]";
 11 		string = substr(string, RSTART + RLENGTH);
 12 		}
 13 
 14 	p = p string;
 15 	return (p);
 16 	}
 17 

Invocation

chl [options] expression [logfile ...]

colour HTTP server logfile lines in apache's combined logfile format. expression can be any valid awk expression to select the desired lines.

Command line options

-d
debug option: print the constructed chl command but do not execute.

-q
quiet option: do not append the call to printline() automatically. A print statement is then expected to be part of the expression.

Notice that if expression is empty, chl will not invoke a second script but process the line direct. Since expression is mandantory you may put a single dot as expression which is taken as empty expression.

< dag | at | awk-scripting.de >