Search:

Calling Shell Commands

Running shell commands from within gawk is common and simple. There's the system() function, the "|" pipe operator and the "|&" co-process operator. So why should anyone care about it? The reason are the command parameters.

Whenever gawk calls a program from the operating system is start a complete shell interpreter which does the real work of parsing the command line, calling the desired program, passing the right arguments. The following example illustrates this.

cmd = "ls /etc/*"
while (cmd | getline > 0)
	...

close(cmd);

In the shell command "ls /etc/*" it's the shell interpreter that replaces the `/etc/*' with all files and directories found in the /etc directory, it's not the program that does the wildcard expansion.

So far there's nothing really interesting. But things change when scripts pass user input as parameters to shell commands like in the following simple example.

while (getline dir > 0) {
	cmd = "ls " dir;
	system(cmd);
	}

The code reads a directory name from it's input and lists the directory's content by calling the ls command. This is at least what it should do, Consider the user enters "/etc; echo 123". The shell interpreter will happily break the command line "ls /etc; echo 123" into two commands, executing them then. That is, after the file listing you'll see "123" in your output. Be happy that the input was not "/etc; rm *" or comparable. You might think that your users will not enter such directory names but sometimes users make errors, sometimes they are hacking around. If your script is a CGI the input doesn't even come from "your users".

The solution to this problem is quoting. If you don't want any special character interpretation (read your shell's manpage) you want to quote your shell arguments with single quotes. Inside single quotes there are no special characters, no backslash quoting etc. Inside single quotes you can use any char but single quotes. Having them in an argument requires a little workaround.

#!/usr/bin/gawk -f
#

function quote(string) {
	gsub(/'/, "'\\''", string);
	string = "'" string "'";
	return (string);
	}

BEGIN {
	while (getline dir > 0) {
		cmd = "ls " quote(dir);
		printf ("cmd= %s\n") >>"/dev/stderr";
		print "*", cmd;
		}

	exit (0);
	}

The quote() function above does the trick. Whenever if finds a single quote in the string parameter it terminates the argument with a single quote followed immediatly by a backslash escaped single quote followed by another single quote to continue the quoted parameter. In other words, the quote() function removes all magic-ness from strings that are passed as shell command parameters.

Setting environment variables

gawk can access environment variables by reading the values from the ENVIRONMENT array. Unfortunately modifiying array values modifies only the array but not the corresponding environment variables.

But there's a trick. Setting environment variables makes only sense when an external program is called later. Now this program is not called directly but using the env(1) system command. env expects environment variables assignments as additional parameters appearing infront of the shell command that should be run:

# env var='Hello World!' /bin/echo $var
Hello World!

The following awk function takes the values from the env-array and puts them into a shell compatible command line using the quote() function from above.

function mkcmd(cmd, env,   a, s, v) {
	for (v in env) {
		a = quote(v) "=" quote(env[v]);
		s = s " " a;
		}

	cmd = "env" s " " cmd;
#	printf (">> cmd= %s\n", cmd) >>"/dev/stderr";
	return (cmd);
	}

If cmd holds the shell command suitable for execution (that is it's command parameters are already quoted) and env is an array holding the additional environment assignements the function above constructs the required env command line. You can check this by giving env without any parameters as command to run since env prints then the current environment.

#!/usr/bin/gawk -f
#

function quote(string) {
	gsub(/'/, "'\\''", string);
	string = "'" string "'";
	return (string);
	}

function mkcmd(cmd, env,   a, s, v) {
	for (v in env) {
		a = quote(v) "=" quote(env[v]);
		s = s " " a;
		}

	cmd = "env" s " " cmd;
	return (cmd);
	}

BEGIN {
	n = split("A B C D E F G H I J K", x, " ");
	while (getline > 0) {
		delete var;
		for (i=1; i<=NF; i++)
			var[x[i]] = $i;

		cmd = mkcmd("env", var);
		printf (">> cmd= %s\n", cmd);
		system(cmd);
		printf ("\n");
		}

	exit (0);
	}

The example above puts it all together.

< dag | at | awk-scripting.de >