Implementation Notes
3q started as "is this possible?" project. The task was to put awk script code into a HTML page. The "natural" (??) way would have been to extend gawk with C functions to do this, write an own awk interpreter or ...
But this wouldn't have been what I call "the awk way". Instead of "fancy" libraries for all kind of things awk programmers are left with system() and (in gawk) bi-directional pipes. A lot of things which are done in other languages using library functions resembling system calls can be done with shell programs that are around under UNIX. One has just to know how to use them. Of course there's the "cost" of running additional programs in terms of CPU cycles but a more real problem is "parameter quoting".
The Quoting Problem
A lot people doesn't even recognize this as a problem when they hear about it. So here's a brief explanation.
Consider you are writing an awk script which gets a filename as input from the user which you store in a variable fn. Then your script wants to get the file's size using the wc command. It is not important what exactly you want to do with the given file; the only thing that really counts is that you put the user argument into a shell command.
The command to count lines in a file is
cmd = sprintf ("wc -c %s", fn);
cmd | getline size;
close (cmd);
and if the filename is "/etc/passwd" (without the quotes) then the shell command is "wc -c /etc/passwd".
This is the intended use.
Now consider "/etc/passwd /etc/group" as input. In this case the constructed command is
"wc -c /etc/passwd /etc/shadow" which counts the bytes in two files. This is not bad but not what was wanted. You get another example of "not intended" if you give "/dev/null; rm -rf /*" as input. In this case the shell command is
"wc -c /dev/null; rm -rf /*" which counts bytes and then tries to erase your operating system.
So first of all it is important to know that all kind of funny, interesting, surprising and exciting stuff can happen if user input is passed as-is as shell command line argument.
Parameter Quoting
The good news is that it is possible to quote shell command line arguments. In this case shell special characters (white space, semicolon, star etc.) loose their meaning and are taken "as-is":
cmd = sprintf ("wc -c '%s'", fn);
cmd | getline size;
close (cmd);
puts single quotes around the filename parameter.
The examples from above change then into
wc -c '/etc/passwd'-
works, the single quotes are stripped off after the argument is recognized.
wc -c '/etc/passwd /etc/shadow'"-
fails because a file named "/etc/passwd /etc/shadow" does not exist.
wc -c '/dev/null; rm -rf /*'- same, file does not exist.
Things look better now but someone gives "/dev/null;' rm -rf /*; '" (watch out for the two additional single quotes) as user input. In this case we get "wc -c '/dev/null;' rm -rf /*; ''" as command for our system() call. It seems that we are back at square 1.
The shellquote() Function
Looking into the bash manpage it says: "Enclosing characters in single quotes preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash."
In other words: inside single quotes you can have all characters (there is no character having a special meaning to the shell) but single quotes (which ends the single quotes). So how is it then possible to have a single quote in an command line argument that is itself single quoted? The anser is simple: At the position where the single quote appear and
- end the quoted string,
- add a literal single quote escaped with a backslash character, and
- put another single quote to continue.
Applying this to "x'y" turns it into "x'\''y". The awk function for this is
function shellquote(string) {
gsub(/'/, "'\\''", string);
string = "'" string "'";
return (string);
}
shellquote() encloses the processed string alread in single quotes so our example from above becomes
cmd = sprintf ("wc -c %s", shellquote(fn));
cmd | getline size;
close (cmd);
There may be still details requiring attention. In contrast to bash the zsh manpage states under "5.8 Quoting": "All characters enclosed between a pair of single quotes ' that is not preceded by a $ are quoted. A single quote cannot appear within single quotes unless the option RC_QUOTES is set, in which case a pair of single quotes are turned into a single quote.".
So zsh is only almost compatible with the "normal" bash, other shell interpreters may also vary. So if parameter quoting is not working as expected then a quick look into the shell's manpage is a very good idea.