Introduction
This series of articles has no relationship with social networks and never will. Period.
Meta-Bash is meant to indicate that I want to go “past” or “beyond” normal Bash scripting. There are rivers of ink and pixels spent to talk about Bash and its “//normal//” scripting features, so I will abstain from that. We wantto go beyond that.
This in turn means that the keen reader either is proficient with Bash featutes like functions or has its documentation at hand and is ready to read the friendly manuals.
A lot of modern (and simpler) shells have copied features from Bash so maybe this wiki can be used or adapted to work with those ones too. But this is not an intended goal.
The main aim is to let the programmer (yes, I intentionally wrote “programmer“) go past her normal Bash scripting flows and think more abstract while still running workloads with Bash.
This is it, more or less. Let’s start.
Function arguments by name
Bash has functions. They are all variadic functions just like any programm called by it. The user can call functions and programs with whatever number of arguments she wants. It will the responsibility of the function body or the program to make a sense out of them (and complain if any is missing or plain wrong).
Normally this can be accomplished quite easily as the programmer defines the meaning of each argument. In case of variadic functions, then, there is also an extra burden to define the meaning (and the function behavior) of the “optional” arguments.
The point here is that if the programmer needs more than one optional argument, making sense of the optional argument list can be tricky. Exactly as it happens, for example, in C.
In Bash functions all arguments are are positional: the function body can access them by numbered position as $1
for the first one, $2
for the second and so on, while in C they have a name (which is actually an alias of their position).
In a recent project I had to cope with such a feature and this is how I implemented it.
Let’s say we have a function like this:
function copy_file () {
local src=$1 dst=$2
cp -a "$src" "$dst"
}
This is not used as a variadic function, despite it is a variadic function. As you can see, I am using local variables to name positional arguments and to make it simpler for me to properly refer to them inside the function body.
We can call that function like this:
copy_file /etc/hosts ${HOME}/hosts.backup ${HOME}/hosts.txt /tmp/someting /etc/somethingelse
Simply put, all arguments past the second one (${HOME}/hosts.txt
, /tmp/someting
and /etc/somethingelse
) will be ignored.
Then I decided I needed an optional behavior: cleanup the containing directory from all files before copying new stuff. So previous function became:
function copy_file () {
local src=$1 dst=$2 cleanup=$3 d
# New stuff is from here ...
if [[ $cleanup ]]; then
if [[ -d "$dst" ]]; then
rm -fr "$dst"
mkdir -p "$dst"
else
d=$(dirname "$dst")
rm -fr "$d"
mkdir -p "$d"
fi
fi
# ... to here
cp -a "$src" "$dst"
}
(I am intentionally keeping the things simple her for the sake of explanation: things should be more complex than that, I know!)
Third argument ($3
) can be anything: its presence (that is non-nullness) is enough and it could be any (string) value like true
, yes
, 1
or even 0
to trigger the optional behavior. Maybe a little bit confusing, but rather effective.
Situation becomes non-trivial when optional arguments are more than just one.
What if I need to add an extra optional feature, like updating the destination copy creation timestamps while keeping the other metadata (copied over by the -a
option)?
Updating the function body (by the programmer) with an extra argument ($4
) is rather easy, but calling it (by the user) can be tricky if not even error prone.
function copy_file () {
local src=$1 dst=$2 cleanup=$3 touch=$4 d t
if [[ $cleanup ]]; then
if [[ -d "$dst" ]]; then
d="$dst"
t="$dst/*"
else
d=$(dirname "$dst")
t="$dst"
fi
rm -fr "$d"
mkdir -p "$d"
fi
cp -a "$src" "$dst"
# New stuff from here ...
if [[ $touch ]]; then
touch -c "$t"
fi
# ... to here
}
Can you see the issues for the user? Please, take a few minutes to get it before reading further.
The first optional argument ($3
) is not optional anymore as we need to call the function with “something” in the third position (an empty argument string''
) like this:
copy_file /etc/hosts ${HOME}/hosts.backup '' yes
We could further modify the function body to recognize a special value (like null
or -
) for an optional argument to signal the function body to skip that optional behavior with all the subsequent complexities.
This won’t make things any easier for the user or even a code reviewer. She needs to know the exact meaning of that third option and/or what is the meaning of that “special” value. So what?
Named arguments to the rescue
One thing I was really missing in Bash (and still miss in C) is the so-called “named arguments“.
Name arguments is a function call schema where the order in which arguments are pushed into the function call is irrelevant as they are passed to the function body by means of (or “with”) their own names.
If I can pass function arguments along with its name, all of the above points and limitations just fade away. So it’s worth it a try!
A first implementation can be done with the use of plain simple variable thanks to the way Bash uses and calls functions.
I won’t deep dive into this topic (please, RTFM for Bash) but the thing boils down to this: a function call qualifies like an external program call and thus can be prefixed with a set of variable assignments to define (and override) the environment only for time the function or program is called (and run).
An example maybe worth more than one thousand pictures…
If I define a function like this:
function test_fun () {
echo "PATH='$PATH'"
echo "USER='$USER'"
}
I can call it like this:
$ test_fun
PATH='/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/bin'
USER='root'
but I can also call it like this:
$ PATH=nothing USER=none test_fun; echo "PATH='$PATH'"; echo "USER='$USER'"
PATH='nothing'
USER='none'
PATH='/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/bin'
USER='root'
In the second case, two variables temporarily override (technically shadow) any other variable with the same name only for the call of that function. Once the function body execution is over, any pre-existing variable will get its “original” value (those will actually be un-shadowed).
The function call we were trying to perform earlier could become:
touch=yes copy_file /etc/hosts ${HOME}/hosts.backup
where optional arguments are passed by name and the rest are positional, or even all optional:
src=/etc/hosts touch=yes dst=${HOME}/hosts.backup copy_file
where all arguments are passed by name, no matter whether they are optional or not.
For both case the changes in the implementation of the function body are really limited and simple and are left to the keen reader. But there is a cost!
“Normal readability” is gone, at least partly, as most of the programming languages, Bash included, put function call arguments after the function name, not before!
We really would like to call that function like this:
copy_file src=/etc/hosts touch=yes dst=${HOME}/hosts.backup
But how to implement it?
Enters eval
, the mother of all the Meta-bash features
The latest function call from the previous chapter looks nice, but doesn’t really work as those variable assignments are not performed by the function body code. They are not assignment at all, but just function arguments. Or maybe not?
We can somehow implement the same in the function body, but we have to fix a problem. In Bash, a variable name needs to be a known name, not something variable itself. Or maybe not?
Bash a very powerful feature among all that is implemented as a builtin command. It’s, you now know it, eval
. This apparently humble built-in command hides an entire universe is its mouth. Let’s read the (small) section from the man page (take you time to read it twice before going on):
eval [arg ...]
The args are read and concatenated together into a single com‐
mand. This command is then read and executed by the shell, and
its exit status is returned as the value of eval. If there are
no args, or only null arguments, eval returns 0.
You could ask yourself: how is this different from just executing the commands? There’s a big difference: the entire command is built at run time when the shell executes the eval
and is not “set in stone” when that line is written!
Let’s make a few enlightening tests:
$ eval echo $PATH # Nothing new here!
/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/bin
$ var=PATH
$ eval echo \$$var # This is it!
/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/bin
In the second command above we have echo
-ed the value of a variable whose name (PATH
) is defined inside another variable (var
). We need to add an extra \$
in front of $var
so we can get a $
in fron of whatver $var
contains when the eval echo
will be executed. It is not really trivial, but it is not rocket science either.
We can go even deeper into this rabbit hole. Just read this:
$ for n in 1 2 4; do eval eval echo "PS$n='\$PS$n'"; done
PS1=$
PS2=>
PS4=+
(Note: PS
* variables are used by Bash to control the look of the prompt. RTFM).
We are not just putting a varibale name into another variable, we are constructing a variable name from a (counter) variable!
In this sense, eval
is not a command, is (also) a meta-command, a command to dynamically build and execute other commands, even from variables.
We can now modify our function body to match the desired syntax we have been talking about with simple “1 liner:
function copy_file () {
while [[ $1 ]]; do eval "local $1"; shift; done # Look, ma'!
...
We have put that while
loop onto a single line as an editor-friendly prologue. What happens there, then? Let’s analyze it.
First, there is a while
loop that scans all arguments as long as (while
) they are not empty. The [[ ... ]]
is a Bash built-in that evaluates expressions. [[ $1 ]]
actually tests whether $1
(as a string) is empty. The loop ends as soon as it finds an empty argument. More on this later.
Inside the loop a command is built with eval
by prepending the local
predicate to whatever is in the currently first argument ($1
). Whatever meta-command results from that is then executed. Of course we expect that argument to look like name=value
so the meta-command will evaulate to local name=value
which creates a local (to the function body) variable and assigns it a value. To be noted that also name
can be considered a valid value (that will expand as local name
with an empty value) while not really useful. Anything else is likely to trigger a syntax error in the function body.
Finally the shift
built-in command “just” removes the first argument from the argument list and shifts all the remaining ones by one position to the left, ready for another possible loop run.
So, when we pass src=/etc/hosts
as an argument to the function, the first line in the body builds and executes this command: local src=/etc/hosts
. This is nothing more than the declaration of a variable local to the function body (that will possibly shadow any similarly named variable existing outside of the function body) with an assigned value. This variable will effectively work just like a function argument and can be referred to … by name.
So, our second iteration of the original function then becomes:
function copy_file () {
while [[ $1 ]]; do eval "local $1"; shift; done
local d t
if [[ $cleanup ]]; then
if [[ -d "$dst" ]]; then
d="$dst"
t="$dst/*"
else
d=$(dirname "$dst")
t="$dst"
fi
rm -fr "$d"
mkdir -p "$d"
fi
cp -a "$src" "$dst"
if [[ $touch ]]; then
touch -c "$t"
fi
}
So, in the end, we end up with a solution similar to the original //trick// where we give a name to positional arguments thanks to local variables. With the very big difference that:
- The name is to be explicitly used by the caller
- The name can shadow an exernal variable
- The list of arguments can be set up with any order
- Extra parameters can be passed so nested called functions can see them
- Default argument values can be defined once and forever as global variables and overriden at will
- Extra arguments can be skipped if pre-pended by a
''
or a""
(empty string) in the function call argument list.
Isn’t it nice? Of course, this is just a starting point.