Over the past two years at my job at OkCupid, I’ve collected a few useful nuggets of bash know-how. These days, my coworkers come to me frequently and ask for help with solving various problems with this tool and so I wanted to share some of the pieces here. This is going to show off how to make use of bash arrays to make managing lists easier in bash.
Our project is going to be a program that reads numbers from standard input and then writes the trailing average of those numbers onto standard output. It should take one integer parameter for the number of trail points it should average together. If it hasn’t read enough to average all of the trail points, then it should just average everything. That should be enough to get started.
The basic structure of the program will be the main input loop, so let’s get that going:
#!/usr/bin/env bash
# How many numbers to average at a time
traillength=$1;
# Main input loop
while read number; do
# TODO: Put useful code here.
done
We’ll also need the trail length parameter, as well as a queue of numbers that we want to average. We’ll use a bash array for that part.
# How many numbers to average at a time
traillength=$1;
# A bash array
queue=();
The parentheses tell bash to let us access $queue
with some special array syntax.
Next we’re going to want to append $number
onto $queue
at the start of ever loop iteration.
# Append the next number onto the array
queue=(${queue[*]} $number);
The same parentheses notation is telling bash to treat the contents of the parentheses as members of an array. ${queue[*]}
tells bash to give us the entire contents of the array separated by spaces (or whatever is in your $IFS
variable — which we’ll make use of in just a second).
Anyway, the next step is to add up all of the things in our list. Since bash can’t do floating-point arithmetic, we’ll need to make use of bc
, the arbitrary precision calculator for unix. We’re going to accomplish this by echoing the expression we want evaluated into bc and then saving its answer. The general form for such a maneuver looks like this:
answer=$(echo "1+1" | bc);
So that’s simple enough. But we want something a little fancier. In order to tell bc that we want to do floating-point arithmetic, we need to set its scale so that it knows how many decimal points to use for output.
answer=$(echo "scale=2; 12.5 + 3.14" | bc);
Finally, we need to actually get the numbers in our queue in there. This is where it gets tricky because we need to get the +
signs in between all of our numbers. This is where the $IFS
variable comes in. $IFS
(Internal Field Separator) tells bash what value to use to separate lines and words within data. The result is:
# Add up all of the numbers in the array
_IFS="$IFS"; # Save old IFS value
IFS="+";
total=$(echo "scale=4; ${queue[*]}" | bc);
IFS="$_IFS"; # Restore old IFS value
NOTE: The @
special index can be used in many of the places where the *
index can be used. THIS IS NOT ONE OF THEM! In fact, @
and *
are only different in how they behave inside of quotes and how they respect the $IFS
variable.
So that will sum up all of the things in our queue. Now we have half of the formula. The other half is the total number of things in the queue. In bash, we can get this value with the following syntax:
size=${#queue[*]};
Now we can use bc to get the average.
echo "scale=4; $total/$size" | bc;
The last step is to actually limit the queue size. On each loop iteration, we’re going to check the queue size. If it’s bigger than $traillength
then we’re going to chop off the first element. That step looks like this:
# If we've gone over our specified array size...
if [ "$size" -gt "$traillength" ]; then
# Lop off the first (oldest) element of the array
queue=(${queue[*]:1});
fi
We’ve made use of one final piece of tricky bash-array syntax here. The colon parameter expansion operator lets us specify an index offset for the array. The part inside the parentheses, ${queue[*]:1}
expands to the entire array, except starting at index 1. The parentheses behave just like they have above.
And Voila! We’re done! The full script looks something like this:
#!/usr/bin/env bash
# Watch a stream of numbers from standard input and print out the N-point
# trailing average of those numbers
# How many numbers to average at a time
traillength=$1;
# A bash array
queue=();
while read number;
do
# Append the next number onto the array
queue=(${queue[*]} $number);
# Count up all the things in the array
size=${#queue[*]};
# If we've gone over our specified array size...
if [ "$size" -gt "$traillength" ]; then
# Lop off the first (oldest) element of the array
queue=(${queue[*]:1});
fi
# Add up all of the numbers in the array
_IFS="$IFS";
IFS="+";
total=$(echo "scale=4; ${queue[*]}" | bc);
IFS="$_IFS";
# Count them up again
size=${#queue[*]};
echo "scale=4; $total/$size" | bc;
done
Final Thoughts
Most people who know me well know that I hate perl for two main reasons. The first is that it’s capable of some VERY scary looking syntax. The second and arguably more important reason is that, in my experience, the most widely-held best practices for writing perl involve using these exact same unintelligible pieces of syntax. I don’t write Perl because I have a choice and have chosen not to.
When it comes to bash, my choice is much harder to make. As you can plainly see above, bash is capable of some seriously busted syntax. I don’t wish these hateful strings of punctuation on any more than is necessary, but there’s just so much less to chose from in the world of shells. At the end of the day, your options are sh, bash, csh, tsh, ksh, and zsh. Many of those aren’t going to be installed on any/all machines that you need to use in your geeky existence. One of my coworkers uses zsh and constantly needs to bug our sysadmin to make sure it’s installed everywhere.
If you want a portable command-line experience, you’re relegated to the feature-sparse sh, or bash. This is why I think it’s worth making friends with bash and its otherwise unsightly syntax. Knowing some bash-fu moves can save you from doing lots of tedious text processing by hand, and can make you feel at-home and much more comfortable with such a ubiquitous tool.