This post is inspired by another blog post called “How do I modify a variable in Haskell?” by Michael Burge.
It got me thinking about different methods of printing Identity Matrices in Bash (because I am not smart enough to do it in Haskell), and because Rosetta Code was missing a Bash solution. (Update from 9/17/2017: they accepted my solution)
But first off, what is an Identity Matrix?
A square matrix in which all the main diagonal elements are 1’s and all the remaining elements are 0’s is called an Identity Matrix. Source
And here is an example (Source):
Alright, let’s get to generating them, or you can jump to my final solution
I usually reach for bc
if I need to do anything somewhat complex arithmetically in Bash. So let’s try that first.
With this script, I am going to try to do things “right” and not code-golf yet. It will also default to 10 if no argument is passed.
You can try it online or here it is:
#!/bin/bash
max="${*:-10}";
for i in $(seq "$max"); do
echo "obase=2; $(( 1 << max - i))"\
| bc \
| xargs printf "%0"$max"d\\n";
done
This isn’t super complicated, but I will still break it down:
max="${*:-10}";
Pulls in all arguments, and defaults to 10 if none are provided.
for i in $(seq "$max"); do
Just generate the seq
uence of 1 to the target number.
echo "obase=2; $(( 1 << max - i))" | bc
This is where it gets fun. The obase=2
tells bc
that we want base 2 instead of base 10 since we are doing bitwise changes next.
Then we do a left bitwise shift of the first argument minus the current position which gives us the numbers we need:
1000000000
100000000
10000000
1000000
100000
10000
1000
100
10
1
But we are missing the padding to make it a true Identity Matrix.
xargs printf "%0"$max"d\\n";
Pipes the output of bc
to printf
who uses its %Wd
syntax to pad the output to W
characters long, in this case the first argument, and then an explicitly escaped newline.
Which finally gives us what we want:
1000000000
0100000000
0010000000
0001000000
0000100000
0000010000
0000001000
0000000100
0000000010
0000000001
This is kind of wordy though and has a lot of drawbacks. For example:
$ ./bcIdentityMatrix 20
### Hundreds of Lines ###
printf: ‘10000000000000000000’: Numerical result out of range
Ouch. Looks like we are running into our system’s LONG_MAX
of 9223372036854775807 which only gives us a maximum of a 19 digit diagonal matrix. Not very versatile at all.
I probably won’t be able to escape that limit while still actually “calculating” the matrix. But I can get rid of our reliance on bc
, so let’s try a new method without it.
I am getting bored, so let’s code golf this a little, and let’s switch to using the powers of 10 to find the next number instead of bitwise shifting since that gave me headaches:
for i in `seq $(($1-1))|tac`; do printf "%0$1d\n" "$((10**i))";done
Not as clear as the last one, but actually simpler in a lot of ways:
for i in `seq $(($1-1))|tac`; do
Generate the seq
uence of 1 to the first argument minus 1. Then reverse it with it tac
.
A couple notes:
$()
because it saves me a character. Please don’t do that.rev
, but tac
is actually much better at interpreting the input from seq
as a list. The more “standard” solution to this problem is probably seq $1 -1 0
to count down.-1
part because otherwise the matrix will be off by one at the top like so:
10000000000
1000000000
0100000000
0010000000
0001000000
0000100000
0000010000
0000001000
0000000100
0000000010
It would probably be faster to have sed
delete the first line if I had already used it somewhere else. Moving on:
printf "%0$1d\n" "$((10**i))"
Same general idea as the printf
of my first script. Pad the output of 10, to the power of the current integer ($((10**i))
), to the width of the first argument (%0$1d\n
).
Alas, this suffers the same fate of my last script:
$ ./exponentSolution 20
-8446744073709551616
01000000000000000000
00100000000000000000
00010000000000000000
00001000000000000000
00000100000000000000
00000010000000000000
00000001000000000000
00000000100000000000
00000000010000000000
00000000001000000000
00000000000100000000
00000000000010000000
00000000000001000000
00000000000000100000
00000000000000010000
00000000000000001000
00000000000000000100
00000000000000000010
At least we got rid of bc
though. Let’s try faking the calculation.
We will be stuck on a 19 digit matrix unless we can trick Bash into treating the output as a string, and “calculating” it using a preset number to fill in the 1’s as needed:
for i in `seq $1`;do printf '%*s\n' $1|tr ' ' '0'|sed "s/0/1/$i";done
So let’s break this down:
for i in `seq $1`;do
Pretty simple. Generate the seq
uence of numbers 1, to the value of our first argument.
printf '%*s\n' $1|tr ' ' '0'
Tell printf
to generate a string that is as long as my first argument. Then tr
replaces it with a 0
. (Some might know this from another blog post). This is what the output looks like right now:
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
Missing some vital 1
characters. Let’s fix that:
sed "s/0/1/$i"
Switches a 0
with a 1
in the position that the current loop $i
is in. And then it generates our matrix as expected:
$ ./dirtysed 10
1000000000
0100000000
0010000000
0001000000
0000100000
0000010000
0000001000
0000000100
0000000010
0000000001
As always, /u/galaktos proposed a much cleaner and effective version of my code. I should probably just start sending them my blog before I post it :)
Here is their reproduced solution:
I wouldn’t even think of doing all that bit shifting or multiplication… here’s my take:
for i in {1..20}; do for j in {1..20}; do printf $((i==j)); done; printf '\n'; done
Or, in more lines:
for i in {1..20}; do for j in {1..20}; do printf $((i==j)) done printf '\n' done
If you don’t like the implicit boolean-to-int conversion, you can also use
$((i==j?1:0))
. Or$((i==j ? 1 : 0))
if you’re not golfing.Unfortunately, brace expansion happens before parameter and variable expansion, so just substituting 20 for
$n
doesn’t work if you want to make the size variable. But we can use Bash’s alternative for syntax for that:n=${1:-10} for ((i=0;i<n;i++)); do for ((j=0;j<n;j++)); do printf $((i==j)) done printf '\n' done
Oh, and shellcheck would prefer
printf '%s' $((i==j))
instead ofprintf $((i==j))
. Fair enough. (Interestingly, it doesn’t complain about the unquoted$(())
– it seems to know that arithmetic expansion can never produce multiple words. But unless I’m missing something, it can’t produce a%
sign that would be interpreted in a format string either, can it?)
The above was already a huge improvement on my loops, but they bested themselves again with a faster version:
Oh, I missed the part where we’re timing this. Here’s a slightly faster version:
#!/bin/bash n=${1:-10} for ((i=0;i<n;i++)); do line= for ((j=0;j<n;j++)); do line+=$((i==j)) done printf '%s\n' "$line" done
But it’s still fairly slow. But hey, it’s pure bash ¯\_(ツ)_/¯
I was also beaten on the bc
front with /u/ray_gun’s inventive bc
and dc
methods:
Another way of getting repeated 0s is multiplying a power of 10 by a number and removing that number with
sed
later (here, by 2):time BC_LINE_LENGTH=0 bc <<< 'r=10000; for ( a = 0; a < r; a++ ) { print 2 * 10^a, 1, 2 * 10^(r - a - 1), "\n" }' | sed 's/2//g' real 0m6.236s user 0m6.415s sys 0m0.476s
6 seconds instead of the author’s 1 minute and 28 seconds.
Here’s a faster one (ab)using
bc'
sscale
and usingsed
to remove the leading dot:time BC_LINE_LENGTH=10050 bc <<< 'scale=10000; 10^-1; for ( a = 0; a < scale-1; a++ ) last/10' | sed 's/^\.//' real 0m1.855s user 0m1.903s sys 0m0.382s
bc
anddc
can also check if the number they just calculated became 0 because internally it won’t be more precise thanscale
, e.g. withdc
:time DC_LINE_LENGTH=10050 dc <<< '10000k [10/ d0=q p ldx]Sd [q]Sq 1 ldx' | sed 's/^\.//' real 0m1.719s user 0m1.716s sys 0m0.404s
Edit:
bc
equivalent of the above, it needssed
to remove the last line (0):time BC_LINE_LENGTH=0 bc <<< 'scale=10000; 10^-1; while ( last != 0 ) last/10' | sed '$d;s/^\.//' >/dev/null real 0m1.504s user 0m1.523s sys 0m0.066s
Using a very unscientific method, let’s benchmark all the solutions to 10,000 digits on my machine. The methods that can’t go above 20 are obviously disqualified, so let’s test the others:
$ time for i in `seq 10000`;do printf '%*s\n' 10000|tr ' ' '0'|sed "s/0/1/$i";done
real 1m31.974s
user 0m29.209s
sys 0m4.848s
Ouch. Not looking great so far.
$ time ./purebash 10000
real 15m50.447s
user 7m20.544s
sys 0m0.421s
Slower, but it is pure bash
so it gets extra points for that. I may have also needed to test it in a different way to be fair.
$ time BC_LINE_LENGTH=0 bc <<< 'scale=10000; 10^-1; while ( last != 0 ) last/10' | sed '$d;s/^\.//'
real 0m17.071s
user 0m2.009s
sys 0m0.340s
Much faster than any other method! So use this one if you need to quickly generate huge matrices.
This was an interesting look at just how many methods there are for generating matrices. I especially enjoyed all the community feedback (thanks /r/bash and #bash), and I learned a ton from it.