We love you Bash

We love you Bash… but sometimes you are so damn slow

I'm sure this has happened to you before… I had to create an ldif file containing 70.000 entries like these ones:

dn: user=XXXXXXXXXX,rootdn=com
changetype: modify
replace: attr1
attr1: qwerty1
-
replace: attr2
attr2: {"reporting": [{"name":"Total","reportingLevel":"total","subscription":"A", "time":500,"reset":{"main":"30 ","time":"30 "}}}]}

(the real data was a little bit larger, this is just an example)

It was supposed to be something really quick and simple, one of those one-liners you write in 3 seconds… I said to myself: bash + loop + echo + redirection to file = WIN

Well, it proved me wrong :

The quick and dirty code to be executed in the shell:

for i in `seq 1 70000`;
do
echo -e "dn: user=XXXXXXXXXX,rootdn=com
changetype: modify
replace: attr1
attr1: qwerty1
-
replace: attr2
attr2: {\"reporting\": [{\"name\":\"Total\",\"reportingLevel\":\"total\",\"subscription\":\"A\", \"time\":500,\"reset\":{\"main\":\"30 \",\"time\":\"30 \"}}}]}
"  >> modify.ldif
done

So I issued that command, and went out for a bite… 30 minutes later, the fraking thing was still running!!! How was that even possible…

I realized I was opening the file every time I wrote a line, a nonsense… so I just stopped the script and tweaked it a little bit… Result: pretty much the same… Here's some figures with only 5000 iterations per loop with different redirections:

Initial redirection:

for i in `seq 1 5000`; do> "  >> modify.ldif; done
real    4m17.606s
user    1m4.028s
sys     1m35.850s

Only one redirection outside of the loop:

for i in `seq 1 5000`; do> " ; done >> modify.ldif
real    4m30.218s
user    1m8.496s
sys     1m41.390s

I also tried to use a file descriptor, just for the sake of it, but it has no impact:

exec 4>> modify.ldif
for i in `seq 1 5000`; do> "  >&4; done
real    4m32.197s
user    1m10.248s
sys     1m39.294s

And using the FD outside of the loop:

for i in `seq 1 5000`; do> "; done >&4
real    4m31.478s
user    1m11.740s
sys     1m38.678s

So all in all, as you can see, Bash IO redirection performance, basically sucks.

A 20 lines python script (with no optimization whatsoever) is able to generate the whole ldif file (creating 70.000 entries to modify) in less than 1 second:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/env python

data="""dn: user=XXXXXXXXXX,rootdn=com
changetype: modify
replace: attr1
attr1: qwerty1
-
replace: attr2
attr2: {\"reporting\": [{\"name\":\"Total\",\"reportingLevel\":\"total\",\"subscription\":\"A\", \"time\":500,\"reset\":{\"main\":\"30 \",\"time\":\"30 \"}}}]}
"""

with open("modify.ldif",'w') as outputF:
    for iter in xrange(70000):
        outputF.write(data.replace("XXXXXXXXXX ",str(iter)))

Executing it:

# time python gen_ldif.py
real    0m0.506s
user    0m0.108s
sys     0m0.052s

How about that? Sometimes it's just better to use the right tool for the right job...

So yeah, we love you Bash… Even though you are really slow sometimes.

\\psgonza