NEF batch processing in MacOS (from CLI)

Hi there

In case someone has to scratch the same itch as me (converting and resizing big NEF files to ease the visualization over the network), this is (by far, I'd say) the easiest way to generate (and resize) a jpg file from a big fat Nikon RAW file (NEF):

psgonza/Nikon$ for i in `ls -1 *.NEF`; 
do 
sips -s format jpeg -s formatOptions 50 -Z 3000 "${i}" --out "jpg/${i%NEF}jpg"; 
done

Above command will:

  • Go through all the NEF files in current directory
  • Generate a jpg from the original nef file
  • Set jpg quality at 50%
  • Resize the image to max 3000px (either width or height, whatever is bigger)

It took A WHILE to process all the files (in my case, 1003 pictures), but I'd say it is worthy... I ended up dealing with (aprox) 1MB jpg files, instead of the original 25MB NEF files I had.

I am fairly new to the Mac OS world so I didn't know about sips until 30 min ago, although it has been included by default in Mac OS since 2003...

Pretty handy

PSA: MiniKeePass and key files

In case you plan to use MiniKeePass on your iphone with key files (which I guess is highly recommended if you plan to share your kdb via Dropbox/Drive/you_name_it), pay special attention to the last line of their help page:

Using Key Files

MiniKeePass can open KeePass 1.x/2.x files that use a key file instead of or in
addition to a password.

Steps:

Load your KeePass database and key file in MiniKeePass using iTunes, Dropbox, etc.
Open your KeePass file in MiniKeePass
When prompted for your password you can enter a password and/or select a key file

Note: MiniKeePass will automatically select your key file if it has the same
filename as your KeePass file but with a .key extension.

Basically, you have to upload your data base and the key file with the same name (ie: MyDB.kdb and MyDB.key) to your_cloud_storage_service_here, and then, open both of them with MiniKeePass. Otherwise, all you will see in the Key-file screen is "None".

I learnt it the hard way and I lost 15 minutes of my life trying to import a key file with a different name than the KeePass database.

The devil is in the detail, as usual.... ;)

///psgonza

KeePass is great, you should totally use. MiniKeePass is just fine, does the job too.

Bye bye 2017, hello 2018

We have a coouple of hours left from this 2017... Or maybe you are already in 2018, or half a day away if you are in some places of the US...

Anyways, happy new year!

There will be time to analyze what was good, bad or needs to be improved in the coming days... Meanwhile, enjoy New Years Eve!

\\psgonza

String manipulation exercise: Perl, Python, Awk

Here comes a small comparation of the performance of Perl, Awk and Python while parsing and splitting lines in a BIG ldif file with thousands or millions of subscriber profiles like this one (I created this ldif file as an example, just for the sake of clarity)

As in the example, one of the attributes in my ldif files was a huge base64 string (more than 10k bytes long) in a single line, which is not supported by slapadd/slapd (I have checked LDIF rfc and I don't see any mention to the 4096 bytes limitation, so it could be our own implementation, not sure about this), so the idea was to spit this line into 76 characters lines (as per recommendation)... Something like this:

Original line (short version):

...
Service: SSBhbSBoYXBweSB0byBqb2luIHdpdGggeW91IHRvZGF5IGluIHdoYXQgd2lsbCBnbyBkb3duIGluIGhpc3RvcnkgYXMgdGhlIGdyZWF0ZXN0IGRlbW9uc3RyYXRpIGV2ZXJ5IHN0YXRlIGZa
...

Replaced by:

...
Service: SSBhbSBoYXBweSB0byBqb2luIHdpdGggeW91IHRvZGF5IGluIHdoYXQgd2lsbCBnbyBkb3
 duIGluIGhpc3RvcnkgYXMgdGhlIGdyZWF0ZXN0IGRlbW9uc3RyYXRpIGV2ZXJ5IHN0YXRlIGZa
...

(Notice the blank at the beginning of the second line)

My first approach was to use Python standard module textwrap:

import sys
import textwrap

try:
    ldiffile = sys.argv[1]
except:
    print("Input file missing... Exit")
    sys.exit(0)


with open(ldiffile, "rU") as f:
    for line in f:
        if line.startswith("Service:"):
            print(textwrap.fill(line, width=76,subsequent_indent=' '))
        else:
            print(line.strip())

It was really simple and produced the expected result, but it was ridiculously slow. RIDICULOUSLY. Really, difficult to believe...

So I decided to do something similar, but this time I dealt with the line split myself:

import sys

try:
    ldiffile = sys.argv[1]
except:
    print("Input file missing... Exit")
    sys.exit(0)

def fixlen(s, n):
    #First line, no blank at the beginning of the line
    print (s[:n])
    s = s[n:]
    #now, starting with blank
    while s:
        print (" " + s[:n])
        s = s[n:]

with open(ldiffile, "r") as f:
    for line in f:
        if line.startswith("Service:"):
            fixlen(line,76)
        else:
            print(line.strip())

The results I got were way better, but still, it took more than I expected... Time to give it a try with other tools: AWK and Perl

Basically I "translated" the python script to awk and perl, almost to the letter...

Perl version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/perl
my $noident = 0;
while ($line = <>) {
  chomp($line);
  if ( $line =~ /^Service:.*/)
  {
    for (unpack("(A76)*",$line)) {
        if ($noindent == 0) {
            print "$_\n";
            $noindent = 1; }
        else { print " $_\n"; }
    }
  }
  else
  {
    $noindent = 0;
    print "$line\n";
  }
}

Awk version:

awk '
  BEGIN {
    indent = 0;
    i=0;
  }
  {
  if ( $0 ~ /^Service:.*/) {
    while(i<=length($0)){
        if ( $indent == 0 ) { printf "%s\n", substr($0,i,76);i+=76;indent=1; }
        else { printf " %s\n", substr($0,i,76);i+=76;}
   }
   }
   else {
        print $0;
   }
  }' $1

I "faked" several input files with thousands of lines (discarding the output), executed all the scripts in Cygwin64, and then checked the time it took.. Something like this:

time python3 textwrap.py XXXX.ldif &> /dev/null
time python3 pythonv1.py XXXX.ldif &> /dev/null
time python3 pythonv2.py XXXX.ldif &> /dev/null
time perl split.pl  XXXX.ldif &> /dev/null
time ./split.awk XXXX.ldif &> /dev/null

All of them generated the exact same output (except the textwrap version that handles the first line in a different way), but the execution time differs:

{% img center https://raw.githubusercontent.com/psgonza/bynario/master/results_table.JPG 'results_table' %}

It is easier to see in a chart:

{% img center https://raw.githubusercontent.com/psgonza/bynario/master/results_chart.JPG 'results_chart' %}

My takeaways after this small exercise:

  • Awk rocks.
  • Perl's black magic is almost as fast as awk.
  • Python is not really fast at file processing. Yes, there are ways to improve this, by splitting in chunks, parallel processing and whatnot... But it takes more than 15 lines.
  • Stay away of textwrap module for heavy usage.

It was fun... Bye!

PS: As you can see in the results, there is a "Python v2" column which produced better results... It is a modification of the original script where I used a comprehension list approach:

import sys

try:
    ldiffile = sys.argv[1]
except:
    print("Input file missing... Exit")
    sys.exit(0)

def fixlen(s, n):
    first = True
    tmp = (s[0+i:n+i] for i in range(0, len(s), n))
    for x in tmp:
        #First line, no blank at the beginning of the line
        if first:
            print(x)
            first=False
        else:
            #now, starting with blank
            print(" " + x)

with open(ldiffile, "r") as f:
    for line in f:
        if line.startswith("Service:"):
            fixlen(line,76)
        else:
            print(line.strip())

Issues running gpg in a container

In case it helps...

I am giving a try to this docker componse example, and for some strange reason, docker was stuck in this block of code in the Dockerfile for one of the components:

# grab gosu for easy step-down from root
ENV GOSU_VERSION 1.7
RUN set -x \
        && apt-get update && apt-get install -y --no-install-recommends ca-certificates wget && rm -rf /var/lib/apt/lists/* \
        && wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg --print-architecture)" \
        && wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg --print-architecture).asc" \
        && export GNUPGHOME="$(mktemp -d)" \
        && gpg --keyserver ha.pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 \
        && gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu \
        && rm -r "$GNUPGHOME" /usr/local/bin/gosu.asc \
        && chmod +x /usr/local/bin/gosu \
        && gosu nobody true

Specifically, in this line:

gpg --keyserver ha.pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 

It is a really simple command... I could ping the keyserver, I checked the website and everything looked ok, but it wasn't working, neither in the host machine or inside a container... The response was the same: timeout while getting the keys

So... Here comes netstat to the rescue:

# netstat -natop | grep gpg
tcp        0      1 192.168.1.10:34340      104.236.209.43:11371    SYN_SENT    8653/gpg2keys_hkp    on (7,10/3/0)

SYN_SENT? So it was trying to stablish the connection, but there wasn't any response from the remote host... In that port

It turns out I recently upgraded my internet connection, and the new router allows you to customize the firewall security levels (you know, low, medium, high and paranoid... yeah, I am using the latter). I noticed that port 11371 was not defined in as a "known service", so I wasn't able to reach it from within my home network.

As soon as I allowed the connection to that port:

# gpg --keyserver ha.pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4
gpg: solicitando clave BF357DD4 de hkp servidor ha.pool.sks-keyservers.net
gpg: /root/.gnupg/trustdb.gpg: se ha creado base de datos de confianza
gpg: clave BF357DD4: clave pública "Tianon Gravi <tianon@tianon.xyz>" importada
gpg: no se encuentran claves absolutamente fiables
gpg: Cantidad total procesada: 1
gpg:               importadas: 1  (RSA: 1)

So make sure all your ip flows are open, even for such a small thing as this one...

Take care out there!