Getting the images in order

Every project is different, so having some scripting skills comes in handy. This book is over 600 pages long so it was definitely worthwhile to create a tool that would help automate the task of renaming the image files in something that approximated page order.

I worked on one signature at a time and renamed the files after I had completed scanning both sides of each leaf. That means I was left with all the odd pages in one file name range, followed by the corresponding even pages in the next range. Time for Perl!

#!/usr/bin/perl

use strict;
use File::Copy;

my ($firstfile, $lastfile, $startnum, $increment) = @ARGV;

$firstfile =~ /(.*?)([0-9]+)(\..*)/;

my ($pre, $input_fnumpart, $suff)=($1, $2, $3);

print "First file number part is '$input_fnumpart'\n";
print "And is " . length($input_fnumpart) . " long\n";

my $l = length($input_fnumpart);
my $sprin = "%0$l" . "d";
my @inputfilenames;
my $inputfilename=$firstfile;
my $number=0;
push(@inputfilenames, $inputfilename);

while ($inputfilename ne $lastfile) {
	$number++;
	$inputfilename = $pre 
            . sprintf($sprin, $number + $input_fnumpart)
            . $suff;
	push(@inputfilenames, $inputfilename);
}

print "@inputfilenames\n";
for my $inputfilename (@inputfilenames) {
	$inputfilename =~ /(.*?)([0-9]+)(\..*)/;
	my ($pre, $input_fnumpart, $suff)=($1, $2, $3);
	my $outputfilename = $inputfilename;
	my $output_fnumpart = sprintf($sprin, $startnum);
	$outputfilename =~ s/$input_fnumpart/$output_fnumpart/g;
	$startnum += $increment;
	print "mv $inputfilename $outputfilename\n";	
	move($inputfilename, "renamed/$outputfilename");
}

Does the script detect the page numbers in the image? No, you have to do that yourself. Good question, though. Some images are well-suited to such a process (with tesseract-ocr and imagemagick), but I didn't try it here.

The script was written for this project but would be easy enough to extend for other situations.

Irfanview makes short work of scrolling through a directory of images to ensure they've been renamed correctly. Mistakes were made but were usually limited to no more than about 32 images at a time.