Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Use Automator for manipulating Scanned PDFs Scanners
If you have a scanner with an automatic document feeder it is convenient to scan double sided pages by scanning the front sides (odds) and then the back sides (evens). I have written an Automator workflow that will take those scanned pages as input, and then output them as a combined and reordered PDF document.

The workflow queries for the source pages, reverses the order of the even pages using PDFtk and then combines and outputs the pages in the correct order.

Requirements: PDFtk is required; however this step could potentially be replaced with a Python script. Unfortunately, building pdftk from either MacPorts or Fink is currently broken so there is a precompiled version here. I only tested it on OS X 10.6.2, so if you use it on other versions, please let me know in the comments if it breaks. I don't believe it will work in 10.5.x.

The Automator workflow can be downloaded from here. And I put up a screenshot of the workflow in Automator. Please feel free to modify, copy, and redistribute the workflow but please share your improvements here.

[crarko adds: I haven't tested this one. The Automator workflow is mirrored here.]
    •    
  • Currently 2.82 / 5
  You rated: 5 / 5 (17 votes cast)
 
[11,760 views]  

Use Automator for manipulating Scanned PDFs | 5 comments | Create New Account
Click here to return to the 'Use Automator for manipulating Scanned PDFs' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Use Automator for manipulating Scanned PDFs
Authored by: robsilve on Jun 25, '10 10:29:52AM

For anyone who is interested, I wrote an applescript a while back that performs a similar function:
Paste everything below into the Script Editor:

-- This script is for those of us who have scanners with a document feeder, but the document feeder can only scan 1 side of a double sided page at a time. This script will set the names of scanned files to the correct order after scanning side one of a packet, then side two of a packet. Script by Bob Silverstein - silverst@orthonj.com. This is provided as donation-ware. If you find it useful, please donate what you think it is worth to my paypal account using the above email address.
(* Put the document packet to be scanned in the document feeder and scan it (first page on top). Remove the packet from the output bin and put it back into the document feeder with the last page on top. (You will now be scanning the last page first, and the first page of the document will be face down on the bottom of the pile. If the last page is blank or you don't want whatever is on the last page to be part of the document, remove this page from the top of the pile and don't scan it). After scanning is complete, move the files to be renumbered into an empty folder (nothing else is in that folder besides the files to be renumbered). Then open the window of this folder and make sure it is the front-most window. Make sure the list is sorted by name in ascending order. Then launch this script. The list of files can now be dragged and dropped onto Acrobat's icon (note that this is the application that can create PDFs, not the reader) and made into a single document in the correct page order. Note that if you want acrobat to combine all of the pages into a single document and the documents are all PDFs, this drag and drop method will not work; Acrobat will just open all of the documents rather than combining them. If you want acrobat to be able to combine them, set your scanner software to save the files as TIFFs or JPEGs.*)

property the_extension : ".tif"
--change the extension in quotes to the original extension of the scanned files (if your scanner software generates files with the .jpg extension, use .jpg above)
property the_filename : "file"
--the base filename can be whatever you want, as long as it is different from the original base filename of the scans (if your scanner software names the pages "page001", "page002", etc., don't set the base filename above to "page").
property num_scans : "1000"
--this is the maximum number of pages that you would scan per document. You can increase this if you need to scan more pages at once.
try
tell application "Finder"
set the source_folder to (folder of the front window) as alias
end tell
end try

set the item_list to list folder source_folder without invisibles
set source_folder to source_folder as string
set num_items to number of items in the item_list

--even number of files
if num_items / 2 mod 1 is 0 then
set first_half to (num_items / 2 as integer)
set second_half to first_half
else
--odd number of files
set first_half to (num_items / 2 div 1) + 1
set second_half to first_half - 1
end if

--first half
set increment to -1
repeat with i from 1 to first_half
set increment to increment + 1
set this_item to item i of the item_list
set this_item to (source_folder & this_item) as alias
set this_info to info for this_item
set the current_name to the name of this_info
set the text_item_list to every text item of the current_name
set the new_item_name to the_filename & (num_scans + i + increment) & the_extension
my set_item_name(this_item, new_item_name)
end repeat

--second half
set increment to 0
repeat with i from 1 to second_half
set increment to increment + 1
set this_item to item (num_items + 1 - i) of the item_list
set this_item to (source_folder & this_item) as alias
set this_info to info for this_item
set the current_name to the name of this_info
set the text_item_list to every text item of the current_name
set the new_item_name to the_filename & (num_scans + i + increment) & the_extension
my set_item_name(this_item, new_item_name)
end repeat

on set_item_name(this_item, new_item_name)
tell application "Finder"
activate
try
set the name of this_item to new_item_name
end try
end tell
end set_item_name



[ Reply to This | # ]
Use Automator for manipulating Scanned PDFs
Authored by: zed on Jun 27, '10 01:15:11AM

While the above I'm sure works and works well... I solved the problem by buying a SnapScan S510M which scans both sides of a page at the same time. It's supposed to OCR the document as well for PDF searching but I'm not tried that yet.

---
www.rho.cc



[ Reply to This | # ]
Use Automator for manipulating Scanned PDFs
Authored by: tedw on Jul 13, '10 09:42:15AM
Actually, this can be done with Apple provided technology entirely (e.g., without using PDFtk). Make a workflow with the following actions:
  • Ask for Finder items - to select the 'odd' page scans
  • Ask for Finder items - to select the 'even' page scans
  • Run applescript - to sort the files into the correct list order. Use the following script
    on run {input, parameters}
    	-- interlace files
    	set output to {}
    	set midpoint to round ((count of input) / 2) rounding up
    	repeat with i from 1 to midpoint
    
    		-- work from beginning of list for odd pages
    		set end of output to item i of input
    
    		-- work from end of list for even pages
    		set j to (count of input) - i + 1
    		if j > midpoint then
    			set end of output to item j of input
    		end if
    	end repeat
    	
    	return output
    end run
  • New PDF from images - takes the sorted file list and creates a PDF from it.
Make sure that multiple selection is enabled on each of the Ask actions, and that all the actions accept input from the previous actions.
Edited on Jul 13, '10 09:45:20AM by tedw


[ Reply to This | # ]
Use Automator for manipulating Scanned PDFs
Authored by: borgo1971 on Nov 18, '10 12:44:39AM

Nice solution... But it seem's to work only if I've a file for every page. Is it possible to modify the script to use as input two PDF's, one with the odd page and one with the even page in reverse order?



[ Reply to This | # ]
Use Automator for manipulating Scanned PDFs
Authored by: promo1 on May 18, '12 09:49:11PM

I had a similar problem, but had the scanner make sets of PDFs each with multiples pages in it.
The solution posted here didn't work, so I eventually ended up writing an applescript that *uses PDFtk* (you need to download that that does the following
1- ask for files containing odd pages
2- ask file files containing even pages
3- ask for output filename (defaults to a randomly named file in the same folder as the first odd page file)
4- interleaves the 1st odd pages file with the *reverse of* the 1st even pages file
5- repeats step 4 for the number of files provided (must be the same number of odds and evens) and combines in to one output file
6 - deletes temporary files and opens the new PDF

Obviously you can use PDFtk via the command to do the same thing, this is just a lot easier...
---
Here's the script - it works well, but is not very robust in terms of error checking or anything like that and is kind of a hack.
----------cut and paste everything below here in to applescript and save----------
-- courtesy of the Beanlander
-- requires download of pdftk, available from http://www.pdflabs.com/docs/install-pdftk/ on 18 May 2012

on run {}
set PDFtk to "/usr/local/bin/pdftk"
set PDFext to ".pdf"

-- get odd files
set files_OddPages to choose file of type {"PDF ", missing value, "PDF Document"} with prompt "Choose files containing odd numbered pages" with multiple selections allowed

-- get even files
set files_EvenPages to choose file of type {"PDF ", missing value, "PDF Document"} with prompt "Choose files containing even numbered pages (to be reversed)" with multiple selections allowed

-- check the same number of odd and even files
if (count of files_OddPages) ≠ (count of files_EvenPages) then
display dialog "Error - different number of odd and even files"
return
end if

-- create a temporary folder for files
tell application "Finder" to set tmpFolderLocation to container of item 1 of files_OddPages
set tmpFolder to createTempFileDirectory(tmpFolderLocation, createRandomUniqueFilename())

-- set output filename / location
set outputFile to getOutputFile(tmpFolderLocation as alias, createRandomUniqueFilename() & PDFext)


set tmpFiles to {}
set alphabet to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
repeat with i from 1 to count of files_OddPages

-- create tempfile name
if (count of files_OddPages) = 1 then
set pdftkScript to (POSIX path of outputFile)
else
set newTempFile to createTempFile(tmpFolder, createRandomUniqueFilename(), PDFext)
set end of tmpFiles to newTempFile
set pdftkScript to newTempFile
end if

set pdftkScript to PDFtk & " A=" & (POSIX path of item i of files_OddPages) & " B=" & (POSIX path of item i of files_EvenPages) & " shuffle A Bend-1 output " & pdftkScript

do shell script pdftkScript
end repeat

if (count of tmpFiles) > 0 then
-- combine the temporary files in to one
set fileHandles to ""
set operations to "cat "
repeat with i from 1 to count of tmpFiles
set fileHandles to fileHandles & (item i of alphabet) & "=" & (item i of tmpFiles) & " "
set operations to operations & (item i of alphabet) & " "
end repeat

set pdftkScript to PDFtk & " " & fileHandles & " " & operations & " output " & (POSIX path of outputFile)

do shell script pdftkScript
end if

-- erase the temporary files
eraseTempFileDirectory(tmpFolder)

-- open neew PDF
tell application "Finder" to open outputFile

end run

on createTempFile(theFolder, filebaseName, fileExt)

set newFilename to (POSIX path of theFolder as text) & filebaseName & fileExt

return newFilename
end createTempFile

on getOutputFile(defaultLocation, defaultFilename)

tell application "Finder"
set myPrompt to "Enter output filename"
--set defaultLocation to path to desktop
set outputFile to choose file name with prompt myPrompt default name defaultFilename default location defaultLocation
end tell

return outputFile
end getOutputFile

on eraseTempFileDirectory(tmpFolder)
tell application "Finder"
try
delete tmpFolder
do shell script "rm -rf ~/.Trash/\"" & name of tmpFolder & "\""

end try
end tell
return
end eraseTempFileDirectory

on createTempFileDirectory(baseFolder, folderName)
tell application "Finder"
if not (exists folder folderName of baseFolder) then
make new folder at baseFolder with properties {name:folderName}
end if
set newFolder to folder folderName of baseFolder as alias
end tell
return newFolder
end createTempFileDirectory


on createRandomUniqueFilename()
set randomCharacterList to {}
repeat with i from 1 to 8
set end of randomCharacterList to (some item of (every character of "qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM1234567890"))
end repeat

return (randomCharacterList as text) & getDateString(current date)

end createRandomUniqueFilename

on getDateString(theDate)

set {year:y, month:m, day:d, time string:t} to theDate

set date_format to (y * 10000 + m * 100 + d) as text
set time_format to (t) as text

set reformated_date to (text items 3 thru 4 of date_format)
& (text items 5 thru 6 of date_format)
& (text items 7 thru 8 of date_format)
& getTimeString(time_format)

return reformated_date

end getDateString

-- returns time in format HHMMSS in 24 hour format
on getTimeString(theTime)

set currentDelimiter to AppleScript's text item delimiters

set AppleScript's text item delimiters to ":"
set timeComponents to (text items 1 thru 2 of theTime)
set secondsAMPM to (text item 3 of theTime)

set AppleScript's text item delimiters to " "
set end of timeComponents to (text item 1 of secondsAMPM)
set end of timeComponents to (text item 2 of secondsAMPM)

if (last item of timeComponents) is equal to "PM" then
set formattedString to ((item 1 of timeComponents as number) + 12) as text
else
set formattedString to item 1 of timeComponents
end if
set formattedString to formattedString & (item 2 of timeComponents) & (item 3 of timeComponents)

set AppleScript's text item delimiters to currentDelimiter

return formattedString

end getTimeString










[ Reply to This | # ]