PDF Pen Does OCR and Bates Stamping

PDF Pen Does OCR and Bates Stamping

A couple of months ago, Smile On My Mac updated their PDF Pen and PDF Pen Pro applications from version 3 to version 4.  Our pal, Peter Summerill at MacLitigator, broke the news of that release here. With PDF Pen’s update to 4.0, the application gained Optical Character Recognition (OCR) functionality.

Our firm decided to purchase the PDF Pen Pro app. We opted for the Pro version because it allows you to create fillable forms – a nice feature that Apple’s Preview does not provide.

OCR

I have used the OCR functionality of PDF Pen several times since we purchased the application. This comes in very handy for me when answering Interrogatories, where, in Maryland, you must retype the questions when drafting your answers. Overall, I’d say it works better than I expected. It is not perfect. When copying and pasting the text, I find that I have to correct a lot of hard returns. I have yet to run into OCR software that is anywhere close to perfect. This may well be as good as anything presently on the market. Aside from some formatting issues (like the hard returns), I must say that I have been impressed with its accuracy at recognizing characters. Based on statistics that I just made up, I put it’s accuracy at about 99.95%.

The OCR engine used by PDF Pen is Tesseract, which is an old open source OCR project that was purchased by Google in the last year or so and turned into a Google Code project. According to Google:

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available.

(As a side note, I came across Tesseract a couple of years ago while searching for a free OCR solution for the Mac. I was not successful in finding a good end-user solution, but I did find Tesseract. I was intrigued by its existence, but by far lacked the know-how to download the source code and do anything useful with it – not that I didn’t try. I am happy to see that it has picked up some steam and is being put to good use by some folks who know what they’re doing.)

Bates Stamping

Two to three weeks after we purchased the application, we got an email notifying us that there was a free update available for PDF Pen. Their email said:

Hi ,

Just wanted to let you know that PDFpen 4.0.2 and PDFpenPro 4.0.2 have just been released. The update adds Bates numbering (popular for legal documents) to the AppleScript menu.

Aside from what is included in Adobe Acrobat Professional for the Mac, Bates stamping options have been limited for the Mac. I wrote about Bates stamping on a Mac several months ago, and concluded that the best option for a Mac user was actually to use a free Windows program called A-PDF Number.

I have used the Bates stamping functionality of PDF Pen several times since we purchased it.

As their email stated, the Bates stamping functionality is implemented via the AppleScript menu.

When you select the Bates numbering script, a window appears asking for your prefix.

Because it places the stamp in the same place on every page, there may be times where the Bates stamp obscures some important part of the document. Fortunately, it is possible to drag the stamp to a different location on any given page. Unfortunately, however, there is no way to relocate the Bates stamp en masse. This could mean a lot of manual dragging if you have a lot of pages that will be obscured by the stamp.

This is certainly a welcome enhancement to the functionality of the program, and we had decided to purchase it before it was even added. That said, while I’m grateful for its inclusion, it could be improved.

What could be improved?

Placement. It might be helpful to be able to specify from the outset a few parameters for where the stamp should go (e.g., bottom-left, bottom-middle, bottom-right; 0.25 inches from edge; etc.).

Specify Starting Number. In addition to specifying a prefix, it would be nice to be able to resume a series from where the last one left off. I know that you could simply specify a different prefix, but “I’m just sayin’ is all.”

All in all, I’m happy to have the application, and if I really get ambitious, maybe I’ll make it a part of my workflow to OCR every document that gets scanned into our office. Better yet, maybe I’ll train one of our wonderful assistants to do it for me!