Monday, December 13, 2010

Extract images from PDF files using 'pdfimages'

Once in a while, I need to extract an image (or a portion thereof) from a PDF file. Usually, I open the PDF file using 'preview' or Adobe 'Acrobat Reader', and take a screenshot which is then cropped to the desired slice. This manual method "works", albeit a bit tedious.

Recently, I came across the handy command-line utility program 'pdfimages' which allows for automatic extraction of all images from a PDF file. The basic usage is simply:
pdfimages   yourfile.pdf   prefix
This will extract all images contained in yourfile.pdf to files prefix-001.ppm, prefix-002.ppm etc. With option "-j", images in DCT format are saved as JPEG files.