Splitting Pages of a PDF in Python

For the past month, I’ve programmed in nothing but Matlab. Kinda sad, because I don’t even care for Matlab. I prefer Python.

I was going through a store called Tuesday Morning to look at the stuff for sale. It’s a bit of a junk store. They had a DVD for sale containing every Fantastic Four comic book on PDF from 1961 to 2004 for only $15. Being a comic fan, I had to get it.

Each issue is scanned into PDF format. Each 2 pages of the paper issue is combined into 1 page of the PDF. I didn’t like this, so I decided to write a Python script to cut each page of the PDF down the middle into two pages and stitch the document back together. This means that 1 page of a comic book equals 1 page of PDF.

I use is the handy pyPdf library. It’s doing all of the magic in this script. To execute this script:

$ python splitPages.py InputDocument.pdf OutputDocument.pdf

(Since the comics are intellectual property owned by Marvel, I’m going to not post screen shots.)

from pyPdf import PdfFileWriter, PdfFileReader
import sys

print "Reading", sys.argv[1]

output = PdfFileWriter()

left = PdfFileReader(file(sys.argv[1], "rb"))
right = PdfFileReader(file(sys.argv[1], "rb"))

left.decrypt('')
right.decrypt('')

pages = left.getNumPages()

for i in range(0, pages):
# Grab the left page
p = left.getPage(i)
p.mediaBox.upperRight = (
p.mediaBox.getUpperRight_x() / 2,
p.mediaBox.getUpperRight_y()
)
output.addPage(p)

# Grab the right page
p = right.getPage(i)
p.mediaBox.upperLeft = (
p.mediaBox.getUpperRight_x() / 2,
p.mediaBox.getUpperRight_y()
)
output.addPage(p)

print "Writing", sys.argv[2]

outputStream = file(sys.argv[2], "wb")
output.write(outputStream)
outputStream.close()
  1. jcchurch posted this