PDF conversion with WebKit

It had been quite some years since I last looked at the options for PDF generation in PHP, so when I needed to add PDF support to Brandfeed I did a bit of research. I ended up on this Stackoverflow thread which overall seems to recommend TCPDF with some fairly strong supporters for other libraries, including mPDF.

 

I wasn’t looking forward to trying them all out to decide which library to use, but as it turns out I didn’t have to. When I discovered wkhtmltopdf, my decision was made. (I know that sounds like a cheesy marketing testimonial, bear with me)

I only needed to convert HTML documents to PDF. i.e. I didn’t need to programmatically draw vectors, or anything like that. That’s a pretty important prerequisite. When you consider how good HTML and CSS are for laying out a document, it makes you wonder why you’d do it any other way. The catch is that your PDF generator needs to render your HTML as well as a browser does.

wkhtmltopdf is built on WebKit which means it does exactly that – it renders the PDF as well as a browser, because it basically is a browser. It even runs the JavaScript. The fact that other libraries have their own [possibly bespoke] rendering engines with huge limitations seems pretty crazy once you’ve seen wkhtmltopdf in action.

Anyway, I thought I’d share a few things that I encountered while getting this integrated to my PHP application.

PHP Bindings

Despite my initial excitement upon discovering  php-wkhtmltox (PHP extension for wkhtmltopdf and wkhtmltoimage), I ended up not using this extension. My main reasons were as follows:

  • Rather opaque configuration options
  • Clumsy when using with stdin and stdout

I decided to execute the wkhtmltopdf binary via the shell and use PHPs procopen and related functions to pipe in my HTML and grab my PDF without faffing around with temporary files. I’m unsure of the comparative overheads in using the shell versus using a PHP extension, but it works very well regardless.

Compiling from source

Don’t.

I’m not an expert sysadmin but I’ve compiled plenty of software on the Linux and Mac command lines and this did not go well. It requires first compiling the open source version of Quicktime, which is almost 1GB of source code and was building for over an hour before I hit Ctrl-C on my Mac. It failed first time on my Linux machines… basically I gave up.

Static binaries are available for both Linux and Mac, and they worked fine for me. I recommend saving yourself the pain and just using them, even if they’re not available for the latest version of wkhtmltopdf. (The Mac binary is a couple of versions old).

Fonts

On my Linux servers, after installing a whole bunch of X11 stuff I probably didn’t need and definitely didn’t understand, I discovered via the comments here that I needed to install the urw-fonts package. If you don’t have the fonts installed correctly you get black squares instead of glyphs.

JavaScript errors

wkhtmltopdf is WebKit, which means it executes JavaScript too. I was using the same HTML template to render PDFs as our ‘print’ page. This page calls window.print() to invoke the printer dialogue. As it turns out (and it took me some head-scratching) wkhtmlpdf dies when you call this function, and it does so without any decent explanation of the error.

Rather than muck around with my PHP template, I just passed --disable-javascript to wkhtmltopdf, and everything was fine.