The challenge
Getting a HTML out of a printer for book-production, for example, is quite a challenge: Getting the page-breaks right, getting the CSS right and so on and so on.
CSS standards
Luckily, CSS 2.1 supports Print-media-specific standards for this (for example: http://www.westciv.com/style_master/academy/css_tutorial/advanced/printing.html), Sadly, this is not completely, or improperly supported, by most HTML clients we have at this moment. (see http://www.webdevout.net/browser-support-css#css2propsprint for a nice overview of support per browser, sadly not updated for Firefox version 4 yet, which DOES support some of it)
First: An example of what to include in your CSS for printing:
<style type="text/css"> @media print { .alwaysbreak { page-break-before: always; } .nobreak { page-break-inside: avoid; } @page :left { margin: 1cm; margin-right: 2cm; } @page :right { margin: 1cm; margin-left: 2cm; } } </style>
As you can see, above I tried to make each left page of the book have a wider right-margin, each right page of the book have a wider left-margin. Ideal for double-sided prints for books.
I also declared a class ‘nobreak’ that will make sure any contents of any <table> or <div> element using it, will no be broken up for a page break, but instead moved to the next page. Then there is the ‘alwaysbreak’ that does almost the opposite: It makes sure the inner contentst are always started at a new page. Both could be used like this:
<div class='alwaysbreak'> <h1>Always at a new page</h1> </div> <div class='nobreak'> <img src='mypicture.jpg' /> <p>Caption, should never be broken from the image by a page-break</p> </div>
Client support
You can imagine other classes too, like ‘neverbreakafter’ that will try to make sure your contents are not broken up in an ugly manner.
While preparing my children’s ‘first years’ book, I tried several clients to render my HTML to a proper PDF (also see this outdated article http://stackoverflow.com/questions/117772/which-browsers-support-page-break-manipulation-using-css-and-the-page-break-insid):
- Internet Explorer: Only respects ‘page-break-after’ and ‘page-break-before’. And not in a very reliable way.
- Chrome: Older versions accepted some of the standard, newer versions have broken it. iow: Not reliable
- Firefox: Since version 4, which was JUST released, it finally supports all page-break-* tags in a nice way, but not the ‘widows/orphans’ and page-left / page-right margins, and rendering is flaky, to say the least.
- Prince 2: http://www.princexml.com/ It promised to be the best, *does* support everything, but, with the latest beta 8, just renders my images-filled tables, the most important part of my books, completely wrong. Even worse: It renders the last column in the outer margins and even bleeding outside the page! Also, Prince is VERY ‘restrictive’ on the accepted incoming HTML, as it should be completely XHTML compliant. In practice, almost no webpage is *completely* xhtml compliant. For example: http://www.google.nl is not even accepted and it keeps complaining about ‘ ’ entities not being accepted. Ofcourse, this last error is fixable, but c’mon!!!! Just not practical and too much a ‘let the user sort it out’ approach. And then they ask money to NOT have a logo on your first page. 😦 On a sidenote, JavaScript support is ‘beta’, not supporting ‘innerHTML’, so jQuery
- Opera: Tested with version 11, the latest, it supports everything!!!!!!!!!!!!!!!!!! One big problem with Opera: When rendering images to a printer, it uses the resolution of the screen-version of the images. Say you have an image of 1000×1000 pixels, and you include it as an IMG element with width=’250px’ height=’250px’, then it will just render it to the printer too as 250x250px. Whereas IE, Chrome, Firefox would just render it as a higher-density image! So, Opera is a no-go too. (And no, Print To File did not solve this either)
- WKHTML2PDF: http://code.google.com/p/wkhtmltopdf/ IT DOES EVERYTHING! WOW! After one day of tweaking my HTML for almost every browser out there, THIS command-line tool just does it all. Be sure to get the ‘statically linked’ version, on my Windows test system if worked flawlessly. Supports higher density images for print, page-break-inside, page-break-after etc., correct header and footer, title page, generation of TOC. Wow!