General
- PDF is a completely different technology than for example HTML. Thus to produce quality PDF documents, use LaTeX or other applications. HTML is not well suited for PDF creation.
- app-text/mbtpdfasm allows you to extract parts of a PDF document like this:
mbtPdfAsm -morigin-file.pdf -p"7;8;9;10" -ddestination-file.pdf
- app-text/pdftk (PDF Toolkit) also looks interesting for these kinds of manipulations.
Converting from (X)HTML + CSS
- Converting an HTML document to a PDF one is not easy. HTML should not be chosen as a view technology if you need to output to a PDF or print your document. However, various libraries still exist to convert an HTML document to PDF.
- Be sure to choose a large font size in your HTML document. I set a base font-size on the body equal to 20px. Use relative font-sizes in your XHTML document (eg, font-size: 0.85em;).
HTML2PDF
- This PHP library (coming from a Perl port) is excellent. It can convert remarkably well XHTML documents with CSS properties. However, it has the following caveats:
- Make sure your CSS is perfectly valid. Sometimes you will need to specify a clear property explicitely. With a bit of effort, your output document will be very close to the original Firefox rendering.
- Be careful with images. html2pdf does not support transparency. You should also use high quality (very large) images in your HTML document, giving its actual size via the CSS properties.
- You can use either the FPDF or PDFLIB libraries to output to PDF. I tried both but did not see any noticeable differences (although the pdflib package must be installed separately in Gentoo).
- Home page for this project. Several other projects exist in PHP but I don't know if they handle CSS 2.1 correctly.