Latest revision as of 11:38, 26 July 2024

General

PDF is a completely different technology than for example HTML. Thus to produce quality PDF documents, use LaTeX or other applications. HTML is not well suited for PDF creation.

Tools for working with PDF files

Current tools

Xournal++ (xournalpp in Portage) is an excellent tool to fill / add text and images to a PDF document.

pdfarranger (in Portage) is a good tool to extract or rearrange pages of a PDF. It has a GUI.

qpdf can be used for concatenation. For instance:

qpdf --pages input-1.pdf 1-3 -- input-1.pdf output.pdf
qpdf --pages input-1.pdf 1-3 input-2.pdf 6-12 -- input-1.pdf output.pdf

pdfimages used with the -i option allows to extract images from a PDF file. You can thus check if sizes seem reasonable. pdfimages is a program part of the poppler packages.

You can open a (single page) PDF with Gimp and convert it easily to a PNG. To convert several images (PNGs, JPEGs) to PDF use imagemagick:

convert pages-*.jpg output.pdf

Obsolete packages

app-text/pdftk (PDF Toolkit) is great for extraction, concatenation and other manipulations. For instance to concatenate two PDFs:

pdftk in1.pdf in2.pdf cat output out1.pdf

To extract a single page:

pdftk A=sample.pdf cat A18 output extract.pdf

To extract several pages:

pdftk A=sample.pdf cat A3-8 output extract.pdf

app-text/mbtpdfasm allows you to extract parts of a PDF document:

mbtPdfAsm -morigin-file.pdf -p"7;8;9;10" -ddestination-file.pdf

However there is not any stable version available on the main Portage tree.

Converting from (X)HTML + CSS

Converting an HTML document to a PDF one is not easy. HTML should not be chosen as a view technology if you need to output to a PDF or print your document. However, various libraries still exist to convert an HTML document to PDF.

Be sure to choose a large font size in your HTML document. I set a base font-size on the body equal to 20px. Use relative font-sizes in your XHTML document (eg, font-size: 0.85em;).

HTML2PDF

This PHP library (coming from a Perl port) is quite good, although it seems development has stopped. It can convert remarkably well XHTML documents with CSS properties. However, it has the following caveats:
- Make sure your CSS is perfectly valid. Sometimes you will need to specify a clear property explicitely. With a bit of effort, your output document will be very close to the original Firefox rendering.
- You can use either the FPDF or PDFLIB libraries to output to PDF. I tried both but did not see any noticeable differences (although the pdflib package must be installed separately in Gentoo).
Home page for this project. Several other projects exist in PHP but I don't know if they handle CSS 2.1 correctly.

Current problems

html2pdf does not yet support (PNG) transparency. You should also use high quality (very large) images in your HTML document, giving its actual size via the CSS properties.
CSS properties with !important set do not seem to work.
Position: absolute does not work well (top: 0px and bottom: 0px not supported).

Additional configuration

In html2ps.config, you can define additional media (and then use this media type via the media parameter). This allows to define custom height and width for the generated PDF.

Pisa

Pisa is a Python library. At this stage, it seems to be lacking a lot of CSS 2.1 properties. To keep an eye on though.

Flying Saucer

Flying Saucer is an open-source Java XHTML + CSS to PDF renderer. It seems to be very good already (supports transparent PNG images), and development is active. This is probably one of the best free solution out there, and its progress should be watched regularly.
However it currently (R8) has problems with CSS properties (top: 0px and bottom: 0px not supported).

Prince

The best converter available today. Really impressive: did not find a single problem so far. But it's not open-source.

@@ Line 5: / Line 5: @@
 = Tools for working with PDF files =
-* app-text/mbtpdfasm allows you to extract parts of a PDF document like this:
+== Current tools ==
-  mbtPdfAsm -morigin-file.pdf -p"7;8;9;10" -ddestination-file.pdf
+* Xournal++ (xournalpp in Portage) is an excellent tool to fill / add text and images to a PDF document.
+* pdfarranger (in Portage) is a good tool to extract or rearrange pages of a PDF. It has a GUI.
+* qpdf can be used for concatenation. For instance:
+  qpdf --pages input-1.pdf 1-3 -- input-1.pdf output.pdf
+ qpdf --pages input-1.pdf 1-3 input-2.pdf 6-12 -- input-1.pdf output.pdf
+* pdfimages used with the -i option allows to extract images from a PDF file. You can thus check if sizes seem reasonable. pdfimages is a program part of the poppler packages.
+* You can open a (single page) PDF with Gimp and convert it easily to a PNG. To convert several images (PNGs, JPEGs) to PDF use imagemagick:
+ convert pages-*.jpg output.pdf
+== Obsolete packages ==
-* app-text/pdftk (PDF Toolkit) also looks interesting for these kinds of manipulations. For instance to concatenate two PDFs:
+* app-text/pdftk (PDF Toolkit) is great for extraction, concatenation and other manipulations. For instance to concatenate two PDFs:
   pdftk in1.pdf in2.pdf cat output out1.pdf
-* pdfimages used with the -i option allows to extract images from a PDF file. You can thus check if sizes seem reasonable. pdfimages is a program part of the poppler packages.
+To extract a single page:
+ pdftk A=sample.pdf cat A18 output extract.pdf
+To extract several pages:
+ pdftk A=sample.pdf cat A3-8 output extract.pdf
+* app-text/mbtpdfasm allows you to extract parts of a PDF document:
+ mbtPdfAsm -morigin-file.pdf -p"7;8;9;10" -ddestination-file.pdf
+However there is not any stable version available on the main Portage tree.
 = Converting from (X)HTML + CSS =

PDF: Difference between revisions

Latest revision as of 11:38, 26 July 2024

Contents

General

Tools for working with PDF files

Current tools

Obsolete packages

Converting from (X)HTML + CSS

HTML2PDF

Current problems

Additional configuration

Pisa

Flying Saucer

Prince

Navigation menu

PDF: Difference between revisions

Latest revision as of 11:38, 26 July 2024

General

Tools for working with PDF files

Current tools

Obsolete packages

Converting from (X)HTML + CSS

HTML2PDF

Current problems

Additional configuration

Pisa

Flying Saucer

Prince

Navigation menu

Search