diff options
author | Ken Sharp <ken.sharp@artifex.com> | 2022-05-08 15:13:16 +0100 |
---|---|---|
committer | Ken Sharp <ken.sharp@artifex.com> | 2022-05-10 11:27:36 +0100 |
commit | 398bfc844bde6e2b2a4f6552ce326ad619471316 (patch) | |
tree | 88a4a82e5f1b00ac5654186c16d7c0afe1058d3f /doc | |
parent | bdc105a686f0c8fa1e29312302091685d27a9464 (diff) | |
download | ghostpdl-398bfc844bde6e2b2a4f6552ce326ad619471316.tar.gz |
GhostPDF - revamp PDF information extraction
A customer requested that we make pdf_info.ps work with the new
PDF interpreter, and generate the same information.
This commit modifies the way we extract information on a
page-by-page basis to potentially include the names of spot inks
and information about fonts used on the page.
This is now returned to the PostScript environment using a PDF
dictionary instead of a C structure. The pdf_info.ps program has
been updated so that it use the new information in broadly the
same way as the information from the old PDF interpreter.
There are differences; pdf_info.ps extracts font information
itself, rather than having the interpreter do it. This is not
possible with the new interpreter which is why we have the
PDF interpreter do it for us. In addition the pdf_info.ps
program only descended to the page level whereas the new PDF
interpreter evaluates all objects on the page, potentially
meaning that more fonts (and technically spot inks) might be
detected.
We now have an additional PostScript operator '.PDFPageInfoExt'
which returns 'extended' information about a page. This is the
same as .PDFPageInfo but includes the font and spot ink
information.
Running with -dPDFINFO using either Ghostscript or GhostPDF will
print more information than before, including the spot inks and
considerably more information about fonts than the pdf_info.ps
program emits, including embedding status, descendant fonts
(and their membedding status) and the presence of ToUnicode
CMaps.
Updated documentation for all of the above.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/Language.htm | 22 | ||||
-rw-r--r-- | doc/Use.htm | 21 |
2 files changed, 39 insertions, 4 deletions
diff --git a/doc/Language.htm b/doc/Language.htm index 6ed12d3a3..44c569d96 100644 --- a/doc/Language.htm +++ b/doc/Language.htm @@ -2219,14 +2219,13 @@ This function needs to write any required output intents, load and send Outlines and Keywords from the Info dict to the output device, copy Optional Content Properties (OCProperties) to the output device. If an AcroForm is present send all its fields and link widget annotations to fields, and finally copy the PageLabels. If we add support for anything else, it will be here too.. </dd><dt><code>PDFcontext int .PDFPageInfo -</code></dt> -<dd> The integer argument is the page number to retrieve information for. +<dd> The integer argument is the page number to retrieve information for. This value starts from zero for the first page. Returns a dictionary with the following key/value pairs: <blockquote> <code>/UsesTransparency</code> true|false<br> - <code>/SpotColours</code> array of names, may be empty|<br> + <code>/NumSpots</code> integer containing the number of spot inks on this page<br> <code>/MediaBox</code> [llx lly urx ury]<br> <code>/HasAnnots</code> true|false<br> - <code>/FontsUsed</code> array of names, may be empty.<br> </blockquote> May also contain (if they are present in the Page dictionary) <blockquote> @@ -2235,6 +2234,23 @@ May also contain (if they are present in the Page dictionary) <code>/BleedBox</code> [llx lly urx ury]<br> <code>/TrimBox</code> [llx lly urx ury]<br> <code>/UserUnit</code> int<br> + <code>/Rotate</code> number<br> +</blockquote> +</dd> +</dd><dt><code>PDFcontext int .PDFPageInfoExt -</code></dt> +<dd> As per .PDFPageInfo above but returns 'Extended' information. This consists of two additional arrays in the returned dictionary: +<blockquote> + <code>/Spots</code> array of names, may be empty<br> + <code>/Fonts</code> array of dictionaries, one dictionary per font used on the page. +</blockquote> +Each font dictionary contains +<blockquote> + <code>/BaseFont</code> string containing the name of the font.<br> + <code>/Subtype</code> string containing the type of the font, as per the PDF Reference.<br> + <code>/ObjectNum</code> If present, the object number of the font in the file (fonts may be defined inline and have no object number).<br> + <code>/Embedded</code> boolean indicating if the font's FontDescriptor includes a FontFile and is therefore embedded.<br> + Type 0 fonts also contain <br> + <code>/Descendants</code> An array containing a single font dictionary, contents as above.<br> </blockquote> </dd> <dt><code>PDFcontext int .PDFDrawPage -</code></dt> diff --git a/doc/Use.htm b/doc/Use.htm index 263bc5cef..33e4f679b 100644 --- a/doc/Use.htm +++ b/doc/Use.htm @@ -630,7 +630,26 @@ when true.</p> </dd> </dl> - <dl> + <d1> +<dt><code>-dPDFINFO</code></dt> +<dd>Starting with release 9.56.0 this new switch will work with the PDF interpreter (GhostPDF) and with +the PDF interpreter integrated into Ghostscript. When this switch is set the interpreter will emit +information regarding the file, similar to that produced by the old pdf_info.ps program in the 'lib' +folder. +<p> +The format is not entirely the same, and the search for fonts and spot colours is 'deeper' than the +old program; pdf_info.ps stops at the page level whereas the PDFINFO switch will descend into objects +such as Forms, Images, type 3 fonts and Patterns. In addition different instances of fonts with the +same name are now enumerated. +</p> +<p> +Unlike the pdf_info.ps program there is no need to add the input file to the list of permitted files +for reading (using --permit-file-read). +</p> +</dd> +</d1> + +<dl> <dt><code>-dPDFFitPage</code></dt> <dd>Rather than selecting a PageSize given by the PDF MediaBox, BleedBox (see -dUseBleedBox), TrimBox (see -dUseTrimBox), ArtBox (see -dUseArtBox), or CropBox (see -dUseCropBox), |