Begin documenting the new PDF interpreter

Document its existence! Add notes about NEWPDF Document those PostScript functions that we (or I) have promised to maintain. Still to do: document the new PostScript operators for the new interpreter, come up with some example programs
author: Ken Sharp <ken.sharp@artifex.com> 2021-08-17 15:51:33 +0100
committer: Ken Sharp <ken.sharp@artifex.com> 2021-08-18 08:12:26 +0100
commit: 319a890c0c7f3d79d1ed59970702e7fd9b44cf5e (patch)
tree: ab736357480c2e64f890db027f1a993266cac594 /doc
parent: 698ad8098d5aaafb65d52f14ea887aa4644b1348 (diff)
download: ghostpdl-319a890c0c7f3d79d1ed59970702e7fd9b44cf5e.tar.gz
3 files changed, 187 insertions, 12 deletions
diff --git a/doc/Language.htm b/doc/Language.htm
index 1922893e5..ce60d84b9 100644
--- a/doc/Language.htm
+++ b/doc/Language.htm
@@ -102,6 +102,7 @@
 <li><a href="#GlyphNames2Unicode">GlyphNames2Unicode</a></li>
 <li><a href="#MultipleResourceDirectories">Multiple Resource directories</a></li>
 </ul>
+<li><a href="#PDF_scripting">Scripting the PDF interpreter</a></li>
 </ul></blockquote>
 
 <!-- [1.2 end table of contents] =========================================== -->
@@ -2044,6 +2045,159 @@ specifying an absolute path. The default value for
 a default invocation with a PostScript installer
 will install resource files into <code>/gs/Resource</code>.</p>
 
+<h2><a name="PDF_scripting"></a>Scripting the PDF interpreter</h2>
+
+<p>We have not previously documented the internals of the Ghostscript PDF interpreter, but we have, on
+occasion, provided solutions that rely upon scripting the interpreter from PostScript. This was
+possible because the interpreter was written in PostScript.</p>
+
+<p>From release 9.55.0 Ghostscript comes supplied with two PDF interpreters, the original written in PostScript
+and a brand-new interpreter written in C. While the new interpreter can be run as part of the GhostPDL family
+it has also been integrated into Ghostscript, and can be run from the PostScript environment in a similar fashion
+to the old interpreter. Eventually we plan to drop the old interpreter and carry on with the new one.</p>
+
+<p>Because we have supplied solutions in the past based on the old interpreter, we have had to implement
+the same capabilities in the integration of the new interpreter. Since this has meant discovering which internal
+portions were being used, working out how those function, and duplicating them anew, it seemed a good time to
+document these officially, so that in future the functionality would be available to all.</p>
+
+<p>The following functions existed in the original PDF interpreter and have been replicated for the new
+interpreter. It should be possible to use these for the forseeable future.</p>
+
+<dt><code>&ltfile&gt runpdf - </code></dt>
+<dd>     Called from the modified PostScript run operator (which copies stdin to a temp
+     file if required). Checks for PDF collections, processes all requested pages.</dd>
+
+<p><dt><code>&ltfile&gt runpdfbegin -</code></dt>
+<dd>     This must be called before performing any further operations. Its exact action depends on which
+interpreter is being used, but it essentially sets up the environment to process the file as a PDF</dd></p>
+
+<p><dt><code>&ltint&gt pdfgetpage &ltpagedict&gt | &ltnull&gt</code></dt>
+<dd>     int is a number from 1 to N indicating the desired page number from
+     the PDF file. Returns the a dictionary containing various informational key/value pairs.
+     If this fails, returns a null object.</dd></p>
+
+<p><dt><code> - pdfshowpage_init -</code></dt>
+<dd>     In the PostScript PDF interpreter This simply adds 1 to the /DSCPageCount value in a dictionary.
+It has no effect in the new PDF interpreter but is maintained for backwards compatibility.</dd></p>
+
+<p><dt><code>&ltpagedict&gt pdfshowpage_setpage &ltpagedict&gt</code></dt>
+<dd>     Takes a dictionary as returned from pdfgetpage, extracts various
+     parameters from it, and sets the media size for the page, taking into
+     account the boxes, and requested Box, Rotate value and PDFFitPage.</dd></p>
+
+<p><dt><code>&ltpagedict&gt pdfshowpage_finish -</code></dt>
+<dd>     Takes a dictionary as returned from pdfgetpage, renders the page content
+     executes showpage to transfer the rendered content to the device.</dd></p>
+
+<p><dt><code>- runpdfend        -</code></dt>
+<dd>     Terminates the PDF processing, executes restore and various cleanup activities.</dd></p>
+
+<p><dt><code>&ltfile&gt pdfopen &ltdict&gt</code></dt>
+<dd>     Open a PDF file and read the header, trailer
+     and cross-reference.</dd></p>
+
+<p><dt><code>&ltdict&gt pdfclose -</code></dt>
+<dd>     Terminates processing the original PDF file object. The dictionary parameter
+ should be the one returned from pdfopen</dd></p>
+
+<p><dt><code>&ltpagedict&gt pdfshowpage -</code></dt>
+<dd>     Takes a dictionary returned from pdfgetpage and calls the pdfshowpage_init
+     pdfshowpage_setpage, pdfshowpage_finish trio to start the page, set up the
+     media and render the page.</dd></p>
+
+<p><dt><code>&ltint&gt &ltint&gt dopdfpages -</code></dt>
+<dd>     The integers are the first and last pages to be run from the file. Runs a loop from
+     the fist integer to the last. NOTE! If the current dictionary contains a PDFPageList
+     array then we 'get' the entry from the array corresponding to the current loop
+     index, and use that to determine whether we should draw the page. Otherwise we
+     simply draw the page. Uses pdfshowpage to actually render the page.</dd></p>
+
+<p><dt><code>- runpdfpagerange &ltint&gt &ltint&gt</code></dt>
+<dd>     Processes the PostScript /FirstPage, /LastPage and /PageList parameters to build an array
+     of page numbers to run (if PageList is present) and a FirstPage and LastPage value.</dd></p>
+
+<p>Normal operation simply calls runpdf with an opened-for-read PostScript file object. The table below shows the normal
+calling sequence</p>
+
+<blockquote><table>
+<tr valign="bottom">
+    <th align="left">Function</th>
+    <th>&nbsp;&nbsp;</th>
+    <th align="left">Calls</th>
+    <th>&nbsp;&nbsp;</th>
+    <th align="left">Calls</th>
+    <th>&nbsp;&nbsp;</th>
+    <th align="left">Calls</th></tr>
+<tr valign="top">
+    <td>runpdf</td>
+    <td>&nbsp;</td>
+    <td>runpdfbegin</td>
+    <td>&nbsp;</td>
+    <td>pdfopen</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td></tr>
+<tr valign="top">
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>process_trailer_attrs</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td></tr>
+<tr valign="top">
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>runpdfpagerange</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td></tr>
+<tr valign="top">
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>dopdfpages</td>
+    <td>&nbsp;</td>
+    <td>pdfgetpage</td>
+    <td>&nbsp;</td></tr>
+<tr valign="top">
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>pdfshowpage</td>
+    <td>&nbsp;</td>
+    <td>pdfshowpage_init</td></tr>
+<tr valign="top">
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>pdfshowpage_setpage</td></tr>
+<tr valign="top">
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>pdfshowpage_finish</td></tr>
+<tr valign="top">
+    <td>&nbsp;</td>
+    <td>&nbsp;</td>
+    <td>runpdfend</td>
+    <td>&nbsp;</td>
+    <td>pdfclose</td>
+    <td>&nbsp;</td></tr>
+</table></blockquote>
+
+<p>It is important to get the number of spots and the presence of transparency correct when
+rendering. Failure to do so will lead to odd output, and potentially crahses. This can be important in situations
+such as N-up ordering.</p>
+<p>As an example, if we have 2 A4 pages and want to render them side-by-side on A3 media, we might set up
+the media size to A3, draw the first page contents, translate the origin, draw the second page contents
+and then render the final content. If the first PDF page did not contain transparency, but the second did, it
+would be necessary to set /PageHasTransparency before drawing the first PDF page.</p>
 <!-- [2.0 end contents] ==================================================== -->
 
 <!-- [3.0 begin visible trailer] =========================================== -->
diff --git a/doc/Use.htm b/doc/Use.htm
index 869575b91..777610bf9 100644
--- a/doc/Use.htm
+++ b/doc/Use.htm
@@ -619,6 +619,20 @@ Ghostscript is normally built to interpret both PostScript and PDF files, examin
 <p>Here are some command line options specific to PDF</p>
 
  <dl>
+<dt><code>-dNEWPDF</code></dt>
+<dd>From release 9.55.0 Ghostscript incorporates two complete PDF interpreters; the original
+long-standing interpreter is written in PostScript but there is now a new interpreter written
+in C.
+<p>At present the old PostScript-based interpreter remains the default, in future releases the
+new C-based interpreter will become the default, though we would encourage people to experiment
+with the new interpreter and send us feedback. While there are two interpreters the command-line
+switch NEWPDF will allow selection of the existing interpreter when false and the new interpreter
+when true.</p>
+
+</dd>
+</dl>
+
+ <dl>
 <dt><code>-dPDFFitPage</code></dt>
 <dd>Rather than selecting a PageSize given by the PDF MediaBox, BleedBox (see -dUseBleedBox),
 TrimBox (see -dUseTrimBox), ArtBox (see -dUseArtBox), or CropBox (see -dUseCropBox),
diff --git a/doc/WhatIsGS.htm b/doc/WhatIsGS.htm
index ed897ad2a..046ca51c1 100644
--- a/doc/WhatIsGS.htm
+++ b/doc/WhatIsGS.htm
@@ -77,27 +77,34 @@ There are various products in the Ghostscript family; this document describes wh
 
 <!-- [2.0 begin contents] ================================================== -->
 
-<h2><a name="Ghostscript/GhostPDF"></a>Ghostscript/GhostPDF</h2>
+<h2><a name="Ghostscript"></a>Ghostscript</h2>
 
 <p>Ghostscript is an interpreter for PostScript<a href="#foot1">&#174;</a> and Portable Document Format (PDF) files.</p>
 
 <p>Ghostscript consists of a PostScript interpreter layer, and a graphics library. The graphics library is shared with all the other products in the Ghostscript family, so all of these technologies are sometimes referred to as Ghostscript, rather than the more correct GhostPDL.</p>
 
-<p>GhostPDF is an interpreter built on top of Ghostscript to handle PDF files.
-Currently GhostPDF relies on extensions to the PostScript language/imaging model,
-and so cannot be used independently of the Ghostscript PostScript interpreter
-component. As such GhostPDF is an umbrella term used to refer to both these
-extensions and the interpreter code.</p>
-
-<p>Many people (including the authors) frequently just refer to Ghostscript
-as supporting PDF and only specifically mention GhostPDF when wanting to make
-the distinction between the PostScript and PDF support.</p>
-
-<p>Binaries for Ghostscript and GhostPDF (included in the Ghostscript binaries) for various systems can be downloaded from <a href="http://www.ghostscript.com/download">here</a>.
+<p>Binaries for Ghostscript and (seel below) GhostPDF (included in the Ghostscript binaries) for various systems can be downloaded from <a href="http://www.ghostscript.com/download">here</a>.
 The source can be found in both the Ghostscript and GhostPDL downloads from
 the same site.</p>
 <hr>
 
+<h2><a name="GhostPDF"></a>GhostPDF</h2>
+
+<p>Prior to release 9.55.0 GhostPDF was an interpreter for the PDF page description language
+built on top of Ghostscript, and written in the PostScript programming language. From 9.55.0
+onwards there is a new GhostPDF executable, separate from Ghostscript and written in C
+rather than PostScript.</p>
+
+<p>This new interpreter has also been integrated into Ghostscript itself, in order to
+preserve the PDF functionality of that interpreter. For now, the old PostScript-based
+interpreter remains the default, but the new interpreter is built-in alongside it.</p>
+
+<p>The intention is that the new interpreter will replace the old one, which will be withdrawn.</p>
+
+<p>It is possible to control which interpreter is used with the NEWPDF command-line switch. When
+this is false (the current default) the old PostScript-based interpreter is used, when NEWPDF
+is true then the new C-based interpreter is used.</p>
+
 <h2><a name="GhostPDL"></a>GhostPDL</h2>
 
 <p>Historically, we’ve used GhostPDL as an umbrella term to encompass our entire line of products. We've now brought all these disparate products together into a single package, called, appropriately enough, GhostPDL.</p>
author	Ken Sharp <ken.sharp@artifex.com>	2021-08-17 15:51:33 +0100
committer	Ken Sharp <ken.sharp@artifex.com>	2021-08-18 08:12:26 +0100
commit	319a890c0c7f3d79d1ed59970702e7fd9b44cf5e (patch)
tree	ab736357480c2e64f890db027f1a993266cac594 /doc
parent	698ad8098d5aaafb65d52f14ea887aa4644b1348 (diff)
download	ghostpdl-319a890c0c7f3d79d1ed59970702e7fd9b44cf5e.tar.gz