diff options
author | Ken Sharp <ken.sharp@artifex.com> | 2022-03-29 09:39:43 +0100 |
---|---|---|
committer | Ken Sharp <ken.sharp@artifex.com> | 2022-04-02 16:02:55 +0100 |
commit | 984b6bcba88a35a1a705480ad6b7cd14f60f9661 (patch) | |
tree | c0907e81ac16dc5b85a4c7641d6079607de70aa6 /pdf/pdf_pattern.c | |
parent | 626d5d3d7c7f7380af6fb3c7b6ee240ccdc1a213 (diff) | |
download | ghostpdl-984b6bcba88a35a1a705480ad6b7cd14f60f9661.tar.gz |
GhostPDF - improve patttern performance
The graphics library and old PDF interpreter are written, in effect, to
be unable to reuse patterns. This is because in PostScript it is usually
impossible to determine that a pattern has been reused since it has no
unique identifier (XUID is rare and not supposed to be used in general
PostScript)
For PDF files, however, we know from the object ID if a Pattern is
the same as a previous invocation. In general there is little to be
gained from this but some files do make extensive use of Patterns, and
do so by switching between Patterns (or sometimes even the **same**
Pattern) frequently.
A case in point is the file attached to bug #704236. Uncompressed this
file is some 66MB and almost all of that consists of 'define a rectangle
define a Patttern, fill the rectangle with the Pattern'. The upshot is
that we end up running the Pattern PaintProc every time the Pattern is
defined, in this file that is well in excess of 10,000 instances.
The graphics library does have a cache for pattern 'tiles', which are
the rendered result of a pattern. This is because in PostScript it is
possible to define a pattern space, draw something, do a gsave, change
colour, draw something, then do a grestore back to the Pattern. In that
case we can consult the tile cache and avoid having to rerun the
pattern PaintProc (which may not even be possible).
For GhostPDF if we can identify that a cached tile matches a pattern
invocation, we can use that tile instead of rerunning the PaintProc,
because we *know* its the same pattern, which should improve the
performance.
This commit does that by, basically, some hackery. Having created the
pattern instance, we overwrite the assigned ID with the object number.
Later, when the pattern is used, we consult the tile cache using the
pattern instance ID to see if we already have a cached tile. If we have
already rendered this pattern the we will find the tile and can reuse it.
There is a small tweak. It is possible to reuse the same Pattern, but
with a different CTM. Because the tile cache is a bitmap resulting from
rendering the pattern it will be different if the CTM differs. So we
need to account not only for the object ID, but also the CTM at the
time the pattern was rendered. So we use a quick and dirty hash of the
CTM as part of the ID. If we reuse the same Pattern with a different
CTM it will get a different ID.
This means we don't use an incorrect tile, and can cache the same
pattern with different CTMs applied to it.
Further changes required when running under Ghostscript:
When running under Ghostscript we can't free the pattern cache at the
end of the job, as the graphics state will be pointing to the PDF
interpreter gstate, not the PostScript one.
In addition there are implications for the garbage collector if we
return to the PostScript ineterpreter with any pattern tiles in the
cache pointing at non-GC'ed objects.
The pattern tile can end up with a clist pattern accumulator device
stored in it, and that device can include a pointer to the pattern
instance. Note that the device does not increment the reference count
when it takes a reference. This appears to be a bug, but altering it
causes crashes.
In addition the pattern instance holds a reference to 'saved' graphics
states which do not appear to be true copies, and not the result of
gsave and grestore. Attempting to fix the clist device caused
additional problems with crashes due to the saved gstates not being
valide any more. Trying to fix the gstates caused different crashes...
We can avoid that happening by flushing the pattern tile cache at the
end of every page. This may cost us a little of the benefit from not
re-running Pattern definitions but it shouldn't be much.
This means we won't have to worry about the garbage collector running
while any object might be pointing at something which is not in GC'ed
memory.
As an (pathological) example, the file from bug 704236, with the new
PDF interpreter originally took 36 minutes 27 seconds to render to ppm
at 200 dpi. With this code it renders in 1 minute 45 seconds.
By using a huge MaxBitmap (3g) the speed of the old code improves to
2 minutes 11 seconds. The new code with the same MaxBitmap improves to
8 seconds.
For a comparison the 9.55.0 release takes ~35 minutes with the
simple invocation and 2 minutes 14 seconds with a MaxBitmap of 3g.
The original complaint was with pdfwrite, using
-sColorConversionStrategy=Gray and that goes from 21 minutes 24
seconds with the 9.55.0 release to 29 minutes 52 seconds with the
new PDF interpreter and 16 seconds with this code.
Diffstat (limited to 'pdf/pdf_pattern.c')
-rw-r--r-- | pdf/pdf_pattern.c | 40 |
1 files changed, 15 insertions, 25 deletions
diff --git a/pdf/pdf_pattern.c b/pdf/pdf_pattern.c index 57a164493..bffe6fd25 100644 --- a/pdf/pdf_pattern.c +++ b/pdf/pdf_pattern.c @@ -114,35 +114,12 @@ static void pdfi_free_pattern_context(pdf_pattern_context_t *context) gs_free_object(context->ctx->memory, context, "Free pattern context"); } -static bool -pdfi_pattern_purge_proc(gx_color_tile * ctile, void *proc_data) -{ - if (ctile->id == *((gx_bitmap_id *)proc_data)) - return true; - return false; -} - void pdfi_pattern_cleanup(gs_memory_t * mem, void *p) { gs_pattern1_instance_t *pinst = (gs_pattern1_instance_t *)p; - pdf_pattern_context_t *context; - gx_color_tile *pctile = NULL; - - context = (pdf_pattern_context_t *)pinst->client_data; - - /* If are being called from Ghostscript, the clist pattern accumulator device (in - the tile cache) *can* outlast outlast our pattern instance, so if the pattern - instance is being freed, also remove the entry from the cache - */ - if (context != NULL && context->ctx != NULL && context->ctx->pgs != NULL && - context->shading == NULL && context->ctx->pgs->pattern_cache != NULL - && gx_pattern_cache_get_entry(context->ctx->pgs, pinst->id, &pctile) == 0 - && gx_pattern_tile_is_clist(pctile)) { - gx_pattern_cache_winnow(gstate_pattern_cache(context->ctx->pgs), pdfi_pattern_purge_proc, (void *)(&pctile->id)); - } - if (context != NULL) { - pdfi_free_pattern_context(context); + if (pinst->client_data != NULL) { + pdfi_free_pattern_context((pdf_pattern_context_t *)pinst->client_data); pinst->client_data = NULL; pinst->notify_free = NULL; } @@ -525,6 +502,19 @@ pdfi_setpattern_type1(pdf_context *ctx, pdf_dict *stream_dict, pdf_dict *page_di cc->pattern->client_data = context; cc->pattern->notify_free = pdfi_pattern_cleanup; + { + unsigned long hash = 5381; + unsigned int i; + const char *str = (const char *)&ctx->pgs->ctm; + + gs_pattern1_instance_t *pinst = (gs_pattern1_instance_t *)cc->pattern; + + + for (i = 0; i < 4 * sizeof(float); i++) + hash = ((hash << 5) + hash) + str[i]; /* hash * 33 + c */ + + pinst->id = hash + pdict->object_num; + } context = NULL; code = pdfi_grestore(ctx); |