diff options
author | Yves Orton <demerphq@gmail.com> | 2022-04-17 15:00:08 +0200 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2022-04-19 05:41:19 -0600 |
commit | eda35008b17e739922da4577bba648b73b8fbefc (patch) | |
tree | 78f8b57ed7d4761f1c41f09d6598de0824cedce3 /regen/mk_invlists.pl | |
parent | 44a605b000708fc84ba34c075bc6ba3bb6a3d36d (diff) | |
download | perl-eda35008b17e739922da4577bba648b73b8fbefc.tar.gz |
regen/mph.pl & mk_invlists.pl - add the "_squeeze" algorithm to produce smaller blobs
The squeeze algorithm produces smaller blobs, 10-20% depending on how it
is used. With the "randomize_squeeze" option enabled it is slower but
produces 20% smaller blobs than the "_simple" strategy we used to use.
With the "randomize_squeeze" option disabled it is about as fast as
"_simple" but produces about 10% smaller blobs. Regardless "_squeeze"
uses more memory than _simple; quite a bit more currently, although that
is unforced and could be changed if required.
-blob length: 10548
+blob length: 8635
...
-data size: 69908 (%67.07)
+data size: 67995 (%65.23)
So it saves 1913 bytes running with this seed. I happened to get lucky
with the seed, depending on the seed used the blob ended up about 8650
bytes.
This algorithm is originally by Ilya Sashcheka, so I have added him to
the AUTHORS file, but unfortunately I no longer have his email address
as we lost touch. It contains many modifications by me.
Diffstat (limited to 'regen/mk_invlists.pl')
-rw-r--r-- | regen/mk_invlists.pl | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/regen/mk_invlists.pl b/regen/mk_invlists.pl index c2e7535ceb..12b9ce2344 100644 --- a/regen/mk_invlists.pl +++ b/regen/mk_invlists.pl @@ -3370,6 +3370,12 @@ print $keywords_fh "\n#if defined(PERL_CORE) || defined(PERL_EXT_RE_BUILD)\n\n"; my $mph= MinimalPerfectHash->new( source_hash => \%keywords, match_name => "match_uniprop", + simple_split => $ENV{SIMPLE_SPLIT} // 0, + randomize_squeeze => $ENV{RANDOMIZE_SQUEEZE} // 1, + max_same_in_squeeze => $ENV{MAX_SAME} // 5, + srand_seed => (lc($ENV{SRAND_SEED}//"") eq "auto") + ? undef + : $ENV{SRAND_SEED} // 1785235451, # I let perl pick a number ); $mph->make_mph_with_split_keys(); print $keywords_fh $mph->make_algo(); |