summaryrefslogtreecommitdiff
path: root/regen/mk_invlists.pl
diff options
context:
space:
mode:
authorYves Orton <demerphq@gmail.com>2022-04-17 15:00:08 +0200
committerKarl Williamson <khw@cpan.org>2022-04-19 05:41:19 -0600
commiteda35008b17e739922da4577bba648b73b8fbefc (patch)
tree78f8b57ed7d4761f1c41f09d6598de0824cedce3 /regen/mk_invlists.pl
parent44a605b000708fc84ba34c075bc6ba3bb6a3d36d (diff)
downloadperl-eda35008b17e739922da4577bba648b73b8fbefc.tar.gz
regen/mph.pl & mk_invlists.pl - add the "_squeeze" algorithm to produce smaller blobs
The squeeze algorithm produces smaller blobs, 10-20% depending on how it is used. With the "randomize_squeeze" option enabled it is slower but produces 20% smaller blobs than the "_simple" strategy we used to use. With the "randomize_squeeze" option disabled it is about as fast as "_simple" but produces about 10% smaller blobs. Regardless "_squeeze" uses more memory than _simple; quite a bit more currently, although that is unforced and could be changed if required. -blob length: 10548 +blob length: 8635 ... -data size: 69908 (%67.07) +data size: 67995 (%65.23) So it saves 1913 bytes running with this seed. I happened to get lucky with the seed, depending on the seed used the blob ended up about 8650 bytes. This algorithm is originally by Ilya Sashcheka, so I have added him to the AUTHORS file, but unfortunately I no longer have his email address as we lost touch. It contains many modifications by me.
Diffstat (limited to 'regen/mk_invlists.pl')
-rw-r--r--regen/mk_invlists.pl6
1 files changed, 6 insertions, 0 deletions
diff --git a/regen/mk_invlists.pl b/regen/mk_invlists.pl
index c2e7535ceb..12b9ce2344 100644
--- a/regen/mk_invlists.pl
+++ b/regen/mk_invlists.pl
@@ -3370,6 +3370,12 @@ print $keywords_fh "\n#if defined(PERL_CORE) || defined(PERL_EXT_RE_BUILD)\n\n";
my $mph= MinimalPerfectHash->new(
source_hash => \%keywords,
match_name => "match_uniprop",
+ simple_split => $ENV{SIMPLE_SPLIT} // 0,
+ randomize_squeeze => $ENV{RANDOMIZE_SQUEEZE} // 1,
+ max_same_in_squeeze => $ENV{MAX_SAME} // 5,
+ srand_seed => (lc($ENV{SRAND_SEED}//"") eq "auto")
+ ? undef
+ : $ENV{SRAND_SEED} // 1785235451, # I let perl pick a number
);
$mph->make_mph_with_split_keys();
print $keywords_fh $mph->make_algo();