diff options
author | Nick Vatamaniuc <vatamane@gmail.com> | 2023-05-16 18:42:30 -0400 |
---|---|---|
committer | Nick Vatamaniuc <nickva@users.noreply.github.com> | 2023-05-17 17:30:41 -0400 |
commit | ba54c635ba3b5a9e4650245f5b4b8df47f39d78c (patch) | |
tree | 86f156a0b115735bfc9ce0144fb2776e04b751a6 /INSTALL.Unix.md | |
parent | 6b4cbaa72ab766e0a08c592971fb86be99a7d182 (diff) | |
download | couchdb-ba54c635ba3b5a9e4650245f5b4b8df47f39d78c.tar.gz |
Optimize mem3:dbname/1 function
`mem3:dbname/1` with a `<<"shard/...">>` binary is called quite a few times as
seen when profiling with fprof:
https://gist.github.com/nickva/38760462c1545bf55d98f4898ae1983d
In that case `mem3:dbname` is removing the timestamp suffix. However, because
it uses `filename:rootname/1` which handles cases pertaining to file system
paths and such, it ends up being a bit more expensive than necessary.
To optimize it assume it has a timestamp suffix and try to parse it out first,
and then verify can be parsed into an integer, if that fails fall back to using
`filename:rootname/1`.
To lower chance of the timestamp suffix changing and us not noticing move the
shard suffix generation function from fabric to mem3 so the generating and the
stripping functions are right next to each other.
A quick speed comparison test shows a 6x speedup or so:
```
shard_speed_test() ->
Shard = <<"shards/80000000-9fffffff/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.1234567890">>,
shard_speed_check(Shard, 10000).
shard_speed_check(Shard, N) ->
T0 = erlang:monotonic_time(),
do_dbname(Shard, N),
Dt = erlang:monotonic_time() - T0,
DtUsec = erlang:convert_time_unit(Dt, native, microsecond),
DtUsec / N.
do_dbname(_, 0) ->
ok;
do_dbname(Shard, N) ->
_ = dbname(Shard),
do_dbname(Shard, N - 1).
```
On main:
```
(node1@127.0.0.1)1> mem3:shard_speed_test().
1.3099
```
With PR:
```
(node1@127.0.0.1)1> mem3:shard_speed_test().
0.1959
```
Diffstat (limited to 'INSTALL.Unix.md')
0 files changed, 0 insertions, 0 deletions