From 7b64d42d22206d9995a8f0cb3b515e623cac4702 Mon Sep 17 00:00:00 2001 From: Karsten Blees Date: Thu, 3 Jul 2014 00:22:54 +0200 Subject: hashmap: add string interning API Interning short strings with high probability of duplicates can reduce the memory footprint and speed up comparisons. Add strintern() and memintern() APIs that use a hashmap to manage the pool of unique, interned strings. Note: strintern(getenv()) could be used to sanitize git's use of getenv(), in case we ever encounter a platform where a call to getenv() invalidates previous getenv() results (which is allowed by POSIX). Signed-off-by: Karsten Blees Signed-off-by: Junio C Hamano --- Documentation/technical/api-hashmap.txt | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'Documentation') diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt index 8ed92a9f85..ad7a5bddd2 100644 --- a/Documentation/technical/api-hashmap.txt +++ b/Documentation/technical/api-hashmap.txt @@ -193,6 +193,21 @@ more entries. `hashmap_iter_first` is a combination of both (i.e. initializes the iterator and returns the first entry, if any). +`const char *strintern(const char *string)`:: +`const void *memintern(const void *data, size_t len)`:: + + Returns the unique, interned version of the specified string or data, + similar to the `String.intern` API in Java and .NET, respectively. + Interned strings remain valid for the entire lifetime of the process. ++ +Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned +strings / data must not be modified or freed. ++ +Interned strings are best used for short strings with high probability of +duplicates. ++ +Uses a hashmap to store the pool of interned strings. + Usage example ------------- -- cgit v1.2.1