delta/gnome/tracker.git - gitlab.gnome.org: GNOME/tracker.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Release 3.4.0.rc3.4.0.rc	Carlos Garnacho	2022-09-05	2	-1/+13
\|
*	libtracker-sparql: Fix off by one in checks for escaped IRIs	Carlos Garnacho	2022-09-04	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	The 0x20 character should also be escaped as per the SPARQL reference, and it correctly is when setting a TrackerResource IRI. Even though, the fast path check for the presence of characters that should be escaped is missing it, so it would be possible to let IRIs that only have this invalid character as valid. Since 0x20 (whitespace) is possibly the most ubiquitous character that should be escaped, it's a bit of an oversight. Fixes: 33031007c ("libtracker-sparql: Escape illegal characters in IRIREF...")
*	Update Latvian translation	Rūdolfs Mazurs	2022-09-03	1	-82/+85
\|
*	Update Danish translation	Alan Mortensen	2022-09-03	1	-79/+82
\|
*	Update Catalan translation	Jordi Mas	2022-09-02	1	-82/+86
\|
*	Merge branch 'wip/carlosg/resource-iri-escapes' into 'master'	Sam Thursfield	2022-09-02	2	-3/+83
\|\ \| \| \| \| \| \| \| \|	libtracker-sparql: Escape illegal characters in IRIREF from TrackerResource See merge request GNOME/tracker!536
\| *	libtracker-sparql: Escape illegal characters in IRIREF from TrackerResource	Carlos Garnacho	2022-09-02	2	-3/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, all IRIREF going through SPARQL updates will be validated for the characters being in the expected set (https://www.w3.org/TR/sparql11-query/#rIRIREF), meanwhile TrackerResource is pretty liberal in the characters used by a TrackerResource identifier or IRI reference. This disagreement causes has 2 possible outcomes: - If the resource is inserted via print_sparql_update(), print_rdf() or alike while containing illegal characters, it will find errors when handling the SPARQL update. - If the resource is directly inserted via TrackerBatch or update_resource(), the validation step will be bypassed, ending up with an IRI that contains illegal characters as per the SPARQL grammar. In order to make TrackerResource friendly to e.g. sloppy IRI composition and avoid these ugly situations when an illegal char sneaks in, make it escape the IRIs as defined by IRIREF in the SPARQL grammar definition. This way every method of insertion will succeed and be most correct with the given input. Also, add tests for this behavior, to ensure we escape what should be escaped.
* \|	Update Korean translation	Seong-ho Cho	2022-09-01	1	-80/+83
\| \|
* \|	Update Hungarian translation	Balázs Úr	2022-09-01	1	-272/+89
\| \|
* \|	Update Slovenian translation	Matej Urbančič	2022-08-31	1	-85/+88
\|/
*	Merge branch 'wip/carlosg/compiler-warnings' into 'master'	Carlos Garnacho	2022-08-30	12	-38/+70
\|\ \| \| \| \| \| \| \| \|	Fix build/compiler warnings See merge request GNOME/tracker!534
\| *	libtracker-sparql: Handle G_TYPE_UINT values in TrackerResource serialization	Carlos Garnacho	2022-08-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Even though the TrackerResource helper functions don't use this type internally, it may be set directly through tracker_resource_set_gvalue(). Handle this additional type, since it's used in some Tracker Miners extractors.
\| *	build: Don't look for posix_fadvise	Carlos Garnacho	2022-08-30	2	-4/+0
\| \| \| \| \| \| \| \| \| \|	For some reason this fails, but also is unused in this project tree. We can stop asking for its existence.
\| *	build: Use generic .get_variable method to get pkgconfig variables	Carlos Garnacho	2022-08-30	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	This generic method is available since meson 0.51 (which we already require), while pkg.get_pkgconfig_variable is deprecated in 0.56.
\| *	docs: Avoid warnings doubly parsing base ontology	Carlos Garnacho	2022-08-30	4	-17/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Separate ontology parsing so that we can provide separate locations for .ontology and .description files, so that we can avoid doubly parsing the base ontology just for the sake of parsing the description files. This avoids redefinition warnings by the ontology docgen tool while generating the docs for the base ontology.
\| *	docs: Use correct integer type for iterator	Carlos Garnacho	2022-08-30	1	-1/+1
\| \| \| \| \| \| \| \|	Fixes a compiler warning.
\| *	tests: Remove unused variables	Carlos Garnacho	2022-08-30	1	-2/+0
\| \|
\| *	core: Cast cursor variable to the right type	Carlos Garnacho	2022-08-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	We are using TrackerSparqlCursor API here, passing a TrackerDBCursor. we should cast this subclass to the correct parent type.
\| *	remote: Support newer soup API to pause/unpause messages	Carlos Garnacho	2022-08-30	1	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \|	There is new API scheduled for 3.2.0 to pause/unpause messages for deferred processing. Follow these API updates and handle it, without bumping to such newer version so far.
\| *	libtracker-sparql: Avoid deprecated API usage warnings better	Carlos Garnacho	2022-08-30	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Or the right way. We still use at places tracker_namespace_manager_get_default() to preserve backwards compatible behavior, so these deprecated warnings should be avoided with G_GNUC_BEGIN/END_IGNORE_DEPRECATIONS.
* \|	Update Brazilian Portuguese translation	Leônidas Araújo	2022-08-30	1	-82/+90
\|/
*	Merge branch 'wip/carlosg/update-perf' into 'master'	Carlos Garnacho	2022-08-30	23	-1511/+1935
\|\ \| \| \| \| \| \| \| \|	Improve performance of database updates See merge request GNOME/tracker!532
\| *	libtracker-sparql: Avoid spurious warnings writing D-Bus endpoint cursor	Carlos Garnacho	2022-08-30	2	-16/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes if the other end closes prematurely, the cursor gets cancelled and end up as an "Interrupted" error. Make that error more consistent by using G_IO_ERROR_CANCELLED, and avoid issuing a warning in those situations. Fixes some sporadic warnings seen in the serialize test.
\| *	core: Avoid query for subclasses of a resource	Carlos Garnacho	2022-08-30	1	-41/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the slow paths of deleting a class, we must recursively delete all subclasses prior to the deletion of this class. We figure out the existing subclasses of a class for a given resource through a query, but we have this information right there. Use the types array, and recursively handle direct subclasses of the class being handled for deletion, so that things cascade properly and it mostly consists of a couple of array lookups as opposed to a database query. This should make these slow paths faster.
\| *	core: Drop caching of scarcely run statements	Carlos Garnacho	2022-08-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The graph management update statements are rarely run, we can avoid caching them.
\| *	core: Avoid finding out resource insertion through GError	Carlos Garnacho	2022-08-30	1	-29/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ensuring that a resource IRI is known in the database, we optimize for newly added resources and try to insert them without further checks, only resorting to a query to fetch the resource ID if that insertion failed. But we figure it out after getting a GError, and clearing it. Instead, just check the return value, so that we don't create/free errors for every resource where that assumption fails. Also, we failed to cache the resource if the second querying step was taken, fix that to get another nice speed improvement.
\| *	core: Return a boolean on tracker_db_statement_execute()	Carlos Garnacho	2022-08-30	2	-30/+28
\| \| \| \| \| \| \| \| \| \|	Instead of having to pass a GError and looking for that to check for errors.
\| *	core: Don't ask to cache a one-time statement	Carlos Garnacho	2022-08-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	We initialize nrl:modified once, there is no need to cache this statement for anything later.
\| *	core: Keep statement to query resource IDs	Carlos Garnacho	2022-08-30	1	-9/+36
\| \| \| \| \| \| \| \| \| \| \| \|	This is quite a hot path during updates of already existing resources, it makes sense to keep the statement around, instead of resorting to cache lookups.
\| *	core: Shuffle query for RDF types of a resource	Carlos Garnacho	2022-08-30	3	-68/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is only used in the update machinery, so move it there and keep the TrackerDBStatement around. This is a very hot path during updates of already existing resources, so it makes sense to avoid the DB interface internal caches and SQL query strings for this.
\| *	core: Use statement cache on queries for property values	Carlos Garnacho	2022-08-30	1	-12/+33
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of relying on the internal TrackerDBInterface cache, use a distinct one in the update machinery, so TrackerProperty objects can be looked up directly without creating a SQL string.
\| *	core: Optimize call to fetch property values	Carlos Garnacho	2022-08-30	1	-10/+10
\| \| \| \| \| \| \| \| \| \|	This is only necessary for properties with rdfs:Resource range, so only perform this operation for those.
\| *	core: Optimize deletion of classes of a resource	Carlos Garnacho	2022-08-30	1	-27/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the code is exceedengly thorough and issues individual deletes for every property that is related to the class being deleted. We can take some shortcuts here, and avoid querying existing values for most cases, only 2 cases remain where it is necessary to fetch the previous content: - Properties that have TRACKER_PROPERTY_TYPE_RESOURCE, in order to correctly change the refcounting of the resource that the property points to. - Properties that are domain indexes in other classes, since we have to chain up to those tables with the right value being deleted. All other situations can do without fetching the previous values for the property, and for single valued properties, they can even be delegated to the deletion of the row in the table representing the TrackerClass. The result is a speedup while deleting entire resources.
\| *	core: Decouple FTS updates from previous property value lookups	Carlos Garnacho	2022-08-30	1	-64/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, FTS updates do iterate the set of previously looked up properties in order to find the FTS ones and issue the FTS update. We want to optimize on fetching the old values for properties, and FTS is orthogonal to it, so decouple both machineries. As a replacement keep a list of FTS properties that were modified, so that these are updated when flushing.
\| *	core: Maintain array for refcount changes	Carlos Garnacho	2022-08-30	1	-20/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since refcount changes are very frequent, and each of them requires a hashtable lookup+update to change (the refcount is the value of the hashtable, stored with GINT_TO_POINTER), plus the replacement goes with a new copy of the TrackerRowid for the hashtable key. Overall, this maintenance is somewhat expensive to perform. As there are small buffers for triple updates (64 items), a hash table does not bring a lot of benefit in lookups, nor updates. Switching to a simple unordered array to keep this accounting looks faster in profiling.
\| *	core: Use TrackerResourceIterator to iterate resources for updates	Carlos Garnacho	2022-08-30	1	-106/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This small internal helper has the nice added value that it does not require allocating new memory to iterate across triples (as opposed to tracker_resource_get_properties and tracker_resource_get_values). Use that so we can avoid these for the most part (the exception being rdf:type, since we want it handled before all values).
\| *	core: Delay GHashTable creation	Carlos Garnacho	2022-08-30	1	-2/+7
\| \| \| \| \| \| \| \| \| \|	We can avoid this small piece of busywork until when it's needed. May help with insertion of simple TrackerResources.
\| *	core: Shuffle graph URI expansion	Carlos Garnacho	2022-08-30	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the function it is being expanded from can be called recursively, on deeper iterations it would pointessly try to expand the URI again. Do it from the toplevel function, so it's done once for the whole TrackerResource update.
\| *	core: Avoid expanding property names for lookups	Carlos Garnacho	2022-08-30	1	-19/+3
\| \| \| \| \| \| \| \| \| \|	We can no look up properties in either short or long URI form, so the URI expansion can be avoided here.
\| *	core: Make it possible to look up properties by short URIs	Carlos Garnacho	2022-08-30	1	-1/+9
\| \| \| \| \| \| \| \| \| \|	In addition to the expanded URIs, make it possible to look up properties by short URI.
\| *	core: Add fast path to look up class/property URIs	Carlos Garnacho	2022-08-30	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \|	We already have the data in memory, so use that instead of querying the database for these. If the URI does not turn out to be a class/property one, the database lookup is performed.
\| *	core: Move accounting of already visited TrackerResources a level up	Carlos Garnacho	2022-08-30	4	-5/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The improvement is twofold, if a same TrackerResource is referenced by different elements in a TrackerBatch, we will skip second additions altogether. On the other hand, the hashtable is more long-lived and not created/freed all the time tens thousands of times per second. Since the resources might be re-used in different graphs within the same batch (happens in tracker-miner-fs-3 for file-related content graph data), we must be careful though not to optimize those away. In that case we simply blow the visited resources cache on graph changes during the processing of a batch.
\| *	core: Replace hashtable with list	Carlos Garnacho	2022-08-30	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When preparing update statements, we can possibly end up with multiple coalesced single-valued property updates for the same property. In that case we attempt to just push the last update as it is the one that will prevail. But the amount of properties to be grouped for a same graph/resource/class tuple is not usually high enough. As this is a very hot path during updates, using a hashtable to track and avoid repeatedly updating the same property has an intrinsic cost that timidly shows up in profiles. As the amount of elements to iterate is not too high, this can be replaced with a GList which esentially drops these lookups from showing in profiles.
\| *	core: Improve performance of update statement caching	Carlos Garnacho	2022-08-30	1	-121/+259
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the code relies on TrackerDBInterface builtin caching, which consists on lookups for the SQL string prior to statement compiling/binding. This is a very late layer of caching, and relies on us creating the full SQL string (i.e. basically one step away from a sqlite3_stmt) to reuse a cached statement. The caching strategy here can be significantly improved, we basically need a graph\|{class\|property} tuple to look up a statement, and that is something that TrackerDataLogEntry structs in the event log already have. So we may go directly from a log entry to a statement without intermediate string stages. And that's precisely what this commit does. There is a new set of hash/equal functions in order to match these log entries in the statement MRU, so that lookups are fast based on pointer hashes and comparisons. Since we need to preserve a copy of these TrackerDataLogEntry as keys in the MRU cache, add copy/free functions that handle copying the necessary data (a copy of the TrackerDataLogEntry, but also partial copy of the array containing propery changes for this update). This makes lookups sensibly faster, compared to the old laborious string building for lookups.
\| *	core: Generalize TrackerDBStatement cache API	Carlos Garnacho	2022-08-30	2	-129/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This API was so far internal to TrackerDBInterface, in order to cache statements for selects and updates. Since we want to add other caching layers elsewhere, make this API public and more generic. Now, it is possible to define hash/equal/destroy functions, so that it is not limited to SQL query strings as keys, this will be useful in future commits. In the mean time, reimplement the select/update caches on top of this API.
\| *	core: Add direct getters for nrl:added/modified properties	Carlos Garnacho	2022-08-30	3	-11/+36
\| \| \| \| \| \| \| \| \| \| \| \|	Since those are very frequently used properties, it makes sense to have a fast path lookup like we have for rdf:type, so that we don't have to perform a hashtable lookup every time these properties are used.
\| *	core: Refactor buffering of database updates	Carlos Garnacho	2022-08-30	3	-413/+513
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, our caching of triples does have a number of nested structures: - In the buffer there is a struct for each graph - In the graph struct there is a set of changed resource - In the resource struct there is a set of modified tables - In the table struct there is a set of modified properties - In the property struct there is a list of values This incurs in a maintenance cost that is higher than desired, adding and removing elements here becomes a fair chunk of the time spent in updates, since there is a number of allocations and list/hashtable updates performed for batches that deal with a fair amount of different resources (i.e. most of them). In order to improve this, keep use two arrays to buffer this data: - A "properties" array, that keeps individual predicate/object pairs. This is used to store the values of properties being inserted or deleted for single-valued and multivalued properties. This struct is "linked" with (i.e. references) other elements in the array, so that e.g. class updates may reference multiple properties/values being updated. - An "update log" array, containing structs that are a event_type/graph/ subject tuple, plus optionally a link to one of the properties in the previous array, all other properties are fetched through iterating through the linked properties. These log entries are valid for class table updates (i.e. single-valued properties) or multi-valued property tables. These arrays make allocating the buffer a one-time operation (buffer size is fixed, and the arrays are reused during the processing of a TrackerBatch) and insertions into the log largely O(1) as opposed to a number of array/hashtable lookups and inserts. But we still want to coalesce updates to a same class table (e.g. changes to several single-valued properties in the same table), for that there is an additional hashtable set that uses these log entries as keys themselves, with special hash/equal functions, lookups for prior events modifying the same TrackerClass is also quite fast. Overall, this makes the maintenance of this buffer less expensive in the big picture. Even though there are still some remnants of the previous caching for graphs and resources, this plays less of a role. Since this changes the ordering of updates, some tests that rely on implicit ordering (DESCRIBE ones) had to be adapted for this change.
\| *	core: Minor refactor	Carlos Garnacho	2022-08-30	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of passing a gint pointer when binding values to statements so it is increased in the function doing that, increase the parameter counter in the upper code.
\| *	core: Untangle class insert and update queries	Carlos Garnacho	2022-08-30	1	-81/+72
\| \| \| \| \| \| \| \| \| \| \| \|	The SQL update construction of both is mixed, which hinders readability. Untangle these so the generated INSERT/UPDATE can be followed more easily.
\| *	libtracker-sparql/core: Do not clear all cached graph info between updates	Carlos Garnacho	2022-08-30	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since graphs don't change often, we can preserve the cached information for these, most importantly the prepared statements to update refcounts. We now instead clear the contents of the TrackerDataUpdateBufferGraph, preserving the things we want to preserve.