Title: SPARQL Tutorial Slug: sparql-tutorial ... {{ style.xml }} This document aims to introduce you to RDF and SPARQL from the ground up, up to a point where SPARQL queries will become familiar and approachable to reason about. Different RDF triple stores may have different data layouts. All examples in this tutorial come from the Nepomuk ontology, and even though the tutorial aims to be generic enough, it mentions things specific to Tracker. Those are clearly spelled out. If you are reading this tutorial, you might also have Tracker installed in your system, if that is the case you can for example start a fresh empty SPARQL service for local testing: ```bash $ tracker3 endpoint --dbus-service org.example.Endpoint --ontology nepomuk ``` The queries can be run in this specific service with: ```bash $ tracker3 sparql --dbus-service org.example.Endpoint --query $SPARQL_QUERY ``` ## RDF Triples RDF data defines a graph, composed by vertices and edges. This graph is directed, because edges point from one vertex to another, and it is labeled, as those edges have a name. The unit of data in RDF is a triple of the form: subject predicate object Or expressed visually:

Subject and object are 2 graph vertices and the predicate is the edge, the accumulation of those triples form the full graph. For example, the following triples: ```turtle a nfo:FileDataObject . a nmm:MusicPiece . nie:title "Images" . nmm:musicAlbum . nmm:albumArtist . nmm:albumArtist . nmm:performer . a nmm:MusicAlbum . nie:title "Go Off!" . a nmm:Artist . nmm:artistName "Jason Becker" . a nmm:Artist . nmm:artistName "Marty Friedman" . a nmm:Artist . nmm:artistName "Cacophony" . ``` Would visually generate the following graph:

The dot after each triple is not (just) there for legibility, but is part of the syntax. The RDF triples in full length are quite repetitive and cumbersome to write, luckily they can be shortened by providing multiple objects (with `,` separator) or multiple predicate/object pairs (with `;` separator), the previous RDF could be shortened into: ```turtle a nfo:FileDataObject, nmm:MusicPiece . nie:title "Images" . nmm:musicAlbum . nmm:albumArtist , . nmm:performer . a nmm:MusicAlbum . nie:title "Go Off!" . a nmm:Artist . nmm:artistName "Jason Becker" . a nmm:Artist . nmm:artistName "Marty Friedman" . a nmm:Artist . nmm:artistName "Cacophony" . ``` And further into: ```turtle a nfo:FileDataObject, nmm:MusicPiece ; nie:title "Images" ; nmm:musicAlbum ; nmm:albumArtist , ; nmm:performer . a nmm:MusicAlbum ; nie:title "Go Off!" . a nmm:Artist ; nmm:artistName "Jason Becker" . a nmm:Artist ; nmm:artistName "Marty Friedman" . a nmm:Artist ; nmm:artistName "Cacophony" . ``` ## SPARQL SPARQL is the definition of a query language for RDF data. How does a query language for graphs work? Naturally by providing a graph to be matched, it is conveniently called the "graph pattern". SPARQL extends over the RDF concepts and syntax, once familiar with RDF the basic data insertion syntax should be fairly self-explanatory: ```SPARQL INSERT DATA { a nfo:FileDataObject, nmm:MusicPiece ; nie:title "Images" ; nmm:musicAlbum ; nmm:albumArtist , ; nmm:performer . a nmm:MusicAlbum ; nie:title "Go Off!" . a nmm:Artist ; nmm:artistName "Jason Becker" . a nmm:Artist ; nmm:artistName "Marty Friedman" . a nmm:Artist ; nmm:artistName "Cacophony" . } ``` Same with simple data deletion: ```SPARQL DELETE DATA { a rdfs:Resource ; } ``` And simple graph testing: ```SPARQL # Tell me whether this RDF data exists in the store ASK { nie:title "Images" ; nmm:albumArtist ; nmm:musicAlbum . nie:title "Go Off!" . nmm:artistName "Jason Becker" } ``` Which would result in `true`, as the triple does exist. The ASK query syntax results in a single boolean row/column containing whether the provided graph exists in the store or not. ## Queries and variables Of course, the deal of a query language is being able to obtain the stored data, not just testing whether data exists. The `SELECT` query syntax is used for that, and variables are denoted with a `?` prefix (or `$`, although that is less widely used), variables act as "placeholders" where any data will match and be available to the resultset or within the query as that variable name. These variables can be set anywhere as the subject, predicate or object of a triple. For example, the following query could be considered the opposite to the simple boolean testing the that `ASK` provides: ```SPARQL # Give me every known triple SELECT * { ?subject ?predicate ?object } ``` What does this query do? it provides a triple with 3 variables, that every known triple in the database will match. The `*` is a shortcut for all queried variables, the query could also be expressed as: ```SPARQL SELECT ?subject ?predicate ?object { ?subject ?predicate ?object } ``` However, querying for all known data is most often hardly useful, this got unwieldly soon! Luckily, that is not necessarily the case, the variables may be used anywhere in the triple definition, with other triple elements consisting of literals you want to match for, e.g.: ```SPARQL # Give me the title of the song (Result: "Images") SELECT ?songName { nie:title ?songName } ``` ```SPARQL # What is this text to the album? (Result: the nie:title) SELECT ?predicate { ?predicate "Go Off!" } ``` ```SPARQL # What is the resource URI of this fine musician? (Result: ) SELECT ?subject { ?subject nmm:artistName "Marty Friedman" } ``` ```SPARQL # Give me all resources that are a music piece (Result: ) SELECT ?song { ?song a nmm:MusicPiece } ``` And also combinations of them, for example: ```SPARQL # Give me all predicate/object pairs for the given resource SELECT ?pred ?obj { ?pred ?obj } ``` ```SPARQL # The Answer to the Ultimate Question of Life, the Universe, and Everything SELECT ?subj ?pred { ?subj ?pred 42 } ``` ```SPARQL # Give me all resources that have a title, and their title. SELECT ?subj ?obj { ?subj nie:title ?obj } ``` And of course, the graph pattern can hold more complex triple definitions, that will be matched as a whole across the stored data. for example: ```SPARQL # Give me all songs from this fine album SELECT ?song { ?album nie:title "Go Off!" . ?song nmm:musicAlbum ?album } ``` ```SPARQL # Give me all song resources, their title, and their album title SELECT ?song ?songTitle ?albumTitle { ?song a nmm:MusicPiece ; nmm:musicAlbum ?album ; nie:title ?songTitle . ?album nie:title ?albumTitle } ``` Stop a bit to think on the graph pattern expressed in the last query: