From e83a89bd6206ac6c3f2249f0c629171f328d8c35 Mon Sep 17 00:00:00 2001 From: Thomas Leonard Date: Fri, 5 Jul 2002 12:30:42 +0000 Subject: New system using update-mime-database. --- shared-mime-info-spec.xml | 321 +++++++++++++++++++++++----------------------- 1 file changed, 160 insertions(+), 161 deletions(-) (limited to 'shared-mime-info-spec.xml') diff --git a/shared-mime-info-spec.xml b/shared-mime-info-spec.xml index f26bef97..2436ed60 100644 --- a/shared-mime-info-spec.xml +++ b/shared-mime-info-spec.xml @@ -34,15 +34,15 @@ Shared MIME-info Database - 23 May 2002 + 04 Jul 2002 Introduction Version -This is version 0.7 of the Shared MIME-info Database spec, last updated 23 May 2002. - +This is version 0.8-preview of the Shared MIME-info Database spec, last updated 05 +Jul 2002. What is this spec? @@ -53,17 +53,20 @@ correct MIME type for a file. This is generally done by examining the file's name or contents, and looking up the correct MIME type in a database. +It is also useful to store information about each type, such as a textual +description of it, or a list of applications that can be used to view or edit +files of that type. + + For interoperability, it is useful for different programs to use the same -database so that different programs agree on the type of a file and new -rules for determining the type apply to all programs. +database so that different programs agree on the type of a file and +information is not duplicated. It is also helpful for application authors to +only have to install new information in one place. -This specification attempts to unify the type-guessing systems currently in +This specification attempts to unify the MIME database systems currently in use by GNOMEGNOME, KDEKDE and -ROXROX. Only the name-to-type and contents-to-type mappings -are covered by this spec; other MIME type information, such as the default -handler for a particular type, or the icon to use to display it in a file -manager, are not covered since these are a matter of style. +ROXROX, and provide room for future extensibility. @@ -279,107 +282,175 @@ This spec proposes: -A standard format for these files. +A standard way for applications to install new MIME related information. + + +A standard way of getting the MIME type for a file. + + +A standard way of getting information about a MIME type. -Standard locations for them. +Standard locations for all the files, and methods of resolving conflicts. Further, the existing databases have been merged into a single package SharedMIME. - File format + Directory layout -The new format is very similar to that described in the Desktop Entries -SpecificationDesktopEntries. However, only the tags used -in this example are valid: - +There are two important requirements for the way the MIME database is stored: + + +Applications must be able to extend the database in any way when they are installed, +both to add new rules for determining type, and new information about specific types. + + +The user must be able to override the defaults set by the system administrator, who +must, in turn, be able to override the defaults set by the distribution. + + -All KDE-specific tags have been removed, as well as the Icon field. Although -all desktops need a way to determine the icon for a particular type, the icon -used will depend on desktop, and not only on the file type. The Encoding tag -is not present; all .mimeinfo files are in the UTF-8 encoding. +The directories to be used to store the files in the database are: + + +/usr/share/mime/ + + +/usr/local/share/mime/ + + +~/.mime/ + + +In the rest of this document, paths shown with the prefix +<MIME> indicate the files should be loaded from +all the directries listed above. For example, Load all the +<MIME>/text/html.xml files means to load +/usr/share/mime/text/html.xml, +/usr/local/share/mime/text/html.xml, and +~/.mime/text/html.xml (if they exist). -The type should be a standard MIME type where possible. If a special media type -is required for non-file objects (directories, pipes, etc), then the media -type 'inode' may be used. +Where the information from these files is conflicting, information from directories +lower in the list take precedence. -The entries in Patterns are separated by semicolons. There is no trailing -semicolon. PreferredExtension is the suggested extension to use when creating -files of this type. +Any file named User.xml takes precedence over all other files in +the same packages directory. Tools which let the user edit the +database should edit the file ~/mime/packages/User.xml. -Although not part of the name-to-type mapping, the Comment field is left in -for the sake of not having too many files. +Each application that wishes to contribute to the MIME database will install a +single XML file, named after the application, into one of the three +<MIME>/packages/ directories (depending on where the user requested +the application be installed). After installing, uninstalling or modifying this +file, the application MUST run the update-mime-database command, +which is provided by the freedesktop.org shared databaseSharedMIME. -The Hidden field is usually not present. It is used to indicate that this entry -replaces all information for this MIME type read so far, instead of being -merged with other records for the same type. The intent is to let users -entirely replace existing types. +update-mime-database is passed the mime +directory containing the packages subdirectory which was +modified as its only argument. It scans all the XML files in the packages +subdirectory, combines the information in them, and creates a number of output files: + + +<MIME>/globs (contains a mapping from extension to MIME type) + + +<MIME>/magic (contains a mapping from file contents to MIME type) + + +<MIME>/MEDIA/SUBTYPE.xml (one file for each MIME +type, giving details about the type) + + +The format of these generated files and the source files in packages +are explained in the following sections. - Directory layout - -Unlike the KDE system, the files are not arranged in the filesystem by type. -This approach is only possible for a tightly coordinated system. Consider, -for example, that ROX-Filer adds a mapping from -.DirIcon to 'image/png'. This cannot be specified in -a file called image/png.desktop without conflicting -with existing definitions for the type. - - -Since files are not named by type, each file may contain multiple types. The -files should instead be named by the package that they come from to avoid -conflicts and reduce loading times. - + The source XML files -The directories to be used to load these files are: - +Each application provides only a single XML source file, which is installed in the +packages directory as described above. This file is an XML file +whose document element is named mime-info and whose namespace URI +is . All elements +described in this specification MUST have this namespace too. + +The document element may contain zero or more mime-type child nodes, +in any order, each describing a single MIME type. Each element has a type +attribute giving the MIME type that it describes. + +Each mime-type node may contain any combination of the following elements, +and in any order: -/usr/share/mime/mime-info +glob elements have a pattern attribute. Any file +whose name matches this pattern will be given this MIME type (subject to conflicting rules in +other files, of course). -/usr/local/share/mime/mime-info +magic elements have offset, +type, value and, optionally, +mask attributes. Each magic element corresponds to one +line of file +1's magic.mime file. +They can be nested in the same way to provide the equivalent of continuation +lines. -~/.mime/mime-info +comment elements give a human-readable textual description of the MIME +type. There may be many of these elements with different xml:lang attributes +to provide the text in multiple languages. -Each of these directories contains a number of files with the '.mimeinfo' -extension. Applications MUST NOT try to load other files. This is to allow for -future extensions. + +Here is an example source file, named mozilla.xml: + + + + + + + HTML page + html bladsy + + +]]> +In practice, common types such as text/html are provided by the freedesktop.org shared +database. Also, only new information needs to be provided, since this information will be merged +with other information about the same type. + + + The MEDIA/SUBTYPE.xml files -Programs modifying any of these files MUST update the modification time on -the parent (mime-info) directory so that applications can -easily detect the change. The rules from the directories in this list take -precedence over conflicting rules from earlier directories. If a directory -contains a file called user.mimeinfo then it should be -read after all other files in that directory. This is to allow the user's -settings to take precedence over all others. GUI tools for editing the MIME -types will edit ~/.mime/mime-info/user.mimeinfo. +These files have a mime-type element as the root node. The format is +as described above. They are created by merging all the mime-type +elements from the source files and creating one output file per MIME type. Each file may contain +information from multiple source files. The magic and +glob elements will have been removed. - Pattern matching + The glob files -KDE's Patterns field replaces GNOME's and ROX's ext/regex fields, since it +This is a simple list of lines containing a glob pattern, whitespace, and a MIME type. For example: + + + +KDE's glob system replaces GNOME's and ROX's ext/regex fields, since it is trivial to detect a pattern in the form '*.ext' and store it in an extension hash table internally. The full power of regular expressions was not being used by either desktop, and glob patterns are more suitable for @@ -390,9 +461,6 @@ Applications MUST first try a case-sensitive match, then a case-insensitive one. This is so that main.C will be seen as a C++ file, but IMAGE.GIF will still use the *.gif pattern. - - - Dealing with conflicts If several patterns match then the longest pattern SHOULD be used. In particular, files with multiple extensions (such as @@ -403,18 +471,9 @@ be matched before all others. It is acceptable to match patterns of the form using a hash table). +There may be several rules mapping to the same type. They should all be merged. If the same pattern is defined twice, then they MUST be ordered by the -directory the rule came from (this is to allow users to override the system -defaults if, for example, they are using a common extension to mean something -else). Patterns in ~/.mime/mime-info override those -in /usr/local/share/mime/mime-info, which in turn take -precedence over those from /usr/mime/mime-info. -If a pattern is defined twice within same directory, either can be used. - - -If the same type is defined in several places, the Patterns and Comments -MUST be merged. If two different comments are provided for the same -MIME type in the same language, they should be ordered by directory as before. +directory the rule came from, as described above. Common types (such as MS Word Documents) will be provided in the X Desktop @@ -424,82 +483,22 @@ about its own types, conflicts should be rare. - Contents matching + The magic files -The value of the Contents attribute contains a priority and an expression. -If several expressions match for one file, the one with the highest priority is used. -As a guide, priorities should be between 1 and 100, with 50 being the normal case. -Generic types (such as XML or GZip-compressed files) should have lower priorities. - -Since scanning a file's contents can be very slow, applications may choose to -do pattern matching first and only fall back to content matching, or not -perform it at all. - - -The basic building blocks of expressions are bracketed lists containing a type, -an offset (or range of offsets), the data to match and, optionally, a mask. For -example: +These files have the same format as +file +1's magic.mime file, except that +the offset may be a range in the form START:END. The rule is considered to match if there is +a match at either of these offsets, or at any offset in-between. -The first element of the list is the type of the data (see the table below), the -second is the range of offsets to check, the third is the value to match and -the last, if present, is the mask. - - -Integers have the usual C-style prefixes (0 for octal numbers, 0x for hexadecimal). -Strings have C-style escaping. This string contains the sequence of bytes -<0, 8, 9, 10>: "\0\010\t\xa". - - -A range gives the range of valid starting offsets. If the end of the range is omitted then -it is assumed to be the same as the start (that is, the match is only checked at one point -in the file). - - -The possible types of match are listed below: - - - - - -TypeDescription - - - -stringString of bytes -byteSingle byte -big1616-bit big-endian integer -big3232-bit big-endian integer -little1616-bit little-endian integer -little3232-bit little-endian integer -host1616-bit integer in host-order -host3232-bit integer in host-order - - - - -These basic expressions may be combined using the and and -or syntax, eg: - -The and keyword corresponds to a more-deeply indented continuation -line in the original file -1 syntax, while or corresponds -to elements at the same indentation. They may be nested in the obvious (scheme-like) -fashion. - - -Since many formats have sub-formats (for example, KOffice stores its files in -GZip format, with a generic KOffice marker and a specific application marker), -it may be a useful optimisation to spot the same subexpression (eg -(string 10 "KOffice")) being used in several types and -only check it once. - + Security implications -- cgit v1.2.1