summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorWilliam A. Rowe Jr <wrowe@apache.org>2000-11-06 18:04:24 +0000
committerWilliam A. Rowe Jr <wrowe@apache.org>2000-11-06 18:04:24 +0000
commit851c7f08e4d5c9a4e2c7a81b508a473f3ad38e4d (patch)
tree5c587ef847c05fbad5b51b0c3c3f8194607d66fd /docs
parent6c1e24643c66cb7e916cfb27353cc9895b0a796b (diff)
downloadapr-851c7f08e4d5c9a4e2c7a81b508a473f3ad38e4d.tar.gz
I've committed this document so that others (esp. Mac OS X, NetWare, OS2, etc)
can refine the observations about their platforms' path processing behavior. Why html? Next revision from me will define the Win32/OS2 pseudo-code in an alternate style, to highlight differences and discrepancies. git-svn-id: https://svn.apache.org/repos/asf/apr/apr/trunk@60638 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'docs')
-rw-r--r--docs/canonical_filenames.html151
1 files changed, 151 insertions, 0 deletions
diff --git a/docs/canonical_filenames.html b/docs/canonical_filenames.html
new file mode 100644
index 000000000..395151986
--- /dev/null
+++ b/docs/canonical_filenames.html
@@ -0,0 +1,151 @@
+<h1>Canonical Filename</h1>
+
+<h2>Requirements</h2>
+
+<p>APR porters need to address the underlying discrepancies between
+file systems. To achieve a reasonable degree of security, the
+program depending upon APR needs to know that two paths may be
+compared, and that a mismatch is guarenteed to reflect that the
+two paths do not return the same resource</p>.
+
+<p>The first discrepancy is in volume roots. Unix and pure deriviates
+have only one root path, "/". Win32 and OS2 share root paths of
+the form "D:/", D: is the volume designation. However, this can
+be specified as "//./D:/" as well, indicating D: volume of the
+'this' machine. Win32 and OS2 also may employ a UNC root path,
+of the form "//server/share/" where share is a share-point of the
+specified network server. Finally, NetWare root paths are of the
+form "server/volume:/", or the simpler "volume:/" syntax for 'this'
+machine. All these non-Unix file systems accept volume:path,
+without a slash following the colon, as a path relative to the
+current working directory, which APR will treat as ambigious, that
+is, neither an absolute nor a relative path per se.</p>
+
+<p>The second discrepancy is in the meaning of the 'this' directory.
+In general, 'this' must be eliminated from the path where it occurs.
+The syntax "path/./" and "path/" are both aliases to path. However,
+this isn't file system independent, since the double slash "//" has
+a special meaning on OS2 and Win32 at the start of the path name,
+and is invalid on those platforms before the "//server/share/" UNC
+root path is completed. Finally, as noted above, "//./volume/" is
+legal root syntax on WinNT, and perhaps others.</p>
+
+<p>The third discrepancy is in the context of the 'parent' directory.
+When "parent/path/.." occurs, the path must be unwound to "parent".
+It's also critical to simply truncate leading "/../" paths to "/",
+since the parent of the root is root. This gets tricky on the
+Win32 and OS2 platforms, since the ".." element is invalid before
+the "//server/share/" is complete, and the "//server/share/../"
+seqence is the complete UNC root "//server/share/". In relative
+paths, leading ".." elements are significant, until they are merged
+with an absolute path. The relative form must only retain the ".."
+segments as leading segments, to be resolved once merged to another
+relative or an absolute path.</p>
+
+<p>The fourth discrepancy occurs with acceptance of alternate character
+codes for the same element. Path seperators are not retained within
+the APR canonical forms. The OS filesystem and APR (slashed) forms
+can both be returned as strings, to be used in the proper context.
+Unix, Win32 and Netware all accept slashes and backslashes as the
+same path seperator symbol, although unix strictly accepts slashes.
+While the APR form of the name strictly uses slashes, always consider
+that there could be a platform that actually accepts slashes as a
+character within a segment name.</p>
+
+<p>The fifth and worst discrepancy plauges Win32, OS2, Netware, and some
+filesystems mounted in Unix. Case insensitivity can permit the same
+file to slip through in both it's proper case and alternate cases.
+Simply changing the case is insufficient for any character set beyond
+ASCII, since various dilectic forms of characters suffer from one to
+many or many to one translations. An example would be u-umlaut, which
+might be accepted as a single character u-umlaut, a two character
+sequence u and the zero-width umlaut, the upper case form of the same,
+or perhaps even a captial U alone. This can be handled in different
+ways depending on the purposes of the APR based program, but the one
+requirement is that the path must be absolute in order to resolve these
+ambiguities. Methods employed include comparison of device and inode
+file uniqifiers, which is a fairly fast operation, or quering the OS
+for the true form of the name, which can be much slower. Only the
+acknowledgement of the file names by the OS can validate the equality
+of two different cases of the same filename.</p>
+
+<p>The sixth discrepancy, illegal or insignificant characters, is especially
+significant in non-unix file systems. Trailing periods are accepted
+but never stored, therefore trailing periods must be ignored for any
+form of comparison. And all OS's have certain expectations of what
+characters are illegal (or undesireable due to confusion.)</p>
+
+<p>A final warning, canonical functions don't transform or resolve case
+or character ambiguity issues until they are resolved into an absolute
+path. The relative canonical path, while useful, while useful for URL
+or similar identifiers, cannot be used for testing or comparison of file
+system objects.</p>
+
+<hr>
+
+<h2>Canonical API</h2>
+
+Functions to manipulate the apr_canon_file_t (an opaque type) include:
+
+<ul>
+<li>Create canon_file_t (from char* path and canon_file_t parent path)
+<li>Merged canon_file_t (from path and parent, both canon_file_t)
+<li>Get char* path of all or some segments
+<li>Get path flags of IsRelative, IsVirtualRoot, and IsAbsolute
+<li>Compare two canon_file_t structures for file equality
+</ul>
+
+<p>The path is corrected to the file system case only if is in absolute
+form. The apr_canon_file_t should be preserved as long as possible and
+used as the parent to create child entries to reduce the number of expensive
+stat and case canonicalization calls to the OS.</p>
+
+<p>The comparison operation provides that the APR can postpone correction
+of case by simply relying upon the device and inode for equivilance. The
+stat implementation provides that two files are the same, while their
+strings are not equivilant, and eliminates the need for the operating
+system to return the proper form of the name.</p>
+
+<p>In any case, returning the char* path, with a flag to request the proper
+case, forces the OS calls to resolve the true names of each segment. Where
+there is a penality for this operation and the stat device and inode test
+is faster, case correction is postponed until the char* result is requested.
+On platforms that identify the inode, device, or proper name interchangably
+with no penalities, this may occur when the name is initially processed.</p>
+
+<hr>
+
+<h2>Unix Example</h2>
+
+<p>First the simplest case:</p>
+
+<pre>
+Parse Canonical Name
+accepts parent path as canonical_t
+ this path as string
+
+Split this path Segments on '/'
+
+For each of this path Segments
+ If first Segment
+ If this Segment is Empty ([nothing]/)
+ Append this Root Segment (don't merge)
+ Continue to next Segment
+ Else is relative
+ Append parent Segments (to merge)
+ Continue with this Segment
+ If Segment is '.' or empty (2 slashes)
+ Discard this Segment
+ Continue with next Segment
+ If Segment is '..'
+ If no previous Segment or previous Segment is '..'
+ Append this Segment
+ Continue with next Segment
+ If previous Segment and previous is not Root Segment
+ Discard previous Segment
+ Discard this Segment
+ Continue with next Segment
+ Append this Relative Segment
+ Continue with next Segment
+</pre>
+