Refactor and document Directory

The previous implementation of directory listings was inefficient for large directories. First it built a bunch of file entry strings, then it joined them into a large string containing all entries, then that was interpolated into the header and footer creating another large string. Then each_line was called on the large string, creating one string per line. This switches to a streaming approach for directory listings. We yield the header, then one string for each listing, then the footer. Testing with puma with a directory with 10,000 files with 100 byte file names, directory listing before took 3.5 seconds, with 900ms before first byte. After this commit, it takes 2.1 seconds, with 10ms before the first byte. Note that total time is not hugely different, because most of the time is probably taken in stat(2) calls. Time to first byte is greatly improved, though. Remove Directory#path. path is used as a local variable, but the instance variable is never set, so the value would always be nil. The previous approach used a glob for getting files. That resulted in broken behavior when the path being served includes glob metacharacters. Switch to a `Dir.foreach` approach to avoid that issue. Because the glob skips hidden files, continue to skip files starting with a period. Note that Directory will still serve hidden files and display directory listings for hidden directories, so this should not be considered a security feature. As a small behavior change, do not show a Parent directory link for the root directory, since it takes you to the same page. Switch list_path to not use exceptions for flow control. Remove all exception handling, and switch to using Directory#stat, which already handles the same exceptions. This has the advantage so that if a non-file, non-directory is requested, you get a valid 404 rack response instead of nil.
author: Jeremy Evans <code@jeremyevans.net> 2020-02-05 15:18:25 -0800
committer: Jeremy Evans <code@jeremyevans.net> 2020-02-06 10:36:16 -0800
commit: a0ccb0912c2dc3187fe477dbdb408e23bad96041 (patch)
tree: 04fdf71c89206f949a3b4ec88883e798e226f769
parent: 5cdea4f6ded1e77a8077945ccee0950aedf25279 (diff)
download: rack-a0ccb0912c2dc3187fe477dbdb408e23bad96041.tar.gz
2 files changed, 79 insertions, 55 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7ce12452..749a7608 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -24,7 +24,9 @@ All notable changes to this project will be documented in this file. For info on
 
 ### Changed
 
-- `QueryParser#parse_nested_query` uses original backtrace when reraising exception with new class ([@jeremyevans](https://github.com/jeremyevans))
+- `Directory` uses a streaming approach, significantly improving time to first byte for large directories. ([@jeremyevans](https://github.com/jeremyevans))
+- `Directory` no longer include a Parent directory link in the root directory index. ([@jeremyevans](https://github.com/jeremyevans))
+- `QueryParser#parse_nested_query` uses original backtrace when reraising exception with new class. ([@jeremyevans](https://github.com/jeremyevans))
 - `ConditionalGet` follows RFC 7232 precedence if both If-None-Match and If-Modified-Since headers are provided. ([@jeremyevans](https://github.com/jeremyevans))
 - `.ru` files supports the `frozen-string-literal` magic comment. ([@eregon](https://github.com/eregon))
 - Rely on autoload to load constants instead of requiring internal files, make sure to require 'rack' and not just 'rack/...'. ([@jeremyevans](https://github.com/jeremyevans))
@@ -40,6 +42,7 @@ All notable changes to this project will be documented in this file. For info on
 
 ### Removed
 
+- `Directory#path` as it was not used and always returned nil. ([@jeremyevans](https://github.com/jeremyevans))
 - `BodyProxy#each` as it was only needed to work around a bug in Ruby <1.9.3. ([@jeremyevans](https://github.com/jeremyevans))
 - `Session::Abstract::SessionHash#transform_keys`, no longer needed. (pavel)
 - `URLMap::INFINITY` and `URLMap::NEGATIVE_INFINITY`, in favor of `Float::INFINITY`. ([@ch1c0t](https://github.com/ch1c0t))
@@ -49,6 +52,7 @@ All notable changes to this project will be documented in this file. For info on
 
 ### Fixed
 
+- `Directory` correctly handles root paths containing glob metacharacters. ([@jeremyevans](https://github.com/jeremyevans))
 - `Cascade` uses a new response object for each call if initialized with no apps. ([@jeremyevans](https://github.com/jeremyevans))
 - `BodyProxy` correctly delegates keyword arguments to the body object on Ruby 2.7+. ([@jeremyevans](https://github.com/jeremyevans))
 - `BodyProxy#method` correctly handles methods delegated to the body object. ([@jeremyevans](https://github.com/jeremyevans))
diff --git a/lib/rack/directory.rb b/lib/rack/directory.rb
index 34c76676..be72be01 100644
--- a/lib/rack/directory.rb
+++ b/lib/rack/directory.rb
@@ -11,8 +11,8 @@ module Rack
   # If +app+ is not specified, a Rack::Files of the same +root+ will be used.
 
   class Directory
-    DIR_FILE = "<tr><td class='name'><a href='%s'>%s</a></td><td class='size'>%s</td><td class='type'>%s</td><td class='mtime'>%s</td></tr>"
-    DIR_PAGE = <<-PAGE
+    DIR_FILE = "<tr><td class='name'><a href='%s'>%s</a></td><td class='size'>%s</td><td class='type'>%s</td><td class='mtime'>%s</td></tr>\n"
+    DIR_PAGE_HEADER = <<-PAGE
 <html><head>
   <title>%s</title>
   <meta http-equiv="content-type" content="text/html; charset=utf-8" />
@@ -33,32 +33,51 @@ table { width:100%%; }
     <th class='type'>Type</th>
     <th class='mtime'>Last Modified</th>
   </tr>
-%s
+    PAGE
+    DIR_PAGE_FOOTER = <<-PAGE
 </table>
 <hr />
 </body></html>
     PAGE
 
+    # Body class for directory entries, showing an index page with links
+    # to each file.
     class DirectoryBody < Struct.new(:root, :path, :files)
+      # Yield strings for each part of the directory entry
       def each
-        show_path = Rack::Utils.escape_html(path.sub(/^#{root}/, ''))
-        listings = files.map{|f| DIR_FILE % DIR_FILE_escape(f) } * "\n"
-        page = DIR_PAGE % [ show_path, show_path, listings ]
-        page.each_line{|l| yield l }
+        show_path = Utils.escape_html(path.sub(/^#{root}/, ''))
+        yield(DIR_PAGE_HEADER % [ show_path, show_path ])
+
+        unless path.chomp('/') == root
+          yield(DIR_FILE % DIR_FILE_escape(files.call('..')))
+        end
+
+        Dir.foreach(path) do |basename|
+          next if basename.start_with?('.')
+          next unless f = files.call(basename)
+          yield(DIR_FILE % DIR_FILE_escape(f))
+        end
+
+        yield(DIR_PAGE_FOOTER)
       end
 
       private
+
+      # Escape each element in the array of html strings.
       def DIR_FILE_escape(htmls)
         htmls.map { |e| Utils.escape_html(e) }
       end
     end
 
-    attr_reader :root, :path
+    # The root of the directory hierarchy.  Only requests for files and
+    # directories inside of the root directory are supported.
+    attr_reader :root
 
+    # Set the root directory and application for serving files.
     def initialize(root, app = nil)
       @root = ::File.expand_path(root)
-      @app = app || Rack::Files.new(@root)
-      @head = Rack::Head.new(lambda { |env| get env })
+      @app = app || Files.new(@root)
+      @head = Head.new(method(:get))
     end
 
     def call(env)
@@ -66,101 +85,101 @@ table { width:100%%; }
       @head.call env
     end
 
+    # Internals of request handling.  Similar to call but does
+    # not remove body for HEAD requests.
     def get(env)
       script_name = env[SCRIPT_NAME]
       path_info = Utils.unescape_path(env[PATH_INFO])
 
-      if bad_request = check_bad_request(path_info)
-        bad_request
-      elsif forbidden = check_forbidden(path_info)
-        forbidden
+      if client_error_response = check_bad_request(path_info) || check_forbidden(path_info)
+        client_error_response
       else
         path = ::File.join(@root, path_info)
         list_path(env, path, path_info, script_name)
       end
     end
 
+    # Rack response to use for requests with invalid paths, or nil if path is valid.
     def check_bad_request(path_info)
       return if Utils.valid_path?(path_info)
 
       body = "Bad Request\n"
-      size = body.bytesize
-      return [400, { CONTENT_TYPE => "text/plain",
-        CONTENT_LENGTH => size.to_s,
+      [400, { CONTENT_TYPE => "text/plain",
+        CONTENT_LENGTH => body.bytesize.to_s,
         "X-Cascade" => "pass" }, [body]]
     end
 
+    # Rack response to use for requests with paths outside the root, or nil if path is inside the root.
     def check_forbidden(path_info)
       return unless path_info.include? ".."
       return if ::File.expand_path(::File.join(@root, path_info)).start_with?(@root)
 
       body = "Forbidden\n"
-      size = body.bytesize
-      return [403, { CONTENT_TYPE => "text/plain",
-        CONTENT_LENGTH => size.to_s,
+      [403, { CONTENT_TYPE => "text/plain",
+        CONTENT_LENGTH => body.bytesize.to_s,
         "X-Cascade" => "pass" }, [body]]
     end
 
+    # Rack response to use for directories under the root.
     def list_directory(path_info, path, script_name)
-      files = [['../', 'Parent Directory', '', '', '']]
-      glob = ::File.join(path, '*')
-
       url_head = (script_name.split('/') + path_info.split('/')).map do |part|
-        Rack::Utils.escape_path part
+        Utils.escape_path part
       end
 
-      Dir[glob].sort.each do |node|
-        stat = stat(node)
+      # Globbing not safe as path could contain glob metacharacters
+      body = DirectoryBody.new(@root, path, ->(basename) do
+        stat = stat(::File.join(path, basename))
         next unless stat
-        basename = ::File.basename(node)
-        ext = ::File.extname(node)
 
-        url = ::File.join(*url_head + [Rack::Utils.escape_path(basename)])
-        size = stat.size
-        type = stat.directory? ? 'directory' : Mime.mime_type(ext)
-        size = stat.directory? ? '-' : filesize_format(size)
+        url = ::File.join(*url_head + [Utils.escape_path(basename)])
         mtime = stat.mtime.httpdate
-        url << '/'  if stat.directory?
-        basename << '/'  if stat.directory?
-
-        files << [ url, basename, size, type, mtime ]
-      end
-
-      return [ 200, { CONTENT_TYPE => 'text/html; charset=utf-8' }, DirectoryBody.new(@root, path, files) ]
+        if stat.directory?
+          type = 'directory'
+          size = '-'
+          url << '/'
+          if basename == '..'
+            basename = 'Parent Directory'
+          else
+            basename << '/'
+          end
+        else
+          type = Mime.mime_type(::File.extname(basename))
+          size = filesize_format(stat.size)
+        end
+
+        [ url, basename, size, type, mtime ]
+      end)
+
+      [ 200, { CONTENT_TYPE => 'text/html; charset=utf-8' }, body ]
     end
 
-    def stat(node)
-      ::File.stat(node)
+    # File::Stat for the given path, but return nil for missing/bad entries.
+    def stat(path)
+      ::File.stat(path)
     rescue Errno::ENOENT, Errno::ELOOP
       return nil
     end
 
-    # TODO: add correct response if not readable, not sure if 404 is the best
-    #       option
+    # Rack response to use for files and directories under the root.
+    # Unreadable and non-file, non-directory entries will get a 404 response.
     def list_path(env, path, path_info, script_name)
-      stat = ::File.stat(path)
-
-      if stat.readable?
+      if (stat = stat(path)) && stat.readable?
         return @app.call(env) if stat.file?
         return list_directory(path_info, path, script_name) if stat.directory?
-      else
-        raise Errno::ENOENT, 'No such file or directory'
       end
 
-    rescue Errno::ENOENT, Errno::ELOOP
-      return entity_not_found(path_info)
+      entity_not_found(path_info)
     end
 
+    # Rack response to use for unreadable and non-file, non-directory entries.
     def entity_not_found(path_info)
       body = "Entity not found: #{path_info}\n"
-      size = body.bytesize
-      return [404, { CONTENT_TYPE => "text/plain",
-        CONTENT_LENGTH => size.to_s,
+      [404, { CONTENT_TYPE => "text/plain",
+        CONTENT_LENGTH => body.bytesize.to_s,
         "X-Cascade" => "pass" }, [body]]
     end
 
     # Stolen from Ramaze
-
     FILESIZE_FORMAT = [
       ['%.1fT', 1 << 40],
       ['%.1fG', 1 << 30],
@@ -168,6 +187,7 @@ table { width:100%%; }
       ['%.1fK', 1 << 10],
     ]
 
+    # Provide human readable file sizes
     def filesize_format(int)
       FILESIZE_FORMAT.each do |format, size|
         return format % (int.to_f / size) if int >= size
author	Jeremy Evans <code@jeremyevans.net>	2020-02-05 15:18:25 -0800
committer	Jeremy Evans <code@jeremyevans.net>	2020-02-06 10:36:16 -0800
commit	a0ccb0912c2dc3187fe477dbdb408e23bad96041 (patch)
tree	04fdf71c89206f949a3b4ec88883e798e226f769
parent	5cdea4f6ded1e77a8077945ccee0950aedf25279 (diff)
download	rack-a0ccb0912c2dc3187fe477dbdb408e23bad96041.tar.gz