diff options
author | Jeremy Evans <code@jeremyevans.net> | 2020-02-05 15:18:25 -0800 |
---|---|---|
committer | Jeremy Evans <code@jeremyevans.net> | 2020-02-06 10:36:16 -0800 |
commit | a0ccb0912c2dc3187fe477dbdb408e23bad96041 (patch) | |
tree | 04fdf71c89206f949a3b4ec88883e798e226f769 | |
parent | 5cdea4f6ded1e77a8077945ccee0950aedf25279 (diff) | |
download | rack-a0ccb0912c2dc3187fe477dbdb408e23bad96041.tar.gz |
Refactor and document Directory
The previous implementation of directory listings was inefficient for large
directories. First it built a bunch of file entry strings, then it joined
them into a large string containing all entries, then that was interpolated
into the header and footer creating another large string. Then each_line was
called on the large string, creating one string per line.
This switches to a streaming approach for directory listings. We yield the
header, then one string for each listing, then the footer.
Testing with puma with a directory with 10,000 files with 100 byte file
names, directory listing before took 3.5 seconds, with 900ms before
first byte. After this commit, it takes 2.1 seconds, with 10ms before
the first byte. Note that total time is not hugely different, because
most of the time is probably taken in stat(2) calls. Time to first byte is
greatly improved, though.
Remove Directory#path. path is used as a local variable, but the instance
variable is never set, so the value would always be nil.
The previous approach used a glob for getting files. That resulted in
broken behavior when the path being served includes glob metacharacters.
Switch to a `Dir.foreach` approach to avoid that issue. Because the glob
skips hidden files, continue to skip files starting with a period. Note
that Directory will still serve hidden files and display directory listings
for hidden directories, so this should not be considered a security feature.
As a small behavior change, do not show a Parent directory link for the
root directory, since it takes you to the same page.
Switch list_path to not use exceptions for flow control. Remove all
exception handling, and switch to using Directory#stat, which already handles
the same exceptions. This has the advantage so that if a non-file,
non-directory is requested, you get a valid 404 rack response instead of
nil.
-rw-r--r-- | CHANGELOG.md | 6 | ||||
-rw-r--r-- | lib/rack/directory.rb | 128 |
2 files changed, 79 insertions, 55 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md index 7ce12452..749a7608 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,7 +24,9 @@ All notable changes to this project will be documented in this file. For info on ### Changed -- `QueryParser#parse_nested_query` uses original backtrace when reraising exception with new class ([@jeremyevans](https://github.com/jeremyevans)) +- `Directory` uses a streaming approach, significantly improving time to first byte for large directories. ([@jeremyevans](https://github.com/jeremyevans)) +- `Directory` no longer include a Parent directory link in the root directory index. ([@jeremyevans](https://github.com/jeremyevans)) +- `QueryParser#parse_nested_query` uses original backtrace when reraising exception with new class. ([@jeremyevans](https://github.com/jeremyevans)) - `ConditionalGet` follows RFC 7232 precedence if both If-None-Match and If-Modified-Since headers are provided. ([@jeremyevans](https://github.com/jeremyevans)) - `.ru` files supports the `frozen-string-literal` magic comment. ([@eregon](https://github.com/eregon)) - Rely on autoload to load constants instead of requiring internal files, make sure to require 'rack' and not just 'rack/...'. ([@jeremyevans](https://github.com/jeremyevans)) @@ -40,6 +42,7 @@ All notable changes to this project will be documented in this file. For info on ### Removed +- `Directory#path` as it was not used and always returned nil. ([@jeremyevans](https://github.com/jeremyevans)) - `BodyProxy#each` as it was only needed to work around a bug in Ruby <1.9.3. ([@jeremyevans](https://github.com/jeremyevans)) - `Session::Abstract::SessionHash#transform_keys`, no longer needed. (pavel) - `URLMap::INFINITY` and `URLMap::NEGATIVE_INFINITY`, in favor of `Float::INFINITY`. ([@ch1c0t](https://github.com/ch1c0t)) @@ -49,6 +52,7 @@ All notable changes to this project will be documented in this file. For info on ### Fixed +- `Directory` correctly handles root paths containing glob metacharacters. ([@jeremyevans](https://github.com/jeremyevans)) - `Cascade` uses a new response object for each call if initialized with no apps. ([@jeremyevans](https://github.com/jeremyevans)) - `BodyProxy` correctly delegates keyword arguments to the body object on Ruby 2.7+. ([@jeremyevans](https://github.com/jeremyevans)) - `BodyProxy#method` correctly handles methods delegated to the body object. ([@jeremyevans](https://github.com/jeremyevans)) diff --git a/lib/rack/directory.rb b/lib/rack/directory.rb index 34c76676..be72be01 100644 --- a/lib/rack/directory.rb +++ b/lib/rack/directory.rb @@ -11,8 +11,8 @@ module Rack # If +app+ is not specified, a Rack::Files of the same +root+ will be used. class Directory - DIR_FILE = "<tr><td class='name'><a href='%s'>%s</a></td><td class='size'>%s</td><td class='type'>%s</td><td class='mtime'>%s</td></tr>" - DIR_PAGE = <<-PAGE + DIR_FILE = "<tr><td class='name'><a href='%s'>%s</a></td><td class='size'>%s</td><td class='type'>%s</td><td class='mtime'>%s</td></tr>\n" + DIR_PAGE_HEADER = <<-PAGE <html><head> <title>%s</title> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> @@ -33,32 +33,51 @@ table { width:100%%; } <th class='type'>Type</th> <th class='mtime'>Last Modified</th> </tr> -%s + PAGE + DIR_PAGE_FOOTER = <<-PAGE </table> <hr /> </body></html> PAGE + # Body class for directory entries, showing an index page with links + # to each file. class DirectoryBody < Struct.new(:root, :path, :files) + # Yield strings for each part of the directory entry def each - show_path = Rack::Utils.escape_html(path.sub(/^#{root}/, '')) - listings = files.map{|f| DIR_FILE % DIR_FILE_escape(f) } * "\n" - page = DIR_PAGE % [ show_path, show_path, listings ] - page.each_line{|l| yield l } + show_path = Utils.escape_html(path.sub(/^#{root}/, '')) + yield(DIR_PAGE_HEADER % [ show_path, show_path ]) + + unless path.chomp('/') == root + yield(DIR_FILE % DIR_FILE_escape(files.call('..'))) + end + + Dir.foreach(path) do |basename| + next if basename.start_with?('.') + next unless f = files.call(basename) + yield(DIR_FILE % DIR_FILE_escape(f)) + end + + yield(DIR_PAGE_FOOTER) end private + + # Escape each element in the array of html strings. def DIR_FILE_escape(htmls) htmls.map { |e| Utils.escape_html(e) } end end - attr_reader :root, :path + # The root of the directory hierarchy. Only requests for files and + # directories inside of the root directory are supported. + attr_reader :root + # Set the root directory and application for serving files. def initialize(root, app = nil) @root = ::File.expand_path(root) - @app = app || Rack::Files.new(@root) - @head = Rack::Head.new(lambda { |env| get env }) + @app = app || Files.new(@root) + @head = Head.new(method(:get)) end def call(env) @@ -66,101 +85,101 @@ table { width:100%%; } @head.call env end + # Internals of request handling. Similar to call but does + # not remove body for HEAD requests. def get(env) script_name = env[SCRIPT_NAME] path_info = Utils.unescape_path(env[PATH_INFO]) - if bad_request = check_bad_request(path_info) - bad_request - elsif forbidden = check_forbidden(path_info) - forbidden + if client_error_response = check_bad_request(path_info) || check_forbidden(path_info) + client_error_response else path = ::File.join(@root, path_info) list_path(env, path, path_info, script_name) end end + # Rack response to use for requests with invalid paths, or nil if path is valid. def check_bad_request(path_info) return if Utils.valid_path?(path_info) body = "Bad Request\n" - size = body.bytesize - return [400, { CONTENT_TYPE => "text/plain", - CONTENT_LENGTH => size.to_s, + [400, { CONTENT_TYPE => "text/plain", + CONTENT_LENGTH => body.bytesize.to_s, "X-Cascade" => "pass" }, [body]] end + # Rack response to use for requests with paths outside the root, or nil if path is inside the root. def check_forbidden(path_info) return unless path_info.include? ".." return if ::File.expand_path(::File.join(@root, path_info)).start_with?(@root) body = "Forbidden\n" - size = body.bytesize - return [403, { CONTENT_TYPE => "text/plain", - CONTENT_LENGTH => size.to_s, + [403, { CONTENT_TYPE => "text/plain", + CONTENT_LENGTH => body.bytesize.to_s, "X-Cascade" => "pass" }, [body]] end + # Rack response to use for directories under the root. def list_directory(path_info, path, script_name) - files = [['../', 'Parent Directory', '', '', '']] - glob = ::File.join(path, '*') - url_head = (script_name.split('/') + path_info.split('/')).map do |part| - Rack::Utils.escape_path part + Utils.escape_path part end - Dir[glob].sort.each do |node| - stat = stat(node) + # Globbing not safe as path could contain glob metacharacters + body = DirectoryBody.new(@root, path, ->(basename) do + stat = stat(::File.join(path, basename)) next unless stat - basename = ::File.basename(node) - ext = ::File.extname(node) - url = ::File.join(*url_head + [Rack::Utils.escape_path(basename)]) - size = stat.size - type = stat.directory? ? 'directory' : Mime.mime_type(ext) - size = stat.directory? ? '-' : filesize_format(size) + url = ::File.join(*url_head + [Utils.escape_path(basename)]) mtime = stat.mtime.httpdate - url << '/' if stat.directory? - basename << '/' if stat.directory? - - files << [ url, basename, size, type, mtime ] - end - - return [ 200, { CONTENT_TYPE => 'text/html; charset=utf-8' }, DirectoryBody.new(@root, path, files) ] + if stat.directory? + type = 'directory' + size = '-' + url << '/' + if basename == '..' + basename = 'Parent Directory' + else + basename << '/' + end + else + type = Mime.mime_type(::File.extname(basename)) + size = filesize_format(stat.size) + end + + [ url, basename, size, type, mtime ] + end) + + [ 200, { CONTENT_TYPE => 'text/html; charset=utf-8' }, body ] end - def stat(node) - ::File.stat(node) + # File::Stat for the given path, but return nil for missing/bad entries. + def stat(path) + ::File.stat(path) rescue Errno::ENOENT, Errno::ELOOP return nil end - # TODO: add correct response if not readable, not sure if 404 is the best - # option + # Rack response to use for files and directories under the root. + # Unreadable and non-file, non-directory entries will get a 404 response. def list_path(env, path, path_info, script_name) - stat = ::File.stat(path) - - if stat.readable? + if (stat = stat(path)) && stat.readable? return @app.call(env) if stat.file? return list_directory(path_info, path, script_name) if stat.directory? - else - raise Errno::ENOENT, 'No such file or directory' end - rescue Errno::ENOENT, Errno::ELOOP - return entity_not_found(path_info) + entity_not_found(path_info) end + # Rack response to use for unreadable and non-file, non-directory entries. def entity_not_found(path_info) body = "Entity not found: #{path_info}\n" - size = body.bytesize - return [404, { CONTENT_TYPE => "text/plain", - CONTENT_LENGTH => size.to_s, + [404, { CONTENT_TYPE => "text/plain", + CONTENT_LENGTH => body.bytesize.to_s, "X-Cascade" => "pass" }, [body]] end # Stolen from Ramaze - FILESIZE_FORMAT = [ ['%.1fT', 1 << 40], ['%.1fG', 1 << 30], @@ -168,6 +187,7 @@ table { width:100%%; } ['%.1fK', 1 << 10], ] + # Provide human readable file sizes def filesize_format(int) FILESIZE_FORMAT.each do |format, size| return format % (int.to_f / size) if int >= size |