summaryrefslogtreecommitdiff
path: root/lib/gitlab
diff options
context:
space:
mode:
authorKerri Miller <kerrizor@kerrizor.com>2019-07-26 13:41:11 +0000
committerGitLab Release Tools Bot <robert+release-tools@gitlab.com>2019-07-26 13:41:11 +0000
commitacc694ead6f8a7428efab126aa6b8a29b132db43 (patch)
treeec4289d68ded3f7524f2be1a20da74799f784773 /lib/gitlab
parent7501d64912be0e7792c89908975b8dcd7e58c4fe (diff)
downloadgitlab-ce-acc694ead6f8a7428efab126aa6b8a29b132db43.tar.gz
Extract SanitizeNodeLink and apply to WikiLinkFilter
The SanitizationFilter was running before the WikiFilter. Since WikiFilter can modify links, we could see links that _should_ be stopped by SanatizationFilter being rendered on the page. I (kerrizor) had previously addressed the bug in: https://gitlab.com/gitlab-org/gitlab-ee/commit/7bc971915bbeadb950bb0e1f13510bf3038229a4 However, an additional exploit was discovered after that was merged. Working through the issue, we couldn't simply shuffle the order of filters, due to some implicit assumptions about the order of filters, so instead we've extracted the logic that sanitizes a Nokogiri-generated Node object, and applied it to the WikiLinkFilter as well. On moving filters around: Once we start moving around filters, we get cascading failures; fix one, another one crops up. Many of the existing filters in the WikiPipeline chain seem to assume that other filters have already done their work, and thus operate on a "transform anything that's left" basis; WikiFilter, for instance, assumes any link it finds in the markdown should be prepended with the wiki_base_path.. but if it does that, it also turns `href="@user"` into `href="/path/to/wiki/@user"`, which the UserReferenceFilter doesn't see as a user reference it needs to transform into a user profile link. This is true for all the reference filters in the WikiPipeline.
Diffstat (limited to 'lib/gitlab')
-rw-r--r--lib/gitlab/utils/sanitize_node_link.rb61
1 files changed, 61 insertions, 0 deletions
diff --git a/lib/gitlab/utils/sanitize_node_link.rb b/lib/gitlab/utils/sanitize_node_link.rb
new file mode 100644
index 00000000000..620d71a7814
--- /dev/null
+++ b/lib/gitlab/utils/sanitize_node_link.rb
@@ -0,0 +1,61 @@
+# frozen_string_literal: true
+
+require_dependency 'gitlab/utils'
+
+module Gitlab
+ module Utils
+ module SanitizeNodeLink
+ UNSAFE_PROTOCOLS = %w(data javascript vbscript).freeze
+ ATTRS_TO_SANITIZE = %w(href src data-src).freeze
+
+ def remove_unsafe_links(env, remove_invalid_links: true)
+ node = env[:node]
+
+ sanitize_node(node: node, remove_invalid_links: remove_invalid_links)
+
+ # HTML entities such as <video></video> have scannable attrs in
+ # children elements, which also need to be sanitized.
+ #
+ node.children.each do |child_node|
+ sanitize_node(node: child_node, remove_invalid_links: remove_invalid_links)
+ end
+ end
+
+ # Remove all invalid scheme characters before checking against the
+ # list of unsafe protocols.
+ #
+ # See https://tools.ietf.org/html/rfc3986#section-3.1
+ #
+ def safe_protocol?(scheme)
+ return false unless scheme
+
+ scheme = scheme
+ .strip
+ .downcase
+ .gsub(/[^A-Za-z\+\.\-]+/, '')
+
+ UNSAFE_PROTOCOLS.none?(scheme)
+ end
+
+ private
+
+ def sanitize_node(node:, remove_invalid_links: true)
+ ATTRS_TO_SANITIZE.each do |attr|
+ next unless node.has_attribute?(attr)
+
+ begin
+ node[attr] = node[attr].strip
+ uri = Addressable::URI.parse(node[attr])
+
+ next unless uri.scheme
+ next if safe_protocol?(uri.scheme)
+
+ node.remove_attribute(attr)
+ rescue Addressable::URI::InvalidURIError
+ node.remove_attribute(attr) if remove_invalid_links
+ end
+ end
+ end
+ end
+ end
+end