summaryrefslogtreecommitdiff
path: root/lib
diff options
context:
space:
mode:
authorGabriel Mazetto <brodock@gmail.com>2012-05-26 20:15:06 -0300
committerGabriel Mazetto <brodock@gmail.com>2012-05-26 20:15:06 -0300
commit50c2c16a4d8ca52c4abcbef638f5105a9b0d1ee0 (patch)
treeeb41fd8825f5ba1e13ba76aac1a10e0ba3ec9945 /lib
parent48a36851e60249565e0869f88a05b36252c7e893 (diff)
downloadgitlab-ce-50c2c16a4d8ca52c4abcbef638f5105a9b0d1ee0.tar.gz
Better algorithm to deal with encodings. Moved fallback rescue message from view to encode library.
This helps fix cases where UTF-8 is wrongly identified as ISO-8859-1. We will only try to convert strings if we are 100% sure about the charset, otherwise, we will fallback to UTF-8.
Diffstat (limited to 'lib')
-rw-r--r--lib/gitlabhq/encode.rb9
1 files changed, 6 insertions, 3 deletions
diff --git a/lib/gitlabhq/encode.rb b/lib/gitlabhq/encode.rb
index e0e52f0a2a7..780d839f420 100644
--- a/lib/gitlabhq/encode.rb
+++ b/lib/gitlabhq/encode.rb
@@ -8,16 +8,19 @@ module Gitlabhq
def utf8 message
return nil unless message
- encoding = detect_encoding(message)
- if encoding
+ detect = CharlockHolmes::EncodingDetector.detect(message) rescue {}
+
+ # It's better to default to UTF-8 as sometimes it's wrongly detected as another charset
+ if detect[:encoding] && detect[:confidence] == 100
CharlockHolmes::Converter.convert(message, encoding, 'UTF-8')
else
message
end.force_encoding("utf-8")
+
# Prevent app from crash cause of
# encoding errors
rescue
- ""
+ "--broken encoding: #{encoding}"
end
def detect_encoding message