diff options
author | Gabriel Mazetto <brodock@gmail.com> | 2012-05-26 20:15:06 -0300 |
---|---|---|
committer | Gabriel Mazetto <brodock@gmail.com> | 2012-05-26 20:15:06 -0300 |
commit | 50c2c16a4d8ca52c4abcbef638f5105a9b0d1ee0 (patch) | |
tree | eb41fd8825f5ba1e13ba76aac1a10e0ba3ec9945 /lib | |
parent | 48a36851e60249565e0869f88a05b36252c7e893 (diff) | |
download | gitlab-ce-50c2c16a4d8ca52c4abcbef638f5105a9b0d1ee0.tar.gz |
Better algorithm to deal with encodings. Moved fallback rescue message from view to encode library.
This helps fix cases where UTF-8 is wrongly identified as ISO-8859-1. We will only try to convert strings if we are 100% sure about the charset, otherwise, we will fallback to UTF-8.
Diffstat (limited to 'lib')
-rw-r--r-- | lib/gitlabhq/encode.rb | 9 |
1 files changed, 6 insertions, 3 deletions
diff --git a/lib/gitlabhq/encode.rb b/lib/gitlabhq/encode.rb index e0e52f0a2a7..780d839f420 100644 --- a/lib/gitlabhq/encode.rb +++ b/lib/gitlabhq/encode.rb @@ -8,16 +8,19 @@ module Gitlabhq def utf8 message return nil unless message - encoding = detect_encoding(message) - if encoding + detect = CharlockHolmes::EncodingDetector.detect(message) rescue {} + + # It's better to default to UTF-8 as sometimes it's wrongly detected as another charset + if detect[:encoding] && detect[:confidence] == 100 CharlockHolmes::Converter.convert(message, encoding, 'UTF-8') else message end.force_encoding("utf-8") + # Prevent app from crash cause of # encoding errors rescue - "" + "--broken encoding: #{encoding}" end def detect_encoding message |