diff options
author | Iristyle <Iristyle@github> | 2016-07-15 11:06:01 -0700 |
---|---|---|
committer | Miklos Fazekas <mfazekas@szemafor.com> | 2016-07-23 16:13:33 +0200 |
commit | 5e474340774ce7b9d091071b0e75a5a478544b9a (patch) | |
tree | 494d7f05446772dfaca3336467bfc4b5c17e19ba /CHANGES.txt | |
parent | 070264cd9f92ef0ff932c21acf8e2cf4f44b145a (diff) | |
download | net-ssh-5e474340774ce7b9d091071b0e75a5a478544b9a.tar.gz |
Prevent encoding issues building UTF8 buffers
- Prior to this change, attempting to send UTF8 commands through
SSH, or attempting to copy files with UTF8 filenames could fail.
This was particularly easy to trigger by attempting to execute
commands that were 128 bytes or longer.
- monkey patch net-ssh gem to allow UTF-8 strings >= 128 bytes
The buffer @content is often built as a UTF-8 string, until the
point at which it appends data that cannot be encoded as a UTF-8
sequence.
One case occurs when the call to write_string is made to append a
string that exceeds 127 bytes in length. The SSH2 format says
that strings must be length prefixed, and when the value [128]
has pack("N*") called against it, the resultant 4 byte network
order representation does not have a valid UTF-8 equivalent,
resulting in an ASCII-8BIT / BINARY string.
[127].pack('N*').encode('utf-8')
=> "\u0000\u0000\u0000\u007F"
[128].pack('N*').encode('utf-8')
Encoding::UndefinedConversionError: "\x80" from ASCII-8BIT to UTF-8
Ruby has a subtle behavior where appending a BINARY string to
an existing UTF-8 string is allowed and the resultant string
changes encoding to BINARY. However, once this has happened,
the string can no longer have UTF-8 encoded strings appended as
Ruby will raise an Encoding:CompatibilityError
Appending BINARY to UTF-8 always creates BINARY:
"foo".encode('utf-8') << [128].pack('N*')
=> "foo\x00\x00\x00\x80"
Appending UTF-8 representable strings to existing strings:
Ruby 2.1.7 keeps the string as its default UTF-8
"foo" << [127].pack('N*')
=> "foo\u0000\u0000\u0000\u007F"
Ruby 1.9.3 keeps UTF-8 strings as UTF-8
"foo".encode('utf-8') << [127].pack('N*')
=> "foo\u0000\u0000\u0000\u007F"
Ruby 1.9.3 defaults to US-ASCII which changes it to BINARY
pry(main)> "foo" << [127].pack('N*')
=> "foo\x00\x00\x00\x7F"
The simple solution is to call force_encoding on UTF-8 strings
prior to appending them to @content, given it's always OK to
append ASCII-8BIT / BINARY strings to existing strings, but
appending UTF-8 to BINARY raises errors.
"\x80".force_encoding('ASCII-8BIT') << "\u16A0"
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
force_encoding in this case, will simply translate a valid UTF-8
string to its BINARY equivalent
"\u16A0".force_encoding('BINARY')
=> "\xE1\x9A\xA0"
Correct conversion per http://www.fileformat.info/info/unicode/char/16a0/index.htm
Diffstat (limited to 'CHANGES.txt')
-rw-r--r-- | CHANGES.txt | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/CHANGES.txt b/CHANGES.txt index e71cf9a..a88a745 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,3 +1,5 @@ +* Fix UTF-8 encoding issues [Ethan J. Brown] + === 4.0.0.alpha4 * Experimental event loop abstraction [Miklos Fazekas] |