Prevent encoding issues building UTF8 buffers

- Prior to this change, attempting to send UTF8 commands through SSH, or attempting to copy files with UTF8 filenames could fail. This was particularly easy to trigger by attempting to execute commands that were 128 bytes or longer. - monkey patch net-ssh gem to allow UTF-8 strings >= 128 bytes The buffer @content is often built as a UTF-8 string, until the point at which it appends data that cannot be encoded as a UTF-8 sequence. One case occurs when the call to write_string is made to append a string that exceeds 127 bytes in length. The SSH2 format says that strings must be length prefixed, and when the value [128] has pack("N*") called against it, the resultant 4 byte network order representation does not have a valid UTF-8 equivalent, resulting in an ASCII-8BIT / BINARY string. [127].pack('N*').encode('utf-8') => "\u0000\u0000\u0000\u007F" [128].pack('N*').encode('utf-8') Encoding::UndefinedConversionError: "\x80" from ASCII-8BIT to UTF-8 Ruby has a subtle behavior where appending a BINARY string to an existing UTF-8 string is allowed and the resultant string changes encoding to BINARY. However, once this has happened, the string can no longer have UTF-8 encoded strings appended as Ruby will raise an Encoding:CompatibilityError Appending BINARY to UTF-8 always creates BINARY: "foo".encode('utf-8') << [128].pack('N*') => "foo\x00\x00\x00\x80" Appending UTF-8 representable strings to existing strings: Ruby 2.1.7 keeps the string as its default UTF-8 "foo" << [127].pack('N*') => "foo\u0000\u0000\u0000\u007F" Ruby 1.9.3 keeps UTF-8 strings as UTF-8 "foo".encode('utf-8') << [127].pack('N*') => "foo\u0000\u0000\u0000\u007F" Ruby 1.9.3 defaults to US-ASCII which changes it to BINARY pry(main)> "foo" << [127].pack('N*') => "foo\x00\x00\x00\x7F" The simple solution is to call force_encoding on UTF-8 strings prior to appending them to @content, given it's always OK to append ASCII-8BIT / BINARY strings to existing strings, but appending UTF-8 to BINARY raises errors. "\x80".force_encoding('ASCII-8BIT') << "\u16A0" Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 force_encoding in this case, will simply translate a valid UTF-8 string to its BINARY equivalent "\u16A0".force_encoding('BINARY') => "\xE1\x9A\xA0" Correct conversion per http://www.fileformat.info/info/unicode/char/16a0/index.htm
author: Iristyle <Iristyle@github> 2016-07-15 11:06:01 -0700
committer: Miklos Fazekas <mfazekas@szemafor.com> 2016-07-23 16:13:33 +0200
commit: 5e474340774ce7b9d091071b0e75a5a478544b9a (patch)
tree: 494d7f05446772dfaca3336467bfc4b5c17e19ba /CHANGES.txt
parent: 070264cd9f92ef0ff932c21acf8e2cf4f44b145a (diff)
download: net-ssh-5e474340774ce7b9d091071b0e75a5a478544b9a.tar.gz
1 files changed, 2 insertions, 0 deletions
diff --git a/CHANGES.txt b/CHANGES.txt
index e71cf9a..a88a745 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,3 +1,5 @@
+* Fix UTF-8 encoding issues [Ethan J. Brown]
+
 === 4.0.0.alpha4
 
 * Experimental event loop abstraction [Miklos Fazekas]
author	Iristyle <Iristyle@github>	2016-07-15 11:06:01 -0700
committer	Miklos Fazekas <mfazekas@szemafor.com>	2016-07-23 16:13:33 +0200
commit	5e474340774ce7b9d091071b0e75a5a478544b9a (patch)
tree	494d7f05446772dfaca3336467bfc4b5c17e19ba /CHANGES.txt
parent	070264cd9f92ef0ff932c21acf8e2cf4f44b145a (diff)
download	net-ssh-5e474340774ce7b9d091071b0e75a5a478544b9a.tar.gz