fix another boundary case and hopefully improve performance

The fix: the if we found ourselves at a charstart with only one character to read, readsize would be zero, handle that correctly. Performance: originally I read just the first byte of the next character, which meant as many extra read calls as there are characters left to read after the initial read. So O(Nleft) reads where Nleft is the number of characters left to read after the initial read. Now read as many bytes as there are characters left to read, which should mean the number of reads comes down to O(log(Nleft**2)) I think (but don't ask me to justify that.)
author: Tony Cook <tony@develop-help.com> 2012-03-17 12:54:17 +1100
committer: Tony Cook <tony@develop-help.com> 2012-12-09 09:32:45 +1100
commit: a1aea1fe12c11cc8f3650979df95e88a810f3238 (patch)
tree: f778adf8468e0aefabfec86fb2d3aca4deaeb143 /sv.c
parent: 90f6536b6e4fefdbe12a64f4201c9f73580aab88 (diff)
download: perl-a1aea1fe12c11cc8f3650979df95e88a810f3238.tar.gz
1 files changed, 17 insertions, 4 deletions
diff --git a/sv.c b/sv.c
index e2bed05753..34dc534503 100644
--- a/sv.c
+++ b/sv.c
@@ -7720,13 +7720,26 @@ S_sv_gets_read_record(pTHX_ SV *const sv, PerlIO *const fp, I32 append)
 		}
 
 		if (charcount < recsize) {
-		    /* read the rest of the current character, and maybe the
-		       beginning of the next, if we need it */
-		    STRLEN readsize = (charstart ? 0 : skip - (bend - bufp))
-			+ (charcount + 1 < recsize);
+		    STRLEN readsize;
 		    STRLEN bufp_offset = bufp - buffer;
 		    SSize_t morebytesread;
 
+		    /* originally I read enough to fill any incomplete
+		       character and the first byte of the next
+		       character if needed, but if there's many
+		       multi-byte encoded characters we're going to be
+		       making a read call for every character beyond
+		       the original read size.
+
+		       So instead, read the rest of the character if
+		       any, and enough bytes to match at least the
+		       start bytes for each character we're going to
+		       read.
+		    */
+		    if (charstart)
+			readsize = recsize - charcount;
+		    else 
+			readsize = skip - (bend - bufp) + recsize - charcount - 1;
 		    buffer = SvGROW(sv, append + bytesread + readsize + 1) + append;
 		    bend = buffer + bytesread;
 		    morebytesread = PerlIO_read(fp, bend, readsize);
author	Tony Cook <tony@develop-help.com>	2012-03-17 12:54:17 +1100
committer	Tony Cook <tony@develop-help.com>	2012-12-09 09:32:45 +1100
commit	a1aea1fe12c11cc8f3650979df95e88a810f3238 (patch)
tree	f778adf8468e0aefabfec86fb2d3aca4deaeb143 /sv.c
parent	90f6536b6e4fefdbe12a64f4201c9f73580aab88 (diff)
download	perl-a1aea1fe12c11cc8f3650979df95e88a810f3238.tar.gz