summaryrefslogtreecommitdiff
path: root/h2pl
diff options
context:
space:
mode:
authorLarry Wall <lwall@jpl-devvax.jpl.nasa.gov>1990-08-08 17:02:14 +0000
committerLarry Wall <lwall@jpl-devvax.jpl.nasa.gov>1990-08-08 17:02:14 +0000
commit79220ce3ebd9c9ac4a99caf508dadef88c26a4e6 (patch)
treee2b23ba8ca5fda09a2471ab538c3d90d4e791c93 /h2pl
parent0f85fab05fafa513bd55a9e1ab280aac5567e27a (diff)
downloadperl-79220ce3ebd9c9ac4a99caf508dadef88c26a4e6.tar.gz
perl 3.0 patch #19 (combined patch)
You now have the capability of linking C subroutines into a special version of perl. See the files in usub/ for an example. There is now an operator to include library modules with duplicate suppression and error checking, called "require". (makelib has been renamed to h2ph, and Tom Christiansen's h2pl stuff has been included too. Perl .h files are now called .ph files to avoid confusion.) It's now possible to truncate files if your machines supports any of ftruncate(fd, size), chsize(fd, size) or fcntl(fd, F_FREESP, size). Added -c switch to do compilation only, that is, to suppress execution. Useful in combination with -D1024. There's now a -x switch to extract a script from the input stream so you can pipe articles containing Perl scripts directly into perl. Previously, the only places you could use bare words in Perl were as filehandles or labels. You can now put bare words (identifiers) anywhere. If they have no interpretation as filehandles or labels, they will be treated as if they had single quotes around them. This works together nicely with the fact that you can use a symbol name indirectly as a filehandle or to assign to *name. It basically means you can write subroutines and pass filehandles without quoting or *-ing them. (It also means the grammar is even more ambiguous now--59 reduce/reduce conflicts!!! But it seems to do the Right Thing.) Added __LINE__ and __FILE__ tokens to let you interpolate the current line number or filename, such as in a call to an error routine, or to help you translate eval linenumbers to real linenumbers. Added __END__ token to let you mark the end of the program in the input stream. (^D and ^Z are allowed synonyms.) Program text and data can now both come from STDIN. `command` in array context now returns array of lines. Previously it would return a single element array holding all the lines. An empty %array now returns 0 in scalar context so that you can use it profitably in a conditional: &blurfl if %seen; The include search path (@INC) now includes . explicity at the end, so you can change it if you wish. Library routines now have precedence by default. Several pattern matching optimizations: I sped up /x+y/ patterns greatly by not retrying on every x, and disabled backoff on patterns anchored to the end like /\s+$/. This made /\s+$/ run 100 times faster on a string containing 70 spaces followed by an X. Actual improvements will generally be less than that. I also sped up {m,n} on simple items by making it a variant of *. And /.*whatever/ is now optimizaed to /^.*whatever/ to avoid retrying at every position in the event of failure. I fixed character classes to allow backslashing hyphen, by popular request. In the past, $ in a pattern would sometimes match in the middle of the string and sometimes not, if $* == 0. Now it will never match except at the end of the string, or just before a terminating newline. When $* == 1 behavior is as before. In the README file, I've expanded on just how I think the GNU General Public License applies to Perl and to things you might want to do with Perl. The interpreter used to set the global variable "line" to be the current line number. Instead, it now sets a global pointer to the current Perl statement, which is no more overhead, but now we will have access to the file name and package name associated with that statement, so that the debugger soon be upgraded to allow debugging of evals and packages. In the past, a conditional construct in an array context passed the array context on to the conditional expression, causing general consternation and confusion. Conditionals now always supply a scalar context to the expression, and if that expression turns out to be the one whose value is returned, the value is coerced to an array value of one element. The switch optimizer was confused by negative fractional values, and truncating them the wrong direction. Configure now checks for chsize, select and truncate functions, and now asks if you want to put scripts into some separate directory from your binaries. More and more people are establishing a common directory across architectures for scripts, so this is getting important. It used to be that a numeric literal ended up being stored both as a string and as a double. This could make for lots of wasted storage if you said things like "$seen{$key} = 1;". So now numeric literals are now stored only in floating point format, which saves space, and generates at most one extra conversion per literal over the life of the script. The % operator had an off-by-one error if the left argument was negative. The pack and unpack functions have been upgraded. You can now convert native float and double fields using f and d. You can specify negative relative positions with X<n>, and absolute positions in the record with @<n>. You can have a length of * on the final field to indicate that it is to gobble all the rest of the available fields. In unpack, if you precede a field spec with %<n>, it does an n-bit checksum on it instead of the value itself. (Thus "%16C*" will checksum just like the Sys V sum program.) One totally wacked out evening I hacked a u format in to pack and unpack uudecode-style strings. A couple bugs were fixed in unpack--it couldn't unpack an A or a format field in a scalar context, which is just supposed to return the first field. The c and C formats were also calling bcopy to copy each character. Yuck. Machines without the setreuid() system call couldn't manipulate $< and $> easily. Now, if you've got setuid(), you can say $< = $> or $> = $< or even ($<, $>) = ($uid, $uid), as long as it's something that can be done with setuid(). Similarly for setgid(). I've included various MSDOS and OS/2 patches that people have sent. There's still more in the hopper... An open on a pipe for output such as 'open(STDOUT,"|command")' left STDOUT attached to the wrong file descriptor. This didn't matter within Perl, but it made subprocesses expecting stdout to be on fd 1 rather irate. The print command could fail to detect errors such as running out room on the disk. Now it checks a little better. Saying "print @foo" might only print out some of the elements if there undefined elements in the middle of the array, due to a reversed bit of logic in the print routine. On machines with vfork the child process would allocate memory in the parent without the parent knowing about it, or having any way to free the memory so allocated. The parent now calls a cleanup routine that knows whether that's what happened. If the getsockname or getpeername functions returned a normal Unix error, perl -w would report that you tried I/O on an unopened socket, even though it was open. MACH doesn't have seekdir or telldir. Who ever uses them anyway? Under certain circumstances, an optimized pattern match could pass a hint into the standard pattern matching routine which the standard routine would then ignore. The next pattern match after that would then get a "panic: hint in do_match" because the hint didn't point into the current string of interest. The $' variable returned a short string if it contained an embedded null. Two common split cases are now special-cased to avoid the regular expression code. One is /\s+/ (and its cousin ' ', which also trims leading whitespace). The other is /^/, which is very useful for splitting a "here-is" quote into lines: @lines = split(/^/, <<END); Element 0 Element 1 Element 2 END You couldn't split on a single case-insensitive letter because the single character split optimization ignore the case folding flag. Sort now handles undefined strings right, and sorts lists a little more efficiently because it weeds them out before sorting so it doesn't have to check for them on every comparison. The each() and keys() functions were returning garbage on null keys in DBM files because the DBM iterator merely returns a pointer into the buffer to a string that's not necessarily null terminated. Internally, Perl keeps a null at the end of every string (though allowing embedded nulls) and some routines make use of this to avoid checking for the end of buffer on every comparison. So this just needed to be treated as a special case. The &, | and ^ operators will do bitwise operations on two strings, but for some reason I hadn't implemented ~ to do a complement. Using an associative array name with a % in dbmopen(%name...) didn't work right, not because it didn't parse, but because the dbm opening routine internally did the wrong thing with it. You can now say dbmopen(name, 'filename', undef) to prevent it from opening the dbm file if it doesn't exist. The die operator simply exited if you didn't give an argument, because that made sense before eval existed. But now it will be equivalent to "die 'Died';". Using the return function outside a subroutine returned a cryptic message about not being able to pop a magical label off the stack. It's now more informative. On systems without the rename() system call, it's emulated with unlink()/link()/unlink(), which could clobber a file if it happened to unlink it before it linked it. Perl now checks to make sure the source and destination filenames aren't in fact the same directory entry. The -s file test now returns size of file. Why not? If you tried to write a general subroutine to open files, passing in the filehandle as *filehandle, it didn't work because nobody took responsibility to allocate the filehandle structure internally. Now, passing *name to subroutine forces filehandle and array creation on that symbol if they're already not created. Reading input via <HANDLE> is now a little more efficient--it does one less string copy. The dumpvar.pl routine now fixes weird chars to be printable, and allows you to specify a list of varables to display. The debugger takes advantage of this. The debugger also now allows \ continuation lines, and has an = command to let you make aliases easily. Line numbers should now be correct even after lines containing only a semicolon. The action code for parsing split; with no arguments didn't pass correct a corrent value of bufend to the scanpat it was using to establish the /\s+/ pattern. The $] variable returned the rcsid string and patchlevel. It still returns that in a string context, but in a numeric context it returns the version number (as in 4.0) + patchlevel / 1000. So these patches are being applied to 3.018. The variables $0, %ENV, @ARGV were retaining incorrect information from the previous incarnation in dumped/undumped scripts. The %ENV array is suppose to be global even inside packages, but and off-by-one error kept it from being so. The $| variable couldn't be set on a filehandle before the file was opened. Now you can. If errno == 0, the $! variable returned "Error 0" in a string context, which is, unfortunately, a true string. It now returns "" in string context if errno == 0, so you can use it reasonable in a conditional without comparing it to 0: &cleanup if $!; On some machines, conversion of a number to a string caused a malloc string to be overrun by 1 character. More memory is now allocated for such a string. The tainting mechanism didn't work right on scripts that were setgid but not setuid. If you had reference to an array such as @name in a program, but didn't invoke any of the usual array operations, the array never got initialized. The FPS compiler doesn't do default in a switch very well if the value can be interpreted as a signed character. There's now a #ifdef BADSWITCH for such machines. Certain combinations of backslashed backslashes weren't correctly parsed inside double-quoted strings. "Here" strings caused warnings about uninitialized variables because the string used internally to accumulate the lines wasn't initialized according to the standards of the -w switch. The a2p translator couldn't parse {foo = (bar == 123)} due to a hangover from the old awk syntax. It also needed to put a chop into a program if the program referenced NF so that the field count would come out right when the split was done. There was a missing semicolon when local($_) was emitted. I also didn't realize that an explicity awk split on ' ' trims leading whitespace just like the implicit split at the beginning of the loop. The awk for..in loop has to be translated in one of two ways in a2p, depending on whether the array was produced by a split or by subscripting. If the array was a normal array, a2p put out code that iterated over the array values rather than the numeric indexes, which was wrong. The s2p didn't translate \n correctly, stripping the backslash.
Diffstat (limited to 'h2pl')
-rw-r--r--h2pl/README71
1 files changed, 71 insertions, 0 deletions
diff --git a/h2pl/README b/h2pl/README
new file mode 100644
index 0000000000..5fe8ae7aa3
--- /dev/null
+++ b/h2pl/README
@@ -0,0 +1,71 @@
+[This file of Tom Christiansen's has been edited to change makelib to h2ph
+and .h to .ph where appropriate--law.]
+
+This directory contains files to help you convert the *.ph files generated my
+h2ph out of the perl source directory into *.pl files with all the
+indirection of the subroutine calls removed. The .ph version will be more
+safely portable, because if something isn't defined on the new system, like
+&TIOCGETP, then you'll get a fatal run-time error on the system lacking that
+function. Using the .pl version means that the subsequent scripts will give
+you a 0 $TIOCGETP and God only knows what may then happen. Still, I like the
+.pl stuff because they're faster to load.
+
+FIrst, you need to run h2ph on things like sys/ioctl.h to get stuff
+into the perl library directory, often /usr/local/lib/perl. For example,
+ # h2ph sys/ioctl.h
+takes /usr/include/sys/ioctl.h as input and writes (without i/o redirection)
+the file /usr/local/lib/perl/sys/ioctl.ph, which looks like this
+
+ eval 'sub TIOCM_RTS {0004;}';
+ eval 'sub TIOCM_ST {0010;}';
+ eval 'sub TIOCM_SR {0020;}';
+ eval 'sub TIOCM_CTS {0040;}';
+ eval 'sub TIOCM_CAR {0100;}';
+
+and much worse, rather than what Larry's ioctl.pl from the perl source dir has,
+which is:
+
+ $TIOCM_RTS = 0004;
+ $TIOCM_ST = 0010;
+ $TIOCM_SR = 0020;
+ $TIOCM_CTS = 0040;
+ $TIOCM_CAR = 0100;
+
+[Workaround for fixed bug in makedir/h2ph deleted--law.]
+
+The more complicated ioctl subs look like this:
+
+ eval 'sub TIOCGSIZE {&TIOCGWINSZ;}';
+ eval 'sub TIOCGWINSZ {&_IOR("t", 104, \'struct winsize\');}';
+ eval 'sub TIOCSETD {&_IOW("t", 1, \'int\');}';
+ eval 'sub TIOCGETP {&_IOR("t", 8,\'struct sgttyb\');}';
+
+The _IO[RW] routines use a %sizeof array, which (presumably)
+is keyed on the type name with the value being the size in bytes.
+
+To build %sizeof, try running this in this directory:
+
+ % ./getioctlsizes
+
+Which will tell you which things the %sizeof array needs
+to hold. You can try to build a sizeof.ph file with:
+
+ % ./getioctlsizes | ./mksizes > sizeof.ph
+
+Note that mksizes hardcodes the #include files for all the types, so it will
+probably require customization. Once you have sizeof.ph, install it in the
+perl library directory. Run my tcbreak script to see whether you can do
+ioctls in perl now. You'll get some kind of fatal run-time error if you
+can't. That script should be included in this directory.
+
+If this works well, now you can try to convert the *.ph files into
+*.pl files. Try this:
+
+ foreach file ( sysexits.ph sys/{errno.ph,ioctl.ph} )
+ ./mkvars $file > t/$file:r.pl
+ end
+
+The last one will be the hardest. If it works, should be able to
+run tcbreak2 and have it work the same as tcbreak.
+
+Good luck.