Following is a merge of two letters I sent to php4beta@lists.php.net, describing the changes in API between PHP 3.0 and PHP 4.0 (Zend). This file is by no means thorough documentation of the PHP API, and is intended for developers who are familiar with the PHP 3.0 API, and want to port their code to PHP 4.0, or take advantage of its new features. For highlights about the PHP 3.0 API, consult apidoc.txt. Zeev -------------------------------------------------------------------------- I'm going to try to list the important changes in API and programming techniques that are involved in developing modules for PHP4/Zend, as opposed to PHP3. Listing the whole PHP4 API is way beyond my scope here, it's mostly a 'diff' from the apidoc.txt, which you're all pretty familiar with. An important note that I neglected to mention yesterday - the php4 tree is based on the php 3.0.5 tree, plus all 3.0.6 patches hand-patched into it. Notably, it does NOT include any 3.0.7 patches. All of those have to be reapplied, with extreme care - modules should be safe to patch (mostly), but anything that touches the core or main.c will almost definitely require changes in order to work properly. [1] Symbol Tables One of the major changes in Zend involves changing the way symbols tables work. Zend enforces reference counting on all values and resources. This required changes in the semantics of the hash tables that implement symbol tables. Instead of storing pval in the hashes, we now store pval *. All of the API functions in Zend were changed in a way that this change is completely transparent. However, if you've used 'low level' hash functions to access or update elements in symbol tables, your code will require changes. Following are two simple examples, one demonstrates the difference between PHP3 and Zend when reading a symbol's value, and the other demonstrates the difference when writing a value. php3_read() { pval *foo; _php3_hash_find(ht, "foo", sizeof("foo"), &foo); /* foo->type is the type and foo->value is the value */ } php4_read() { pval **foo; _php3_hash_find(ht, "foo", sizeof("foo"), &foo); /* (*foo)->type is the type and (*foo)->value is the value */ } --- php3_write() { pval newval; newval.type = ...; newval.value = ...; _php3_hash_update(ht, "bar", sizeof("bar"), &newval, sizeof(pval), NULL); } php4_write() { pval *newval = ALLOC_ZVAL(); newval->refcount=1; newval->is_ref=0; newval->type = ...; newval->value = ...; _php3_hash_update(ht, "bar", sizeof("bar"), &newval, sizeof(pval *), NULL); } [2] Resources One of the 'cute' things about the reference counting support is that it completely eliminates the problem of resource leaking. A simple loop that included '$result = mysql_query(...)' in PHP leaked unless the user remembered to run mysql_free($result) at the end of the loop body, and nobody really did. In order to take advantage of the automatic resource deallocation upon destruction, there's virtually one small change you need to conduct. Change the result type of a resource that you want to destroy itself as soon as its no longer referenced (just about any resource I can think of) as IS_RESOURCE, instead of as IS_LONG. The rest is magic. A special treatment is required for SQL modules that follow MySQL's approach for having the link handle as an optional argument. Modules that follow the MySQL module model, store the last opened link in a global variable, that they use in case the user neglects to explicitly specify a link handle. Due to the way referenec counting works, this global reference is just like any other reference, and must increase that SQL link resource's reference count (otherwise, it will be closed prematurely). Simply, when you set the default link to a certain link, increase that link's reference count by calling zend_list_addref(). As always, the MySQL module is the one used to demonstrate 'new technology'. You can look around it and look for IS_RESOURCE, as well as zend_list_addref(), to see a clear example of how the new API should be used. [3] Thread safety issues I'm not going to say that Zend was designed with thread safety in mind, but from some point, we've decided upon several guidelines that would make the move to thread safety much, much easier. Generally, we've followed the PHP 3.1 approach of moving global variables to a structure, and encapsulating all global variable references within macros. There are three main differences: 1. We grouped related globals in a single structure, instead of grouping all globals in one structure. 2. We've used much, much shorter macro names to increase the readability of the source code. 3. Regardless of whether we're compiling in thread safe mode or not, all global variables are *always* stored in a structure. For example, you would never have a global variable 'foo', instead, it'll be a property of a global structure, for example, compiler_globals.foo. That makes development much, much easier, since your code will simply not compile unless you remember to put the necessary macro around foo. To write code that'll be thread safe in the future (when we release our thread safe memory manager and work on integrating it), you can take a look at zend_globals.h. Essentially, two sets of macros are defined, one for thread safe mode, and one for thread unsafe mode. All global references are encapsulated within ???G(varname), where ??? is the appropriate prefix for your structure (for example, so far we have CG(), EG() and AG(), which stand for the compiler, executor and memory allocator, respectively). When compiling with thread safety enabled, each function that makes use of a ???G() macro, must obtain the pointer to its copy of the structure. It can do so in one of two forms: 1. It can receive it as an argument. 2. It can fetch it. Obviously, the first method is preferable since it's much quicker. However, it's not always possible to send the structure all the way to a particular function, or it may simply bloat the code too much in some cases. Functions that receive the globals as an argument, should look like this: rettype functioname(???LS_D) <-- a function with no arguments rettype functioname(type arg1, ..., type argn ???LS_DC) <-- a funciton with arguments Calls to such functions should look like this: functionname(???LS_C) <-- a function with no arguments functionname(arg1, ..., argn ???LS_CC) <-- a function with arguments LS stands for 'Local Storage', _C stands for Call and _CC stands for Call Comma, _D stands for Declaration and _DC stands for Declaration Comma. Note that there's NO comma between the last argument and ???LS_DC or ???LS_CC. In general, every module that makes use of globals should use this approach if it plans to be thread safe. [4] Generalized INI support The code comes to solve several issues: a. The ugly long block of code in main.c that reads values from the cfg_hash into php3_ini. b. Get rid of php3_ini. The performance penalty of copying it around all the time in the Apache module probably wasn't too high, but psychologically, it annoyed me :) c. Get rid of the ugly code in mod_php4.c, that also reads values from Apache directives and puts them into the php3_ini structure. d. Generalize all the code so that you only have to add an entry in one single place and get it automatically supported in php3.ini, Apache, Win32 registry, runtime function ini_get() and ini_alter() and any future method we might have. e. Allow users to easily override *ANY* php3.ini value, except for ones they're not supposed to, of course. I'm happy to say that I think I pretty much reached all goals. php_ini.c implements a mechanism that lets you add your INI entry in a single place, with a default value in case there's no php3.ini value. What you get by using this mechanism: 1. Automatic initialization from php3.ini if available, or from the default value if not. 2. Automatic support in ini_alter(). That means a user can change the value for this INI entry at runtime, without you having to add in a single line of code, and definitely no additional function (for example, in PHP3, we had to add in special dedicated functions, like set_magic_quotes_runtime() or the likes - no need for that anymore). 3. Automatic support in Apache .conf files. 4. No need for a global php3_ini-like variable that'll store all that info. You can directly access each INI entry by name, in runtime. 'Sure, that's not revolutionary, it's just slow' is probably what some of you think - which is true, but, you can also register a callback function that is called each time your INI entry is changed, if you wish to store it in a cached location for intensive use. 5. Ability to access the current active value for a given INI entry, and the 'master' value. Of course, (2) and (3) are only applicable in some cases. Some entries shouldn't be overriden by users in runtime or through Apache .conf files - you can, of course, mark them as such. So, enough hype, how does it work. Essentially: static PHP_INI_MH(OnChangeBar); /* declare a message handler for a change in "bar" */ PHP_INI_BEGIN() PHP_INI_ENTRY("foo", "1", PHP_INI_ALL, NULL, NULL) PHP_INI_ENTRY("bar", "bah", PHP_INI_SYSTEM, OnChangeBar, NULL) PHP_INI_END() static PHP_INI_MH(OnChangeBar) { a_global_var_for_bar = new_value; return SUCCESS; } int whatever_minit(INIT_FUNC_ARGS) { ... REGISTER_INI_ENTRIES(); ... } int whatever_mshutdown(SHUTDOWN_FUNC_ARGS) { ... UNREGISTER_INI_ENTRIES(); ... } and that's it. Here's what it does. As you can probably guess, this code registers two INI entries - "foo" and "bar". They're given defaults "1" and "bah" respectively - note that all defaults are always given as strings. That doesn't reduce your ability to use integer values, simply specify them as strings. "foo" is marked so that it can be changed by anyone at any time (PHP_INI_ALL), whereas "bar" is marked so it can be changed only at startup in the php3.ini only, presumably, by the system administrator (PHP_INI_SYSTEM). When "foo" changes, no function is called. Access to it is done using the macros INI_INT("foo"), INI_FLT("foo") or INI_STR("foo"), which return a long, double or char * respectively (strings that are returned aren't duplicated - if they're manipulated, you must duplicate them first). You can also access the original value (the 'master' value, in case one of them was overriden by a user) using another pair of macros: INI_ORIG_INT("foo"), INI_ORIG_FLT("foo") and INI_ORIG_STR("foo"). When "bar" changes, a special message handler is called, OnBarChange(). Always declare those message handlers using PHP_INI_MH(), as they might change in the future. Message handlers are called as soon as an ini entry initializes or changes, and allow you to cache a certain INI value in a quick C structure. In this example, whenever "bar" changes, the new value is stored in a_global_var_for_bar, which is a global char * pointer, quickly accessible from other functions. Things get a bit more complicated when you want to implement a thread-safe module, but it's doable as well. Message handlers may return SUCCESS to acknowledge the new value, or FAILURE to reject it. That enables you to reject invalid values for some INI entries if you want. Finally, you can have a pointer passed to your message handler - that's the fifth argument to PHP_INI_ENTRY(). It is passed as mh_arg to the message handler. Remember that for certain values, there's really no reason to mess with a callback function. A perfect example for this are the syntax highlight colors, which no longer have a dedicated global C slot that stores them, but instead, are fetched from the php_ini hash on demand. "As always", for a perfect working example of this mechanism, consult functions/mysql.c. This module uses the new INI entry mechanism, and was also converted to be thread safe in general, and in its php_ini support in particular. Converting your modules to look like this for thread safety isn't a bad idea (not necessarily now, but in the long run).