diff options
Diffstat (limited to 'docs/user-guides/processes-and-threading.rst')
-rw-r--r-- | docs/user-guides/processes-and-threading.rst | 495 |
1 files changed, 495 insertions, 0 deletions
diff --git a/docs/user-guides/processes-and-threading.rst b/docs/user-guides/processes-and-threading.rst new file mode 100644 index 0000000..05813af --- /dev/null +++ b/docs/user-guides/processes-and-threading.rst @@ -0,0 +1,495 @@ +======================= +Processes And Threading +======================= + +Apache can operate in a number of different modes dependent on the platform +being used and the way in which it is configured. This ranges from multiple +processes being used, with only one request being handled at a time within +each process, to one or more processes being used, with concurrent requests +being handled in distinct threads executing within those processes. + +The combinations possible are further increased by mod_wsgi through its +ability to create groups of daemon processes to which WSGI applications can +be delegated. As with Apache itself, each process group can consist of one +or more processes and optionally make use of multithreading. Unlike Apache, +where some combinations are only possible based on how Apache was compiled, +the mod_wsgi daemon processes can operate in any mode based only on runtime +configuration settings. + +This article provides background information on how Apache and mod_wsgi +makes use of processes and threads to handle requests, and how Python +sub interpreters are used to isolate WSGI applications. The implications +of the various modes of operation on data sharing is also discussed. + +WSGI Process/Thread Flags +------------------------- + +Although Apache can make use of a combination of processes and/or threads +to handle requests, this is not unique to the Apache web server and the +WSGI specification acknowledges this fact. This acknowledgement is in the +form of specific key/value pairs which must be supplied as part of the WSGI +environment to a WSGI application. The purpose of these key/value pairs is +to indicate whether the underlying web server does or does not make use of +multiple processes and/or multiple threads to handle requests. + +These key/value pairs are defined as follows in the WSGI specification. + +*wsgi.multithread* + This value should evaluate true if the application object may be + simultaneously invoked by another thread in the same process, and + should evaluate false otherwise. + +*wsgi.multiprocess* + This value should evaluate true if an equivalent application object may + be simultaneously invoked by another process, and should evaluate false + otherwise. + +A WSGI application which is not written to take into consideration the +different combinations of process and threading models may not be portable +and potentially may not be robust when deployed to an alternate hosting +platform or configuration. + +Although you may not need an application or application component to work +under all possible combinations for these values initially, it is highly +recommended that any application component still be designed to work under +any of the different operating modes. If for some reason this cannot be +done due to the very nature of what functionality the component provides, +the component should validate if it is being run within a compatible +configuration and return a HTTP 500 internal server error response if it +isn't. + +An example of a component for which restrictions would apply is one +providing an interactive browser based debugger session in response to an +internal failure of a WSGI application. In this scenario, for the component +to work correctly, subsequent HTTP requests must be processed by the same +process. As such, the component can only be used with a web server that +uses a single process. In other words, the value of 'wsgi.multiprocess' +would have to evaluate to be false. + +Multi-Processing Modules +------------------------ + +The main factor which determines how Apache operates is which +multi-processing module (MPM) is built into Apache at compile time. +Although runtime configuration can customise the behaviour of the MPM, the +choice of MPM will dictate whether or not multithreading is available. + +On UNIX based systems, Apache defaults to being built with the 'prefork' +MPM. If Apache 1.3 is being used this is actually the only choice, but for +later versions of Apache, this can be overridden at build time by supplying +an appropriate value in conjunction with the '--with-mpm' option when +running the 'configure' script for Apache. The main alternative to the +'prefork' MPM which can be used on UNIX systems is the 'worker' MPM. + +If you are unsure which MPM is built into Apache, it can be determined +by running the Apache web server executable with the '-V' option. The +output from running the web server executable with this option will be +information about how it was configured when built:: + + Server version: Apache/2.2.1 + Server built: Mar 4 2007 20:48:15 + Server's Module Magic Number: 20051115:1 + Server loaded: APR 1.2.6, APR-Util 1.2.6 + Compiled using: APR 1.2.6, APR-Util 1.2.6 + Architecture: 32-bit + Server MPM: Worker + threaded: yes (fixed thread count) + forked: yes (variable process count) + Server compiled with.... + -D APACHE_MPM_DIR="server/mpm/worker" + -D APR_HAS_MMAP + -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) + -D APR_USE_SYSVSEM_SERIALIZE + -D APR_USE_PTHREAD_SERIALIZE + -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT + -D APR_HAS_OTHER_CHILD + -D AP_HAVE_RELIABLE_PIPED_LOGS + -D DYNAMIC_MODULE_LIMIT=128 + -D HTTPD_ROOT="/usr/local/apache-2.2" + -D SUEXEC_BIN="/usr/local/apache-2.2/bin/suexec" + -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" + -D DEFAULT_ERRORLOG="logs/error_log" + -D AP_TYPES_CONFIG_FILE="conf/mime.types" + -D SERVER_CONFIG_FILE="conf/httpd.conf" + +Which MPM is being used can be determined from the 'Server MPM' field. + +On the Windows platform the only available MPM is 'winnt'. + +The UNIX 'prefork' MPM +---------------------- + +This MPM is the most commonly used. It was the only mode of operation +available in Apache 1.3 and is still the default mode on UNIX systems in +later versions of Apache. In this configuration, the main Apache process +will at startup create multiple child processes. When a request is received +by the parent process, it will be processed by which ever of the child +processes is ready. + +Each child process will only handle one request at a time. If another +request arrives at the same time, it will be handled by the next available +child process. When it is detected that the number of available processes +is running out, additional child processes will be created as necessary. If +a limit is specified as to the number of child processes which may be +created and the limit is reached, plus there are sufficient requests +arriving to fill up the listener socket queue, the client may instead +receive an error resulting from not being able to establish a connection +with the web server. + +Where additional child processes have to be created due to a peak in the +number of current requests arriving and where the number of requests has +subsequently dropped off, the excess child processes may be shutdown and +killed off. Child processes may also be shutdown and killed off after they +have handled some set number of requests. + +Although threads are not used to service individual requests, this does not +preclude an application from creating separate threads to perform some +specific task. + +For the typical 'prefork' configuration where multiple processes are used, +the WSGI environment key/value pairs indicating how processes and threads +are being used will be as follows. + +*wsgi.multithread* + False + +*wsgi.multiprocess* + True + +Because multiple processes are being used, a WSGI middleware component such +as the interactive browser based debugger described would not be able to be +used. If during development and testing of a WSGI application, use of such a +debugger was required, the only option which would exist would be to limit +the number of processes being used. This could be achieved using the Apache +configuration:: + + StartServers 1 + ServerLimit 1 + +With this configuration, only one process will be started, with no +additional processes ever being created. The WSGI environment key/value +pairs indicating how processes and threads are being used will for this +configuration be as follows. + +*wsgi.multithread* + False + +*wsgi.multiprocess* + False + +In effect, this configuration has the result of serialising all requests +through a single process. This will allow an interactive browser based +debugger to be used, but may prevent more complex WSGI applications which +make use of AJAX techniques from working. This could occur where a web page +initiates a sequence of AJAX requests and expects later requests to be able +to complete while a response for an initial request is still pending. In +other words, problems may occur where requests overlap, as subsequent +requests will not be able to be executed until the initial request has +completed. + +The UNIX 'worker' MPM +--------------------- + +The 'worker' MPM is similar to 'prefork' mode except that within each child +process there will exist a number of worker threads. Instead of a request +only being able to be processed by the next available idle child process +and with the handling of the request being the only thing the child process +is then doing, the request may be processed by a worker thread within a +child process which already has other worker threads handling other +requests at the same time. + +It is possible that a WSGI application could be executed at the same time +from multiple worker threads within the one child process. This means that +multiple worker threads may want to access common shared data at the same +time. As a consequence, such common shared data must be protected in a way +that will allow access and modification in a thread safe manner. Normally +this would necessitate the use of some form of synchronisation mechanism to +ensure that only one thread at a time accesses and or modifies the common +shared data. + +If all worker threads within a child process were busy when a new request +arrives the request would be processed by an idle worker thread in another +child process. Apache may still create new child processes on demand if +necessary. Apache may also still shutdown and kill off excess child +processes, or child processes that have handled more than a set number of +requests. + +Overall, use of 'worker' MPM will result in less child processes needing to +be created, but resource usage of individual child processes will be +greater. On modern computer systems, the 'worker' MPM would in general be +the prefered MPM to use and should if possible be used in preference to the +'prefork' MPM. + +Although contention for the global interpreter lock (GIL) in Python can +causes issues for pure Python programs, it is not generally as big an issue +when using Python within Apache. This is because all the underlying +infrastructure for accepting requests and mapping the URL to a WSGI +application, as well as the handling of requests against static files are +all performed by Apache in C code. While this code is being executed the +thread will not be holding the Python GIL, thus allowing a greater level of +overlapping execution where a system has multiple CPUs or CPUs with +multiple cores. + +This ability to make good use of more than processor, even when using +multithreading, is further enchanced by the fact that Apache uses multiple +processes for handling requests and not just a single process. Thus, even +when there is some contention for the GIL within a specific process, it +doesn't stop other processes from being able to run as the GIL is only +local to a process and does not extend across processes. + +For the typical 'worker' configuration where multiple processes and +multiple threads are used, the WSGI environment key/value pairs indicating +how processes and threads are being used will be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + True + +Similar to the 'prefork' MPM, the number of processes can be restricted +to just one if required using the configuration:: + + StartServers 1 + ServerLimit 1 + +With this configuration, only one process will be started, with no +additional processes ever being created, but that one process would still +make use of multiple threads. + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + False + +Because multiple threads are being used, there would be no problem with +overlapping requests generated by an AJAX based web page. + +The Windows 'winnt' MPM +----------------------- + +On the Windows platform the 'winnt' MPM is the only option available. With +this MPM, multiple worker threads within a child process are used to handle +all requests. The 'winnt' MPM is different to the 'worker' mode however in +that there is only one child process. At no time are additional child +processes created, or that one child process shutdown and killed off, +except where Apache as a whole is being stopped or restarted. Because there +is only one child process, the maximum number of threads used is much +greater. + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + False + +The mod_wsgi Daemon Processes +----------------------------- + +When using 'daemon' mode of mod_wsgi, each process group can be +individually configured so as to run in a manner similar to either +'prefork', 'worker' or 'winnt' MPMs for Apache. This is achieved by +controlling the number of processes and threads within each process +using the 'processes' and 'threads' options of the WSGIDaemonProcess +directive. + +To emulate the same process/thread model as the 'winnt' MPM, that is, +a single process with multiple threads, the following configuration would +be used:: + + WSGIDaemonProcess example threads=25 + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + False + +Note that by not specifying the 'processes' option only a single process is +created within the process group. Although providing 'processes=1' as an +option would also result in a single process being created, this has a +slightly different meaning and so you should only do this if necessary. + +The difference between not specifying the 'processes' option and defining +'processes=1' will be that WSGI environment attribute called +'wsgi.multiprocess' will be set to be True when the 'processes' option +is defined, whereas not providing the option at all will result in the +attribute being set to be False. This distinction is to allow for where +some form of mapping mechanism might be used to distribute requests across +multiple process groups and thus in effect it is still a multiprocess +application. + +In other words, if you use the configuration:: + + WSGIDaemonProcess example processes=1 threads=25 + +the WSGI environment key/value pairs indicating how processes and threads +are being used will instead be: + +*wsgi.multithread* + True + +*wsgi.multiprocess* + True + +If you need to ensure that 'wsgi.multiprocess' is False so that interactive +debuggers do not complain about an incompatible configuration, simply do +not specify the 'processes' option and allow the default behaviour of a +single daemon process to apply. + +To emulate the same process/thread model as the 'worker' MPM, that is, +multiple processes with multiple threads, the following configuration would +be used:: + + WSGIDaemonProcess example processes=2 threads=25 + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + True + +To emulate the same process/thread model as the 'prefork' MPM, that is, +multiple processes with only a single thread running in each, the following +configuration would be used:: + + WSGIDaemonProcess example processes=5 threads=1 + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + False + +*wsgi.multiprocess* + True + +Note that when using mod_wsgi daemon processes, the processes are only used +to execute the Python based WSGI application. The processes are not in any +way used to serve static files, or host applications implemented in other +languages. + +Unlike the normal Apache child processes when 'embedded' mode of mod_wsgi +is used, the configuration as to the number of daemon processes within a +process group is fixed. That is, when the server experiences additional +load, no more daemon processes are created than what is defined. You should +therefore always plan ahead and make sure the number of processes and +threads defined is adequate to cope with the expected load. + +Sharing Of Global Data +---------------------- + +When the 'winnt' MPM is being used, or the 'prefork' or 'worker' MPM are +forced to run with only a single process, all request handlers within a +specific WSGI application will always be accessing the same global data. +This global data will persist in memory until Apache is shutdown or +restarted, or in the case of the 'prefork' or 'worker' MPM until the child +process is recycled due to reaching a predefined request limit. + +This ability to access the same global data and for that data to persist +for the lifetime of the child process is not present when either of the +'prefork' or 'worker' MPM are used in multiprocess mode. In other words, +where the WSGI environment key/value pair indicating how processes are used +is set to: + +*wsgi.multiprocess* + True + +This is because request handlers can execute within the context of distinct +child processes, each with their own set of global data unique to that +child process. + +The consequences of this are that you cannot assume that separate +invocations of a request handler will have access to the same global data +if that data only resides within the memory of the child process. If some +set of global data must be accessible by all invocations of a handler, that +data will need to be stored in a way that it can be accessed from multiple +child processes. Such sharing could be achieved by storing the global data +within an external database, the filesystem or in shared memory accessible +by all child processes. + +Since the global data will be accessible from multiple child processes at +the same time, there must be adequate locking mechanisms in place to +prevent distinct child processes from trying to modify the same data at the +same time. The locking mechanisms need to also be able to deal with the +case of multiple threads within one child process accessing the global data +at the same time, as will be the case for the 'worker' and 'winnt' MPM. + +Python Sub Interpreters +----------------------- + +The default behaviour of mod_wsgi is to create a distinct Python sub +interpreter for each WSGI application. Thus, where Apache is being used to +host multiple WSGI applications a process will contain multiple sub +interpreters. When Apache is run in a mode whereby there are multiple child +processes, each child process will contain sub interpreters for each WSGI +application. + +When a sub interpreter is created for a WSGI application, it would then +normally persist for the life of the process. The only exception to this +would be where interpreter reloading is enabled, in which case the sub +interpreter would be destroyed and recreated when the WSGI application +script file has been changed. + +For the sub interpreter created for each WSGI application, they will each +have their own set of Python modules. In other words, a change to the +global data within the context of one sub interpreter will not be seen from +the sub interpreter corresponding to a different WSGI application. This +will be the case whether or not the sub interpreters are in the same +process. + +This behaviour can be modified and multiple applications grouped together +using the WSGIApplicationGroup directive. Specifically, the directive +indicates that the marked WSGI applications should be run within the +context of a common sub interpreter rather than being run in their own sub +interpreters. By doing this, each WSGI application will then have access +to the same global data. Do note though that this doesn't change the fact +that global data will not be shared between processes. + +The only other way of sharing data between sub interpreters within the one +child process would be to use an external data store, or a third party +C extension module for Python which allows communication or sharing of +data between multiple interpreters within the same process. + +Building A Portable Application +------------------------------- + +Taking into consideration the different process models used by Apache and the +manner in which interpreters are used by mod_wsgi, to build a portable and +robust application requires the following therefore be satisified. + +1. Where shared data needs to be visible to all application instances, +regardless of which child process they execute in, and changes made to the +data by one application are immediately available to another, including any +executing in another child process, an external data store such as a +database or shared memory must be used. Global variables in normal Python +modules cannot be used for this purpose. + +2. Access to and modification of shared data in an external data store must +be protected so as to prevent multiple threads in the same or different +processes from interfering with each other. This would normally be achieved +through a locking mechanism visible to all child processes. + +3. An application must be re-entrant, or simply put, be able to be called +concurrently by multiple threads at the same time. Data which needs to +exist for the life of the request, would need to be stored as stack based +data, thread local data, or cached in the WSGI application environment. +Global variables within the actual application module cannot be used for +this purpose. + +4. Where global data in a module local to a child process is still used, +for example as a cache, access to and modification of the global data must +be protected by local thread locking mechanisms. |