diff options
Diffstat (limited to 'docs/user-guides/reloading-source-code.rst')
-rw-r--r-- | docs/user-guides/reloading-source-code.rst | 483 |
1 files changed, 483 insertions, 0 deletions
diff --git a/docs/user-guides/reloading-source-code.rst b/docs/user-guides/reloading-source-code.rst new file mode 100644 index 0000000..199cd30 --- /dev/null +++ b/docs/user-guides/reloading-source-code.rst @@ -0,0 +1,483 @@ +===================== +Reloading Source Code +===================== + +This document contains information about mechanisms available in mod_wsgi +for automatic reloading of source code when an application is changed and +any issues related to those mechanisms. + +Embedded Mode Vs Daemon Mode +---------------------------- + +What is achievable in the way of automatic source code reloading depends on +which mode your WSGI application is running. + +If your WSGI application is running in embedded mode then what happens when +you make code changes is largely dictated by how Apache works, as it +controls the processes handling requests. In general, if using embedded +mode you will have no choice but to manually restart Apache in order for code +changes to be used. + +If using daemon mode, because mod_wsgi manages directly the processes +handling requests and in which your WSGI application runs, there is more +avenue for performing automatic source code reloading. + +As a consequence, it is important to understand what mode your WSGI +application is running in. + +If you are running on Windows, are using Apache 1.3, or have not used +WSGIDaemonProcess/WSGIProcessGroup directives to delegate your WSGI +application to a mod_wsgi daemon mode process, then you will be using +embedded mode. + +If you are not sure whether you are using embedded mode or daemon mode, +then substitute your WSGI application entry point with:: + + def application(environ, start_response): + status = '200 OK' + + if not environ['mod_wsgi.process_group']: + output = 'EMBEDDED MODE' + else: + output = 'DAEMON MODE' + + response_headers = [('Content-Type', 'text/plain'), + ('Content-Length', str(len(output)))] + + start_response(status, response_headers) + + return [output] + +If your WSGI application is running in embedded mode, this will output to +the browser 'EMBEDDED MODE'. If your WSGI application is running in daemon +mode, this will output to the browser 'DAEMON MODE'. + +Reloading In Embedded Mode +-------------------------- + +However you have configured Apache to mount your WSGI application, you will +have a script file which contains the entry point for the WSGI application. +This script file is not treated exactly like a normal Python module and +need not even use a '.py' extension. It is even preferred that a '.py' +extension not be used for reasons described below. + +For embedded mode, one of the properties of the script file is that by +default it will be reloaded whenever the file is changed. The primary +intent with the file being reloaded is to provide a second chance at +getting any configuration in it and the mapping to the application correct. +If the script weren't reloaded in this way, you would need to restart +Apache even for a trivial change to the script file. + +Do note though that this script reloading mechanism is not intended as a +general purpose code reloading mechanism. Only the script file itself is +reloaded, no other Python modules are reloaded. This means that if modifying +normal Python code files which are used by your WSGI application, you will +need to trigger a restart of Apache. For example, if you are using Django +in embedded mode and needed to change your 'settings.py' file, you would +still need to restart Apache. + +That only the script file and not the whole process is reloaded also has a +number of implications and imposes certain restrictions on what code in the +script file can do or how it should be implemented. + +The first issue is that when the script file is imported, if the code makes +modifications to ``sys.path`` or other global data structures and the +changes are additive, checks should first be made to ensure that the change +has not already been made, else duplicate data will be added every time the +script file is reloaded. + +This means that when updating ``sys.path``, instead of using:: + + import sys + sys.path.append('/usr/local/wsgi/modules') + +the more correct way would be to use:: + + import sys + path = '/usr/local/wsgi/modules' + if path not in sys.path: + sys.path.append(path) + +This will ensure that the path doesn't get added multiple times. + +Even where the script file is named so as to have a '.py' extension, that +the script file is not treated like a normal module means that you should +never try to import the file from another code file using the 'import' +statement or any other import mechanism. The easiest way to avoid this is +not use the '.py' extension on script files or never place script files in +a directory which is located on the standard module search path, nor add +the directory containing the script into ``sys.path`` explicitly. + +If an attempt is made to import the script file as a module the result will +be that it will be loaded a second time as an independent module. This is +because script files are loaded under a module name which is keyed to the +full absolute path for the script file and not just the basename of the +file. Importing the script file directly and accessing it will therefore +not result in the same data being accessed as exists in the script file +when loaded. + +Because the script file is not treated like a normal Python module also has +implications when it comes to using the "pickle" module in conjunction +with objects contained within the script file. + +In practice what this means is that neither function objects, class objects +or instances of classes which are defined in the script file should be +stored using the "pickle" module. + +The technical reasons for the limitations on the use of the "pickle" module +in conjunction with objects defined in the script file are further +discussed in the document :doc:`../user-guides/issues-with-pickle-module`. + +The act of reloading script files also means that any data previously held +by the module corresponding to the script file will be deleted. If such +data constituted handles to database connections, and the connections are +not able to clean up themselves when deleted, it may result in resource +leakage. + +One should therefore be cautious of what data is kept in a script file. +Preferably the script file should only act as a bridge to code and data +residing in a normal Python module imported from an entirely different +directory. + +Restarting Apache Processes +--------------------------- + +As explained above, the only facility that mod_wsgi provides for reloading +source code files in embedded mode, is the reloading of just the script +file providing the entry point for your WSGI application. + +If you don't have a choice but to use embedded mode and still desire some +measure of automatic source code reloading, one option available which +works for both Windows and UNIX systems is to force Apache to recycle the +Apache server child process that handles the request automatically after +the request has completed. + +To enable this, you need to modify the value of the MaxRequestsPerChild +directive in the Apache configuration. Normally this would be set to a +value of '0', indicating that the process should never be restarted as a +result of the number of requests processed. To have it restart a process +after every request, set it to the value '1' instead:: + + MaxRequestsPerChild 1 + +Do note however that this will cause the process to be restarted after any +request. That is, the process will even be restarted if the request was for +a static file or a PHP application and wasn't even handled by your WSGI +application. The restart will also occur even if you have made no changes +to your code. + +Because a restart happens regardless of the request type, using this method +is not recommended. + +Because of how the Apache server child processes are monitored and restarts +handled, it is technically possible that this method will yield performance +which is worse than CGI scripts. For that reason you may even be better off +using a CGI/WSGI bridge to host your WSGI application. At least that way +the handling of other types of requests, such as for static files and PHP +applications will not be affected. + +Reloading In Daemon Mode +------------------------ + +If using mod_wsgi daemon mode, what happens when the script file is changed +is different to what happens in embedded mode. In daemon mode, if the +script file changed, rather than just the script file being reloaded, the +daemon process which contains the application will be shutdown and +restarted automatically. + +Detection of the change in the script file will occur at the time of the +first request to arrive after the change has been made. The way that the +restart is performed does not affect the handling of the request, with it +still being processed once the daemon process has been restarted. + +In the case of there being multiple daemon processes in the process group, +then a cascade effect will occur, with successive processes being restarted +until the request is again routed to one of the newly restarted processes. + +In this way, restarting of a WSGI application when a change has been made +to the code is a simple matter of touching the script file if daemon mode +is being used. Any daemon processes will then automatically restart without +the need to restart the whole of Apache. + +So, if you are using Django in daemon mode and needed to change your +'settings.py' file, once you have made the required change, also touch the +script file containing the WSGI application entry point. Having done that, +on the next request the process will be restarted and your Django +application reloaded. + +Restarting Daemon Processes +--------------------------- + +If you are using daemon mode of mod_wsgi, restarting of processes can to a +degree also be controlled by a user, or by the WSGI application itself, +without restarting the whole of Apache. + +To force a daemon process to be restarted, if you are using a single daemon +process with many threads for the application, then you can embed a page in +your application (password protected hopefully), that sends an appropriate +signal to itself. + +This should only be done for daemon processes and not within the Apache +child processes, as sending such a signal within a child process may +interfere with the operation of Apache. That the code is executing within a +daemon process can be determined by checking the 'mod_wsgi.process_group' +variable in the WSGI environment passed to the application. The value will +be non empty if a daemon process:: + + if environ['mod_wsgi.process_group'] != '': + import signal, os + os.kill(os.getpid(), signal.SIGINT) + +This will cause the daemon process your application is in to shutdown. The +Apache process supervisor will then automatically restart your process +ready for subsequent requests. On the restart it will pick up your new +code. This way you can control a reload from your application through some +special web page specifically for that purpose. + +You can also send this signal from an external application, but a problem +there may be identifying which process to send the signal to. If you are +running the daemon process(es) as a distinct user/group to Apache and each +application is running as a different user then you could just look for the +Apache (httpd) processes owned by the user the application is running as, +as opposed to the Apache user, and send them all signals. + +If the daemon process is running as the same user as Apache or there are +distinct applications running in different daemon processes but as the same +user, knowing which daemon processes to send the signal may be harder to +determine. + +Either way, to make it easier to identify which processes belong to a +daemon process group, you can use the 'display-name' option to the +WSGIDaemonProcess to name the process. On many platforms, when this option +is used, that name will then appear in the output from the 'ps' command +and not the name of the actual Apache server binary. + +Monitoring For Code Changes +--------------------------- + +The use of signals to restart a daemon process could also be employed in a +mechanism which automatically detects changes to any Python modules or +dependent files. This could be achieved by creating a thread at startup +which periodically looks to see if file timestamps have changed and trigger +a restart if they have. + +Example code for such an automatic restart mechanism which is compatible +with how mod_wsgi works is shown below:: + + import os + import sys + import time + import signal + import threading + import atexit + import Queue + + _interval = 1.0 + _times = {} + _files = [] + + _running = False + _queue = Queue.Queue() + _lock = threading.Lock() + + def _restart(path): + _queue.put(True) + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Change detected to \'%s\'.' % (prefix, path) + print >> sys.stderr, '%s Triggering process restart.' % prefix + os.kill(os.getpid(), signal.SIGINT) + + def _modified(path): + try: + # If path doesn't denote a file and were previously + # tracking it, then it has been removed or the file type + # has changed so force a restart. If not previously + # tracking the file then we can ignore it as probably + # pseudo reference such as when file extracted from a + # collection of modules contained in a zip file. + + if not os.path.isfile(path): + return path in _times + + # Check for when file last modified. + + mtime = os.stat(path).st_mtime + if path not in _times: + _times[path] = mtime + + # Force restart when modification time has changed, even + # if time now older, as that could indicate older file + # has been restored. + + if mtime != _times[path]: + return True + except: + # If any exception occured, likely that file has been + # been removed just before stat(), so force a restart. + + return True + + return False + + def _monitor(): + while 1: + # Check modification times on all files in sys.modules. + + for module in sys.modules.values(): + if not hasattr(module, '__file__'): + continue + path = getattr(module, '__file__') + if not path: + continue + if os.path.splitext(path)[1] in ['.pyc', '.pyo', '.pyd']: + path = path[:-1] + if _modified(path): + return _restart(path) + + # Check modification times on files which have + # specifically been registered for monitoring. + + for path in _files: + if _modified(path): + return _restart(path) + + # Go to sleep for specified interval. + + try: + return _queue.get(timeout=_interval) + except: + pass + + _thread = threading.Thread(target=_monitor) + _thread.setDaemon(True) + + def _exiting(): + try: + _queue.put(True) + except: + pass + _thread.join() + + atexit.register(_exiting) + + def track(path): + if not path in _files: + _files.append(path) + + def start(interval=1.0): + global _interval + if interval < _interval: + _interval = interval + + global _running + _lock.acquire() + if not _running: + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Starting change monitor.' % prefix + _running = True + _thread.start() + _lock.release() + +This would be used by importing into the script file the Python module +containing the above code, starting the monitoring system and adding any +additional non Python files which should be tracked:: + + import os + + import monitor + monitor.start(interval=1.0) + monitor.track(os.path.join(os.path.dirname(__file__), 'site.cf')) + + def application(environ, start_response): + ... + +Where needing to add many non Python files in a directory hierarchy, such +as template files which would otherwise be cached within the running +process, the ``os.path.walk()`` function could be used to traverse +all files and add required files based on extension or other criteria +using the 'track()' function. + +This mechanism would generally work adequately where a single daemon +process is used within a process group. You would need to be careful +however when multiple daemon processes are used. This is because it may not +be possible to synchronise the checks exactly across all of the daemon +processes. As a result you may end up with the daemon processes running a +mixture of old and new code until they all synchronise with the new code +base. This problem can be minimised by defining a short interval time +between scans, however that will increase the overhead of the checks. + +Using such an approach may in some cases be useful if using mod_wsgi as a +development platform. It certainly would not be recommended you use this +mechanism for a production system. + +The reasons for not using it on a production system is due to the +additional overhead and chance that daemon processes are restarted when you +are not expecting them to be. For example, in a production environment +where requests are coming in all the time, you do not want a restart +triggered when you are part way through making a set of changes which cover +multiple files as likely then that an inconsistent set of code will be +loaded and the application will fail. + +Note that you should also not use this mechanism on a system where you have +configured mod_wsgi to preload your WSGI application as soon as the daemon +process has started. If you do that, then the monitor thread will be recreated +immediately and so for every single code change on a preloaded file you +make, the daemon process will be restarted, even if there is no intervening +request. + +If preloading was really required, the example code would need to be +modified so as to not use signals to restart the daemon process, but reset +to zero the variable saved away in the WSGI script file that records the +modification time of the script file. This will have the affect of delaying +the restart until the next request has arrived. Because that variable holding +the modification time is an internal implementation detail of mod_wsgi and +not strictly part of its published API or behaviour, you should only use +that approach if it is warranted. + +Restarting Windows Apache +------------------------- + +On the Windows platform there is no daemon mode only embedded mode. The MPM +used on Apache is the 'winnt' MPM. This MPM is like the worker MPM on UNIX +systems except that there is only one process. + +Being embedded mode, modifying the WSGI script file only results in the WSGI +script file itself being reloaded, the process as a whole is not reloaded. +Thus there is no way normally through modifying the WSGI script file or any +other Python code file used by the application, of having the whole +application reloaded automatically. + +The recipe in the previous section can be used with daemon mode on UNIX +systems to implement an automated scheme for restarting the daemon +processes when any code change is made, but because Windows lacks the +'fork()' system call daemon mode isn't supported in the first place. + +Thus, the only way one can have code changes picked up on Windows is to +restart Apache as a whole. Although a full restart is required, Apache on +Windows only uses a single child server process and so the impact isn't as +significant as on UNIX platforms, where many processes may need to be +shutdown and restarted. + +With that in mind, it is actually possible to modify the prior recipe for +restarting a daemon process to restart Apache itself. To achieve this slight +of hand, it is necessary to use the Python 'ctypes' module to get access to +a special internal Apache function which is available in the Windows version +of Apache called 'ap_signal_parent()'. + +The required change to get this to work is to replace the restart +function in the previous code with the following:: + + def _restart(path): + _queue.put(True) + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Change detected to \'%s\'.' % (prefix, path) + print >> sys.stderr, '%s Triggering Apache restart.' % prefix + import ctypes + ctypes.windll.libhttpd.ap_signal_parent(1) + +Other than that, the prior code would be used exactly as before. Now when +any change is made to Python code used by the application or any other +monitored files, Apache will be restarted automatically for you. + +As before, probably recommended that this only be used during development +and not on a production system. |