From 9a41180de3b597a30dc740effbf3c4c3ff4b0be1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alex=20Gr=C3=B6nholm?= Date: Mon, 15 Aug 2022 22:47:05 +0300 Subject: Updated the user guide for v4.0 --- docs/userguide.rst | 672 ++++++++++++++++++++++++++--------------------------- 1 file changed, 326 insertions(+), 346 deletions(-) (limited to 'docs') diff --git a/docs/userguide.rst b/docs/userguide.rst index 36395a1..c547391 100644 --- a/docs/userguide.rst +++ b/docs/userguide.rst @@ -3,8 +3,8 @@ User guide ########## -Installing APScheduler ----------------------- +Installation +============ The preferred installation method is by using `pip `_:: @@ -15,453 +15,432 @@ If you don't have pip installed, you need to Code examples -------------- +============= The source distribution contains the :file:`examples` directory where you can find many working examples for using APScheduler in different ways. The examples can also be `browsed online `_. -Basic concepts --------------- - -APScheduler has four kinds of components: - -* triggers -* job stores -* executors -* schedulers - -*Triggers* contain the scheduling logic. Each job has its own trigger which determines when the job -should be run next. Beyond their initial configuration, triggers are completely stateless. - -*Job stores* house the scheduled jobs. The default job store simply keeps the jobs in memory, but -others store them in various kinds of databases. A job's data is serialized when it is saved to a -persistent job store, and deserialized when it's loaded back from it. Job stores (other than the -default one) don't keep the job data in memory, but act as middlemen for saving, loading, updating -and searching jobs in the backend. Job stores must never be shared between schedulers. - -*Executors* are what handle the running of the jobs. They do this typically by submitting the -designated callable in a job to a thread or process pool. When the job is done, the executor -notifies the scheduler which then emits an appropriate event. - -*Schedulers* are what bind the rest together. You typically have only one scheduler running in your -application. The application developer doesn't normally deal with the job stores, executors or -triggers directly. Instead, the scheduler provides the proper interface to handle all those. -Configuring the job stores and executors is done through the scheduler, as is adding, modifying and -removing jobs. - - -Choosing the right scheduler, job store(s), executor(s) and trigger(s) ----------------------------------------------------------------------- - -Your choice of scheduler depends mostly on your programming environment and what you'll be using -APScheduler for. Here's a quick guide for choosing a scheduler: - -* :class:`~apscheduler.schedulers.blocking.BlockingScheduler`: - use when the scheduler is the only thing running in your process -* :class:`~apscheduler.schedulers.background.BackgroundScheduler`: - use when you're not using any of the frameworks below, and want the scheduler to run in the - background inside your application -* :class:`~apscheduler.schedulers.asyncio.AsyncIOScheduler`: - use if your application uses the asyncio module -* :class:`~apscheduler.schedulers.gevent.GeventScheduler`: - use if your application uses gevent -* :class:`~apscheduler.schedulers.tornado.TornadoScheduler`: - use if you're building a Tornado application -* :class:`~apscheduler.schedulers.twisted.TwistedScheduler`: - use if you're building a Twisted application -* :class:`~apscheduler.schedulers.qt.QtScheduler`: - use if you're building a Qt application - -Simple enough, yes? - -To pick the appropriate job store, you need to determine whether you need job persistence or not. -If you always recreate your jobs at the start of your application, then you can probably go with -the default (:class:`~apscheduler.jobstores.memory.MemoryJobStore`). But if you need your jobs to -persist over scheduler restarts or application crashes, then your choice usually boils down to what -tools are used in your programming environment. If, however, you are in the position to choose -freely, then :class:`~apscheduler.jobstores.sqlalchemy.SQLAlchemyJobStore` on a -`PostgreSQL `_ backend is the recommended choice due to its strong data -integrity protection. - -Likewise, the choice of executors is usually made for you if you use one of the frameworks above. -Otherwise, the default :class:`~apscheduler.executors.pool.ThreadPoolExecutor` should be good -enough for most purposes. If your workload involves CPU intensive operations, you should consider -using :class:`~apscheduler.executors.pool.ProcessPoolExecutor` instead to make use of multiple CPU -cores. You could even use both at once, adding the process pool executor as a secondary executor. - -When you schedule a job, you need to choose a *trigger* for it. The trigger determines the logic by -which the dates/times are calculated when the job will be run. APScheduler comes with three -built-in trigger types: - -* :mod:`~apscheduler.triggers.date`: - use when you want to run the job just once at a certain point of time -* :mod:`~apscheduler.triggers.interval`: - use when you want to run the job at fixed intervals of time -* :mod:`~apscheduler.triggers.cron`: - use when you want to run the job periodically at certain time(s) of day -* :mod:`~apscheduler.triggers.calendarinterval`: - use when you want to run the job on calendar-based intervals, at a specific time of day - -It is also possible to combine multiple triggers into one which fires either on times agreed on by -all the participating triggers, or when any of the triggers would fire. For more information, see -the documentation for :mod:`combining triggers `. - -You can find the plugin names of each job store, executor and trigger type on their respective API -documentation pages. - - -.. _scheduler-config: - -Configuring the scheduler -------------------------- +Introduction +============ -APScheduler provides many different ways to configure the scheduler. You can use a configuration -dictionary or you can pass in the options as keyword arguments. You can also instantiate the -scheduler first, add jobs and configure the scheduler afterwards. This way you get maximum -flexibility for any environment. - -The full list of scheduler level configuration options can be found on the API reference of the -:class:`~apscheduler.schedulers.base.BaseScheduler` class. Scheduler subclasses may also have -additional options which are documented on their respective API references. Configuration options -for individual job stores and executors can likewise be found on their API reference pages. - -Let's say you want to run BackgroundScheduler in your application with the default job store and -the default executor:: - - from apscheduler.schedulers.background import BackgroundScheduler - - - scheduler = BackgroundScheduler() +The core concept of APScheduler is to give the user the ability to queue Python code to +be executed, either as soon as possible, later at a given time, or on a recurring +schedule. To make this happen, APScheduler has two types of components: *schedulers* and +*workers*. - # Initialize the rest of the application here, or before the scheduler initialization - -This will get you a BackgroundScheduler with a MemoryJobStore named "default" and a -ThreadPoolExecutor named "default" with a default maximum thread count of 10. - -Now, suppose you want more. You want to have *two* job stores using *two* executors and you also -want to tweak the default values for new jobs and set a different timezone. -The following three examples are completely equivalent, and will get you: - -* a MongoDBJobStore named "mongo" -* an SQLAlchemyJobStore named "default" (using SQLite) -* a ThreadPoolExecutor named "default", with a worker count of 20 -* a ProcessPoolExecutor named "processpool", with a worker count of 5 -* UTC as the scheduler's timezone -* coalescing turned off for new jobs by default -* a default maximum instance limit of 3 for new jobs +A scheduler is the user-facing interface of the system. When running, it asks its +associated *data store* for *schedules* due to be run. For each such schedule, it then +uses the schedule's associated *trigger* to calculate run times up to the present. For +each run time, the scheduler creates a *job* in the data store, containing the +designated run time and the identifier of the schedule it was derived from. -Method 1:: +A worker asks the data store for jobs, and then starts running those jobs. If the data +store signals that it has new jobs, the worker will try to acquire those jobs if it is +capable of accommodating more jobs. When a worker completes a job, it will then also ask +the data store for as many more jobs as it can handle. - from pytz import utc +By default, each scheduler starts an internal worker to simplify use, but in more +complex use cases you may wish to run them in separate processes, or even on separate +nodes. For this, you'll need both a persistent data store and an *event broker*, shared +by both the scheduler(s) and worker(s). For more information, see the section below on +running schedulers and workers separately. - from apscheduler.schedulers.background import BackgroundScheduler - from apscheduler.jobstores.mongodb import MongoDBJobStore - from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore - from apscheduler.executors.pool import ThreadPoolExecutor, ProcessPoolExecutor +Basic concepts / glossary +========================= +These are the basic components and concepts of APScheduler whixh will be referenced +later in this guide. - jobstores = { - 'mongo': MongoDBJobStore(), - 'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite') - } - executors = { - 'default': ThreadPoolExecutor(20), - 'processpool': ProcessPoolExecutor(5) - } - job_defaults = { - 'coalesce': False, - 'max_instances': 3 - } - scheduler = BackgroundScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc) +A *task* encapsulates a Python function and a number of configuration parameters. They +are often implicitly defined as a side effect of the user creating a new schedule +against a function, but can also be explicitly defined beforehand (**TODO**: implement +this!). -Method 2:: +A *trigger* contains the logic and state used to calculate when a scheduled task should +be run. - from apscheduler.schedulers.background import BackgroundScheduler +A *schedule* combines a task with a trigger, plus a number of configuration parameters. +A *job* is request for a task to be run. It can be created automatically from a schedule +when a scheduler processes it, or it can be directly created by the user if they +directly request a task to be run. - # The "apscheduler." prefix is hard coded - scheduler = BackgroundScheduler({ - 'apscheduler.jobstores.mongo': { - 'type': 'mongodb' - }, - 'apscheduler.jobstores.default': { - 'type': 'sqlalchemy', - 'url': 'sqlite:///jobs.sqlite' - }, - 'apscheduler.executors.default': { - 'class': 'apscheduler.executors.pool:ThreadPoolExecutor', - 'max_workers': '20' - }, - 'apscheduler.executors.processpool': { - 'type': 'processpool', - 'max_workers': '5' - }, - 'apscheduler.job_defaults.coalesce': 'false', - 'apscheduler.job_defaults.max_instances': '3', - 'apscheduler.timezone': 'UTC', - }) +A *data store* is used to store *schedules* and *jobs*, and to keep track of tasks. -Method 3:: +A *scheduler* fetches schedules due for their next runs from its associated data store +and then creates new jobs accordingly. - from pytz import utc +A *worker* fetches jobs from its data store, runs them and pushes the results back to +the data store. - from apscheduler.schedulers.background import BackgroundScheduler - from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore - from apscheduler.executors.pool import ProcessPoolExecutor +An *event broker* delivers published events to all interested parties. It facilitates +the cooperation between schedulers and workers by notifying them of new or updated +schedules or jobs. +Scheduling tasks +================ - jobstores = { - 'mongo': {'type': 'mongodb'}, - 'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite') - } - executors = { - 'default': {'type': 'threadpool', 'max_workers': 20}, - 'processpool': ProcessPoolExecutor(max_workers=5) - } - job_defaults = { - 'coalesce': False, - 'max_instances': 3 - } - scheduler = BackgroundScheduler() +To create a schedule for running a task, you need, at the minimum: - # .. do something else here, maybe add jobs etc. +* A *callable* to be run +* A *trigger* - scheduler.configure(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc) +.. note:: Scheduling lambdas or nested functions is currently not possible. This will be + fixed before the final release. +The callable can be a function or method, lambda or even an instance of a class that +contains the ``__call__()`` method. With the default (memory based) data store, any +callable can be used as a task callable. Persistent data stores (more on those below) +place some restrictions on the kinds of callables can be used because they cannot store +the callable directly but instead need to be able to locate it with a *reference*. -Starting the scheduler ----------------------- +The trigger determines the scheduling logic for your schedule. In other words, it is +used to calculate the datetimes on which the task will be run. APScheduler comes with a +number of built-in trigger classes: -Starting the scheduler is done by simply calling -:meth:`~apscheduler.schedulers.base.BaseScheduler.start` on the scheduler. For schedulers other -than :class:`~apscheduler.schedulers.blocking.BlockingScheduler`, this call will return immediately and -you can continue the initialization process of your application, possibly adding jobs to the -scheduler. +* :class:`~apscheduler.triggers.date.DateTrigger`: + use when you want to run the task just once at a certain point of time +* :class:`~apscheduler.triggers.interval.IntervalTrigger`: + use when you want to run the task at fixed intervals of time +* :class:`~apscheduler.triggers.cron.CronTrigger`: + use when you want to run the task periodically at certain time(s) of day +* :class:`~apscheduler.triggers.calendarinterval.CalendarIntervalTrigger`: + use when you want to run the task on calendar-based intervals, at a specific time of + day -For BlockingScheduler, you will only want to call -:meth:`~apscheduler.schedulers.base.BaseScheduler.start` after you're done with any initialization -steps. +Combining multiple triggers +--------------------------- -.. note:: After the scheduler has been started, you can no longer alter its settings. +Occasionally, you may find yourself in a situation where your scheduling needs are too +complex to be handled with any of the built-in triggers directly. +One examples of such a need would be when you want the task to run at 10:00 from Monday +to Friday, but also at 11:00 from Saturday to Sunday. +A single :class:`~apscheduler.triggers.cron.CronTrigger` would not be able to handle +this case, but an :class:`~apscheduler.triggers.combining.OrTrigger` containing two cron +triggers can:: -Adding jobs ------------ + from apscheduler.triggers.combining import OrTrigger + from apscheduler.triggers.cron import CronTrigger -There are two ways to add jobs to a scheduler: + trigger = OrTrigger( + CronTrigger(day_of_week="mon-fri", hour=10), + CronTrigger(day_of_week="sat-sun", hour=11), + ) -#. by calling :meth:`~apscheduler.schedulers.base.BaseScheduler.add_job` -#. by decorating a function with :meth:`~apscheduler.schedulers.base.BaseScheduler.scheduled_job` +On the first run, :class:`~apscheduler.triggers.combining.OrTrigger` generates the next +run times from both cron triggers and saves them internally. It then returns the +earliest one. On the next run, it generates a new run time from the trigger that +produced the earliest run time on the previous run, and then again returns the earliest +of the two run times. This goes on until all the triggers have been exhausted, if ever. -The first way is the most common way to do it. The second way is mostly a convenience to declare -jobs that don't change during the application's run time. The -:meth:`~apscheduler.schedulers.base.BaseScheduler.add_job` method returns a -:class:`apscheduler.job.Job` instance that you can use to modify or remove the job later. +Another example would be a case where you want the task to be run every 2 months at +10:00, but not on weekends (Saturday or Sunday):: -You can schedule jobs on the scheduler **at any time**. If the scheduler is not yet running when -the job is added, the job will be scheduled *tentatively* and its first run time will only be -computed when the scheduler starts. + from apscheduler.triggers.calendarinterval import CalendarIntervalTrigger + from apscheduler.triggers.combining import AndTrigger + from apscheduler.triggers.cron import CronTrigger -It is important to note that if you use an executor or job store that serializes the job, it will -add a couple requirements on your job: + trigger = AndTrigger( + CalendarIntervalTrigger(months=2, hour=10), + CronTrigger(day_of_week="mon-fri", hour=10), + ) -#. The target callable must be globally accessible -#. Any arguments to the callable must be serializable +On the first run, :class:`~apscheduler.triggers.combining.AndTrigger` generates the next +run times from both the +:class:`~apscheduler.triggers.calendarinterval.CalendarIntervalTrigger` and +:class:`~apscheduler.triggers.cron.CronTrigger`. If the run times coincide, it will +return that run time. Otherwise, it will calculate a new run time from the trigger that +produced the earliest run time. It will keep doing this until a match is found, one of +the triggers has been exhausted or the maximum number of iterations (1000 by default) is +reached. -Of the builtin job stores, only MemoryJobStore doesn't serialize jobs. -Of the builtin executors, only ProcessPoolExecutor will serialize jobs. +If this trigger is created on 2022-06-07 at 09:00:00, its first run times would be: -.. important:: If you schedule jobs in a persistent job store during your application's - initialization, you **MUST** define an explicit ID for the job and use ``replace_existing=True`` - or you will get a new copy of the job every time your application restarts! +* 2022-06-07 10:00:00 +* 2022-10-07 10:00:00 +* 2022-12-07 10:00:00 -.. tip:: To run a job immediately, omit ``trigger`` argument when adding the job. +Notably, 2022-08-07 is skipped because it falls on a Sunday. +Running tasks without scheduling +-------------------------------- -Removing jobs -------------- +In some cases, you want to run tasks directly, without involving schedules: -When you remove a job from the scheduler, it is removed from its associated job store and will not -be executed anymore. There are two ways to make this happen: +* You're only interested in using the scheduler system as a job queue +* You're interested in the job's return value -#. by calling :meth:`~apscheduler.schedulers.base.BaseScheduler.remove_job` with the job's ID and - job store alias -#. by calling :meth:`~apscheduler.job.Job.remove` on the Job instance you got from - :meth:`~apscheduler.schedulers.base.BaseScheduler.add_job` +To queue a job and wait for its completion and get the result, the easiest way is to +use :meth:`~apscheduler.schedulers.sync.Scheduler.run_job`. If you prefer to just launch +a job and not wait for its result, use +:meth:`~apscheduler.schedulers.sync.Scheduler.add_job` instead. If you want to get the +results later, you can then call +:meth:`~apscheduler.schedulers.sync.Scheduler.get_job_result` with the job ID you got +from :meth:`~apscheduler.schedulers.sync.Scheduler.add_job`. -The latter method is probably more convenient, but it requires that you store somewhere the -:class:`~apscheduler.job.Job` instance you received when adding the job. For jobs scheduled via the -:meth:`~apscheduler.schedulers.base.BaseScheduler.scheduled_job`, the first way is the only way. +Removing schedules +------------------ -If the job's schedule ends (i.e. its trigger doesn't produce any further run times), it is -automatically removed. +To remove a previously added schedule, call +:meth:`~apscheduler.schedulers.sync.Scheduler.remove_schedule`. Pass the identifier of +the schedule you want to remove as an argument. This is the ID you got from +:meth:`~apscheduler.schedulers.sync.Scheduler.add_schedule`. -Example:: +Note that removing a schedule does not cancel any jobs derived from it, but does prevent +further jobs from being created from that schedule. - job = scheduler.add_job(myfunc, 'interval', minutes=2) - job.remove() +Limiting the number of concurrently executing instances of a job +---------------------------------------------------------------- -Same, using an explicit job ID:: +It is possible to control the maximum number of concurrently running jobs for a +particular task. By default, only one job is allowed to be run for every task. +This means that if the job is about to be run but there is another job for the same task +still running, the later job is terminated with the outcome of +:data:`~apscheduler.JobOutcome.missed_start_deadline`. - scheduler.add_job(myfunc, 'interval', minutes=2, id='my_job_id') - scheduler.remove_job('my_job_id') +To allow more jobs to be concurrently running for a task, pass the desired maximum +number as the ``max_instances`` keyword argument to +:meth:`~apscheduler.schedulers.sync.Scheduler.add_schedule`.~ +Controlling how much a job can be started late +---------------------------------------------- -Pausing and resuming jobs -------------------------- +Some tasks are time sensitive, and should not be run at all if it fails to be started on +time (like, for example, if the worker(s) were down while they were supposed to be +running the scheduled jobs). You can control this time limit with the +``misfire_grace_time`` option passed to +:meth:`~apscheduler.schedulers.sync.Scheduler.add_schedule`. A worker that acquires the +job then checks if the current time is later than the deadline +(run time + misfire grace time) and if it is, it skips the execution of the job and +releases it with the outcome of :data:`~apscheduler.JobOutcome.` -You can easily pause and resume jobs through either the :class:`~apscheduler.job.Job` instance or -the scheduler itself. When a job is paused, its next run time is cleared and no further run times -will be calculated for it until the job is resumed. To pause a job, use either method: +Controlling how jobs are queued from schedules +---------------------------------------------- -* :meth:`apscheduler.job.Job.pause` -* :meth:`apscheduler.schedulers.base.BaseScheduler.pause_job` +In most cases, when a scheduler processes a schedule, it queues a new job using the +run time currently marked for the schedule. Then it updates the next run time using the +schedule's trigger and releases the schedule back to the data store. But sometimes a +situation occurs where the schedule did not get processed often or quickly enough, and +one or more next run times produced by the trigger are actually in the past. -To resume: +In a situation like that, the scheduler needs to decide what to do: to queue a job for +every run time produced, or to *coalesce* them all into a single job, effectively just +kicking off a single job. To control this, pass the ``coalesce`` argument to +:meth:`~apscheduler.schedulers.sync.Scheduler.add_schedule`. -* :meth:`apscheduler.job.Job.resume` -* :meth:`apscheduler.schedulers.base.BaseScheduler.resume_job` +The possible values are: +* :data:`~apscheduler.CoalescePolicy.latest`: queue exactly one job, using the + **latest** run time as the designated run time +* :data:`~apscheduler.CoalescePolicy.earliest`: queue exactly one job, using the + **earliest** run time as the designated run time +* :data:`~apscheduler.CoalescePolicy.all`: queue one job for **each** of the calculated + run times -Getting a list of scheduled jobs --------------------------------- +The biggest difference between the first two options is how the designated run time, and +by extension, the starting deadline is for the job is selected. With the first option, +the job is less likely to be skipped due to being started late since the latest of all +the collected run times is used for the deadline calculation. -To get a machine processable list of the scheduled jobs, you can use the -:meth:`~apscheduler.schedulers.base.BaseScheduler.get_jobs` method. It will return a list of -:class:`~apscheduler.job.Job` instances. If you're only interested in the jobs contained in a -particular job store, then give a job store alias as the second argument. +As explained in the previous section, the starting +deadline is *misfire grace time* +affects the newly queued job. -As a convenience, you can use the :meth:`~apscheduler.schedulers.base.BaseScheduler.print_jobs` -method which will print out a formatted list of jobs, their triggers and next run times. +Context variables +================= +Schedulers and workers provide certain `context variables`_ available to the tasks being +run: -Modifying jobs --------------- +* The current scheduler: :data:`~apscheduler.current_scheduler` +* The current worker: :data:`~apscheduler.current_worker` +* Information about the job being currently run: :data:`~apscheduler.current_job` -You can modify any job attributes by calling either :meth:`apscheduler.job.Job.modify` or -:meth:`~apscheduler.schedulers.base.BaseScheduler.modify_job`. You can modify any Job attributes -except for ``id``. +Here's an example:: -Example:: + from apscheduler import current_job - job.modify(max_instances=6, name='Alternate name') + def my_task_function(): + job_info = current_job.get().id + print( + f"This is job {job_info.id} and was spawned from schedule " + f"{job_info.schedule_id}" + ) -If you want to reschedule the job -- that is, change its trigger, you can use either -:meth:`apscheduler.job.Job.reschedule` or -:meth:`~apscheduler.schedulers.base.BaseScheduler.reschedule_job`. -These methods construct a new trigger for the job and recalculate its next run time based on the -new trigger. +.. _context variables: :mod:`contextvars` -Example:: +.. _scheduler-events: - scheduler.reschedule_job('my_job_id', trigger='cron', minute='*/5') +Subscribing to events +===================== +Schedulers and workers have the ability to notify listeners when some event occurs in +the scheduler system. Examples of such events would be schedulers or workers starting up +or shutting down, or schedules or jobs being created or removed from the data store. -Shutting down the scheduler ---------------------------- +To listen to events, you need a callable that takes a single positional argument which +is the event object. Then, you need to decide which events you're interested in: -To shut down the scheduler:: +.. tabs:: - scheduler.shutdown() + .. code-tab:: python Synchronous -By default, the scheduler shuts down its job stores and executors and waits until all currently -executing jobs are finished. If you don't want to wait, you can do:: + from apscheduler import Event, JobAcquired, JobReleased - scheduler.shutdown(wait=False) + def listener(event: Event) -> None: + print(f"Received {event.__class__.__name__}") -This will still shut down the job stores and executors but does not wait for any running -tasks to complete. + scheduler.events.subscribe(listener, {JobAcquired, JobReleased}) + .. code-tab:: python Asynchronous -Pausing/resuming job processing -------------------------------- + from apscheduler import Event, JobAcquired, JobReleased -It is possible to pause the processing of scheduled jobs:: + async def listener(event: Event) -> None: + print(f"Received {event.__class__.__name__}") - scheduler.pause() + scheduler.events.subscribe(listener, {JobAcquired, JobReleased}) -This will cause the scheduler to not wake up until processing is resumed:: +This example subscribes to the :class:`~apscheduler.JobAcquired` and +:class:`~apscheduler.JobAcquired` event types. The callback will receive an event of +either type, and prints the name of the class of the received event. - scheduler.resume() +Asynchronous schedulers and workers support both synchronous and asynchronous callbacks, +but their synchronous counterparts only support synchronous callbacks. -It is also possible to start the scheduler in paused state, that is, without the first wakeup -call:: +When **distributed** event brokers (that is, other than the default one) are being used, +events other than the ones relating to the life cycles of schedulers and workers, will +be sent to all schedulers and workers connected to that event broker. - scheduler.start(paused=True) +Deployment +========== -This is useful when you need to prune unwanted jobs before they have a chance to run. +Using persistent data stores +---------------------------- +The default data store, :class:`~apscheduler.datastores.memory.MemoryDataStore`, stores +data only in memory so all the schedules and jobs that were added to it will be erased +if the process crashes. -Limiting the number of concurrently executing instances of a job ----------------------------------------------------------------- +When you need your schedules and jobs to survive the application shutting down, you need +to use a *persistent data store*. Such data stores do have additional considerations, +compared to the memory data store: -By default, only one instance of each job is allowed to be run at the same time. -This means that if the job is about to be run but the previous run hasn't finished yet, then the -latest run is considered a misfire. It is possible to set the maximum number of instances for a -particular job that the scheduler will let run concurrently, by using the ``max_instances`` keyword -argument when adding the job. +* The task callable cannot be a lambda or a nested function +* Task arguments must be *serializable* +* You must either trust the data store, or use an alternate *serializer* +* A *conflict policy* and an *explicit identifier* must be defined for schedules that + are added at application startup +These requirements warrant some explanation. The first point means that since persisting +data means saving it externally, either in a file or sending to a database server, all +the objects involved are converted to bytestrings. This process is called +*serialization*. By default, this is done using :mod:`pickle`, which guarantees the best +compatibility but is notorious for being vulnerable to simple injection attacks. This +brings us to the second point. If you cannot be sure that nobody can maliciously alter +the externally stored serialized data, it would be best to use another serializer. The +built-in alternatives are: -.. _missed-job-executions: +* :class:`~apscheduler.serializers.cbor.CBORSerializer` +* :class:`~apscheduler.serializers.json.JSONSerializer` -Missed job executions and coalescing ------------------------------------- +The former requires the cbor2_ library, but supports a wider variety of types natively. +The latter has no dependencies but has very limited support for different types. -Sometimes the scheduler may be unable to execute a scheduled job at the time it was scheduled to -run. The most common case is when a job is scheduled in a persistent job store and the scheduler -is shut down and restarted after the job was supposed to execute. When this happens, the job is -considered to have "misfired". The scheduler will then check each missed execution time against the -job's ``misfire_grace_time`` option (which can be set on per-job basis or globally in the -scheduler) to see if the execution should still be triggered. This can lead into the job being -executed several times in succession. +The third point relates to situations where you're essentially adding the same schedule +to the data store over and over again. If you don't specify a static identifier for +the schedules added at the start of the application, you will end up with an increasing +number of redundant schedules doing the same thing, which is probably not what you want. +To that end, you will need to come up with some identifying name which will ensure that +the same schedule will not be added over and over again (as data stores are required to +enforce the uniqueness of schedule identifiers). You'll also need to decide what to do +if the schedule already exists in the data store (that is, when the application is +started the second time) by passing the ``conflict_policy`` argument. Usually you want +the :data:`~apscheduler.ConflictPolicy.replace` option, which replaces the existing +schedule with the new one. -If this behavior is undesirable for your particular use case, it is possible to use ``coalescing`` -to roll all these missed executions into one. In other words, if coalescing is enabled for the job -and the scheduler sees one or more queued executions for the job, it will only trigger it once. No -misfire events will be sent for the "bypassed" runs. +.. seealso:: You can find practical examples of persistent data stores in the + :file:`examples/standalone` directory (``async_postgres.py`` and + ``async_mysql.py``). -.. note:: - If the execution of a job is delayed due to no threads or processes being available in the - pool, the executor may skip it due to it being run too late (compared to its originally - designated run time). If this is likely to happen in your application, you may want to either - increase the number of threads/processes in the executor, or adjust the ``misfire_grace_time`` - setting to a higher value. +.. _cbor2: https://pypi.org/project/cbor2/ +Using multiple schedulers +------------------------- -.. _scheduler-events: +There are several situations in which you would want to run several schedulers against +the same data store at once: + +* Running a server application (usually a web app) with multiple workers +* You need fault tolerance (scheduling will continue even if a node or process running + a scheduler goes down) + +When you have multiple schedulers (or workers; see the next section) running at once, +they need to be able to coordinate their efforts so that the schedules don't get +processed more than once and the schedulers know when to wake up even if another +scheduler added the next due schedule to the data store. To this end, a shared +*event broker* must be configured. + +.. seealso:: You can find practical examples of data store sharing in the + :file:`examples/web` directory. + +Running schedulers and workers separately +----------------------------------------- + +Some deployment scenarios may warrant running workers separately from the schedulers. +For example, if you want to set up a scalable worker pool, you can run just the workers +in that pool and the schedulers elsewhere without the internal workers. To prevent the +scheduler from starting an internal worker, you need to pass it the +``start_worker=False`` option. + +Starting a worker without a scheduler looks very similar to the procedure to start a +scheduler: + +.. tabs:: + + .. code-tab: python Synchronous + + from apscheduler.workers.sync import Worker + + + data_store = ... + event_broker = ... + worker = Worker(data_store, event_broker) + worker.run_until_stopped() + + .. code-tab: python asyncio + + import asyncio -Scheduler events ----------------- + from apscheduler.workers.async_ import AsyncWorker -It is possible to attach event listeners to the scheduler. Scheduler events are fired on certain -occasions, and may carry additional information in them concerning the details of that particular -event. It is possible to listen to only particular types of events by giving the appropriate -``mask`` argument to :meth:`~apscheduler.schedulers.base.BaseScheduler.add_listener`, OR'ing the -different constants together. The listener callable is called with one argument, the event object. -See the documentation for the :mod:`~apscheduler.events` module for specifics on the available -events and their attributes. + async def main(): + data_store = ... + event_broker = ... + async with AsyncWorker(data_store, event_broker) as worker: + await worker.wait_until_stopped() -Example:: + asyncio.run(main()) - def my_listener(event): - if event.exception: - print('The job crashed :(') - else: - print('The job worked :)') +There is one significant matter to take into consideration if you do this. The scheduler +object, usually available from :data:`~apscheduler.current_scheduler`, will not be set +since there is no scheduler running in the current thread/task. - scheduler.add_listener(my_listener, EVENT_JOB_EXECUTED | EVENT_JOB_ERROR) +.. seealso:: A practical example of separate schedulers and workers can be found in the + :file:`examples/separate_worker` directory. .. _troubleshooting: Troubleshooting ---------------- +=============== -If the scheduler isn't working as expected, it will be helpful to increase the logging level of the -``apscheduler`` logger to the ``DEBUG`` level. +If something isn't working as expected, it will be helpful to increase the logging level +of the ``apscheduler`` logger to the ``DEBUG`` level. If you do not yet have logging enabled in the first place, you can do this:: @@ -470,13 +449,14 @@ If you do not yet have logging enabled in the first place, you can do this:: logging.basicConfig() logging.getLogger('apscheduler').setLevel(logging.DEBUG) -This should provide lots of useful information about what's going on inside the scheduler. +This should provide lots of useful information about what's going on inside the +scheduler and/or worker. -Also make sure that you check the :doc:`faq` section to see if your problem already has a solution. +Also make sure that you check the :doc:`faq` section to see if your problem already has +a solution. Reporting bugs --------------- +============== .. include:: ../README.rst :start-after: Reporting bugs - -------------- -- cgit v1.2.1