Cron Jobs or Automated Background Processes in Magento
Why cron jobs are not always the best choice
Who should read this post?
Read this post to figure out how Pacemaker can solve your problems with cron jobs, especially if
- one or more Magento indexers are invalid
- your Magento cron jobs are not running
- prices and/or inventory doesn't seems to be appropriate
Magento and its problems with Cron Jobs
That is fine, but sometimes it is necessary to perform certain calculations or operations on the server without prior interaction of a client. For example, the product data would have to be regularly compared with the PIM system, or the status of the order in the ERP system needs to be checked and, if necessary, compared with its own status. But if this would only be done when the client - from the perspective of an online store: the customer - visits the website, then the calculation of the page would probably take too long and the customer might lose interest.
A common solution to this problem are the so called cron jobs. Nearly all Unix-based systems have the cron daemon installed and since web servers are mostly operated with Linux operating systems, the cron jobs are the method of choice for classic web applications. Magento (as well as all other common frameworks) provides an own configuration possibility for the developers, so that the developers do not have to worry about the operating system specific characteristics of the cron configurations. Furthermore, it is sufficient to configure only one cron job from operator perspective, because the Magento's own logic takes care of the execution of all application-specific processes.
Besides a number of advantages for operation and development, this approach has some drawbacks. For example, due to numerous extensions, numerous processes can be executed within one cronjob and the execution time can exceed the given time schedule. This leads to a process jam, although the server capacity would be sufficient. This jam leads to deadlocks and thus inconsistencies in the database or to certain processes missing their regular execution time and consequently not or not correctly executed. Without going into too much detail about the technical problems and entanglements at this point: The result is an unstable system with numerous, supposedly random errors, inconsistent data and new surprises every day. To make matters worse, the errors are barely reproducible at all and this ticking time bomb causes frustration for both the merchant and the developers of the store.
How can these Problems be avoided?
Cronjobs are processes that are triggered on a time basis. This means that time is the only condition that decides whether the command is executed or not. Server administrators know the 3 o'clock problem when suddenly the whole system is under a high load because various programs start their backup. So we identified the limitation of the conditions to the time factor as one of the main problems in connection with classic cron jobs. Instead of "Execute the product import from the PIM system at 04:15 daily", we would prefer to formulate the condition as follows: "Execute the product import from the PIM system between 03:30 and 05:00 daily, when the server load is below 60% and no other imports are performed". Other conditions must therefore be able to be checked and combined.
Long-running processes are another issue. The previously formulated condition would not lead to a successful import of the products even if, for example, another import process is carried out within the defined period of time and blocks our execution. If you take a closer look at such long-running processes, you will notice that they are usually so-called process monoliths. Often they are scripts that contain a concatenation of many smaller processes. So if one reveals this process chain too, you would probably find out that the import process, which prevented the execution of our cron command, consists of partial steps. These sub-steps are for example "Get data", "Transform data", "Import data", "Re-indexing", etc. During the first two sub-steps, our product data import could have been executed and it was only the monolithic view of the neighboring process that prevented execution. Consequently, it should be possible to address the individual process steps explicitly in our conditions.
Pipeline-Pattern as Solution
Once the problems are identified, a solution can be developed. Our search has led us to an architecture pattern that is very familiar to most developers, but more from an end-user perspective. The pipeline pattern is used in most Continuous Integration (CI) tools, such as Jenkins or Gitlab CI. The developers define a pipeline to build, test and deliver software. Several conditions are used to define which step may be triggered and when. For example, via events, time specifications or manual releases. In summary, the pattern can be described as follows:
1. You need a Declaration of the Pipelines
The pipelines replace the process monoliths and you define the individual steps that belong to the process. The individual steps can be named and can be brought in relation to each other and to neighboring pipelines
We decided to use an XML-based declaration, since Magento already offers a mechanism to merge and read configurations across modules. This enables us to modify or extend existing pipelines with additional modules.
In the following picture you can see an exemplary configuration of an import pipeline. The pipeline consists of conditions that define when a pipeline should or may be started and several process steps. In our example the conditions are "IsExecutionTimeReached", "HasImportFiles" and "NoImportProzessIsRunning", each of which is represented by a class. Thus, numerous conditions can now be linked together. The process steps have a class for the actual business logic, as well as further conditions about how the individual steps are related to each other. Furthermore, the conditions can be used to create additional control mechanisms, such as ensuring that a process step is repeated in case of an error or similar. The relations between the steps ensure that the steps are processed in the correct order and enable independent steps to be executed in parallel.
2. You need a higher-level Control Process
An independent process is needed to control the conditions of the pipeline as well as the process steps. We called this process "heartbeat" as part of the implementation of the pattern. The heartbeat ensures that the status of all process steps is validated against the conditions and thus decides whether a step is now ready for execution. In addition, the conditions of the pipeline settings are also checked, which decides whether a new pipeline can be started.
This control process is actually the only cronjob needed to run the pipeline pattern. Important here is to restrict the responsibility of this cronjob to exactly the functions mentioned above. In fact, the actual execution of the individual process steps cannot pass through the Heartbeat, only the decision about the status. And this brings us to …
3. Execution of the Process Steps with the help of Runners, instead of cron jobs
For processing the individual steps, an additional instance is required. Traditionally, the most common CI-Tools refer to this entity as "Runner". The Runner is a permanently running process, which waits to receive an instruction from the Heartbeat to execute a process step. Several Runners can be active at the same time to allow parallel processing. Furthermore, the resources of the server are used optimally or limited in a natural way.
We used the existing integration of RabbitMQ in Magento for our implementation. This allows the Heartbeat to communicate via the message queue with the Runners, which are basically simple consumers of the queue. This approach also allows us to scale across servers, the runners can run on dedicated servers and therefore do not strain the resources of the servers that are responsible for customer interaction.
Conclusion
The Pipeline Pattern is the perfect solution for us to deal with the problems caused by complex background processes. The integration of the Pipeline Pattern within Pacemaker has served us well in numerous projects for many years and is the foundation of the product Pacemaker Enterprise.