Skip to content

Cron starts when it's already running #10650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kanduvisla opened this issue Aug 24, 2017 · 26 comments
Closed

Cron starts when it's already running #10650

kanduvisla opened this issue Aug 24, 2017 · 26 comments
Assignees
Labels
bug report Fixed in 2.2.x The issue has been fixed in 2.2 release line Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release

Comments

@kanduvisla
Copy link
Contributor

I noticed that when I start a cron that's already running it will run again. For example: I have a task that takes 15 minutes, but the cron executes it's every 5 minutes. In the database the cron gets the status running and the other scheduled jobs have pending. But after 5 minutes, the other job (with the same code) also starts to run. So now I have the same job running twice!

Preconditions

  1. Magento 2.1.7

Steps to reproduce

  1. Create a cron that takes a while (for example, sleep 15 minutes)
  2. Schedule it to run every 5 minutes

Expected result

The other jobs should wait for the first job to complete, even though they are scheduled to run every 5 minutes.

Actual result

The new job starts, ignoring the fact that it's already running...

I'm not sure if this is intended behavior or a bug. Am I expected to create my own lock / flag for this?

@magento-engcom-team magento-engcom-team added G1 Passed bug report Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed and removed G1 Passed labels Sep 5, 2017
@scottsb
Copy link
Member

scottsb commented Sep 21, 2017

Note to anybody working on this: ticket #10279 was closed as a dup of this but contains additional details about the root cause that I had already tracked down.

@magento-engcom-team
Copy link
Contributor

@kanduvisla, thank you for your report.
We've created internal ticket(s) MAGETWO-80790 to track progress on the issue.

@magento-engcom-team magento-engcom-team added 2.1.x Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release labels Oct 3, 2017
@gabrielqs-redstage
Copy link

Hey there. I'm working on it #SQUASHTOBERFEST

@sylvainraye
Copy link
Contributor

sylvainraye commented Oct 15, 2017

@gabrielqs-redstage Sorry I was already working on it yesterday and sent the PR. It was not marked bc I forgot it while the ticket was open on my side.

@gabrielqs-redstage
Copy link

@ishakhsuvarov can you please assign this one to @diglin instead?

@maksek
Copy link
Contributor

maksek commented Oct 17, 2017

@gabrielqs-redstage reassigned.

@s-hoffman
Copy link

s-hoffman commented Dec 8, 2017

Hi,

Does anyone know why concurrent tasks with same job_code is not prevented by the following:

Magento\Cron\Observer\ProcessCronQueueObserver => if ($schedule->tryLockJob()) {
Magento\Cron\Model\Model\Schedule => if ($this->_getResource()->trySetJobUniqueStatusAtomic(
            $this->getId(),
            self::STATUS_RUNNING,
            self::STATUS_PENDING
        )) 
Magento\Cron\Model\ResourceModel\Schedule => trySetJobUniqueStatusAtomic

Query that was generated by the test was this

UPDATE `cron_schedule` AS `current`
 LEFT JOIN `cron_schedule` AS `existing` ON existing.job_code = current.job_code AND existing.status = 'running'
SET `current`.`status` = 'running'
WHERE (current.schedule_id = '111') AND (current.status = 'pending') AND (existing.schedule_id IS NULL)

It seems designed to prevent two concurrently running tasks with the same job code.
(It is also covered by tests at Magento\Cron\Model\ScheduleTest in the integration test-suite)

@elvinristi
Copy link
Contributor

@s-hoffman this is in progress in #12497

@s-hoffman
Copy link

@elvinristi, as I understand that ticket, it is adding MySql based locking around the cron group's execution and does not explain why trySetJobUniqueStatusAtomic is failing to catch this error.
(It may prevent concurrent execution by definition but does not directly address trySetJobUniqueStatusAtomic's failure).

@paveq
Copy link
Contributor

paveq commented Dec 11, 2017

@s-hoffman I think what Magento does in $schedule->tryLockJob() is locking individual scheduled execution run of a job, so that specific job run is not triggered again. What it does not prevent is running the same job code (but different scheduled execution run) at the same time, and it was never intended to do so. $schedule is referring to a row in cron_schedule table, where there are multiple rows per job code.

@ihor-sviziev ihor-sviziev self-assigned this Dec 15, 2017
@dmanners
Copy link
Contributor

@s-hoffman random thought but does your database contain more than 1 schedule for the given status? I wonder if the code fails if there is already more than one entry in the database.

@s-hoffman
Copy link

@paveq, Not [running the same job code (but different scheduled execution run) at the same time]?
What can I do when I do not see where in the code that is the intention?
As far as I can see, the method that is currently called, is named, trySetJobUniqueStatusAtomic.

The Sql that is generated by the method is:

UPDATE `cron_schedule` AS `current`
 LEFT JOIN `cron_schedule` AS `existing` ON existing.job_code = current.job_code AND existing.status = 'running'
SET `current`.`status` = 'running'
WHERE (current.schedule_id = '111') AND (current.status = 'pending') AND (existing.schedule_id IS NULL)

This clearly shows intent to check for currently running schedules, without checking scheduled time or any other timestamp field.

Sets schedule status only if no existing schedules with the same job code
     * have that status.  This is used to implement locking for cron jobs.

The comment also indicates an intent to implement locking.

Also, there are integration tests in .\dev\tests\integration\testsuite\Magento\Cron\Model\ScheduleTest.php
These tests testTryLockJobNoLockedJobsSucceeds & testTryLockJobAlreadyLockedFails & testTryLockJobOtherLockedFails & testTryLockJobDifferentJobLocked, try to cover the basic scenario of, no existing running jobs and yes existing running jobs.

The above description does include a temporal component. (But after 5 minutes, the other job). Otherwise the tests seem to cover the scenario described above.

@dmanners, I can some tests on my local using the SQL statement above, and tried to add more schedules. I was unable to replicate it by adding extra jobs, though it was guesswork testing. (low cost low time investment).

@dmanners
Copy link
Contributor

I have added a back-port pull request for the 2.1-develop branch so that both 2.1 and 2.2 should behave in the same way with regards to cron scheduling. (See #12805). We are also getting the core team involved with #12497 to try to work on cron scheduling with the community.

@paveq
Copy link
Contributor

paveq commented Dec 21, 2017

@s-hoffman Indeed, it looks like this was introduced in 2.2 here eacb702

Either it does not work, or if it does it causes even larger issues. If we blindly check for "running" status, since crashed job will cause that job to never run again until the row is removed from schedule table.

In my opinion MySQL based locking which gets released when connection closes is better solution here, as it actually depends on the running process.

@nobodypb
Copy link

@paveq Actually I just had an issue, where some cronjobs didn't start after updating from 2.1 to 2.2, because there where old entrys of failed jobs with running status in the schedule table.

@s-hoffman
Copy link

s-hoffman commented Dec 21, 2017

@paveq, This ticket is currently labeled_ 'Reproduced on 2.2.x'.
Is that label correct?

@xi-ao
Copy link

xi-ao commented Jan 11, 2018

@s-hoffman Yes, I can confirm we can reproduce the issue on a 2.2.2 EE.

@ghost
Copy link

ghost commented Jan 14, 2018

This issue also exists in CE 2.2.x. Its hilariously server breaking.

@johnny-longneck
Copy link

What's the state of this? We have 2.2.2. CE running and cron is not running correctly and needs to be fixed asap.

@ericvhileman
Copy link

We wrote an extension to fix these bugs, speed up performance, and control the execution of tasks: https://github.com/magemojo/m2-ce-cron

@ihor-sviziev
Copy link
Contributor

@ericvhileman why have you decided to create separate module instead creating PR with all your improvements?

@johnny-longneck
Copy link

@ihor-sviziev , I can just assume but I saw tickets and problems with already created Pull Requests being ignored for months. And as we know, major bugfixes often just get into the next major release, which happens once a year.

As users working with production environments, we need solutions asap.
Same happend when I found out paypal is no longer working starting with 2.2.0 - needed a community module like this to fix this.

@ihor-sviziev
Copy link
Contributor

ihor-sviziev commented Mar 23, 2018

@johnny-longneck thank you for clarifying. Specially I'd like to add these fixes directly into magento, it will fix issues for a much more people than separate module. Is there any list of issues and how they were fixed in your module?

PS: specially I'm creating PR with all needed changes and use following article to apply these changes into our projects: https://twitter.com/s3lf/status/957993636058288128. As result - these changes will be merged at some point of time and we're getting issue fixed on production right now.

Also if you have any issues with long process of merging - you can contact community maintainers in slack or just mention in comment, it will help to speedup this process.

@ghost
Copy link

ghost commented Mar 23, 2018

Yes most people agree @johnny-longneck , waiting on Magento to release a fix is a complete waste of time when problems exist in the present, not the next release so we move on.

Instead we get more 3rd party stuff added in releases which is basically an advertisement for a given vendor.

@ericvhileman
Copy link

ericvhileman commented Mar 23, 2018

@ihor-sviziev

  1. We're a hosting provider with many live customers on M2 with this problem. We need a fix right now and one that works for every version.

  2. We don't know what Magento's own plans are for a fix. This cron issue has been open for a very long time. It has not been properly addressed. It is impacting customers severely. See 1.

  3. We submitted a PR with fixes initially. No response. See 1.

  4. PR merges are very slow. Releases with fixes are very slow. See 1.

  5. M2 fixes are not being back ported. Customers on live M2 stores are not able to upgrade to the latest version. See 1.

We're happy to work with the core team to get this merged into the core. We will be at Imagine in April at the hackathon and will discuss it further with the comm eng team. See 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Fixed in 2.2.x The issue has been fixed in 2.2 release line Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release
Projects
None yet
Development

No branches or pull requests