-
Notifications
You must be signed in to change notification settings - Fork 644
Remove logging of dates from update-downloads #603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Er this I think indicates a bug? These should update at most like 1k crates per minute or something like that. If we're printing thousands of messages then something's going very wrong I think |
Hm soooo it might not be as much as I guessed, but take a look at papertrail for "worker updating versions" (to just get this log statement, not even the one i'm trying to get rid of). I think the worker got behind when sgrif's STD crate was crashing it? It seems to have started attempting to process more and more rows on sunday, and it looks like it's still increasing today even though I deleted the STD crate (using delete-crate.rs that also gets rid of the version_downloads rows) yesterday... It's hard to see if it's still erroring because of all these date lines though... |
Ah yeah the last result for "worker main" showing a crash was yesterday... investigating more.... |
It looks like the script sleeps for 5 min, then loops over batches of 1000 but it does try to process all rows where processed=false every 5 min. Right now, I think this script is querying for more rows than it needs, and doing an extra operation on both rows it needs and rows it does not need. The query that happens every 5 minutes is querying for all rows where processed = false. Then it skips over processing any rows whose Then it gets the difference between We're then skipping the rest of the loop for rows whose Am I missing anything about how this works? |
@carols10cents yeah that sounds about right. The tl;dr; is that I don't know how to implement download counts. I assumed that every download updating a global counter would be incredibly slow (and lossy), so every download updates a version-local counter and that's it. Those counters are then picked up by this batch process (serially) and it propagates the download counts upward (from version to crate, from crate to website). So the general gist is that this script looks at rows that need to be propagated upwards, but doesn't look at too many. I'd definitely believe there's more rows to be filtered out! |
That being said, 29k rows per 5 minutes should be 29 messages per 5 minutes not 10k/5min ... |
Urgh of course this is printing a line per row. I'm crazy. |
A not-insignificant portion of our production logs is the update-downloads worker printing the date:
I think this is printing like... 1000 times every 30 seconds or so.
In addition to removing log noise, this should hopefully help our papertrail quota stretch a bit farther.