-
Notifications
You must be signed in to change notification settings - Fork 30
fix(concurrent): Reduce occurrences of database table is locked errors #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@maxi297 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 8 minutes and 21 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 Walkthrough📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request focus on the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HttpClient
participant Cache
Client->>HttpClient: Make HTTP Request
HttpClient->>Cache: Check SQLite Cache
alt Cache Hit
Cache-->>HttpClient: Return Cached Response
HttpClient-->>Client: Return Response
else Cache Miss
HttpClient->>HttpClient: Create New Session
HttpClient->>HttpClient: Send Request
HttpClient->>HttpClient: Handle Response
alt Error Occurs
HttpClient->>HttpClient: Log Error
else Success
HttpClient->>Cache: Store Response in Cache
HttpClient-->>Client: Return Response
end
end
What do you think about this diagram? Does it capture the flow accurately? Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
141-150: LGTM! The WAL mode implementation looks solid! 🎉Great solution using WAL mode to prevent database locks! The implementation is clean and the comment explains the purpose well. Would you consider adding a brief comment about any potential limitations or side effects of WAL mode for future maintainers? For instance, WAL files cleanup strategy, concurrent access patterns, etc. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/http/http_client.py(1 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/streams/http/http_client.py (2)
148-150: Verify other SQLite usage patterns in the codebase 🔍
The WAL mode implementation looks good! Should we verify that there aren't any other SQLite instances in the codebase that might need similar treatment?
✅ Verification successful
SQLite WAL mode implementation looks good! 👍
From the search results, I can see that SQLite usage is well-contained within the HTTP client caching mechanism. The WAL mode is consistently applied in the only place where SQLite is directly used (airbyte_cdk/sources/streams/http/http_client.py). The other mentions are just in dependency specifications and changelog entries.
The implementation follows good practices:
- Uses WAL mode for preventing "database locked" errors
- Keeps SQLite usage isolated to the caching mechanism
- Maintains consistent naming pattern for cache files
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for other SQLite usage patterns
echo "Searching for SQLite related imports..."
rg "import.*sqlite" --type python
echo "Searching for SQLite configuration or initialization..."
rg "sqlite|SQLite" --type python -g "!test_*"
Length of output: 355
Script:
#!/bin/bash
# Let's try searching without specifying file type
echo "Searching for SQLite related imports..."
rg "import.*sqlite"
echo "\nSearching for SQLite configuration or initialization..."
rg "sqlite|SQLite" -g "!test_*"
echo "\nLooking for specific SQLite files..."
fd -e db -e sqlite -e sqlite3
Length of output: 6825
Line range hint 452-461: Consider tracking the temporary rate limit status printing as technical debt 📝
The current implementation using print() with flush=True works but might cause issues in the future. The TODO comment explains why this approach was chosen, but should we create a tracking issue to ensure we revisit this when the concurrent message repository is ready? I can help create one if you'd like!
Let's check if there are any existing issues tracking this:
brianjlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the very detailed explanation.
I agree with your assessment on risk. Given this is just running on a container, if for some reasons it does get corrupted or there's an unexpected shut down, the worst that can happen is we need to start a new attempt. Doesn't seem like there are long term repercussion or data loss
Following the release of source-jira concurrent, we have seen some
database table is lockedissue on a connection.History
source-jira:3.4.0
We have seen
sqlite3.OperationalError: database table is lockedhere for the first time after enabling concurrency for source-jira. We had already seen similar issues (database is locked but I can't see references fordatabase table is locked). This error was happening consistently for this sync.source-jira:3.4.0-dev.1e489a955b
Trying to wrap the write in a failsafe manner, we were now seeing
sqlite3.OperationalError: database table is locked: responseson reads.source-jira:3.4.0-dev.ba45526fa4
As part of this release, we've also wrapped the read. This caused the app to logs warnings likes
WARN Error while retrieving item from cache: database table is locked: responsesbut did not crash because of thatsource-jira:3.4.0-dev.657e80a526
At that point, we fix part of the memory issue. There were a lot of errors
database table is lockedbut the connector was running longer because it would not consume the memory so fastsource-jira:3.4.0-dev.272b13a278
Blindly adding mode
fast_saveandwalto True. At this point, we don't see the error anymore.Proposed solution
The sync that was generating the issues is now running and since there is >30 hours or I/O before it is done, it is annoying to just cancel it. I would have liked to test without
fast_savebecause it seems to have some important drawback on consistency based on this, this andIf the application running SQLite crashes, the data will be safe, but the database [might become corrupted](https://www.sqlite.org/howtocorrupt.html#cfgerr) if the operating system crashes or the computer loses power before that data has been written to the disk surface.in this description. In the last statement, I don't know whatSQLite crashesmeans but my reasoning is the if one thread crashes, there are a couple of possible things that can happen:I'm fine with this level of risk.
If we still see some issues with database being locked, we can add the SkipFailureSQLiteDict implementation that was used to fix the customer's connection which would just ignore caching errors entirely.
Summary by CodeRabbit
New Features
Refactor
These changes enhance the efficiency and reliability of the HTTP client, providing a smoother user experience.