-
Notifications
You must be signed in to change notification settings - Fork 30
feat: StreamThreadException investigation spike for Bing Ads source #710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
devin-ai-integration
wants to merge
4
commits into
main
Choose a base branch
from
devin/1755287258-bing-ads-stream-thread-exception-spike
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
feat: StreamThreadException investigation spike for Bing Ads source #710
devin-ai-integration
wants to merge
4
commits into
main
from
devin/1755287258-bing-ads-stream-thread-exception-spike
+339
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Document root cause analysis of UTF-8 decoding error with GZIP data - Identify issue in CompositeRawDecoder parser selection logic - Outline investigation areas and proposed fixes for concurrent source framework - Reference issue #8301 with campaign_labels stream error Co-Authored-By: unknown <>
- Create test script demonstrating StreamThreadException root cause - Reproduce exact error: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte - Test both failing scenario (missing Content-Encoding) and correct GZIP handling - Validate header-based parser selection in CompositeRawDecoder Co-Authored-By: unknown <>
- Add ImprovedCompositeRawDecoder with auto-detection of GZIP content - Detect GZIP magic bytes (0x1f 0x8b) when Content-Encoding header missing - Provide better error handling for UTF-8 decoding of GZIP data - Add recovery mechanism for StreamThreadException in campaign_labels stream - Create Bing Ads compatible decoder configuration Co-Authored-By: unknown <>
Original prompt from API User
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1755287258-bing-ads-stream-thread-exception-spike#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1755287258-bing-ads-stream-thread-exception-spike Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
- Apply ruff formatting to test_gzip_utf8_issue.py - Apply ruff formatting to fix_gzip_parser_selection.py - Ensure code style compliance for CI checks Co-Authored-By: unknown <>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
StreamThreadException investigation and fix (spike, do not merge)
Summary
This spike PR investigates and proposes a fix for issue #8301 - a
StreamThreadException
in the Bing Ads source connector where thecampaign_labels
stream fails with:Root Cause: Byte
0x8b
is the GZIP magic number, indicating that GZIP-compressed data is being incorrectly passed to a UTF-8 decoder. This occurs when theCompositeRawDecoder
's parser selection logic fails to detect GZIP content (likely due to missingContent-Encoding
headers), causing compressed data to be treated as plain text.Proposed Solution: Enhanced
CompositeRawDecoder
with auto-detection of GZIP content by magic bytes, better error handling, and graceful fallback mechanisms.Review & Testing Checklist for Human
🔴 High Risk - 5 Critical Items
test_gzip_utf8_issue.py
in a proper environment to verify it reproduces the exact error (had import issues locally)CompositeRawDecoder
or create new implementation - current proposal creates separate classcampaign_labels
stream to verify it resolves the issueCompositeRawDecoder
Recommended Test Plan:
campaign_labels
stream to reproduce the errorDiagram
Notes
CompositeRawDecoder
- integration approach needs decision