Skip to content

Conversation

@gargsaumya
Copy link
Contributor

@gargsaumya gargsaumya commented Sep 3, 2025

Work Item / Issue Reference

AB#38110
AB#34162

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request significantly improves the handling of large object (LOB) data types (such as large strings and binary data) in the MSSQL Python driver, especially for fetching and streaming variable-length data. The changes introduce robust streaming logic for LOB columns, prevent data truncation, and ensure correct type handling for both single-row and batch fetches. Additionally, the code now detects LOB columns and automatically switches to per-row streaming when necessary, improving reliability and correctness for large datasets.

LOB Streaming and Fetching Improvements:

  • Introduced the FetchLobColumnData function in ddbc_bindings.cpp to stream LOB data (CHAR, WCHAR, and BINARY types) in chunks, correctly handling nulls, null-terminators, and platform-specific encoding. This prevents truncation and errors when fetching large columns.
  • Updated SQLGetData_wrap to use streaming for LOB columns or when data length is unknown/too large, for both narrow and wide character types, as well as binary data. This ensures correct retrieval of all data regardless of size. [1] [2] [3]

Batch Fetch Logic Enhancements:

  • Modified FetchBatchData to detect LOB columns and use streaming fetch for those columns, avoiding exceptions and ensuring all data is retrieved for large columns in batch operations. [1] [2] [3] [4] [5]
  • Updated FetchMany_wrap to pre-scan columns for LOB types and, if any are found, fall back to row-by-row streaming fetch for those rows; otherwise, it proceeds with standard batch fetching.

Type Mapping and Constants:

  • Adjusted _map_sql_type in cursor.py to map long string types to SQL_WVARCHAR/SQL_VARCHAR with length 0 for streaming, aligning with the new LOB streaming logic.
  • Defined SQL_MAX_LOB_SIZE (8000) as the threshold for LOB streaming, centralizing the logic for when to treat columns as LOBs.

These changes collectively make LOB handling more robust, reduce the risk of data truncation, and improve compatibility across platforms.

Copilot AI review requested due to automatic review settings September 3, 2025 12:30
@github-actions github-actions bot added the pr-size: medium Moderate update size label Sep 3, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive streaming support for VARCHAR(MAX) data types by introducing a new LOB (Large Object) streaming mechanism in the C++ bindings and updating the Python cursor layer to handle long strings more efficiently.

Key changes:

  • Implements streaming-based data retrieval for large VARCHAR(MAX) columns to handle values that exceed buffer limits
  • Refactors SQL type mapping to use zero column size for long strings, triggering proper LOB handling
  • Adds comprehensive test coverage for VARCHAR(MAX) scenarios including boundary conditions, large values, and edge cases

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
mssql_python/pybind/ddbc_bindings.cpp Adds FetchLobColumnData function for streaming large column data and updates SQLGetData_wrap to use streaming for VARCHAR(MAX)
mssql_python/cursor.py Updates _map_sql_type to use SQL_VARCHAR/SQL_WVARCHAR with zero column size for long strings
tests/test_004_cursor.py Adds comprehensive test suite for VARCHAR(MAX) covering various data sizes, edge cases, and transaction scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 3, 2025
@gargsaumya gargsaumya changed the title FEAT: adding streaming support in fetch for varcharmax type FEAT: streaming support in fetchone for varcharmax data type Sep 3, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 3, 2025
Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Please resolve

@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch from f6b7389 to e21b47e Compare September 10, 2025 07:25
Copy link
Collaborator

@bewithgaurav bewithgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a re-review post solving conflicts

@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch from 11dac52 to 960edef Compare September 11, 2025 05:19
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 11, 2025
@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch from 4ee4c77 to 960edef Compare September 11, 2025 05:35
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 11, 2025
@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch from 78dc1e8 to 598a6be Compare September 11, 2025 05:50
@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch from 598a6be to 7f67326 Compare September 11, 2025 05:53
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 11, 2025
@gargsaumya
Copy link
Contributor Author

need a re-review post solving conflicts

The conflicts are now resolved. You can go ahead and re-review.

sumitmsft
sumitmsft previously approved these changes Sep 12, 2025
bewithgaurav
bewithgaurav previously approved these changes Sep 15, 2025
### Work Item / Issue Reference  
<!-- 
IMPORTANT: Please follow the PR template guidelines below.
For mssql-python maintainers: Insert your ADO Work Item ID below (e.g.
AB#37452)
For external contributors: Insert Github Issue number below (e.g. #149)
Only one reference is required - either GitHub issue OR ADO Work Item.
-->

<!-- mssql-python maintainers: ADO Work Item -->
>
[AB#38110](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38110)

[AB#34162](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/34162)
<!-- External contributors: GitHub Issue -->
> GitHub Issue: #<ISSUE_NUMBER>

-------------------------------------------------------------------
### Summary   
<!-- Insert your summary of changes below. Minimum 10 characters
required. -->
This pull request improves NVARCHAR data handling in the SQL Server
Python bindings and adds comprehensive tests for NVARCHAR(MAX)
scenarios. The main changes include switching to streaming for large
NVARCHAR values, optimizing direct fetch for smaller values, and adding
tests for edge cases and boundaries to ensure correctness.

**NVARCHAR data handling improvements:**

* Updated the logic in `ddbc_bindings.cpp` to use streaming for large
NVARCHAR/NCHAR columns (over 4000 characters or unknown size) and direct
fetch for smaller values, optimizing performance and reliability.
* Refactored data conversion for NVARCHAR fetches, using `std::wstring`
for conversion and simplifying platform-specific handling for both
macOS/Linux and Windows.
* Improved handling of empty strings and NULLs for NVARCHAR columns,
ensuring correct Python types are returned and logging is more
descriptive.

**Testing enhancements:**

* Added new tests in `test_004_cursor.py` for NVARCHAR(MAX) covering
short strings, boundary conditions (4000 chars), streaming (4100+
chars), large values (100,000 chars), empty strings, NULLs, and
transaction rollback scenarios to verify correct behavior across all
edge cases.

**VARCHAR/CHAR fetch improvements:**

* Improved direct fetch logic for small VARCHAR/CHAR columns and fixed
string conversion to use the actual data length, preventing potential
issues with null-termination and buffer size.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R1825-R1830)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L1841-L1850)

<!-- 
### PR Title Guide

> For feature requests
FEAT: (short-description)

> For non-feature requests like test case updates, config updates ,
dependency updates etc
CHORE: (short-description) 

> For Fix requests
FIX: (short-description)

> For doc update requests 
DOC: (short-description)

> For Formatting, indentation, or styling update
STYLE: (short-description)

> For Refactor, without any feature changes
REFACTOR: (short-description)

> For release related changes, without any feature changes
RELEASE: #<RELEASE_VERSION> (short-description) 

### Contribution Guidelines

External contributors:
- Create a GitHub issue first:
https://github.com/microsoft/mssql-python/issues/new
- Link the GitHub issue in the "GitHub Issue" section above
- Follow the PR title format and provide a meaningful summary

mssql-python maintainers:
- Create an ADO Work Item following internal processes
- Link the ADO Work Item in the "ADO Work Item" section above  
- Follow the PR title format and provide a meaningful summary
-->
@gargsaumya gargsaumya dismissed stale reviews from bewithgaurav and sumitmsft via fba171c September 15, 2025 02:59
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: medium Moderate update size labels Sep 15, 2025
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Sep 15, 2025
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Sep 15, 2025
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Sep 15, 2025
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Sep 15, 2025
@gargsaumya gargsaumya merged commit 1ed773c into main Sep 15, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: large Substantial code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants