Skip to content

Conversation

@gargsaumya
Copy link
Contributor

@gargsaumya gargsaumya commented Sep 3, 2025

Work Item / Issue Reference

AB#38110
AB#34162

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request improves NVARCHAR data handling in the SQL Server Python bindings and adds comprehensive tests for NVARCHAR(MAX) scenarios. The main changes include switching to streaming for large NVARCHAR values, optimizing direct fetch for smaller values, and adding tests for edge cases and boundaries to ensure correctness.

NVARCHAR data handling improvements:

  • Updated the logic in ddbc_bindings.cpp to use streaming for large NVARCHAR/NCHAR columns (over 4000 characters or unknown size) and direct fetch for smaller values, optimizing performance and reliability.
  • Refactored data conversion for NVARCHAR fetches, using std::wstring for conversion and simplifying platform-specific handling for both macOS/Linux and Windows.
  • Improved handling of empty strings and NULLs for NVARCHAR columns, ensuring correct Python types are returned and logging is more descriptive.

Testing enhancements:

  • Added new tests in test_004_cursor.py for NVARCHAR(MAX) covering short strings, boundary conditions (4000 chars), streaming (4100+ chars), large values (100,000 chars), empty strings, NULLs, and transaction rollback scenarios to verify correct behavior across all edge cases.

VARCHAR/CHAR fetch improvements:

  • Improved direct fetch logic for small VARCHAR/CHAR columns and fixed string conversion to use the actual data length, preventing potential issues with null-termination and buffer size. [1] [2]

@github-actions github-actions bot added the pr-size: medium Moderate update size label Sep 3, 2025
@gargsaumya gargsaumya changed the title FEAT: streaming support in fetch flow for nvarcharmax data type FEAT: streaming support in fetchone for nvarcharmax data type Sep 3, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 3, 2025
Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch from f6b7389 to e21b47e Compare September 10, 2025 07:25
Copy link
Collaborator

@bewithgaurav bewithgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will wait for latest main branch merge, will refactor this PR to a great extent

@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 11, 2025
@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch 2 times, most recently from 598a6be to 7f67326 Compare September 11, 2025 05:53
@gargsaumya gargsaumya force-pushed the saumya/fetchone-nvarcharmax branch from 63834f3 to d0ccd44 Compare September 11, 2025 06:20
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 11, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 11, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 11, 2025
@github-actions github-actions bot removed the pr-size: medium Moderate update size label Sep 11, 2025
@github-actions github-actions bot added the pr-size: medium Moderate update size label Sep 11, 2025
@gargsaumya
Copy link
Contributor Author

will wait for latest main branch merge, will refactor this PR to a great extent

The conflicts are now resolved. You can go ahead and re-review.

### Work Item / Issue Reference  
<!-- 
IMPORTANT: Please follow the PR template guidelines below.
For mssql-python maintainers: Insert your ADO Work Item ID below (e.g.
AB#37452)
For external contributors: Insert Github Issue number below (e.g. #149)
Only one reference is required - either GitHub issue OR ADO Work Item.
-->

<!-- mssql-python maintainers: ADO Work Item -->
>
[AB#34162](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/34162)

[AB#38110](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38110)

<!-- External contributors: GitHub Issue -->
> GitHub Issue: #<ISSUE_NUMBER>

-------------------------------------------------------------------
### Summary   
<!-- Insert your summary of changes below. Minimum 10 characters
required. -->
This pull request improves the handling of large object (LOB) columns
when fetching data from SQL Server in the
`mssql_python/pybind/ddbc_bindings.cpp` file. The main change is to
detect LOB columns (such as large strings or binary data) and switch to
a per-row fetching strategy using `SQLGetData_wrap` for those columns,
ensuring correct streaming and memory usage. For non-LOB columns, the
batch fetching logic remains unchanged, but now includes logic to handle
LOBs appropriately during batch fetches.

**LOB detection and fetch strategy:**

* Added logic in both `FetchMany_wrap` and `FetchAll_wrap` to detect LOB
columns by checking column data types and sizes, and to fall back to
per-row fetch using `SQLGetData_wrap` when LOBs are present. This avoids
buffer overflows and streams LOBs correctly.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2647-R2675)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2769-R2797)

* Updated calls to `FetchBatchData` in both wrappers to pass the list of
detected LOB columns, so batch fetches can handle them appropriately.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2677-R2690)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2770-R2812)

**Batch fetch improvements:**

* Modified the signature of `FetchBatchData` to accept the LOB columns
list and updated its logic to handle LOB columns differently: instead of
throwing exceptions when buffer sizes are insufficient, it now streams
LOB data using `FetchLobColumnData`.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2345-R2345)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2388-R2396)
[[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2408-R2410)
[[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2428-R2424)
[[5]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2520-R2518)

<!-- 
### PR Title Guide

> For feature requests
FEAT: (short-description)

> For non-feature requests like test case updates, config updates ,
dependency updates etc
CHORE: (short-description) 

> For Fix requests
FIX: (short-description)

> For doc update requests 
DOC: (short-description)

> For Formatting, indentation, or styling update
STYLE: (short-description)

> For Refactor, without any feature changes
REFACTOR: (short-description)

> For release related changes, without any feature changes
RELEASE: #<RELEASE_VERSION> (short-description) 

### Contribution Guidelines

External contributors:
- Create a GitHub issue first:
https://github.com/microsoft/mssql-python/issues/new
- Link the GitHub issue in the "GitHub Issue" section above
- Follow the PR title format and provide a meaningful summary

mssql-python maintainers:
- Create an ADO Work Item following internal processes
- Link the ADO Work Item in the "ADO Work Item" section above  
- Follow the PR title format and provide a meaningful summary
-->
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: medium Moderate update size labels Sep 15, 2025
@gargsaumya gargsaumya merged commit fba171c into saumya/streaming-fetchone Sep 15, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: large Substantial code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants