Skip to content

Conversation

@eyalleshem
Copy link
Contributor

@eyalleshem eyalleshem commented Oct 27, 2025

This continues the preparation for a borrowed Tokenizer (#2036) and adds more internal functions that borrow strings during tokenization.
This commit also handles places where the Tokenizer could modify the original strings. In such cases, the strategy is to create functions that return Cow<'a, str> and create an owned version of the string when modification is needed.
This commit is rebased on PR #2073 and will remain in draft until #2073 is reviewed.

@eyalleshem eyalleshem force-pushed the reduce-string-copies-cow branch from fa0db77 to d92e39a Compare October 27, 2025 10:15
@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

FYI @iffyio and @yoavcloud

Key points for this commit:
- The peekable trait isn't sufficient for using string slices, as we need
  the byte indexes (start/end) to create string slices, so added the current
  byte position to the State struct
  (Note: in the long term we could potentially remove peekable and use only
  the current position as an iterator)
- Created internal functions that create slices from the original query
  instead of allocating strings, then converted these functions to return
  String to maintain compatibility (the idea is to make a small, reviewable
  commit without changing the Token struct or the parser)
@eyalleshem eyalleshem force-pushed the reduce-string-copies-cow branch from d92e39a to b5e4533 Compare November 6, 2025 19:00
  Add internal _borrowed() functions that return Cow<\'a, str> to prepare for
  zero-copy tokenization. When the source string needs no transformation
  (no escaping), return Cow::Borrowed. When transformation is required,
  return Cow::Owned.

  The Token enum still uses String, so borrowed values are converted via
  to_owned() for now. This maintains API compatibility while preparing the
  codebase for a future refactor where Token can hold borrowed strings.

  Optimized: comments, quoted strings, dollar-quoted strings, quoted identifiers.
@eyalleshem eyalleshem force-pushed the reduce-string-copies-cow branch from b5e4533 to 954176e Compare November 7, 2025 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants