Skip to content

Conversation

pavel-esir
Copy link
Contributor

@pavel-esir pavel-esir commented Sep 25, 2025

Description

Ticket: CVS-170883

Checklist:

  • Tests have been updated or added to cover the new code
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation

@Copilot Copilot AI review requested due to automatic review settings September 25, 2025 13:03
@pavel-esir pavel-esir marked this pull request as draft September 25, 2025 13:03
@github-actions github-actions bot added category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms category: cmake / build Cmake scripts category: Python API Python API for GenAI category: LLM samples GenAI LLM samples category: CPP API Changes in GenAI C++ public headers no-match-files category: GGUF GGUF file reader category: text streamer labels Sep 25, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds parsing functionality to OpenVINO GenAI, implementing both base parsers for batch processing and incremental parsers for streaming scenarios. The changes introduce parser classes for structured output processing, specifically reasoning content and tool calling, with support for both C++ and Python APIs.

  • Introduces new parser base classes (ParserBase, IncrementalParserBase) with concrete implementations (DeepSeekR1ReasoningParser, Llama32PythonicParser, BaseReasoningParser)
  • Adds TextParserStreamer class that extends TextStreamer with parsing capabilities for incremental text processing
  • Integrates parsers into the generation pipeline through GenerationConfig and provides comprehensive test coverage

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/python_tests/test_parsers.py Python test cases for parser functionality
tests/cpp/parser.cpp C++ unit tests for various parser implementations
tests/cpp/CMakeLists.txt Build configuration update to include nlohmann_json dependency
src/python/py_streamers.cpp Python bindings for TextParserStreamer class
src/python/py_parsers.cpp Python bindings for parser base classes
src/python/py_openvino_genai.cpp Integration of parser module initialization
src/python/py_generation_config.cpp Adds parsers field to GenerationConfig Python bindings
src/python/openvino_genai/py_openvino_genai.pyi Type definitions for parser classes and updated exports
src/python/openvino_genai/__init__.py Python module exports for parser classes
src/cpp/src/text_streamer.cpp Implementation of TextParserStreamer class
src/cpp/src/parsers.hpp Internal header with parser implementation details
src/cpp/src/parsers.cpp Core parser implementations and registration logic
src/cpp/src/llm/pipeline.cpp Integration of parsers into LLM generation pipeline
src/cpp/src/generation_config.cpp Adds parsers field support to GenerationConfig
src/cpp/include/openvino/genai/text_streamer.hpp Public header for TextParserStreamer
src/cpp/include/openvino/genai/parsers.hpp Public header defining parser interfaces
src/cpp/include/openvino/genai/llm_pipeline.hpp Adds parsed results field to DecodedResults
src/cpp/include/openvino/genai/generation_config.hpp Adds parsers field to GenerationConfig
samples/cpp/text_generation/parsed_output_sample.cpp Sample demonstrating parser usage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +110 to +150
extended = stream_string[:]
extended.append("")

for parser in parsers:
for (prev_subword, subword) in zip(extended, stream_string):
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zip operation creates pairs where prev_subword comes from extended (which has an extra empty string at the end) and subword comes from the original stream_string. This creates incorrect pairing - the last element of extended will be paired with the last element of stream_string, not providing the intended previous/current relationship.

Suggested change
extended = stream_string[:]
extended.append("")
for parser in parsers:
for (prev_subword, subword) in zip(extended, stream_string):
# Pair each subword with its previous subword (first element has no previous)
prev_subwords = [""] + stream_string[:-1]
for parser in parsers:
for (prev_subword, subword) in zip(prev_subwords, stream_string):

Copilot uses AI. Check for mistakes.


class ReasoningParser : public IncrementalParserBase {
private:
std::shared_ptr<ReasoningParserImpl> m_impl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<ReasoningParserImpl> m_impl;
std::unique_ptr<ReasoningParserImpl> m_impl;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually difficulty with unique_ptr. It didn't compile, i suspect it requires destructor, but i didn't manage to get it work yet. Can we leave it for the moment as is and continue review while i try to change it to unique_ptr?

Or we can maybe even leave it a shared ptrs? Since the only place where pointer is created in in ctor and we are sure it's really unique, and we already use shared_ptr for other pimpls in our codebase.

@Wovchena Wovchena requested a review from sgonorov October 17, 2025 09:26
@pavel-esir pavel-esir marked this pull request as ready for review October 17, 2025 10:37
@apaniukov apaniukov requested a review from yatarkan October 17, 2025 10:39


streamer = TextParserStreamer(genai_tokenizer, parsers=[DeepSeekR1ReasoningParser()])
breakpoint()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

namespace ov {
namespace genai {

class IncrementalParserBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No virtual destructor - better add it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i will add it

static std::string name() { return "Phi4ReasoningParser"; }
};

class ParserBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have virtual destructor

return m_pimpl->generate(inputs, generation_config, streamer);
auto res = m_pimpl->generate(inputs, generation_config, streamer);

// If streamer is of StreamerBase type, and it is TextParserStreamer, get parsed message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've extracted this logic into a separate method. Something like apply_parsers - otherwise generate is too overburdened with non-related logic. Also to complex, too much nesting, better flatten it a little bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will do that

: m_starts_with_thinking(starts_with_thinking),
m_keep_original_content(keep_original_content) {}

std::string parse(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is too big, consider refactoring it into smaller ones.

}

// Ensure the backends are registered before main
static bool are_backends_registered = register_backends();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this bool and why it's so important to have them before main? Can't they be lazily initialized on first request? Maybe it's better to introduce some kind of singleton like ParserRegistry.

// Regex to capture the [...] part
std::smatch m;
const std::string& text = input["content"].get_string();
std::regex r(R"(\[.*?\])");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::regex r(R"(\[.*?\])");
std::regex r(R"^(\[.*?\])$");

We also need to have anchors, otherwise strings like xx[xxxx]xxx will pass the test but won't be proper.

}

input["tool_calls"] = JsonContainer::array();
input["tool_calls"].push_back(JsonContainer({{"name", name}, {"arguments", kv}}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are more than one tool call?

registered_incremental_parsers[DeepSeekR1ReasoningParser::name()] = []() { return std::make_shared<DeepSeekR1ReasoningParser>(/*starts_with_thinking*/ true); };
registered_incremental_parsers[Phi4ReasoningParser::name()] = []() { return std::make_shared<Phi4ReasoningParser>(/*starts_with_thinking*/ false); };

registered_base_parsers[Llama32PythonicToolParser::name()] = []() { return std::make_shared<Llama32PythonicToolParser>(); };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no Llama32JsonToolParser?

Comment on lines +218 to +221
if (!generation_config.has_value() || (*generation_config).parsers.empty()) {
return res;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!generation_config.has_value() || (*generation_config).parsers.empty()) {
return res;
}

return res;
}

std::vector<std::shared_ptr<ParserBase>> parsers = (*generation_config).parsers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<std::shared_ptr<ParserBase>> parsers = (*generation_config).parsers;
std::vector<std::shared_ptr<ParserBase>> parsers = generation_config->parsers;

for (auto& parser: m_parsers) {
message = parser->parse(m_parsed_message, m_text_buffer, message);
// Message can be modified inside parser, if parser for example extracted tool calling from message content
// but parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

from transformers import AutoTokenizer
from utils.hugging_face import convert_and_save_tokenizer, download_and_convert_model
import re
import textwrap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import textwrap

JsonContainer msg;
msg["content"] = res.texts[i];
for (auto& parser: parsers) {
// TODO: Check the state of incremental parser and reset if necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only applicable for incremental parsers

for (const auto& parsed: dr.parsed) {
auto json_str = parsed.to_json_string();
py::dict json_dict = json_mod.attr("loads")(json_str);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +69 to +83
void call_parser(py::dict& msg, std::function<void(JsonContainer&)> func) {
auto msg_anymap = ov::genai::pybind::utils::py_object_to_any_map(msg);
auto msg_cpp = JsonContainer(msg_anymap);

func(msg_cpp);

auto json_str = msg_cpp.to_json_string();
py::dict result = json_mod.attr("loads")(json_str);

// update msg with result
msg.clear();
for (auto item : result) {
msg[item.first] = item.second;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void call_parser(py::dict& msg, std::function<void(JsonContainer&)> func) {
auto msg_anymap = ov::genai::pybind::utils::py_object_to_any_map(msg);
auto msg_cpp = JsonContainer(msg_anymap);
func(msg_cpp);
auto json_str = msg_cpp.to_json_string();
py::dict result = json_mod.attr("loads")(json_str);
// update msg with result
msg.clear();
for (auto item : result) {
msg[item.first] = item.second;
}
}
py::dict call_parser(py::dict& msg, std::function<void(JsonContainer&)> func) {
auto msg_anymap = ov::genai::pybind::utils::py_object_to_any_map(msg);
auto msg_cpp = JsonContainer(msg_anymap);
func(msg_cpp);
auto json_str = msg_cpp.to_json_string();
return json_mod.attr("loads")(json_str);
}

void init_parsers(py::module_& m) {
py::class_<IncrementalParserBase, ConstructableIncrementalParserBase, std::shared_ptr<IncrementalParserBase>>(m, "IncrementalParserBase")
.def(py::init<>())
.def("parse", [](IncrementalParserBase& self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use call_parser helper


StreamingStatus write(JsonContainer& message) override {
py::dict message_py;
auto json_obj = message.to_json();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed API

message_py[py::cast(it.key())] = py::cast(it.value().get<std::string>());
}

// call python implementation which accepts py::dict instead of JsonContainer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more description about the custom Python function call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for a custom parser implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for a custom parser implementation

kv[std::string((*it)[1])] = std::string((*it)[2]);
}

input["tool_calls"] = JsonContainer::array();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated input["tool_calls"] = JsonContainer::array(); - see L191

Comment on lines +8 to +10
#include <nlohmann/json.hpp>

using json = nlohmann::json;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nlohmann/json is unused

.def(py::init<>())
.def_property_readonly("texts", [](const DecodedResults &dr) -> py::typing::List<py::str> { return pyutils::handle_utf8((std::vector<std::string>)dr); })
.def_readonly("scores", &DecodedResults::scores)
.def_property_readonly("parsed", [](const DecodedResults& dr) -> py::dict {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return type seems to be list

Suggested change
.def_property_readonly("parsed", [](const DecodedResults& dr) -> py::dict {
.def_property_readonly("parsed", [](const DecodedResults& dr) -> py::list {

Comment on lines +97 to +106
static py::object json_mod = py::module_::import("json");
py::list result_dicts;

for (const auto& parsed: dr.parsed) {
auto json_str = parsed.to_json_string();
py::dict json_dict = json_mod.attr("loads")(json_str);

result_dicts.append(json_dict);
}
return result_dicts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once PR #2816 is merged, you can use utility function:

Suggested change
static py::object json_mod = py::module_::import("json");
py::list result_dicts;
for (const auto& parsed: dr.parsed) {
auto json_str = parsed.to_json_string();
py::dict json_dict = json_mod.attr("loads")(json_str);
result_dicts.append(json_dict);
}
return result_dicts;
return pyutils::json_container_to_py_object(dr.parsed);

For now it also uses json string serialization, but can be optimized later with direct native conversion.

Comment on lines +70 to +71
auto msg_anymap = ov::genai::pybind::utils::py_object_to_any_map(msg);
auto msg_cpp = JsonContainer(msg_anymap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto msg_anymap = ov::genai::pybind::utils::py_object_to_any_map(msg);
auto msg_cpp = JsonContainer(msg_anymap);
auto msg_cpp = pyutils::py_object_to_json_container(msg);

BTW, conversion through AnyMap does not preserve object keys order as AnyMap sorts them alphabetically

message_py[py::cast(it.key())] = py::cast(it.value().get<std::string>());
}

// call python implementation which accepts py::dict instead of JsonContainer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use C++ implementation instead of python to prevent double conversion? Or I am missing something?

Comment on lines +88 to +89
auto msg_anymap = ov::genai::pybind::utils::py_object_to_any_map(message_py);
message = JsonContainer(msg_anymap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto msg_anymap = ov::genai::pybind::utils::py_object_to_any_map(message_py);
message = JsonContainer(msg_anymap);
message = pyutils::py_object_to_json_container(message_py);

Comment on lines +177 to +183
static py::object json_mod = py::module_::import("json");

auto res = self.get_parsed_message();
auto json_str = res.to_json_string();
py::dict json_dict = json_mod.attr("loads")(json_str);

return json_dict;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static py::object json_mod = py::module_::import("json");
auto res = self.get_parsed_message();
auto json_str = res.to_json_string();
py::dict json_dict = json_mod.attr("loads")(json_str);
return json_dict;
return pyutils::json_container_to_py_object(self.get_parsed_message());

add_executable(${TEST_TARGET_NAME} ${tests_src} $<TARGET_OBJECTS:openvino_genai_obj>)

target_link_libraries(${TEST_TARGET_NAME} PRIVATE $<TARGET_PROPERTY:openvino::genai,LINK_LIBRARIES> gtest_main gmock_main)
target_link_libraries(${TEST_TARGET_NAME} PRIVATE $<TARGET_PROPERTY:openvino::genai,LINK_LIBRARIES> gtest_main gmock_main nlohmann_json::nlohmann_json)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be needed anymore

#include <gtest/gtest.h>
#include "openvino/genai/generation_config.hpp"
#include "openvino/genai/parsers.hpp"
#include "nlohmann/json.hpp"
Copy link
Contributor

@yatarkan yatarkan Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test with JsonContainer instead of nlohamnn::json? JsonContainer has equality operator that compares json objects under the hood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: cmake / build Cmake scripts category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: LLM LLM pipeline (stateful, static) category: Python API Python API for GenAI category: sampling Sampling / Decoding algorithms category: text streamer Code Freeze no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants