Fix ( colang v2 serialization ): prevent Unknown d_type by encoding non-registered dataclasses as dicts #1429

gautamvarmadatla · 2025-10-01T04:56:15Z

Summary

colang/v2_x state serialization was type-tagging all dataclasses (e.g., {"__type":"Foo","value":...}), which breaks decoding for classes outside Guardrails’ known types (name_to_class is built from colang_ast + flows). When such JSON is round-tripped via json_to_state, decoding raises Unknown d_type: Foo.

This PR keeps type tags for known Guardrails classes (so they still round-trip as structured objects) and encodes unknown dataclasses as plain dicts ({"__type":"dict","value":...}), ensuring robust decoding for user-land or third-party dataclasses. Adds a unit test to prevent regressions.

What’s affected (scope in the framework)

Module: nemoguardrails/colang/v2_x/runtime/serialization.py
- encode_to_dict (encoding path) — changed
- decode_from_dict (decoding path) — unchanged, but now protected from unknown dataclass tags
- state_to_json / json_to_state — behavior preserved; round-trip is more resilient
Runtime surfaces that rely on state JSON:
- LLM rails logging & tracing (state snapshots emitted during generation/execution)
- Action/tool logging (e.g., passthrough and tool-calling paths that serialize intermediate state)
- Persistence/telemetry/debugging that stores or reloads State JSON

Changes

Encoding rule for dataclasses:
- If type(obj).__name__ is in name_to_class (i.e., Guardrails’ own Colang/flows types) → retain type tag ({"__type":"ClassName","value":...}) to enable full object reconstruction.
- If not in name_to_class (unknown/user-land dataclass) → encode as dict ({"__type":"dict","value":...}) to avoid Unknown d_type on decode.
Tests: tests/test_serialization_dataclass.py ensures an unknown dataclass is encoded as a dict payload and decodes safely.

Rationale

Real-world states can contain custom dataclasses from actions, tools, or integration code. Previous behavior emitted {"__type":"CustomClass"} which decode_from_dict cannot map back (since name_to_class is limited), causing hard failures when logs are reloaded or states are restored.
This change preserves lossless round-trip for Guardrails’ native types, while guaranteeing JSON-safety and decode-safety for everything else.

Testing

Unit test (new):
- python -m pytest tests/v2_x/test_serialization_dataclass.py -q

Backward compatibility & risk

BC-safe: Known Guardrails classes still produce type-tagged JSON and decode to original objects as before.
Safer defaults: Unknown dataclasses previously produced JSON that could not be decoded; now they decode to plain dicts with the same field values.
Schema note: For unknown dataclasses, the on-wire shape remains the project’s typed envelope ({"__type":"dict","value":...}), so downstream consumers that already tolerate dict-encoded nodes remain compatible.
Performance: Negligible; only affects dataclass branch during encoding.

Developer notes

name_to_class is populated from colang_ast_module and flows_module. The new rule relies solely on that mapping to decide when to keep a class tag vs. downgrade to dict.
If future modules add decodable types, they will naturally benefit from the keep-tag path without changes here.

Links

Fixes bug: state_to_json() doesn't correctly serialize dataclasses #1378

…nit test Signed-off-by: Gautam Datla <[email protected]>

codecov-commenter · 2025-10-01T05:09:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Pouyanpi

Hi @gautamvarmadatla, thanks for opening this PR.

Your proposed fix (encoding unknown dataclasses as dicts) violates the test's contract:

  check_equal_objects(state, state_2, "state")

please see tests/v2_x/test_state_serialization.py

Because:

goes in: EvalScore(label="ok", score=0.87) (dataclass)
comes out: {"label": "ok", "score": 0.87} (dict)

This breaks type assumptions in subsequent flow code that expects the dataclass type.

The serialization system is designed for lossless round-trip of state objects. It only supports arount 50 internal NeMo Guardrails types (from colang_ast and flows modules). This is intentional: the state serialization is for runtime state persistence, not arbitrary Python objects.

Solution:

Dicts are the intended data format for action return values. They work perfectly with Colang's expression system:

async def get_eval_score():
  return {"label": "ok", "score": 0.87}

flow main
  $result = await get_eval_score()
  bot say "Score: {$result.score}"

Colang automatically wraps dicts in AttributeDict (see eval.py:138-141), so you get dot notation
for free!

If you need dataclasses for type safety, IDE support, or validation in your Python code, convert them before returning:

from dataclasses import dataclass, asdict

@dataclass
class EvalScore:
  label: str
  score: float
async def get_eval_score():
  eval_score = EvalScore("ok", 0.87)
  # Your Python code gets type safety here
  return asdict(eval_score)  # convert to dict before returning

Please let me know if you have a different use case that the above solution does not fit it.

fix (serialization) : encode unknown dataclasses as plain dicts;add u…

787b7f4

…nit test Signed-off-by: Gautam Datla <[email protected]>

gautamvarmadatla force-pushed the fix/dataclass-serialization branch from 689f847 to 787b7f4 Compare October 1, 2025 05:02

gautamvarmadatla mentioned this pull request Oct 4, 2025

bug: state_to_json() doesn't correctly serialize dataclasses #1378

Closed

4 tasks

tgasser-nv self-requested a review October 6, 2025 18:53

Pouyanpi requested a review from schuellc-nvidia October 13, 2025 12:28

Pouyanpi requested changes Oct 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ( colang v2 serialization ): prevent Unknown d_type by encoding non-registered dataclasses as dicts #1429

Fix ( colang v2 serialization ): prevent Unknown d_type by encoding non-registered dataclasses as dicts #1429

Uh oh!

gautamvarmadatla commented Oct 1, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 1, 2025

Uh oh!

Pouyanpi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix ( colang v2 serialization ): prevent Unknown d_type by encoding non-registered dataclasses as dicts #1429

Are you sure you want to change the base?

Fix ( colang v2 serialization ): prevent Unknown d_type by encoding non-registered dataclasses as dicts #1429

Uh oh!

Conversation

gautamvarmadatla commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What’s affected (scope in the framework)

Changes

Rationale

Testing

Backward compatibility & risk

Developer notes

Links

Uh oh!

codecov-commenter commented Oct 1, 2025

Codecov Report

Uh oh!

Pouyanpi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gautamvarmadatla commented Oct 1, 2025 •

edited

Loading