Skip to content

Plugin Request: Implement Argument Normalizer Plugin (Native) #986

@crivetimihai

Description

@crivetimihai

Feature Request: Implement Argument Normalizer Plugin (Native)

Implement a native (in‑process) plugin named ArgumentNormalizer that stabilizes prompt/tool inputs by normalizing:

  • Unicode (NFC/NFD/NFKC/NFKD) and control characters
  • Whitespace (trim, collapse internal spaces, normalize CR/LF, optional blank-line collapse)
  • Optional casing (none/lower/upper/title)
  • Numeric dates to ISO 8601 (YYYY-MM-DD) with day_first/year_first ambiguity handling
  • Numbers to a canonical form (strip thousands separators; . as decimal)

Default mode should be non-blocking (permissive), returning modified payloads when changes occur. Target hooks: prompt_pre_fetch, tool_pre_invoke.

Problem Statement

Incoming args vary widely in normalization (Unicode forms, whitespace, casing), and frequently include ambiguous numeric dates or locale-specific number formatting. This leads to:

  • Prompt template mismatches and brittle tool args
  • Lower PII/regex filtering accuracy due to noisy inputs
  • Unnecessary retries and inconsistent behavior across environments

Proposal

Add plugins/argument_normalizer/argument_normalizer.py implementing a Plugin subclass with:

  • Config (ArgumentNormalizerConfig) for toggles and strategies (Unicode, whitespace, casing, dates, numbers)
  • Per-field regex-based overrides (field_overrides) to tune normalization by key path (e.g., user.name, items[0].title)
  • Safe recursive normalization for dict/list structures
  • Non-blocking results with modified_payload and minimal metadata; no violations by default

Recommended ordering: run before PII filtering so detectors see stabilized inputs.

Scope

In scope:

  • prompt_pre_fetch and tool_pre_invoke hooks
  • Unicode normalization, whitespace cleanup, optional casing, numeric date and number normalization
  • Per-field overrides and conditions to target specific prompts/tools

Out of scope (for this issue):

  • Resource hooks; advanced locale-aware date parsing libraries; schema validation; rate limiting

Configuration (Example)

- name: "ArgumentNormalizer"
  kind: "plugins.argument_normalizer.argument_normalizer.ArgumentNormalizerPlugin"
  description: "Normalizes Unicode, whitespace, casing, dates, and numbers in args"
  version: "0.1.0"
  author: "Mihai Criveti"
  hooks: ["prompt_pre_fetch", "tool_pre_invoke"]
  mode: "permissive"
  priority: 40
  conditions: []
  config:
    enable_unicode: true
    unicode_form: "NFC"
    remove_control_chars: true
    enable_whitespace: true
    trim: true
    collapse_internal: true
    normalize_newlines: true
    collapse_blank_lines: false
    enable_casing: false
    case_strategy: "none"  # none|lower|upper|title
    enable_dates: true
    day_first: false
    year_first: false
    enable_numbers: true
    decimal_detection: "auto"  # auto|comma|dot
    field_overrides: []

Ordering Guidance

  • Argument Normalizer should precede PII Filter. Suggested priorities:
    • ArgumentNormalizer: 40
    • PIIFilterPlugin: 50

Acceptance Criteria

  • Native plugin class ArgumentNormalizerPlugin implements prompt_pre_fetch and tool_pre_invoke
  • Config supports Unicode, whitespace, casing, dates, numbers, and field_overrides
  • Non-blocking behavior by default; returns modified_payload when changes occur
  • Recursive normalization works for nested dict/list args
  • Unit tests cover unicode/whitespace, numbers (comma/dot locales), dates (day_first), casing, and nested structures
  • README under plugins/argument_normalizer/ documents behavior, config, overrides, and examples
  • Docs mention in llms/plugins-llms.md and ordering note
  • Example config entry included in plugins/config.yaml (commented or sample)

Tasks

  1. Implement plugin in plugins/argument_normalizer/argument_normalizer.py
  2. Add unit tests in tests/unit/.../argument_normalizer/test_argument_normalizer.py
  3. Write plugin README with examples and tuning tips
  4. Update docs: add to llms/plugins-llms.md (Built‑in Plugins) and note ordering
  5. Add example configuration to plugins/config.yaml
  6. Validate with make doctest test and run targeted pytest selection

Risks & Mitigations

  • Over-normalization (e.g., changing intended casing): mitigate via field_overrides and disabled enable_casing by default
  • Ambiguous dates: controlled by day_first/year_first; default conservative transformations
  • Locale edge cases for numbers: decimal_detection with explicit comma|dot override when needed

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or requestplugins

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions