Skip to content

Proposal: Optional Compression for Large Attributes (e.g., db.statement) in Spans and Logs #2068

@oszlak

Description

@oszlak

Background
Certain attributes in OpenTelemetry spans and logs — such as db.statement, http.request.body, and custom extra fields in logs — can sometimes carry extremely large payloads (multiple MBs).

This creates several issues:

Increased span/log size impacts backend storage and indexing.

Large attributes can be difficult to search and slow to process.

Some backends may truncate or even drop large spans/logs entirely.

In many real-world cases (e.g., SQL statements, large prompts, large request bodies), users do not actually query directly on the full content of these attributes — they rely on other labels or metadata.

Proposal
Introduce an optional mechanism to compress large attributes at the SDK level before exporting them:

Use a lossless compression algorithm such as LZ4.

Base64-encode the compressed data so it remains compatible as an attribute value.

Optionally add a secondary indicator attribute:

db.statement.compressed = true
Compression would be applied only if the attribute exceeds a configurable size threshold (e.g., 1KB, 10KB, etc.).

Example Flow
Detect if db.statement (or other attr) exceeds threshold.

Compress with LZ4.

Base64 encode it.

Replace the attribute with the encoded value.

Optionally include a flag attribute to indicate compression.

Benefits
Reduces payload size significantly, especially for known large fields.

Retains data for inspection or debugging (just decompress + decode).

Avoids external storage costs and complexity.

Easier to implement and manage than uploading to cloud storage.

Minimal disruption to current workflows.

Concerns
Compressed attributes will not be human-readable or directly searchable in most backends.

Adds some CPU cost for compression (very minimal with LZ4).

Requires optional support in tooling to decompress for inspection.

Alternatives Considered
External cloud storage (e.g., upload large attributes to S3 and link in span):

Adds cost, latency, access control challenges, and complexity.

Querying or searching still requires custom tooling.

Dropping large attributes:

Complete data loss, limits post-mortem/debugging capabilities.

Truncating attributes:

Risk of breaking SQL/log structure, and still partial data loss.

Target Use Cases
db.statement for SQL queries in tracing.

http.request.body, http.response.body.

Large custom logRecord.attributes["extra"].

GenAI spans: prompts and responses.

Notes
Open to feedback and willing to help drive a proof of concept if the community finds this useful.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions