Skip to content

reject event if fields count exceed 250 #1311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 14, 2025

Conversation

nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented May 12, 2025

keep all attributes as individual columns in the ingested event
add check for fields count in any dataset
reject event if fields count exceeds P_DATASET_FIELDS_ALLOWED_LIMIT
default value is set to 250
Fixes: #1310

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added a configurable limit for allowed dataset fields via command-line flag and environment variable.
    • Introduced error handling for datasets exceeding the allowed field count, returning clear error responses.
    • Added warnings when dataset field count approaches the configured limit to encourage dataset creation.
  • Refactor

    • Simplified OpenTelemetry (OTel) log, trace, and metric attribute handling by removing intermediate attribute maps and merging steps.
    • Expanded lists of known OTel fields for logs, metrics, and traces, improving attribute classification.
    • Streamlined attribute insertion by consolidating attribute maps into a single flat structure.
  • Bug Fixes

    • Improved error propagation and HTTP response codes for ingestion errors related to field limits.

keep all attributes as individual columns in the ingested event
expose env `P_OTEL_ATTRIBUTES_ALLOWED_LIMIT` to configure the allowed limit
for attributes count
if attributes count in flattened event > the allowed limit
log the error, and reject the event

Fixes: parseablehq#1310
Copy link
Contributor

coderabbitai bot commented May 12, 2025

## Walkthrough

The changes add a new CLI and environment option to limit dataset fields count, enforce this limit during ingestion with a new error variant, and simplify OpenTelemetry attribute handling by removing intermediate "other_attributes" maps. Known otel field lists are expanded, and attribute insertion is streamlined without separate merging.

## Changes

| File(s)                                                                                  | Change Summary                                                                                                                                                                                                                                                                                     |
|-----------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `src/cli.rs`, `src/option.rs`                                                           | Added `dataset_fields_allowed_limit` option with CLI flag and env var, default 250, validated by a new parser function.                                                                                                                                                                           |
| `src/handlers/http/ingest.rs`                                                           | Added `FieldsCountLimitExceeded` variant to `PostError` enum and mapped it to HTTP 400 Bad Request.                                                                                                                                                                                             |
| `src/handlers/http/modal/utils/ingest_utils.rs`                                         | Added pre-ingestion check in `flatten_and_push_logs` to fetch stream schema and enforce dataset fields count limit, returning `FieldsCountLimitExceeded` error if exceeded.                                                                                                                     |
| `src/otel/logs.rs`, `src/otel/metrics.rs`, `src/otel/traces.rs`                         | Removed intermediate `other_attributes` maps; inserted attributes directly into main JSON maps; expanded known field lists; renamed some schema URL keys; simplified flattening and attribute insertion logic without changing core functionality.                                               |
| `src/otel/otel_utils.rs`                                                                | Removed constants and all logic related to attribute segregation and merging of `other_attributes`; simplified `flatten_attributes` and `insert_attributes` to handle attributes in a single flat map; removed multiple helper functions related to attribute merging and serialization.         |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant User
    participant CLI
    participant IngestHandler
    participant StreamSchema
    participant OtelFlattener

    User->>CLI: Start with --dataset-field-count-limit or env var
    CLI->>IngestHandler: Pass parsed options including limit
    IngestHandler->>StreamSchema: Fetch schema for stream
    StreamSchema-->>IngestHandler: Return field count
    IngestHandler->>IngestHandler: Check if field count exceeds limit
    alt Field count exceeded
        IngestHandler-->>User: Return FieldsCountLimitExceeded error (HTTP 400)
    else Within limit
        IngestHandler->>OtelFlattener: flatten logs/traces/metrics with direct attribute insertion
        OtelFlattener-->>IngestHandler: Return flattened data
        IngestHandler-->>User: Success response
    end

Assessment against linked issues

Objective Addressed Explanation
Remove other_attributes field; insert all otel attributes at the top level (#1310)
Restore ability to query all otel attributes directly via SQL (#1310)
Do not segregate otel attributes into a nested structure (#1310)
Add attribute count limit enforcement and error handling (#1310)

Suggested labels

for next release

Poem

Oh, what a hop for otel logs,
No more fields hidden in nested bogs!
Attributes now all up front and clear,
Query with SQL, nothing to fear.
If you add too many, a warning you'll meet—
But now your dashboards are rabbit-ly neat! 🐇✨


<!-- walkthrough_end -->

<!-- announcements_start -->

> [!TIP]
> <details>
> <summary>⚡️ Faster reviews with caching</summary>
> 
> - CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure `Review - Disable Cache` at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the `Data Retention` setting under your Organization Settings.
> 
> Enjoy the performance boost—your workflow just got faster.
> 
> </details>

<!-- announcements_end -->

---

<details>
<summary>📜 Recent review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 566ea152d19202fd0672c84618de69d6bf48b874 and 1636d3108139d1e7c36134366cba15b6beb0917a.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `src/handlers/http/modal/utils/ingest_utils.rs` (2 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* src/handlers/http/modal/utils/ingest_utils.rs

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (10)</summary>

* GitHub Check: Build Default x86_64-pc-windows-msvc
* GitHub Check: Build Default aarch64-apple-darwin
* GitHub Check: Build Kafka aarch64-apple-darwin
* GitHub Check: Build Default x86_64-apple-darwin
* GitHub Check: Build Default aarch64-unknown-linux-gnu
* GitHub Check: Build Kafka x86_64-unknown-linux-gnu
* GitHub Check: Quest Smoke and Load Tests for Standalone deployments
* GitHub Check: Build Default x86_64-unknown-linux-gnu
* GitHub Check: coverage
* GitHub Check: Quest Smoke and Load Tests for Distributed deployments

</details>

</details>
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNwSPbABsvkCiQBHbGlcSHFcLzpIACJ/ETFIEikMUPgAM0g0+BIvWmQmbBTEgA8GEiiAJgBWAAYAGmj7bAFmdRp6OUhsREpIDHgAawQvRHgMWDRuZx6VSPRkWz7HAV6ARgBmVdW6lBSKRWwy/ImMUmQCMNgPJPZds/F8LF58I8QNGCuu7lpqa4wHfznCahNA+dC4XAUeACPDSdD+SDMbQpZFRNDIMa0STwWjYUEKLyOf6QADu6gQWFwnzG9yiNyKP1waHeAEkQbQ8uhIBJQTjqPBHvYaNxLtRGFcGAMgR4MMtevgMlkcpyxugMPJGeiSKFcVDTncQgKMKyMlSPAUigrMtlcsgSKVypyzQoMFkiNh/PQvPBWjqSFlyB15M6yBJ4PsMGwijyobMPAADKwAfQAIgBBaBpgDKAFFoEmAGIsnMAGRTWaTaZLJYA8gB1HMppMllkAWRZ0HjOxJCAYsEgSjSaG8uHO+Eg1Xql2uyVSyDiona7zgqD7mFI6A5gJ6GMQDg8Zg2qxqavo2mYY8SrvwFDKXO9vv4lM+suYKz4VqVtp4vXpoQuSJjEyqqarsNBEFCuDyJg9CILIiA0MwP4UGkN5IhgZTLp83DeL4/hBCEjCYJAKx+Ek2QkmioSwBC3CIBwAD0DFEOSzQaEwzAMVMFAzAIkSwAEXHTCQcZcbhDFHqsGjmJYADCLBRqOTROC4bifI8XjyBxilPtOnhgvhwQISgyBpPsSHOsOVI3js3S9P0QzwCMYwTNxvGRFhJBaY8WQUBZwLoGCvDkfg3SaX42g9PQ2nsMgExSCR5RYGMTAUNwN6/PQQHjs6Xi/MZ2nqO8ABy45pB6Zp8FiiAMN0oyCjemSOgIaCSiZZGpUoWXPh4zxhl1CisLF0kGPoxjgFAZD0FaVmEKQ5BUO0g2KVwvD8MIi6SHCnRMEoVCqEV2i6GAhgmFAK67mqOAEMQZDKEtMUpFwVAkspSIuCR3l7SoaiaEdY3jaYBiILeDEMN6Gg8RwBjRLDBgWJAaYsrdC2ZW9zjyFaa6nNIRhpn0JCvUw/xMkU8bppmub5kWpZNrJNYAKrFfmLbtp2pLklyPKEh4VqTqS6JgfsuJlLQnmQPGNbcA8/zxkKFCHKEJKC/aNAYANZJUly5CvThfHwAw1rKhLmo9LgSZfnkSagl4+CUbQSYPuoctWlB3AJt08AAF4kPGWGoJb7XE26HposgaDLTBYDeuQ8LujpMFXmGEY6TG8Bxvw0tGhzWu5Y8RCZHlBfxhApvamAltgBauDRz6zs7KG4aPKnzjp3xCbJhT2Z5oWxZlkm9NMyzbYdl2XKDsOXihNzwT2NqYQ5S+hMughmC4DsicR7VCEsNyoKz25vTlRhMsS9zfIyxwHDn4yJBJmX5uW4g1s+HbdCO3XuB+x8HhXF4IpsD3GgDcqE+BUlQPgLOgpUDRAIEyXwr53z8EVDaTk/hHoDVVBHB+0QSrjnwJVSAtsWIMB2MTSE+BfBpFtiSHYjVKD7D4CcWgMcC7K2QMwRQ6RshiyMDJRGU97pGkvM6JQ4NnD8keMgK09p0oUCWo1PW3pDbsHUNkRARgoBpg5FEYmq8yZ6xXqELuVNe60wHozZmzYR7QC4J7H2kAAC8E5ahy1VPGEGDAwYQx4n7UaiMdH0EDvGQxD8LaoOfjbN+DsnacC6KMH2csLjOiUQbeWisJZSxlogNxWAPGg3BvASGOSc79mdKhV+ZJ9TOHjrcagkJoSwnogYSAugJZmAANo1IABS21OA3DAEgnExE7hmbu1M+500scPNm0QdgTxHEmGed9QjOJMT3Gm/dB5WNZh2HYyykyHz4M4m+kiMBX1OTQe+1AtSPwiS/Gh79Yk7D/iKZxsCCH4kQfKFByp5yiAUlNKIWCBw3LNtEAAlAAXXjDDOGo0gaeIYswyIPFkW0QYjSEIxToaw2iPDSwSMUb3SiA4FSmMMjYzOHjAmr0GGNTTmvCWBYInyUKLgEsn8cwOjoHQbpWYGn9Pid7EgtkEkkAhW48OgSF56UTPgBCOYKCMLlmQRwsr/DBR6EUCOQ4nKh1JFcHqC94FLDfD8o234QUPxKGUOg4cXQh09EQz+4t4w2GkOlf4JAlUqpQMwbgkRFJnMyI1eVirlU3ilV8W+9AAKTEuKgHWe9YxFGSZ8AAEtAaAVghTUG6AoJQEsABCaYmw2BzAARQZjmLM7NukABYag1AhXg/ghDiEGzIY8ChVCaF0L4PSphMFWHinXHCZwHggLC0OHQEa/C0yCMWsI2VojRB5WXVI5BJQ5EKL4KklRKQ1G438dogajKyYsr+WylInLfTcrtV1flgqiBipFW+n2krV2fHDbgX1UarzqvcUilFlBEDotwNwTFOMELFL8VABm3w0bxh9IG9AYh4DFCTJRAQV8PWIC9T0f9fBQHWAVX+yNFBckSxAyOsDEGoNYtg742VoG9LJt/cRq+V7bQ3o5VynlXU5YXtCJ0fwuAPT9H1FmnNeaJP5EUAmUt5aq01rrX4vFmjEWg1A2imikGGKcJ+F4BieAnLgaY+bMzIwcVwvxQjIl80SWwUcO9ClY6YNqQTNQ+pZBrbq0Od0WAjt8BEBKeiWQGFMiFAw4KdhMa0YXBSoSItW8bayojsm15R8Yun3jFIKEaRZDXKZGbcJfykzVzlmKdQyBV7yP9nFHI7sSO5ezuJqENxpSMA9P4Io3yPy/Ktcam1NUrhIlDWAz4LFkjyxEkhROHFuJwnUOgIgyICo+XgO6Z1VJASwEoXkE0elA7V1tY6ZAAAOGoABSbdllKlRGeSgUIxCHXKwoFJgu6DAXqxpIwfw/Jqm0tBaV7Ux3ymoIUOy879q9IqEQJQ2EgVHlek/jsVbb2rpDtPGRCTn2HWcco9x1loVb0CcfXQVVlGN4CBvOIfU5V5FXD4JZo0jWhYHCONj5w4VLnZwQiQEUVIxRkFAdzm1p2yevc/jIf0N4PCYHkMQwuvmvv8H3UF/7Lxao8TbQQlnHmziyvIfsPtdsNeJEo5AFF/3KIIiREoEaDml1nJEZ8MRG63fbtkfTqIijmjKKvOEdRmjIAFja4KNAgSuDxjSFgAr6RithKfpV6X3SEKA+YEmDAaA2BcAAGSZ4hboPQkB8MjmAN0iFOwrDkeI3oaj+SvG6fA/pqDRnQSmfECMaD9wkzWbeL4/xEeT4C+27n/HHhsBIfaLHwxEWovx9VxCPzMFAuIGC297pQhEfnMgAANX3qKubeec955IIXzPOxiFJkRx6MoheSyhazKFW8J/uCVe6AQbPT9C8ZvRFgFbEmGAAFT1FfUgDAJpD0BLxOnL2kEr2r1r3r0o0b12Box0zoz0wxU7xM0Hz7xCAHx7yHxKW6WjwGgYHSwuHy0oCTxK1uXK1tDT3ZWq1CHq1wAhTsy0zACMCRQIRyAYje1s00wc2Ric0WlJVcwxm3SpRPXcAHHSAyH8E4SkHDjBDsmkQyCAkoDYCxF+AlgN0oGtghChBhBoBKSRDoh4H8DDFCkQHCjsjjXHFalqmYG8H0PqVMKaTnm4l+HCncR81XwwBCyICTB+woFoDHjjzyiCLvyYHdhCLHkTmiLVyTH4K8BCJyTdTGB6HkWMIaTMOkDlmPli2SmQAwAt0oJ8CiCxHQQiExmNUAiwAACkswaxipERJhkBNYDs8AuQehfCaB0NXD3CCAKBxY9FSZQhJZoBSxmwawABxJMAAaWKnrGKjMX7hbHU3alkRgiiDMl3gADZZVVgTiBgKiSQsBA5c9AEN4OR/sn4mg+x5gYgapIE75biSA5kYgfMwsfjog/xz82AfjE5DC9dIAWQ8lAi1Y0iaAMi3sx5nQ2BrJppKVKFIgMNGcYi1YohAQ387wsdZDTJzISg1YsRgcpAxBGoeiJZVYpoklxwcJN9/sRIXjWj2j1p4hUg/t+o8QfBoJkB4wj8eYr4awNoxBv55CAx8RxNJNt0UjYj0jMjo1RgA1vQlQnC8d5TnQqSxiFC6jwoejQpQhLDuAaR3gawCsbYdhnRiT7BUMNToITDGkhjbd9QxMSAVD/tPRCgfgihtC/I6B04hjLDw51ZT9mAY4PA7SpFIRFZs5E45TPtt0YS7p6AOSOjBBuS3g+ECUBEaBN1iR00PBPcJFskfdihd1/dNd9ZD0Q8T0oB5ISYmU9j1Y6B59mgjFIAaxZiSx5iljVj1jNimxtjbFIAOki9IQABuSAI4mFSAQAJMIJZDFJiey+yByVi1i6wNjNlRyWQ60uBJzM9ZyziFzgNQZ0jBDQsSDYVNMEUeDgZLz4TDNtQoQGASDcU4ZRDiUJCXNyUZCThqUDB5CsQ0glCvT8BVCUd4k4QrRAzdCQyExwT8ivDzC5YwzrCQowp5Apg9wohqCcjKBzZPDXSijcc2AKBSAvRfg+BaTkTkRIBMzOirCsEwQ0z1cSiKzulFTYT7QvTA1nAoiOKc85QKB6DDl8AgIckdheK/MEAd4II89hKcS/NZEkwFKCAlLmAVLUiyU3MkiIy5LgiUT3zn5wjIjW1ISSYRJaAN4XTCjw4ERLjdhcilpajFx/CUhF5eooQ3MmK2iOiwyG4nZJ9/sqiRhZV4xKLSBUKyLn4xgkxd9HhqsjLyC4SWc4rHKkx0gc8CEkwBKoIpSFKHT1TuEJ0HLkd3SC5PTvT9RfT1YmUELgz9DMLE4YrpARp5DJimUZi5jWw8wbAWRZIKwhydyRzrEdjUA2yBpDikIqhZV1gG1g9OtEB7iKSC50J5B9EfgIjEQ3y0kbiL9/kYiCLxx+L1Tph5kbkeApKUg1q1sIISB1tT5EI5FeQoIdh3j3YN4Iz8SH8PAUTo8bkRooSJYRLlTTKDYZK9JdoZRxw+lSA+AGBAdzDLcOqMQUgdCWqhjSLCiWLTJGp/r39cdvrFdKrzDxYBgvIhTogyakwoaPy4jxs0AB8KAvBohUr6B4xYhpACS75Gbn4xsvTWaPQObqsEQ+sL8tSeb6bhakQ2bxbccebiayhmaRbFbOayICNNopBwpSM9EkA1YGBZBxYOLSV4jAaDqPyArOTszFwnLSzwxPLcKgtgVvK9JAQRxuRFxGpBZhTj8xSJSv5SkTT9qqL/slcySqBwQCjkcwyRorTlAfBbTPh7S1TA0k94cKap1vV5Fs5O1DZaqoL/tmq9Dcac6CbcczRwxw6zgG4qzE5nQLi7ZriIkXUEJfr6Apa2B7AWAPB5aI4GYbASxIBqb4JJsV4jayATbPIERJ0+hxx7SSyFAEQOL/tC7Lccdqr518zF1CzvcV6yyizNCd0/dppayg9VFxBGzIBmz9FQgZ9Y1OyBBuzez+rBrhrRrtzdzJlJrxzjyZzIAqgFzlyQkuy1z37+yBroAhqRqtzhy9z/6jypyKBZylrzy8k+CXzBa4MR9I8sBRgiBJ8DUYsgKOyVyuzl8Ib4SGbrbEBulAFEBgFL9IBWx6GUwblYCy8D9RBgARTgg0CLyvErzcHh97yIBHzsGBDlTB9hDvzCUxC7o/z0YPosZyGNEQK06NGyIVCYy064y14hSxqNiMxYGWRi0GZZiKxbAcwiwAANLmzJaADNHMGwSsbNIayx6xlYnMAATWq0RtKRgs3pFzYKF3LKojjrRuyjHsuKeH8CyGKAnQjOwSNpSlCGiBQrxqaUaGYvtrEDdREpyfQuizHzi0Fgzo1J4VlQ8rEC8rcpgpKY8HHrAGWR4G0B4jAmcIdNODmEsNDr6KyCXW1z4B6Cepev+zNGYHeHkmVU9UeA2s0lkqIryOaZKXiyfsSyXqoM+CqfKu5uKZzo2fJDDuqv6IiYkJiGyeOcaEsJGkXSoQIfDK9FCzSX8FOq1JippB2DSG1D7B+bnljG9C9kBc3geP1CycqiytycyFJMaguHyeDu6MoA8GUKgrnQ+FQGS2wCUCFI6phfMJyuCOSowCiIJZ6IKuKHSdOEJaKNkopfJCpZpdCIKa/lktyPThBbvnWeEv+eC3WYtnMiSr3z5dwD7DpaFpfUMu5vSpueiekByrSDyvNkKtkD8QMFKnbUNy4qNHxHtPt0BujxjOcMrqYGqNKIxzzplgHRdAK3qiwGs2PVzIMCTqoBTr0jAvRP+EoQvgqoVZtxHVLo9ojlGD6Y8ECJYs+l0ZLv1DcKnngGjjQFkFRbPEroGPLMTIjO+f1ELt3pdwPorKPvXUza3RkSrPPstwPWD2dbDw9T0eikMaegljXJMY8fMe8drSTFsYcZQZPLnIXLmowJEZfNkeILwagHrYxcbZbObfAdfsgZcbcfba8asa7eWP8av0hGKNJObwYivLHfMwnbYa4U1LKdKIdOIfzX8BfsyGuNUuCPWbIOOcL14YYGAGWK8gEZID0DoWhcFdJcLzcNCGAO4FAJfR2G/ZgNLzYZAKgKFSg7ljAcMWoYfclefYVfokgALw6U/dkG/ahW4dg7A/g4gMQ/QL3YPdobkeHygFbFPZqd1cFCIZIZvcodfuX1WZIuOcYcmCA76NA/A/AMg+P1/djrQukFfb4bw6g7/cyoA73345A7g4g8P1E6/WQ6oeSmtfQ8sKU+I6E8Bdk/E/isL1w6/eP0I6b2kZM0PZs1o/gIbfPcvnY7vbrp5eOeJZFceAw4k6w8E9I5E55jE6pO8/3wL2A8Pz4YC9U6g5gJ3d3ko9Heo/HYc6nakCCQIdj2X0ZapGZYQhpHQ/WcLxi+E7U+C/sshClYoC3YoCI6ySNEM9ODQKHaS5kZS6PbS8goy+c6NGy6wFy+C3tBZbSODt89M+w9K6M9E4bpG88Kw4Lym4Q/U5g4a8eCa6IBa93Zs/3Y6/s5yX8XS4OKy/BsIdoN5B9iK5fcm5U7K6g/q6gQwA2628S527s9vMO+6+O/Kf33nbc7+fFYFc87mrC53z32tmVSk/faW7I9E+g7gJh6C8EYS6Qja9s724+8na+8y5+9veof5clbvxfXG8cpK9u+m+C6I9I5R+Hd2/a5yCIM64O6x6c6Y9+5Q6wDlf/c89yoqNVYDSgjB8eH08R/K8Ebk6MOK5u5I9i/U5p7R7p4yJo4O4ke0xHYEMhFak6qhi4J/PELRn0ukPUfHU0dAsUJjegvS26BYe3WIjLqQoMO58w4woTSTJyDRmy1ax++QB4q5/k55+Vb54KoF/VdkoB4lcFZB9JfJcoFisFcSuj5L1an2D3D0lUQRAV816ODg2tA8kgPdgYG4QitkDIRtkvBQx095cXqIXzl6CZGpq5GA4zkSdRYwhNaunjHlb85d+4FnJyJoGj1Tq8gNJdtcuIpM8cu6b0iZCovnmYvubvqkUCGCBSGWZCbecNjCcev8Beqib87hd3j/BeZdQwClCn876d+74otj4IvGxIlagGCn+dAI2ImYumtCsyipvictU5G9GMj6v7KwM0wskddj/Qmpjlo0M1A4qSVWALULg6wE8E/HWr/Zx6J1bZvYEto7BVaJ+P8DfjGADAu69gKYJz2ObvBWwYwImmQDzza4tC1rKIKgP4AMBdcD1EgBoCIAaBZKdNPsBrTFpa1UAmqPKKLGjay1La6tBWrwJdh8AVafNAGmINFrs0tagsSYM8F4BIVxYTRFfGrEeLPNIAPFESnLWIG6UlStDLPvS3Bpocj+Rg2EjHClAytzBelYgWEQBQRF4wSfBEPszPbF0ww+oZ0Pbw8KV1gEEzENO1Vj7/ZBcdELAQgQTb6gQIztBIFx2zgzRjmT/T4ICAL7iAEo8/LouLAxbutfApuShIXAtxgk8AOENgvGTEAGplCjFMhuOjsokQ+ilkSuuc1QCZ55s0ZIMBbymZ7NLm+hc1piVPghCI6+oK0FCxZyNB1mzuQlK7iLY+UBwJbE+pWWrIX1PAdZGtjfU0ZQAtWy9OYSx2vZwUMgvueRDWVWFB42e0iQdEeg2HSRVeUjS8o93kb2ZFGv5A3lITUaUoNGeMZKAGnpwCxpUA0C4AIHALoFYys7aYusgmTmJtkMyUeAf38gJhCkGFA4Hn3xjJo2eZ8XkLfHoJlZU8USe2B/F9Byx4s06LnO7TlT84UqiIZEawKxamQCGHTHiBOnljBsyhU/COPGHsS+wwIz1XoItglBn90gekQ+PQHabeg4QPRVUL4IwiEhRgCUKgDjDhGQBVgUVCERNWhHWI2Y38FkKaE+DtNUAN8DHKEGTLEhJYAwDPCKklR98dRHgPUdIj6JWh5RpAQ0dqQJzY5rcTDG3jrUL5FZuhNozERFBxjHY3IjxbQCMGdHGjwyVuRhLsCxCUEGcBcXUsfnsC9FcgiUK6NoSRr64O0G/S3AUPNxEwdGhrTok7gXQzCV0xbcRIsPLbLCq2geNJNfVDz4Mfum4LqHjwTyYjfg2I7UIwSth4inkn8DPLVyI4V4p4wATkTsFI5CMBu1IjEd6EZBGgiR5IYRvuweHiN4UAMM6FeDRLXQ5oyjNGI9DiQvRVG8gHaIpn2i/QtAOgMaJuMKjmwcQ5lciITHfjsFjohgAwJuJqDlA/mAAdm/GXYjiaQGoGkHWC1A0gAgNIABJICfiGADaI4g2gYACAKgv4qoAIF/EABOb8YBNfHviJokAGoOsEuy0A0g34hgJBIgkCBCJqwEgEcQqBpBaAFQdYAwCqAVBaANQAQLQEuyASjiVQBtLQAYA1Bvx2EzcWgC2Dfi0JNQBgJsAqBoSGJZxSgjUAbT+h4Jl2NADJN/EcT1gAgBCXxMknKjrxuEqoGhKqCXYKgaAI4mUAqDNoBADaAiTBKok6TwJNQI4qsHAo1BagGwGoBUBMkRx9JEASAOJIbRVAIJNEmoGgDSD8S6J/4kgN+JqBKA0AEk1YEoFWACAqgDACCd+IqApTVgFQISbhKImGS6JUE1QDxPYmoTvx0U0QDZJon+g0ADaTiZdlWCBT1gdANAB4F8mTRYJ5QIKQwC8n/iZJRxRifFLAnNp+JzU+iQ2gbRoSG0FExCeJLQm5S/JCExKesC8kMBLsxkiabQCOLUSG0FQVYPxN2mrBo8Ek3SXtMuwISlq6wBaVACClpBsppksKTUEuxoTrsaE8SceFUnfjaAsEgaesHCkQTUpZky7NNJOLtTgGRxbaSJOYm0BVgMkzyXRKckZS1psE1YIROoloStp4EuqRRO/HLUwZZxdYEcVoCbAnpGwDGVRJImEyNgVU8yS1FgECAjiKwAQDUDQmrBvxPkk6IDCgC3icqVsGwhRHfhTRXxQAA== -->

<!-- internal state end -->
<!-- finishing_touch_checkbox_start -->

<details open="true">
<summary>✨ Finishing Touches</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> 📝 Generate Docstrings

</details>

<!-- finishing_touch_checkbox_end -->
<!-- tips_start -->

---

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>

<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=parseablehq/parseable&utm_content=1311):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate docstrings` to [generate docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) for this PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### CodeRabbit Configuration File (`.coderabbit.yaml`)

- You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository.
- Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json`

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
src/otel/traces.rs (2)

35-67: ⚠️ Potential issue

Mismatch between generated field names and KNOWN_FIELD_LIST will mis-classify built-in keys as “user attributes”

OTEL_TRACES_KNOWN_FIELD_LIST lists scope_schema_url / resource_schema_url, yet the code later inserts a plain schema_url key (see lines 97-102 & 129-133).
Because schema_url is not in this list, the enforcement block on lines 139-156 will count it as an extra attribute, potentially rejecting otherwise valid events once the limit is low.

@@
-    "scope_schema_url",
+    "scope_schema_url",
+    "schema_url",                 // added so it matches the inserted key
@@
-    "resource_schema_url",
+    "resource_schema_url",

(Alternatively rename the inserted keys – see next comment – but the list and the insertions must agree.)


97-104: ⚠️ Potential issue

Inserted key should match naming convention (scope_schema_url / resource_schema_url)

If you prefer the prefixed style used elsewhere (scope_metrics_schema_url, resource_metrics_schema_url, …), update the insertion sites instead of expanding the constant. One possible patch:

-        span_json.insert(
-            "schema_url".to_string(),
-            Value::String(scope_span.schema_url.clone()),
-        );
+        span_json.insert(
+            "scope_schema_url".to_string(),
+            Value::String(scope_span.schema_url.clone()),
+        );
@@
-        resource_span_json.insert(
-            "schema_url".to_string(),
-            Value::String(record.schema_url.clone()),
-        );
+        resource_span_json.insert(
+            "resource_schema_url".to_string(),
+            Value::String(record.schema_url.clone()),
+        );

Either apply this or extend the constant as shown in the previous comment; keeping them inconsistent will trigger false‐positive limit violations.

Also applies to: 129-133

🧹 Nitpick comments (7)
src/handlers/http/ingest.rs (1)

501-502: HTTP status choice – consider 400 (Bad Request) instead of 417

417 Expectation Failed is rarely used and some proxies drop it. The violation is a client input problem, so 400 keeps semantics simple and interoperable:

-PostError::OtelError(_) => StatusCode::EXPECTATION_FAILED,
+PostError::OtelError(_) => StatusCode::BAD_REQUEST,

Not blocking, just food for thought.

src/otel/logs.rs (1)

180-198: Attribute counting may over-count due to previous cloning

attribute_count is computed after merging all maps; cloning and then counting adds unnecessary heap allocations.

A micro-optimisation (avoids clone):

-        for resource_logs_json in &mut vec_resource_logs_json {
-            resource_logs_json.extend(resource_log_json.clone());
+        for mut resource_logs_json in vec_resource_logs_json {
+            resource_logs_json.extend(resource_log_json.clone());

(Not critical; optional.)

src/otel/traces.rs (1)

139-156: Attribute-count check may run hot and performs an unnecessary clone()

  1. .keys().filter(...).count() allocates an iterator every time.
  2. Immediately afterwards the entire Map is cloned only to be pushed into vec_otel_json.

A small optimisation keeps the code readable but avoids the extra allocation:

-            let attribute_count = resource_spans_json
-                .keys()
-                .filter(|key| !known_fields.contains(key.as_str()))
-                .count();
+            let attribute_count = resource_spans_json
+                .iter()
+                .filter(|(k, _)| !known_fields.contains(k.as_str()))
+                .count();
@@
-            vec_otel_json.push(Value::Object(resource_spans_json.clone()));
+            vec_otel_json.push(Value::Object(resource_spans_json));

(You then need to drop the .clone() call and move, not copy, the map.)

src/otel/metrics.rs (2)

35-71: Keep the “known fields” list minimal and accurate

OTEL_METRICS_KNOWN_FIELD_LIST contains resource_schema_url & scope_schema_url, but those keys are never inserted in this module. Dead entries do no harm today, yet they make the intent unclear and risk future drift. Consider pruning unused names to keep the whitelist self-documenting.

No functional breakage, but tightening this list helps prevent silent mismatches like the one flagged in traces.


568-586: Micro-optimisation: avoid re-scanning the entire map for every push

Just like in traces.rs, you can compute attribute_count before merging the map into vec_otel_json, and you can move the map instead of cloning it to cut memory traffic. Same patch pattern as previously suggested.

src/otel/otel_utils.rs (2)

152-162: Accept slices instead of &Vec for better API ergonomics

flatten_attributes (and callers such as insert_attributes) receive &Vec<KeyValue>.
The function never needs a growable vector, only iteration, so &[KeyValue] is the idiomatic choice and frees callers from owning a Vec.

-pub fn flatten_attributes(attributes: &Vec<KeyValue>) -> Map<String, Value> {
+pub fn flatten_attributes(attributes: &[KeyValue]) -> Map<String, Value> {
@@
-    for attribute in attributes {
+    for attribute in attributes {

Apply the same change to insert_attributes:

-pub fn insert_attributes(map: &mut Map<String, Value>, attributes: &Vec<KeyValue>) {
+pub fn insert_attributes(map: &mut Map<String, Value>, attributes: &[KeyValue]) {

This is a non-breaking internal refactor and yields a slightly more flexible API surface.


197-201: Consider deriving Serialize for OtelError

OtelError is forwarded to HTTP handlers; deriving Serialize (or at least Display + IntoResponse) would let you return structured JSON error bodies rather than plain strings, improving client experience.

No immediate action required, raising for future UX enhancement.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d844b3 and 0eefe77.

📒 Files selected for processing (7)
  • src/cli.rs (1 hunks)
  • src/handlers/http/ingest.rs (3 hunks)
  • src/handlers/http/modal/utils/ingest_utils.rs (1 hunks)
  • src/otel/logs.rs (5 hunks)
  • src/otel/metrics.rs (12 hunks)
  • src/otel/otel_utils.rs (3 hunks)
  • src/otel/traces.rs (9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
🔇 Additional comments (2)
src/handlers/http/ingest.rs (1)

471-473: Good addition of domain-specific error variant

PostError::OtelError cleanly wraps lower-level OtelError and allows ergonomic ? propagation. Nice!

src/handlers/http/modal/utils/ingest_utils.rs (1)

66-88: Error propagation updated correctly

Switching to Result-returning flatteners with ? keeps the function concise and readable. 👍

src/cli.rs Outdated
Comment on lines 371 to 379

#[arg(
long,
env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT",
default_value = "200",
help = "allowed limit for otel attributes"
)]
pub otel_attributes_allowed_limit: usize,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

otel_attributes_allowed_limit default is silently 0 when Options::default() is used

default_value = "200" only affects values parsed from CLI/env with clap; it is ignored when the code calls Options::default() (several unit-tests do this).
Because #[derive(Default)] zero-initialises primitive fields, the limit becomes 0, causing every OTEL event with ≥ 1 attribute to be rejected in test/bench environments.

Recommended fix (any one of):

-#[derive(Parser, Debug, Default)]
+#[derive(Parser, Debug)]

and remove all direct Options::default() calls, or

 #[derive(Parser, Debug, Default)]
 pub struct Options {
@@
-    #[arg(
-        long,
-        env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT",
-        default_value = "200",
-        help = "allowed limit for otel attributes"
-    )]
-    pub otel_attributes_allowed_limit: usize,
+    #[arg(
+        long,
+        env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT",
+        default_value_t = 200,
+        value_parser = clap::value_parser!(usize).range(1..),
+        help = "Maximum number of OTEL attributes allowed per event"
+    )]
+    pub otel_attributes_allowed_limit: usize,
 }

This guarantees 200 either via CLI or via Default and prevents accidental “blocking everything”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[arg(
long,
env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT",
default_value = "200",
help = "allowed limit for otel attributes"
)]
pub otel_attributes_allowed_limit: usize,
}
#[derive(Parser, Debug, Default)]
pub struct Options {
// … other fields …
#[arg(
long,
env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT",
default_value_t = 200,
value_parser = clap::value_parser!(usize).range(1..),
help = "Maximum number of OTEL attributes allowed per event"
)]
pub otel_attributes_allowed_limit: usize,
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/option.rs (3)

177-190: Implement a constant for the attribute limit.

The validation function looks well-structured and follows the pattern of other validation functions in the module. However, the maximum attribute limit (200) is hardcoded in both the range check and the error message. Consider defining this as a constant at the module level to improve maintainability and ensure consistency if this limit needs to be changed in the future.

 pub mod validation {
     use std::{
         env, io,
         net::ToSocketAddrs,
         path::{Path, PathBuf},
     };
 
     use path_clean::PathClean;
 
     use super::{Compression, Mode};
+
+    // Maximum allowed OpenTelemetry attributes per event
+    const MAX_OTEL_ATTRIBUTES: usize = 200;
 
     // ... existing code ...
 
     pub fn validate_otel_attributes_allowed_limit(s: &str) -> Result<usize, String> {
         if let Ok(size) = s.parse::<usize>() {
-            if (1..=200).contains(&size) {
+            if (1..=MAX_OTEL_ATTRIBUTES).contains(&size) {
                 Ok(size)
             } else {
                 Err(format!(
                     "Invalid value for size. It should be between 1 and {}",
-                    200
+                    MAX_OTEL_ATTRIBUTES
                 ))
             }
         } else {
             Err("Invalid value for size. It should be given as integer value".to_string())
         }
     }
 }

188-188: Improve error message clarity.

The error message about invalid value is somewhat generic. Consider making it more specific to the OpenTelemetry attribute context.

-            Err("Invalid value for size. It should be given as integer value".to_string())
+            Err("Invalid value for OTEL_ATTRIBUTES_ALLOWED_LIMIT. It should be an integer value".to_string())

182-186: Standardize error message format.

The error message format differs slightly from other validation functions in this module. For consistency, consider simplifying the format pattern.

-                Err(format!(
-                    "Invalid value for size. It should be between 1 and {}",
-                    200
-                ))
+                Err(format!("Invalid value for OTEL_ATTRIBUTES_ALLOWED_LIMIT. It should be between 1 and {}", 200))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 038df7c and a11790c.

📒 Files selected for processing (2)
  • src/cli.rs (1 hunks)
  • src/option.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cli.rs
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: CLAAssistant

add check for fields count in any dataset
reject event if fields count exceeds `P_DATASET_FIELDS_ALLOWED_LIMIT`
default value is set to 250
@nikhilsinhaparseable nikhilsinhaparseable changed the title remove other_attributes from otel logs/traces/metrics reject event if fields count exceed 250 May 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9045f66 and df59fde.

📒 Files selected for processing (3)
  • src/handlers/http/ingest.rs (2 hunks)
  • src/handlers/http/modal/utils/ingest_utils.rs (2 hunks)
  • src/option.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/option.rs
  • src/handlers/http/ingest.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
src/validator.rs (1)
  • stream_name (33-68)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
🔇 Additional comments (3)
src/handlers/http/modal/utils/ingest_utils.rs (3)

50-51: Well defined constant with good naming.

Adding a named constant for the default limit value improves code readability and maintainability.


59-60: Good placement of validation check.

Performing the validation early in the ingestion process prevents wasteful processing of data that would ultimately be rejected. This is an efficient approach.


230-242: Good error handling with specific error type and detailed context.

The error case is well handled with:

  • Clear error logging that provides actionable guidance
  • A specific error type that includes all relevant context (stream name, field count, and limit)
  • Using the configured limit from options rather than the hardcoded constant

This makes both debugging and user experience better when limits are exceeded.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)

213-244: LGTM: Well-implemented field count validation.

The implementation correctly:

  • Retrieves the current field count from the schema
  • Sets a warning threshold at 80% of the allowed limit
  • Provides informative warning and error messages
  • Returns appropriate errors when the limit is exceeded

One minor suggestion: consider extracting the 0.8 threshold calculation to a named constant for better readability and maintainability.

+ // Percentage of allowed limit that triggers a warning
+ const DATASET_FIELDS_WARN_THRESHOLD_PERCENT: f64 = 0.8;

  fn verify_dataset_fields_count(stream_name: &str) -> Result<(), PostError> {
      let fields_count = PARSEABLE
          .get_stream(stream_name)?
          .get_schema()
          .fields()
          .len();
-     let dataset_fields_warn_threshold = 0.8 * DATASET_FIELDS_ALLOWED_LIMIT as f64;
+     let dataset_fields_warn_threshold = DATASET_FIELDS_WARN_THRESHOLD_PERCENT * PARSEABLE.options.dataset_fields_allowed_limit as f64;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df59fde and e46ee5f.

📒 Files selected for processing (1)
  • src/handlers/http/modal/utils/ingest_utils.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (2)
src/handlers/http/modal/utils/ingest_utils.rs (2)

50-51: LGTM: Clear constant definition for field limit.

The constant DATASET_FIELDS_ALLOWED_LIMIT is well-defined with a reasonable value of 250, matching the PR objective of rejecting events that exceed this limit.


59-61: LGTM: Early validation of field count.

Good placement of the validation call at the beginning of the ingestion process. This ensures that field count is verified before any processing occurs, preventing unnecessary work on events that would eventually be rejected.

@@ -467,6 +467,8 @@ pub enum PostError {
KnownFormat(#[from] known_schema::Error),
#[error("Ingestion is not allowed to stream {0} as it is already associated with a different OTEL format")]
IncorrectLogFormat(String),
#[error("Ingestion has been stoppped for dataset {0} as fields count {1} exceeds the allowed limit of {2}, Please create a new dataset.")]
FieldsLimitExceeded(String, usize, usize),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FieldsLimitExceeded(String, usize, usize),
FieldsCountLimitExceeded(String, usize, usize),

@@ -495,6 +497,7 @@ impl actix_web::ResponseError for PostError {
PostError::MissingTimePartition(_) => StatusCode::BAD_REQUEST,
PostError::KnownFormat(_) => StatusCode::BAD_REQUEST,
PostError::IncorrectLogFormat(_) => StatusCode::BAD_REQUEST,
PostError::FieldsLimitExceeded(_, _, _) => StatusCode::BAD_REQUEST,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PostError::FieldsLimitExceeded(_, _, _) => StatusCode::BAD_REQUEST,
PostError:: FieldsCountLimitExceeded(_, _, _) => StatusCode::BAD_REQUEST,

fields_count,
PARSEABLE.options.dataset_fields_allowed_limit);
// Return an error if the fields count exceeds the limit
return Err(PostError::FieldsLimitExceeded(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Err(PostError::FieldsLimitExceeded(
return Err(PostError:: FieldsCountLimitExceeded(

src/cli.rs Outdated

#[arg(
long,
env = "P_DATASET_FIELDS_ALLOWED_LIMIT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
env = "P_DATASET_FIELDS_ALLOWED_LIMIT",
env = "P_DATASET_FIELD_COUNT_LIMIT",

"Fields count {0} for dataset {1} has exceeded the warning threshold of {2} fields, Parseable recommends creating a new dataset.",
fields_count,
stream_name,
dataset_fields_warn_threshold);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataset_fields_warn_threshold);
stream_name,
fields_count,
PARSEABLE.options.dataset_fields_allowed_limit);

.get_schema()
.fields()
.len();
let dataset_fields_warn_threshold = 0.8 * DATASET_FIELDS_ALLOWED_LIMIT as f64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let dataset_fields_warn_threshold = 0.8 * DATASET_FIELDS_ALLOWED_LIMIT as f64;
let dataset_fields_warn_threshold = 0.8 * PARSEABLE.options.dataset_fields_allowed_limit as f64;

src/option.rs Outdated
Ok(size)
} else {
Err(format!(
"Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be between 1 and {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be between 1 and {}",
"Invalid value for P_DATASET_FIELD_COUNT_LIMIT. It should be between 1 and {}",

src/option.rs Outdated
))
}
} else {
Err("Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be given as integer value".to_string())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Err("Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be given as integer value".to_string())
Err("Invalid value for P_DATASET_FIELD_COUNT_LIMIT. It should be given as integer value".to_string())

use super::otel_utils::merge_attributes_in_json;

pub const OTEL_LOG_KNOWN_FIELD_LIST: [&str; 6] = [
pub const OTEL_LOG_KNOWN_FIELD_LIST: [&str; 16] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use of this list now? We're defaulting to separate columns right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list was initially maintained to store the known field list along with the known log format name in the stream info, with an idea that UI can use the fields list to apply quick filters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifying useful columns is difficult. chances of getting it wrong are high. In UX we're working on a way that lets user decide what is important for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can create an issue and work on it in a separate PR, similar change needs to be done at the place when we detect schema and add fields to stream_info, other than otel

nikhilsinhaparseable and others added 2 commits May 14, 2025 17:44
@nikhilsinhaparseable
Copy link
Contributor Author

@nitisht all the above comments are incorporated in the latest commit

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)

211-240: Implementation looks solid but contains redundant comment.

The function correctly implements the field count validation with appropriate thresholds and error handling. It gets the current field count, checks against a warning threshold (80% of the limit), and enforces the hard limit if exceeded.

However, there's a redundant comment at line 236-237 that repeats what the code already clearly shows at line 230-238.

    if fields_count > PARSEABLE.options.dataset_fields_allowed_limit {
        let error = PostError::FieldsCountLimitExceeded(
            stream_name.to_string(),
            fields_count,
            PARSEABLE.options.dataset_fields_allowed_limit,
        );
        tracing::error!("{}", error);
-        // Return an error if the fields count exceeds the limit
        return Err(error);
    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb1d328 and 5ff122a.

📒 Files selected for processing (4)
  • src/cli.rs (2 hunks)
  • src/handlers/http/ingest.rs (2 hunks)
  • src/handlers/http/modal/utils/ingest_utils.rs (2 hunks)
  • src/option.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/cli.rs
  • src/option.rs
  • src/handlers/http/ingest.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
src/validator.rs (1)
  • stream_name (33-68)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
🔇 Additional comments (2)
src/handlers/http/modal/utils/ingest_utils.rs (2)

57-59: Verification function well integrated.

The placement of the verify_dataset_fields_count call at the beginning of the flatten_and_push_logs function ensures early validation of field counts before any processing begins, aligning with the PR objective.


229-238: Error handling looks good.

The error creation and return logic is well-implemented. The PostError::FieldsCountLimitExceeded variant is properly constructed with all necessary context (stream name, actual count, and limit), and the error is logged before being returned.

nitisht
nitisht previously approved these changes May 14, 2025
// Check if the fields count exceeds the warn threshold
if fields_count > dataset_fields_warn_threshold as usize {
tracing::warn!(
"Total fields in dataset {0} has reached the warning threshold of {1}. Ingestion will not be possible after reaching {2} fields. We recommend creating a new dataset.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Total fields in dataset {0} has reached the warning threshold of {1}. Ingestion will not be possible after reaching {2} fields. We recommend creating a new dataset.",
"Dataset {0} has {1} fields, which exceeds the warning threshold of {2}. Ingestion will not be possible after reaching {3} fields. We recommend creating a new dataset.",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to add the field count in args below.

Copy link
Member

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment.

@nitisht nitisht merged commit f240dde into parseablehq:main May 14, 2025
14 checks passed
@nikhilsinhaparseable nikhilsinhaparseable deleted the remove-other-attributes branch May 15, 2025 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log data format change between 2.1.0 and 2.2.0
2 participants