-
-
Notifications
You must be signed in to change notification settings - Fork 142
reject event if fields count exceed 250 #1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reject event if fields count exceed 250 #1311
Conversation
keep all attributes as individual columns in the ingested event expose env `P_OTEL_ATTRIBUTES_ALLOWED_LIMIT` to configure the allowed limit for attributes count if attributes count in flattened event > the allowed limit log the error, and reject the event Fixes: parseablehq#1310
## Walkthrough
The changes add a new CLI and environment option to limit dataset fields count, enforce this limit during ingestion with a new error variant, and simplify OpenTelemetry attribute handling by removing intermediate "other_attributes" maps. Known otel field lists are expanded, and attribute insertion is streamlined without separate merging.
## Changes
| File(s) | Change Summary |
|-----------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `src/cli.rs`, `src/option.rs` | Added `dataset_fields_allowed_limit` option with CLI flag and env var, default 250, validated by a new parser function. |
| `src/handlers/http/ingest.rs` | Added `FieldsCountLimitExceeded` variant to `PostError` enum and mapped it to HTTP 400 Bad Request. |
| `src/handlers/http/modal/utils/ingest_utils.rs` | Added pre-ingestion check in `flatten_and_push_logs` to fetch stream schema and enforce dataset fields count limit, returning `FieldsCountLimitExceeded` error if exceeded. |
| `src/otel/logs.rs`, `src/otel/metrics.rs`, `src/otel/traces.rs` | Removed intermediate `other_attributes` maps; inserted attributes directly into main JSON maps; expanded known field lists; renamed some schema URL keys; simplified flattening and attribute insertion logic without changing core functionality. |
| `src/otel/otel_utils.rs` | Removed constants and all logic related to attribute segregation and merging of `other_attributes`; simplified `flatten_attributes` and `insert_attributes` to handle attributes in a single flat map; removed multiple helper functions related to attribute merging and serialization. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant CLI
participant IngestHandler
participant StreamSchema
participant OtelFlattener
User->>CLI: Start with --dataset-field-count-limit or env var
CLI->>IngestHandler: Pass parsed options including limit
IngestHandler->>StreamSchema: Fetch schema for stream
StreamSchema-->>IngestHandler: Return field count
IngestHandler->>IngestHandler: Check if field count exceeds limit
alt Field count exceeded
IngestHandler-->>User: Return FieldsCountLimitExceeded error (HTTP 400)
else Within limit
IngestHandler->>OtelFlattener: flatten logs/traces/metrics with direct attribute insertion
OtelFlattener-->>IngestHandler: Return flattened data
IngestHandler-->>User: Success response
end Assessment against linked issues
Suggested labels
Poem
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
src/otel/traces.rs (2)
35-67
:⚠️ Potential issueMismatch between generated field names and
KNOWN_FIELD_LIST
will mis-classify built-in keys as “user attributes”
OTEL_TRACES_KNOWN_FIELD_LIST
listsscope_schema_url
/resource_schema_url
, yet the code later inserts a plainschema_url
key (see lines 97-102 & 129-133).
Becauseschema_url
is not in this list, the enforcement block on lines 139-156 will count it as an extra attribute, potentially rejecting otherwise valid events once the limit is low.@@ - "scope_schema_url", + "scope_schema_url", + "schema_url", // added so it matches the inserted key @@ - "resource_schema_url", + "resource_schema_url",(Alternatively rename the inserted keys – see next comment – but the list and the insertions must agree.)
97-104
:⚠️ Potential issueInserted key should match naming convention (
scope_schema_url
/resource_schema_url
)If you prefer the prefixed style used elsewhere (
scope_metrics_schema_url
,resource_metrics_schema_url
, …), update the insertion sites instead of expanding the constant. One possible patch:- span_json.insert( - "schema_url".to_string(), - Value::String(scope_span.schema_url.clone()), - ); + span_json.insert( + "scope_schema_url".to_string(), + Value::String(scope_span.schema_url.clone()), + ); @@ - resource_span_json.insert( - "schema_url".to_string(), - Value::String(record.schema_url.clone()), - ); + resource_span_json.insert( + "resource_schema_url".to_string(), + Value::String(record.schema_url.clone()), + );Either apply this or extend the constant as shown in the previous comment; keeping them inconsistent will trigger false‐positive limit violations.
Also applies to: 129-133
🧹 Nitpick comments (7)
src/handlers/http/ingest.rs (1)
501-502
: HTTP status choice – consider 400 (Bad Request) instead of 417417 Expectation Failed is rarely used and some proxies drop it. The violation is a client input problem, so 400 keeps semantics simple and interoperable:
-PostError::OtelError(_) => StatusCode::EXPECTATION_FAILED, +PostError::OtelError(_) => StatusCode::BAD_REQUEST,Not blocking, just food for thought.
src/otel/logs.rs (1)
180-198
: Attribute counting may over-count due to previous cloning
attribute_count
is computed after merging all maps; cloning and then counting adds unnecessary heap allocations.A micro-optimisation (avoids clone):
- for resource_logs_json in &mut vec_resource_logs_json { - resource_logs_json.extend(resource_log_json.clone()); + for mut resource_logs_json in vec_resource_logs_json { + resource_logs_json.extend(resource_log_json.clone());(Not critical; optional.)
src/otel/traces.rs (1)
139-156
: Attribute-count check may run hot and performs an unnecessaryclone()
.keys().filter(...).count()
allocates an iterator every time.- Immediately afterwards the entire
Map
is cloned only to be pushed intovec_otel_json
.A small optimisation keeps the code readable but avoids the extra allocation:
- let attribute_count = resource_spans_json - .keys() - .filter(|key| !known_fields.contains(key.as_str())) - .count(); + let attribute_count = resource_spans_json + .iter() + .filter(|(k, _)| !known_fields.contains(k.as_str())) + .count(); @@ - vec_otel_json.push(Value::Object(resource_spans_json.clone())); + vec_otel_json.push(Value::Object(resource_spans_json));(You then need to drop the
.clone()
call and move, not copy, the map.)src/otel/metrics.rs (2)
35-71
: Keep the “known fields” list minimal and accurate
OTEL_METRICS_KNOWN_FIELD_LIST
containsresource_schema_url
&scope_schema_url
, but those keys are never inserted in this module. Dead entries do no harm today, yet they make the intent unclear and risk future drift. Consider pruning unused names to keep the whitelist self-documenting.No functional breakage, but tightening this list helps prevent silent mismatches like the one flagged in traces.
568-586
: Micro-optimisation: avoid re-scanning the entire map for every pushJust like in
traces.rs
, you can computeattribute_count
before merging the map intovec_otel_json
, and you can move the map instead of cloning it to cut memory traffic. Same patch pattern as previously suggested.src/otel/otel_utils.rs (2)
152-162
: Accept slices instead of&Vec
for better API ergonomics
flatten_attributes
(and callers such asinsert_attributes
) receive&Vec<KeyValue>
.
The function never needs a growable vector, only iteration, so&[KeyValue]
is the idiomatic choice and frees callers from owning aVec
.-pub fn flatten_attributes(attributes: &Vec<KeyValue>) -> Map<String, Value> { +pub fn flatten_attributes(attributes: &[KeyValue]) -> Map<String, Value> { @@ - for attribute in attributes { + for attribute in attributes {Apply the same change to
insert_attributes
:-pub fn insert_attributes(map: &mut Map<String, Value>, attributes: &Vec<KeyValue>) { +pub fn insert_attributes(map: &mut Map<String, Value>, attributes: &[KeyValue]) {This is a non-breaking internal refactor and yields a slightly more flexible API surface.
197-201
: Consider derivingSerialize
forOtelError
OtelError
is forwarded to HTTP handlers; derivingSerialize
(or at leastDisplay + IntoResponse
) would let you return structured JSON error bodies rather than plain strings, improving client experience.No immediate action required, raising for future UX enhancement.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/cli.rs
(1 hunks)src/handlers/http/ingest.rs
(3 hunks)src/handlers/http/modal/utils/ingest_utils.rs
(1 hunks)src/otel/logs.rs
(5 hunks)src/otel/metrics.rs
(12 hunks)src/otel/otel_utils.rs
(3 hunks)src/otel/traces.rs
(9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (2)
src/handlers/http/ingest.rs (1)
471-473
: Good addition of domain-specific error variant
PostError::OtelError
cleanly wraps lower-levelOtelError
and allows ergonomic?
propagation. Nice!src/handlers/http/modal/utils/ingest_utils.rs (1)
66-88
: Error propagation updated correctlySwitching to
Result
-returning flatteners with?
keeps the function concise and readable. 👍
src/cli.rs
Outdated
|
||
#[arg( | ||
long, | ||
env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT", | ||
default_value = "200", | ||
help = "allowed limit for otel attributes" | ||
)] | ||
pub otel_attributes_allowed_limit: usize, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
otel_attributes_allowed_limit
default is silently 0
when Options::default()
is used
default_value = "200"
only affects values parsed from CLI/env with clap
; it is ignored when the code calls Options::default()
(several unit-tests do this).
Because #[derive(Default)]
zero-initialises primitive fields, the limit becomes 0
, causing every OTEL event with ≥ 1 attribute to be rejected in test/bench environments.
Recommended fix (any one of):
-#[derive(Parser, Debug, Default)]
+#[derive(Parser, Debug)]
and remove all direct Options::default()
calls, or
#[derive(Parser, Debug, Default)]
pub struct Options {
@@
- #[arg(
- long,
- env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT",
- default_value = "200",
- help = "allowed limit for otel attributes"
- )]
- pub otel_attributes_allowed_limit: usize,
+ #[arg(
+ long,
+ env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT",
+ default_value_t = 200,
+ value_parser = clap::value_parser!(usize).range(1..),
+ help = "Maximum number of OTEL attributes allowed per event"
+ )]
+ pub otel_attributes_allowed_limit: usize,
}
This guarantees 200
either via CLI or via Default
and prevents accidental “blocking everything”.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#[arg( | |
long, | |
env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT", | |
default_value = "200", | |
help = "allowed limit for otel attributes" | |
)] | |
pub otel_attributes_allowed_limit: usize, | |
} | |
#[derive(Parser, Debug, Default)] | |
pub struct Options { | |
// … other fields … | |
#[arg( | |
long, | |
env = "P_OTEL_ATTRIBUTES_ALLOWED_LIMIT", | |
default_value_t = 200, | |
value_parser = clap::value_parser!(usize).range(1..), | |
help = "Maximum number of OTEL attributes allowed per event" | |
)] | |
pub otel_attributes_allowed_limit: usize, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/option.rs (3)
177-190
: Implement a constant for the attribute limit.The validation function looks well-structured and follows the pattern of other validation functions in the module. However, the maximum attribute limit (200) is hardcoded in both the range check and the error message. Consider defining this as a constant at the module level to improve maintainability and ensure consistency if this limit needs to be changed in the future.
pub mod validation { use std::{ env, io, net::ToSocketAddrs, path::{Path, PathBuf}, }; use path_clean::PathClean; use super::{Compression, Mode}; + + // Maximum allowed OpenTelemetry attributes per event + const MAX_OTEL_ATTRIBUTES: usize = 200; // ... existing code ... pub fn validate_otel_attributes_allowed_limit(s: &str) -> Result<usize, String> { if let Ok(size) = s.parse::<usize>() { - if (1..=200).contains(&size) { + if (1..=MAX_OTEL_ATTRIBUTES).contains(&size) { Ok(size) } else { Err(format!( "Invalid value for size. It should be between 1 and {}", - 200 + MAX_OTEL_ATTRIBUTES )) } } else { Err("Invalid value for size. It should be given as integer value".to_string()) } } }
188-188
: Improve error message clarity.The error message about invalid value is somewhat generic. Consider making it more specific to the OpenTelemetry attribute context.
- Err("Invalid value for size. It should be given as integer value".to_string()) + Err("Invalid value for OTEL_ATTRIBUTES_ALLOWED_LIMIT. It should be an integer value".to_string())
182-186
: Standardize error message format.The error message format differs slightly from other validation functions in this module. For consistency, consider simplifying the format pattern.
- Err(format!( - "Invalid value for size. It should be between 1 and {}", - 200 - )) + Err(format!("Invalid value for OTEL_ATTRIBUTES_ALLOWED_LIMIT. It should be between 1 and {}", 200))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cli.rs
(1 hunks)src/option.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cli.rs
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: CLAAssistant
add check for fields count in any dataset reject event if fields count exceeds `P_DATASET_FIELDS_ALLOWED_LIMIT` default value is set to 250
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/handlers/http/ingest.rs
(2 hunks)src/handlers/http/modal/utils/ingest_utils.rs
(2 hunks)src/option.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/option.rs
- src/handlers/http/ingest.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
src/validator.rs (1)
stream_name
(33-68)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
🔇 Additional comments (3)
src/handlers/http/modal/utils/ingest_utils.rs (3)
50-51
: Well defined constant with good naming.Adding a named constant for the default limit value improves code readability and maintainability.
59-60
: Good placement of validation check.Performing the validation early in the ingestion process prevents wasteful processing of data that would ultimately be rejected. This is an efficient approach.
230-242
: Good error handling with specific error type and detailed context.The error case is well handled with:
- Clear error logging that provides actionable guidance
- A specific error type that includes all relevant context (stream name, field count, and limit)
- Using the configured limit from options rather than the hardcoded constant
This makes both debugging and user experience better when limits are exceeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
213-244
: LGTM: Well-implemented field count validation.The implementation correctly:
- Retrieves the current field count from the schema
- Sets a warning threshold at 80% of the allowed limit
- Provides informative warning and error messages
- Returns appropriate errors when the limit is exceeded
One minor suggestion: consider extracting the 0.8 threshold calculation to a named constant for better readability and maintainability.
+ // Percentage of allowed limit that triggers a warning + const DATASET_FIELDS_WARN_THRESHOLD_PERCENT: f64 = 0.8; fn verify_dataset_fields_count(stream_name: &str) -> Result<(), PostError> { let fields_count = PARSEABLE .get_stream(stream_name)? .get_schema() .fields() .len(); - let dataset_fields_warn_threshold = 0.8 * DATASET_FIELDS_ALLOWED_LIMIT as f64; + let dataset_fields_warn_threshold = DATASET_FIELDS_WARN_THRESHOLD_PERCENT * PARSEABLE.options.dataset_fields_allowed_limit as f64;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/handlers/http/modal/utils/ingest_utils.rs
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (2)
src/handlers/http/modal/utils/ingest_utils.rs (2)
50-51
: LGTM: Clear constant definition for field limit.The constant
DATASET_FIELDS_ALLOWED_LIMIT
is well-defined with a reasonable value of 250, matching the PR objective of rejecting events that exceed this limit.
59-61
: LGTM: Early validation of field count.Good placement of the validation call at the beginning of the ingestion process. This ensures that field count is verified before any processing occurs, preventing unnecessary work on events that would eventually be rejected.
src/handlers/http/ingest.rs
Outdated
@@ -467,6 +467,8 @@ pub enum PostError { | |||
KnownFormat(#[from] known_schema::Error), | |||
#[error("Ingestion is not allowed to stream {0} as it is already associated with a different OTEL format")] | |||
IncorrectLogFormat(String), | |||
#[error("Ingestion has been stoppped for dataset {0} as fields count {1} exceeds the allowed limit of {2}, Please create a new dataset.")] | |||
FieldsLimitExceeded(String, usize, usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FieldsLimitExceeded(String, usize, usize), | |
FieldsCountLimitExceeded(String, usize, usize), |
src/handlers/http/ingest.rs
Outdated
@@ -495,6 +497,7 @@ impl actix_web::ResponseError for PostError { | |||
PostError::MissingTimePartition(_) => StatusCode::BAD_REQUEST, | |||
PostError::KnownFormat(_) => StatusCode::BAD_REQUEST, | |||
PostError::IncorrectLogFormat(_) => StatusCode::BAD_REQUEST, | |||
PostError::FieldsLimitExceeded(_, _, _) => StatusCode::BAD_REQUEST, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostError::FieldsLimitExceeded(_, _, _) => StatusCode::BAD_REQUEST, | |
PostError:: FieldsCountLimitExceeded(_, _, _) => StatusCode::BAD_REQUEST, |
fields_count, | ||
PARSEABLE.options.dataset_fields_allowed_limit); | ||
// Return an error if the fields count exceeds the limit | ||
return Err(PostError::FieldsLimitExceeded( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Err(PostError::FieldsLimitExceeded( | |
return Err(PostError:: FieldsCountLimitExceeded( |
src/cli.rs
Outdated
|
||
#[arg( | ||
long, | ||
env = "P_DATASET_FIELDS_ALLOWED_LIMIT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env = "P_DATASET_FIELDS_ALLOWED_LIMIT", | |
env = "P_DATASET_FIELD_COUNT_LIMIT", |
"Fields count {0} for dataset {1} has exceeded the warning threshold of {2} fields, Parseable recommends creating a new dataset.", | ||
fields_count, | ||
stream_name, | ||
dataset_fields_warn_threshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset_fields_warn_threshold); | |
stream_name, | |
fields_count, | |
PARSEABLE.options.dataset_fields_allowed_limit); |
.get_schema() | ||
.fields() | ||
.len(); | ||
let dataset_fields_warn_threshold = 0.8 * DATASET_FIELDS_ALLOWED_LIMIT as f64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let dataset_fields_warn_threshold = 0.8 * DATASET_FIELDS_ALLOWED_LIMIT as f64; | |
let dataset_fields_warn_threshold = 0.8 * PARSEABLE.options.dataset_fields_allowed_limit as f64; |
src/option.rs
Outdated
Ok(size) | ||
} else { | ||
Err(format!( | ||
"Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be between 1 and {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be between 1 and {}", | |
"Invalid value for P_DATASET_FIELD_COUNT_LIMIT. It should be between 1 and {}", |
src/option.rs
Outdated
)) | ||
} | ||
} else { | ||
Err("Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be given as integer value".to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err("Invalid value for P_DATASET_FIELDS_ALLOWED_LIMIT. It should be given as integer value".to_string()) | |
Err("Invalid value for P_DATASET_FIELD_COUNT_LIMIT. It should be given as integer value".to_string()) |
use super::otel_utils::merge_attributes_in_json; | ||
|
||
pub const OTEL_LOG_KNOWN_FIELD_LIST: [&str; 6] = [ | ||
pub const OTEL_LOG_KNOWN_FIELD_LIST: [&str; 16] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use of this list now? We're defaulting to separate columns right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this list was initially maintained to store the known field list along with the known log format name in the stream info, with an idea that UI can use the fields list to apply quick filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identifying useful columns is difficult. chances of getting it wrong are high. In UX we're working on a way that lets user decide what is important for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can create an issue and work on it in a separate PR, similar change needs to be done at the place when we detect schema and add fields to stream_info, other than otel
Co-authored-by: Nitish Tiwari <[email protected]> Signed-off-by: Nikhil Sinha <[email protected]>
@nitisht all the above comments are incorporated in the latest commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
211-240
: Implementation looks solid but contains redundant comment.The function correctly implements the field count validation with appropriate thresholds and error handling. It gets the current field count, checks against a warning threshold (80% of the limit), and enforces the hard limit if exceeded.
However, there's a redundant comment at line 236-237 that repeats what the code already clearly shows at line 230-238.
if fields_count > PARSEABLE.options.dataset_fields_allowed_limit { let error = PostError::FieldsCountLimitExceeded( stream_name.to_string(), fields_count, PARSEABLE.options.dataset_fields_allowed_limit, ); tracing::error!("{}", error); - // Return an error if the fields count exceeds the limit return Err(error); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cli.rs
(2 hunks)src/handlers/http/ingest.rs
(2 hunks)src/handlers/http/modal/utils/ingest_utils.rs
(2 hunks)src/option.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/cli.rs
- src/option.rs
- src/handlers/http/ingest.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
src/validator.rs (1)
stream_name
(33-68)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (2)
src/handlers/http/modal/utils/ingest_utils.rs (2)
57-59
: Verification function well integrated.The placement of the
verify_dataset_fields_count
call at the beginning of theflatten_and_push_logs
function ensures early validation of field counts before any processing begins, aligning with the PR objective.
229-238
: Error handling looks good.The error creation and return logic is well-implemented. The
PostError::FieldsCountLimitExceeded
variant is properly constructed with all necessary context (stream name, actual count, and limit), and the error is logged before being returned.
// Check if the fields count exceeds the warn threshold | ||
if fields_count > dataset_fields_warn_threshold as usize { | ||
tracing::warn!( | ||
"Total fields in dataset {0} has reached the warning threshold of {1}. Ingestion will not be possible after reaching {2} fields. We recommend creating a new dataset.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Total fields in dataset {0} has reached the warning threshold of {1}. Ingestion will not be possible after reaching {2} fields. We recommend creating a new dataset.", | |
"Dataset {0} has {1} fields, which exceeds the warning threshold of {2}. Ingestion will not be possible after reaching {3} fields. We recommend creating a new dataset.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to add the field count in args below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor comment.
keep all attributes as individual columns in the ingested event
add check for fields count in any dataset
reject event if fields count exceeds
P_DATASET_FIELDS_ALLOWED_LIMIT
default value is set to 250
Fixes: #1310
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
Bug Fixes