-
-
Notifications
You must be signed in to change notification settings - Fork 153
Feat: Metastore #1424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Metastore #1424
Conversation
WalkthroughMoves metadata I/O and listing from direct object storage into a new Metastore abstraction, adds Metastore/MetastoreObject traits and ObjectStore-backed Metastore, wires an Arc into Parseable, implements MetastoreObject for many domain types, and propagates MetastoreError handling across handlers, storage, catalog, migration, and query flows. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant HTTP as HTTP Handler
participant MS as Metastore
participant State as In-memory State
Client->>HTTP: POST /alerts
HTTP->>HTTP: validate & build AlertConfig
HTTP->>MS: put_alert(AlertConfig)
MS-->>HTTP: ok / MetastoreError (detail)
alt success
HTTP->>State: insert/update in-memory + schedule task
HTTP-->>Client: 201 Created
else metastore error
HTTP-->>Client: 4xx/5xx JSON (MetastoreError.to_detail)
end
sequenceDiagram
autonumber
participant Updater as catalog::update_snapshot
participant MS as Metastore
participant Worker as Snapshot Processor
Updater->>MS: get_stream_json(stream, base?)
MS-->>Updater: Bytes(JSON)
Updater->>MS: get_manifest_path(stream, lower, upper)
MS-->>Updater: manifest_path
Updater->>MS: get_manifest(stream, lower, upper, Some(manifest_path))
MS-->>Updater: Manifest / None
Updater->>Worker: process partition groups (using retrieved Manifests)
Worker->>MS: put_manifest(...), put_stream_json(...)
MS-->>Worker: ack / MetastoreError
Estimated code review effort🎯 5 (Critical) | ⏱️ ~150 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks (2 passed, 1 inconclusive)❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 29
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (18)
src/handlers/http/modal/query/querier_logstream.rs (1)
69-107: Ensure Metastore Metadata Is Purged in Querier Delete Handler
In src/handlers/http/modal/query/querier_logstream.rs (lines 69–107), after deleting storage and local dirs but before notifying ingestors, invoke adelete_stream(or equivalent) async method on theMetastoreto remove persistent metadata (stream.json, schemas, manifests). The current Metastore trait (src/metastore/metastore_traits.rs) only definesinitiate_connectionandlist_objects, so introduce and implement adelete_streamAPI for atomic, idempotent cleanup.src/handlers/http/modal/utils/rbac_utils.rs (1)
34-38: Writes also bypass Metastore here.Align writes with the new read-path to support non-object-store metastores.
pub async fn put_metadata(metadata: &StorageMetadata) -> Result<(), ObjectStorageError> { - storage::put_remote_metadata(metadata).await?; - storage::put_staging_metadata(metadata)?; - Ok(()) + PARSEABLE + .metastore + .put_parseable_metadata(metadata) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))) }src/alerts/target.rs (1)
98-120: Delete ordering can leave dangling persisted state on failureCurrently removes from memory, then deletes from metastore. If delete_target() fails, the entry vanishes from memory but remains persisted.
- let target = self - .target_configs - .write() - .await - .remove(target_id) - .ok_or(AlertError::InvalidTargetID(target_id.to_string()))?; - PARSEABLE.metastore.delete_target(&target).await?; - Ok(target) + // Fetch a clone first, persist delete, then remove from memory + let target = { + let map = self.target_configs.read().await; + map.get(target_id) + .cloned() + .ok_or(AlertError::InvalidTargetID(target_id.to_string()))? + }; + PARSEABLE.metastore.delete_target(&target).await?; + let _ = self.target_configs.write().await.remove(target_id); + Ok(target)src/handlers/http/modal/ingest_server.rs (1)
307-329: Prevent panic on missing token and reconcile all ingestor metadata
Avoid usingunwrap()on the"token"field—this will panic if the key is absent or not a string. Instead, parse every metadata entry to collect its"token"string, return an error if any entry lacks a valid token or if multiple distinct tokens are found, then compare the computedBasiccredential against that single, verified token.src/prism/home/mod.rs (1)
339-343: Preserve MetastoreError semantics instead of wrapping into AnyhowYou already added
PrismHomeError::MetastoreError(#[from]). Let?convertMetastoreErrordirectly so status codes and JSON details are preserved.- .list_streams() - .await - .map_err(|e| PrismHomeError::Anyhow(anyhow::Error::new(e)))? + .list_streams() + .await?src/handlers/http/logstream.rs (3)
89-95: Avoid panic: propagate Metastore errors instead of unwrap()Using unwrap() will 500/panic the handler on metastore failures. Propagate as StreamError::MetastoreError.
Apply:
- let res = PARSEABLE - .metastore - .list_streams() - .await - .unwrap() + let res = PARSEABLE + .metastore + .list_streams() + .await?
399-426: Hot-tier enable flow: fix available_size math and make in-memory flip after persistence succeeds
- available_size should reflect remaining capacity (size - used).
- set_hot_tier(true) before persistence can leave memory inconsistent if downstream ops fail.
Apply:
- stream.set_hot_tier(true); let Some(hot_tier_manager) = HotTierManager::global() else { return Err(StreamError::HotTierNotEnabled(stream_name)); }; let existing_hot_tier_used_size = hot_tier_manager .validate_hot_tier_size(&stream_name, hottier.size) .await?; hottier.used_size = existing_hot_tier_used_size; - hottier.available_size = hottier.size; + hottier.available_size = hottier.size.saturating_sub(existing_hot_tier_used_size); hottier.version = Some(CURRENT_HOT_TIER_VERSION.to_string()); hot_tier_manager .put_hot_tier(&stream_name, &mut hottier) .await?; let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice( &PARSEABLE .metastore .get_stream_json(&stream_name, false) .await?, )?; stream_metadata.hot_tier_enabled = true; PARSEABLE .metastore .put_stream_json(&stream_metadata, &stream_name) .await?; + + // Flip in-memory flag only after all side effects succeed + stream.set_hot_tier(true);
452-481: On hot-tier delete, also flip metastore flag hot_tier_enabled = falseSymmetry with put_stream_hot_tier prevents stale metadata.
Apply:
hot_tier_manager.delete_hot_tier(&stream_name).await?; + // Reflect deletion in metastore metadata + let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice( + &PARSEABLE + .metastore + .get_stream_json(&stream_name, false) + .await?, + )?; + stream_metadata.hot_tier_enabled = false; + PARSEABLE + .metastore + .put_stream_json(&stream_metadata, &stream_name) + .await?; + Ok(( format!("hot tier deleted for stream {stream_name}"), StatusCode::OK, ))src/enterprise/utils.rs (1)
81-91: Don’t swallow metastore errors from get_all_stream_jsons.Swallowing Err hides real failures and yields partial results. Propagate as ObjectStorageError (MetastoreError) and continue.
- let obs = PARSEABLE.metastore.get_all_stream_jsons(stream, None).await; - if let Ok(obs) = obs { - for ob in obs { - if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { - let snapshot = object_store_format.snapshot; - for manifest in snapshot.manifest_list { - merged_snapshot.manifest_list.push(manifest); - } - } - } - } + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e)))?; + for ob in obs { + if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { + let snapshot = object_store_format.snapshot; + merged_snapshot.manifest_list.extend(snapshot.manifest_list); + } + }src/catalog/mod.rs (2)
300-347: Fragile path check can duplicate manifests; always fetch via Metastore.Relying on
contains(manifest_path(""))can skip updates for metastore-backed paths, causing duplicate manifests for the same partition. Remove the path heuristic and use Metastore as the single source of truth.- let manifest_file_name = manifest_path("").to_string(); - let should_update = manifests[pos].manifest_path.contains(&manifest_file_name); - - if should_update { - if let Some(mut manifest) = PARSEABLE + if let Some(mut manifest) = PARSEABLE .metastore .get_manifest( stream_name, manifests[pos].time_lower_bound, manifests[pos].time_upper_bound, ) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? - { + { // Update existing manifest for change in partition_changes { manifest.apply_change(change); } PARSEABLE .metastore .put_manifest( &manifest, stream_name, manifests[pos].time_lower_bound, manifests[pos].time_upper_bound, ) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; manifests[pos].events_ingested = events_ingested; manifests[pos].ingestion_size = ingestion_size; manifests[pos].storage_size = storage_size; Ok(None) - } else { - // Manifest not found, create new one - create_manifest( - partition_lower, - partition_changes, - stream_name, - false, - meta.clone(), - events_ingested, - ingestion_size, - storage_size, - ) - .await - } - } else { - // Create new manifest for different partition - create_manifest( - partition_lower, - partition_changes, - stream_name, - false, - ObjectStoreFormat::default(), - events_ingested, - ingestion_size, - storage_size, - ) - .await - } + } else { + // Manifest not found, create new one + create_manifest( + partition_lower, + partition_changes, + stream_name, + false, + meta.clone(), + events_ingested, + ingestion_size, + storage_size, + ) + .await + }
482-499: Inconsistent persistence: read from Metastore, write to object storage.
remove_manifest_from_snapshotloads stream JSON via Metastore but writes snapshot viastorage.put_snapshot(...). This will diverge state between Metastore and object store.- storage.put_snapshot(stream_name, meta.snapshot).await?; + PARSEABLE + .metastore + .put_stream_json(&meta, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?;src/handlers/http/cluster/mod.rs (3)
510-522: Don't panic on malformed stream.json; return a typed error instead.Using expect(...) will crash the node on a single bad entry. Propagate SerdeError as StreamError to keep the cluster healthy.
- let stream_metadata: ObjectStoreFormat = - serde_json::from_slice(&ob).expect("stream.json is valid json"); + let stream_metadata: ObjectStoreFormat = + serde_json::from_slice(&ob).map_err(StreamError::SerdeError)?;
1065-1099: Type mismatch: QuerierStatus holds QuerierMetadata but you populate it with NodeMetadata.This won’t compile. Either change QuerierStatus.metadata to NodeMetadata everywhere or request QuerierMetadata from get_node_info. The latter is minimal.
- let querier_metadata: Vec<NodeMetadata> = get_node_info(NodeType::Querier).await?; + let querier_metadata: Vec<QuerierMetadata> = + get_node_info::<QuerierMetadata>(NodeType::Querier).await?; ... - let liveness_results: Vec<(String, bool, NodeMetadata)> = stream::iter(querier_metadata) + let liveness_results: Vec<(String, bool, QuerierMetadata)> = + stream::iter(querier_metadata)
1016-1037: Invalid if-let chain; this won’t compile on stable.Rewrite to first unwrap Result, then guard on is_empty().
- let cluster_metrics = fetch_cluster_metrics().await; - if let Ok(metrics) = cluster_metrics - && !metrics.is_empty() - { + let cluster_metrics = fetch_cluster_metrics().await; + if let Ok(metrics) = cluster_metrics { + if metrics.is_empty() { + return Ok(()); + } info!("Cluster metrics fetched successfully from all ingestors"); if let Ok(metrics_bytes) = serde_json::to_vec(&metrics) {src/alerts/mod.rs (1)
987-1079: Do not hold a write lock across awaits; also don’t swallow metastore errors.
load()acquires a write lock onself.alertsand then awaits multiple times (metastore calls, serde, sender.send). This can deadlock and blocks readers. In addition,unwrap_or_default()onget_alerts()silently hides metastore failures and can boot with zero alerts.Refactor to:
- Fetch and migrate alerts without any lock.
- Acquire a short write lock only to insert.
- Propagate (or at least log) metastore errors.
Apply this diff:
@@ - async fn load(&self) -> anyhow::Result<()> { - let mut map = self.alerts.write().await; - - // Get alerts path and read raw bytes for migration handling - let raw_objects = PARSEABLE.metastore.get_alerts().await.unwrap_or_default(); + async fn load(&self) -> anyhow::Result<()> { + // Read raw alerts from metastore; fail fast on storage issues + let raw_objects = PARSEABLE + .metastore + .get_alerts() + .await + .map_err(|e| anyhow::anyhow!("failed to read alerts from metastore: {e}"))?; @@ - // Create alert task iff alert's state is not paused + // Create alert task iff alert's state is not paused if alert.get_state().eq(&AlertState::Disabled) { - map.insert(*alert.get_id(), alert); + { + let mut map = self.alerts.write().await; + map.insert(*alert.get_id(), alert); + } continue; } @@ - map.insert(*alert.get_id(), alert); + { + let mut map = self.alerts.write().await; + map.insert(*alert.get_id(), alert); + }src/storage/azure_blob.rs (1)
687-689: Fix absolute_url: pass &str, not RelativePath.
object_store::path::Path::parseexpects&str. Current call likely won’t compile.Apply this diff:
- fn absolute_url(&self, prefix: &RelativePath) -> object_store::path::Path { - object_store::path::Path::parse(prefix).unwrap() - } + fn absolute_url(&self, prefix: &RelativePath) -> object_store::path::Path { + object_store::path::Path::parse(prefix.as_str()).unwrap() + }src/storage/object_storage.rs (2)
143-156: Defensive parsing for date=… in filenames to avoid panicsIndexing [0]/[1] will panic on unexpected names. Parse safely and error out cleanly.
- let mut file_date_part = filename.split('.').collect::<Vec<&str>>()[0]; - file_date_part = file_date_part.split('=').collect::<Vec<&str>>()[1]; + let file_date_part = filename + .split('.') + .next() + .and_then(|s| s.split_once('=').map(|(_, v)| v)) + .ok_or_else(|| ObjectStorageError::Custom(format!("unexpected data file name: {filename}")))?;
989-1010: Avoid panic on Schema::try_merge; return proper error and consider missing-base handlingunwrap() will crash the process on merge conflicts. Map errors and, ideally, handle “schema not found” by seeding with the incoming schema.
- let stream_schema = PARSEABLE - .metastore - .get_schema(stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - - let new_schema = Schema::try_merge(vec![ - schema, - serde_json::from_slice::<Schema>(&stream_schema)?, - ]) - .unwrap(); + let stream_schema = PARSEABLE + .metastore + .get_schema(stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + + let new_schema = Schema::try_merge(vec![ + serde_json::from_slice::<Schema>(&stream_schema)?, + schema, + ]) + .map_err(|e| ObjectStorageError::Custom(format!("Schema merge failed: {e}")))?;Follow-up: If get_schema can return a not-found error, treat that case by writing the incoming schema directly instead of failing. Want a patch for that as well?
🧹 Nitpick comments (46)
Cargo.toml (1)
61-61: Relax erased-serde version pin
Use a caret‐range to allow patch fixes within 0.3.x (<0.4) while avoiding unintended 0.4.x upgrades with a potentially higher MSRV:- erased-serde = "=0.3.16" + erased-serde = "^0.3.16" # equivalent to >=0.3.16, <0.40.3.16 is unyanked and this lets you pick up all subsequent 0.3-patches.
src/handlers/http/modal/query/querier_logstream.rs (1)
165-189: Good switch to metastore-backed retrieval; improve diagnostics on parse failuresUsing
PARSEABLE.metastore.get_all_stream_jsonsis the right direction. Minor nit: include an identifier (e.g., source node or object id) in the error to aid debugging when a particular stream.json is malformed.Example:
- error!("Failed to parse stream metadata: {:?}", e); + error!("Failed to parse stream metadata for {}: {:?}", stream_name, e);src/handlers/http/ingest.rs (1)
517-526: JSON error body for MetastoreError matches other handlersConsistent with logstream/filters handlers. Consider centralizing this pattern to avoid repetition across enums, but not blocking.
src/storage/mod.rs (1)
289-291: New error variant is fine; consider Display over Debug for the detail.Using {:?} may leak internal structure. Prefer {0} if MetastoreErrorDetail implements Display, else keep as-is.
- #[error("MetastoreError: {0:?}")] + #[error("MetastoreError: {0}")] MetastoreError(Box<MetastoreErrorDetail>),src/handlers/http/mod.rs (1)
92-95: Handle empty schema list explicitly for clearer errors.
Schema::try_mergeerrors on empty input; return a targeted error when no schemas exist for the stream.pub async fn fetch_schema(stream_name: &str) -> anyhow::Result<arrow_schema::Schema> { - let res: Vec<Schema> = PARSEABLE.metastore.get_all_schemas(stream_name).await?; + let res: Vec<Schema> = PARSEABLE.metastore.get_all_schemas(stream_name).await?; + if res.is_empty() { + anyhow::bail!("No schema found for stream '{}'", stream_name); + } let new_schema = Schema::try_merge(res)?; Ok(new_schema) }src/handlers/http/users/dashboards.rs (2)
271-278: Make JSON error responses consistent with the rest of the codebaseOther modules explicitly set ContentType::json() when serializing metastore errors. Align for consistency.
- match self { - DashboardError::MetastoreError(e) => { - actix_web::HttpResponse::build(self.status_code()).json(e.to_detail()) - } + match self { + DashboardError::MetastoreError(e) => { + actix_web::HttpResponse::build(self.status_code()) + .insert_header(ContentType::json()) + .json(e.to_detail()) + }
236-254: Remove deadDashboardError::ObjectStoragevariant
DashboardError::ObjectStorageis never constructed or propagated—no code path in src/handlers/http/users/dashboards.rs invokes or maps anObjectStorageErrorinto this variant—so it can be removed to reduce noise.src/handlers/http/alerts.rs (1)
263-263: Delete ordering — consider stopping the task before removing from memoryYou delete in metastore, then from memory, then cancel the scheduled task. Flipping the last two reduces the window where a running task may reference a now-missing memory entry.
src/prism/home/mod.rs (1)
229-231: LGTM: switch to metastore for stream metadataUsing
get_all_stream_jsons(&stream, None)aligns with the new abstraction. Consider paging/streaming later if this grows large, but fine for now.src/query/mod.rs (3)
549-563: Parallelize manifest fetches to reduce latency.Fetching each manifest sequentially increases tail latency. Fire them concurrently and collect.
- let mut all_manifest_files = Vec::new(); - for manifest_item in merged_snapshot.manifests(&time_filter) { - // (fetch each serially) - } + use futures::future; + let futures = merged_snapshot + .manifests(&time_filter) + .into_iter() + .map(|mi| async move { + PARSEABLE + .metastore + .get_manifest(stream_name, mi.time_lower_bound, mi.time_upper_bound) + .await + .and_then(|m| m.ok_or_else(|| QueryError::CustomError("Missing manifest".into()))) + }); + let all_manifest_files: Vec<Manifest> = future::try_join_all(futures).await?;
525-542: Guard against duplicate manifest entries when merging snapshots.If multiple stream JSONs contain overlapping manifest_list entries, duplicates can slip in. Deduplicate by (time_lower_bound, time_upper_bound) or by manifest path before iterating.
- if let Ok(obs) = obs { - for ob in obs { - if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { - let snapshot = object_store_format.snapshot; - for manifest in snapshot.manifest_list { - merged_snapshot.manifest_list.push(manifest); - } - } - } - } + if let Ok(obs) = obs { + use std::collections::BTreeSet; + let mut seen = BTreeSet::new(); + for ob in obs { + if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { + for m in object_store_format.snapshot.manifest_list { + // adjust key as appropriate (e.g., include m.manifest_path if available) + let key = (m.time_lower_bound, m.time_upper_bound); + if seen.insert(key) { + merged_snapshot.manifest_list.push(m); + } + } + } + } + }
564-575: Remove stale commented-out code.The old object-store path is fully replaced. Keeping large commented blocks adds noise.
- // let all_manifest_files = collect_manifest_files( - // object_store, - // merged_snapshot - // .manifests(&time_filter) - // .into_iter() - // .sorted_by_key(|file| file.time_lower_bound) - // .map(|item| item.manifest_path) - // .collect(), - // ) - // .await - // .map_err(|err| anyhow::Error::msg(err.to_string()))?; + // (old object-store manifest collection removed)src/hottier.rs (2)
276-285: Wrap metastore errors without type erasure or misclassification.Mapping metastore errors into ObjectStorageError::MetastoreError(Box::new(e.to_detail())) risks losing the original type and may not match the variant’s expected payload. Prefer a dedicated HotTierError variant or map directly to HotTierError::Anyhow while preserving the original error string. Also, variable name s3_manifest_file_list no longer reflects the source.
- let mut s3_manifest_file_list = PARSEABLE + let mut manifest_map = PARSEABLE .metastore .get_all_manifest_files(&stream) .await - .map_err(|e| { - HotTierError::ObjectStorageError(ObjectStorageError::MetastoreError(Box::new( - e.to_detail(), - ))) - })?; + .map_err(|e| HotTierError::Anyhow(e.into()))?;Optionally, introduce a dedicated error variant:
pub enum HotTierError { + #[error("{0}")] + Metastore(#[from] crate::metastore::error::MetastoreError), }
308-312: Avoid duplicate downloads when the same file appears in multiple manifests.If a file is present in multiple Manifest objects for a date, the current extend(...) can enqueue duplicates (later skipped by exists(), but still extra work). Dedup by file_path when building storage_combined_manifest.
- for storage_manifest in manifest_files { - storage_combined_manifest - .files - .extend(storage_manifest.files.clone()); - } + use std::collections::HashSet; + let mut seen = HashSet::new(); + for storage_manifest in manifest_files { + for f in &storage_manifest.files { + if seen.insert(&f.file_path) { + storage_combined_manifest.files.push(f.clone()); + } + } + }src/handlers/http/logstream.rs (1)
50-83: Stream delete should also purge metastore entries for the streamCurrently only storage/staging/memory are cleaned. Please also remove the stream’s metastore record(s) to avoid orphaned metadata. If a single-call API exists (e.g., metastore.delete_stream or equivalent), use that; otherwise delete all related stream JSON objects via the metastore.
I can wire this once you confirm the metastore API name(s) intended for stream deletion.
src/handlers/http/users/filters.rs (1)
71-71: Consider 201 Created for POSTMinor UX: returning Created better reflects resource creation.
Apply:
- Ok((web::Json(filter), StatusCode::OK)) + Ok((web::Json(filter), StatusCode::CREATED))src/handlers/http/modal/mod.rs (1)
366-383: Tighten error handling and logs in load_from_storageImprove observability; avoid shadowing and log the concrete error.
Apply:
- async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { - let obs = PARSEABLE.metastore.get_node_metadata(node_type).await; - - let mut metadata = vec![]; - if let Ok(obs) = obs { - for object in obs { - //convert to NodeMetadata - match serde_json::from_slice::<NodeMetadata>(&object) { - Ok(node_metadata) => metadata.push(node_metadata), - Err(e) => error!("Failed to deserialize NodeMetadata: {:?}", e), - } - } - } else { - error!("Couldn't read from storage"); - } + async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { + let mut metadata = vec![]; + match PARSEABLE.metastore.get_node_metadata(node_type).await { + Ok(objects) => { + for object in objects { + match serde_json::from_slice::<NodeMetadata>(&object) { + Ok(node_metadata) => metadata.push(node_metadata), + Err(e) => error!("Failed to deserialize NodeMetadata from metastore: {:?}", e), + } + } + } + Err(e) => { + error!("Couldn't read node metadata from metastore: {}", e); + } + } // Return the metadata metadata }src/storage/gcs.rs (1)
246-272: Remove dead commented code.Commented legacy listing code adds noise and can mislead future edits. Prefer deleting it (git history preserves it).
- // async fn _list_streams(&self) -> Result<HashSet<LogStream>, ObjectStorageError> { - // let mut result_file_list = HashSet::new(); - // let resp = self.client.list_with_delimiter(None).await?; - // - // let streams = resp - // .common_prefixes - // .iter() - // .flat_map(|path| path.parts()) - // .map(|name| name.as_ref().to_string()) - // .filter(|name| name != PARSEABLE_ROOT_DIRECTORY && name != USERS_ROOT_DIR) - // .collect::<Vec<_>>(); - // - // for stream in streams { - // let stream_path = - // object_store::path::Path::from(format!("{}/{}", &stream, STREAM_ROOT_DIRECTORY)); - // let resp = self.client.list_with_delimiter(Some(&stream_path)).await?; - // if resp - // .objects - // .iter() - // .any(|name| name.location.filename().unwrap().ends_with("stream.json")) - // { - // result_file_list.insert(stream); - // } - // } - // - // Ok(result_file_list) - // }src/enterprise/utils.rs (1)
122-154: Tighten the map-building loop (avoid filter_map+for_each side-effects).Minor readability/perf: iterate once and insert directly; the current pattern allocates and returns dummy values.
- selected_files - .into_iter() - .filter_map(|file| { + for file in selected_files { let date = file.file_path.split("/").collect_vec(); // … compute file_date … if file_date < time_range.start { - None + continue; } else { let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); let date = RelativePathBuf::from_iter(date); parquet_files.entry(date).or_default().push(file); - Some("") } - }) - .for_each(|_| {}); + }src/metastore/mod.rs (1)
58-64: Avoid leaking internal structure in Display text forErrorvariant.
#[error("{self:?}")]dumps internals. Prefer the message field.- #[error("{self:?}")] + #[error("{message}")] Error { status_code: StatusCode, message: String, flow: String, },src/catalog/mod.rs (1)
350-360: Preserve metastore-aware metadata when creating for a “different partition.”If you retain this branch elsewhere, prefer
meta.clone()overObjectStoreFormat::default()to keeptime_partitionand stats intact.src/correlation.rs (2)
98-111: Return 404 for missing correlations.Currently returns
AnyhowError→ 500. Prefer a dedicatedNotFoundvariant mapped to 404.
353-362: Nice: JSON error bodies for Metastore errors.Consider unifying other errors to JSON too for consistency, but this is fine for now.
src/handlers/http/cluster/mod.rs (3)
1010-1011: Typo in user-facing log: “schedular” → “scheduler”.Minor polish; logs are part of the UX.
- info!("Setting up schedular for cluster metrics ingestion"); + info!("Setting up scheduler for cluster metrics ingestion");
768-775: Variable name nit: rename dresses → metrics.Improves readability.
- let dresses = fetch_cluster_metrics().await.map_err(|err| { + let metrics = fetch_cluster_metrics().await.map_err(|err| { error!("Fatal: failed to fetch cluster metrics: {:?}", err); PostError::Invalid(err.into()) })?; - Ok(actix_web::HttpResponse::Ok().json(dresses)) + Ok(actix_web::HttpResponse::Ok().json(metrics))
813-836: Delete node metadata for all roles concurrently to reduce tail latency.Current sequential awaits add ~4x latency on slow stores. Use try_join! to short-circuit on first error and run deletions in parallel.
- // Delete ingestor metadata - let removed_ingestor = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Ingestor) - .await?; - - // Delete indexer metadata - let removed_indexer = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Indexer) - .await?; - - // Delete querier metadata - let removed_querier = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Querier) - .await?; - - // Delete prism metadata - let removed_prism = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Prism) - .await?; + // Delete all node-type metadata concurrently + let (removed_ingestor, removed_indexer, removed_querier, removed_prism) = + futures::try_join!( + PARSEABLE + .metastore + .delete_node_metadata(&domain_name, NodeType::Ingestor), + PARSEABLE + .metastore + .delete_node_metadata(&domain_name, NodeType::Indexer), + PARSEABLE + .metastore + .delete_node_metadata(&domain_name, NodeType::Querier), + PARSEABLE + .metastore + .delete_node_metadata(&domain_name, NodeType::Prism), + )?;src/parseable/mod.rs (1)
107-151: DRY: metastore construction duplicated across all storage variants.Tiny helper reduces repetition and keeps the wiring consistent.
// Add once near this module (outside diffs for clarity) fn build_default_metastore(p: &impl ObjectStorageProvider) -> Arc<dyn Metastore> { Arc::new(ObjectStoreMetastore { storage: p.construct_client() }) }- // for now create a metastore without using a CLI arg - let metastore = ObjectStoreMetastore { - storage: args.storage.construct_client(), - }; + let metastore = build_default_metastore(&args.storage); ... - Arc::new(metastore), + metastore,src/metastore/metastores/object_store_metastore.rs (4)
336-341: Fix error message for clarityThe flow string mentions “get_all_streams” but the function is
get_all_stream_jsons.- message: "Incorrect server mode passed as input. Only `Ingest` is allowed." - .into(), - flow: "get_all_streams with mode".into(), + message: "Incorrect server mode passed as input. Only `Ingest` is allowed." + .into(), + flow: "get_all_stream_jsons with mode".into(),
509-518: More precise schema filter
contains(".schema")is brittle. Match the actual suffix.- Box::new(|file_name: String| file_name.contains(".schema")), + Box::new(|file_name: String| file_name.ends_with("schema.json")),
617-634: Delete all matching node metadata, not just the firstReturning after the first match can leave stale entries and inconsistent results.
If backward-compatible, iterate and delete all matches, and return whether any were deleted. I can provide a patch if you confirm desired semantics.
126-135: Avoid hard-coded directory literalsUse the shared constant for dashboards dir for consistency.
- let dashboards_path = users_dir.join(&user).join("dashboards"); + let dashboards_path = users_dir.join(&user).join(crate::handlers::http::users::DASHBOARDS_DIR);src/migration/mod.rs (1)
29-29: Remove dead code and unused import
to_bytesisn’t used here; theSerializeimport is only for it.-use serde::Serialize;-#[inline(always)] -pub fn to_bytes(any: &(impl ?Sized + Serialize)) -> Bytes { - serde_json::to_vec(any) - .map(|any| any.into()) - .expect("serialize cannot fail") -}Also applies to: 413-418
src/users/dashboards.rs (2)
70-84: Avoidunwrap()s inMetastoreObjectimplMake panics actionable with
expectmessages since the trait can’t returnResult.- RelativePathBuf::from_iter([ - USERS_ROOT_DIR, - self.author.as_ref().unwrap(), - DASHBOARDS_DIR, - &format!("{}.json", self.dashboard_id.unwrap()), - ]) + RelativePathBuf::from_iter([ + USERS_ROOT_DIR, + self.author + .as_ref() + .expect("dashboard.author must be set"), + DASHBOARDS_DIR, + &format!( + "{}.json", + self.dashboard_id + .expect("dashboard.dashboard_id must be set") + ), + ]) .to_string() } fn get_object_id(&self) -> String { - self.dashboard_id.unwrap().to_string() + self.dashboard_id + .expect("dashboard.dashboard_id must be set") + .to_string() }
236-243: Enforce title uniqueness per user, not globallyPrevents cross-user collisions.
- let has_duplicate = dashboards - .iter() - .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); + let has_duplicate = dashboards.iter().any(|d| { + d.author == dashboard.author + && d.title == dashboard.title + && d.dashboard_id != dashboard.dashboard_id + });src/alerts/mod.rs (1)
105-141: Avoid side effects in migrate_from_v1 (optional).The migration helper both transforms and persists (put_alert). Consider returning the migrated AlertConfig and letting the caller persist. This improves testability and keeps the function pure.
src/query/stream_schema_provider.rs (1)
513-529: Log failures when get_all_stream_jsons() errs (optional).Currently ignored. Emit a warning to aid debugging in Query/Prism modes.
Apply this diff:
- if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { - let obs = PARSEABLE - .metastore - .get_all_stream_jsons(&self.stream, None) - .await; - if let Ok(obs) = obs { + if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { + match PARSEABLE + .metastore + .get_all_stream_jsons(&self.stream, None) + .await + { + Ok(obs) => { for ob in obs { if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { let snapshot = object_store_format.snapshot; for manifest in snapshot.manifest_list { merged_snapshot.manifest_list.push(manifest); } } } - } + } + Err(e) => tracing::warn!( + "get_all_stream_jsons failed for {}: {}", + self.stream, + e + ), + } } else {src/storage/azure_blob.rs (1)
709-719: list_dirs returns nested components, not just immediate children (optional).Flattening
parts()yields deeper-level names. For “dirs at root/prefix”, pick only the first component.Apply this diff:
- Ok(resp - .common_prefixes - .iter() - .flat_map(|path| path.parts()) - .map(|name| name.as_ref().to_string()) - .collect::<Vec<_>>()) + Ok(resp + .common_prefixes + .iter() + .filter_map(|p| p.parts().next()) + .map(|name| name.as_ref().to_string()) + .collect::<Vec<_>>())Repeat the same change for
list_dirs_relative().src/storage/object_storage.rs (6)
272-276: Align list_with_delimiter signature with object_store to avoid needless clonesobject_store::ObjectStore expects Option<&Path>. Mirror that to prevent Path moves/copies and to match wrappers like metrics_layer.
- async fn list_with_delimiter( - &self, - prefix: Option<object_store::path::Path>, - ) -> Result<ListResult, ObjectStorageError>; + async fn list_with_delimiter( + &self, + prefix: Option<&object_store::path::Path>, + ) -> Result<ListResult, ObjectStorageError>;Note: implementors (S3/GCS/Azure/LocalFS) and call sites will need corresponding small changes (passing Some(&path) vs Some(path)).
294-300: Simplify schema cloning; avoid Arc clone + deref danceUse as_ref().clone() to get an owned Schema cleanly.
- let s = &*schema.clone(); - PARSEABLE - .metastore - .put_schema(s.clone(), stream_name) + let owned_schema: Schema = schema.as_ref().clone(); + PARSEABLE + .metastore + .put_schema(owned_schema, stream_name) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?;
606-613: Naming/doc mismatchDoc says “create schema from storage” but the function name says “from_metastore” and it writes back into the metastore. Align doc/name to avoid confusion.
550-604: Don’t silently ignore errors from get_all_stream_jsonsinto_iter().next() on Result drops Err and returns None. This masks metastore failures. Prefer explicit match and log/propagate errors.
- if let Some(stream_metadata_obs) = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) - .await - .into_iter() - .next() - && !stream_metadata_obs.is_empty() + let fetch = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) + .await; + if let Ok(stream_metadata_obs) = &fetch + && !stream_metadata_obs.is_empty() { for stream_metadata_bytes in stream_metadata_obs.iter() { let stream_ob_metadata =Optionally log fetch.err() to aid diagnostics.
315-328: Potential lost-update writes to stream.json; consider optimistic concurrencyAll these read-modify-write updates to stream.json have no versioning/CAS. Concurrent writers can clobber each other (e.g., retention and stats updates racing).
Mitigations:
- Add a revision/etag field in ObjectStoreFormat and enforce “if-match” semantics in Metastore.put_stream_json.
- Or serialize updates via a per-stream async mutex in PARSEABLE.
Want me to draft a small revision/CAS design?Also applies to: 337-350, 359-372, 403-416, 425-440, 447-461, 509-514
688-710: Minor: avoid full sort to find min/max datesYou can derive min/max in one pass to reduce allocations for very large catalogs. Low impact but easy win.
Example:
let (min_date, max_date) = parsed_dates.iter().fold((None, None), |acc, d| { /* update min/max */ acc });src/metastore/metastore_traits.rs (3)
25-26: Use a single async_trait macro source across the codebaseOther files use async_trait::async_trait. Switch from tonic::async_trait for consistency and to avoid needless dependency coupling.
-use tonic::async_trait; +use async_trait::async_trait;(No functional change; just macro import.)
Also applies to: 36-36
152-155: Consider requiring Send on MetastoreObjectThese objects are serialized before awaits today, but adding Send is a safe default for future async usage.
-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {
68-93: Clarify stream.json selection semantics in docsThe get_stream_json/get_all_stream_jsons docs are good; add a note that the metastore MUST honor server Mode when resolving the active stream.json to prevent cross-mode leakage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/enterprise/utils.rs (1)
81-91: Don’t swallow metastore errors; propagate them.Ignoring Err from get_all_stream_jsons can yield incomplete/incorrect query results.
Apply:
- let obs = PARSEABLE.metastore.get_all_stream_jsons(stream, None).await; - if let Ok(obs) = obs { - for ob in obs { + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e)))?; + for ob in obs { if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { let snapshot = object_store_format.snapshot; for manifest in snapshot.manifest_list { merged_snapshot.manifest_list.push(manifest); } } } - }src/catalog/mod.rs (1)
483-499: Persisting snapshot deletions to object store breaks the abstraction; write via metastore instead.You load meta via metastore but persist with storage.put_snapshot(), which can diverge from the configured Metastore. Persist through PARSEABLE.metastore.put_stream_json to avoid inconsistency.
Apply this diff:
- storage.put_snapshot(stream_name, meta.snapshot).await?; + PARSEABLE + .metastore + .put_stream_json(&meta, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?;Optional follow-up: If API allows, drop the storage parameter from this function and thread storage only into update_deleted_stats.
♻️ Duplicate comments (9)
src/enterprise/utils.rs (2)
65-71: Box the actual error, not its detail.Using e.to_detail() produces a String, which doesn’t implement Error and will fail type expectations.
Apply:
- .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?, + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e)))?,
95-109: Avoid panic on missing manifest and fix error boxing.Replace expect(...) with structured error; also box e, not e.to_detail().
Apply:
- for manifest_item in merged_snapshot.manifests(&time_filters) { - manifest_files.push( - PARSEABLE - .metastore - .get_manifest( - stream, - manifest_item.time_lower_bound, - manifest_item.time_upper_bound, - Some(manifest_item.manifest_path), - ) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? - .expect("Data is invalid for Manifest"), - ) - } + for manifest_item in merged_snapshot.manifests(&time_filters) { + let maybe_manifest = PARSEABLE + .metastore + .get_manifest( + stream, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound, + Some(manifest_item.manifest_path), + ) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e)))?; + match maybe_manifest { + Some(m) => manifest_files.push(m), + None => { + return Err(ObjectStorageError::Custom(format!( + "Manifest not found for {} [{} - {}]", + stream, manifest_item.time_lower_bound, manifest_item.time_upper_bound + ))) + } + } + }Optional concurrency (batch fetch) available on request.
src/metastore/mod.rs (3)
27-36: Ensure chrono’s serde feature is enabled.MetastoreErrorDetail.timestamp requires chrono with features=["serde"] to serialize.
Would you like me to open a follow-up to adjust Cargo.toml if it’s not already set?
91-146: Return 400 for client JSON issues in to_detail().These are client errors; don’t stamp 500 in the serialized detail.
Apply:
- MetastoreError::JsonParseError(e) => MetastoreErrorDetail { + MetastoreError::JsonParseError(e) => MetastoreErrorDetail { operation: "JsonParseError".to_string(), message: e.to_string(), stream_name: None, file_path: None, timestamp: Some(chrono::Utc::now()), metadata: std::collections::HashMap::new(), - status_code: 500, + status_code: 400, }, MetastoreError::JsonSchemaError { message } => MetastoreErrorDetail { operation: "JsonSchemaError".to_string(), message: message.clone(), stream_name: None, file_path: None, timestamp: Some(chrono::Utc::now()), metadata: std::collections::HashMap::new(), - status_code: 500, + status_code: 400, }, MetastoreError::InvalidJsonStructure { expected, found } => MetastoreErrorDetail { operation: "InvalidJsonStructure".to_string(), message: format!("Expected {}, found {}", expected, found), stream_name: None, file_path: None, timestamp: Some(chrono::Utc::now()), metadata: [ ("expected".to_string(), expected.clone()), ("found".to_string(), found.clone()), ] .into_iter() .collect(), - status_code: 500, + status_code: 400, }, MetastoreError::MissingJsonField { field } => MetastoreErrorDetail { operation: "MissingJsonField".to_string(), message: format!("Missing required field: {}", field), stream_name: None, file_path: None, timestamp: Some(chrono::Utc::now()), metadata: [("field".to_string(), field.clone())].into_iter().collect(), - status_code: 500, + status_code: 400, }, MetastoreError::InvalidJsonValue { field, reason } => MetastoreErrorDetail { operation: "InvalidJsonValue".to_string(), message: format!("Invalid value for field '{}': {}", field, reason), stream_name: None, file_path: None, timestamp: Some(chrono::Utc::now()), metadata: [ ("field".to_string(), field.clone()), ("reason".to_string(), reason.clone()), ] .into_iter() .collect(), - status_code: 500, + status_code: 400, },
149-159: Map JSON parse/validation variants to 400.StatusCode::BAD_REQUEST aligns with client-side JSON errors.
Apply:
- MetastoreError::JsonParseError(..) => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::JsonSchemaError { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::InvalidJsonStructure { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::MissingJsonField { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::InvalidJsonValue { .. } => StatusCode::INTERNAL_SERVER_ERROR, + MetastoreError::JsonParseError(..) => StatusCode::BAD_REQUEST, + MetastoreError::JsonSchemaError { .. } => StatusCode::BAD_REQUEST, + MetastoreError::InvalidJsonStructure { .. } => StatusCode::BAD_REQUEST, + MetastoreError::MissingJsonField { .. } => StatusCode::BAD_REQUEST, + MetastoreError::InvalidJsonValue { .. } => StatusCode::BAD_REQUEST,src/metastore/metastores/object_store_metastore.rs (3)
667-673: Exclude non-stream folders from list_streams.Avoid surfacing alerts/settings/system folders as streams.
Apply:
- .filter(|name| name != PARSEABLE_ROOT_DIRECTORY && name != USERS_ROOT_DIR) + .filter(|name| { + name != PARSEABLE_ROOT_DIRECTORY + && name != USERS_ROOT_DIR + && name != ALERTS_ROOT_DIRECTORY + && name != SETTINGS_ROOT_DIRECTORY + && name != "lost+found" + })
107-111: Don’t unwrap ULIDs from untrusted input.Return 400 on bad IDs instead of panicking.
Apply:
- let path = alert_json_path(Ulid::from_string(&obj.get_object_id()).unwrap()); + let id = Ulid::from_string(&obj.get_object_id()).map_err(|_| MetastoreError::Error { + status_code: StatusCode::BAD_REQUEST, + message: "Invalid alert id".into(), + flow: "put_alert".into(), + })?; + let path = alert_json_path(id);
176-229: Filter migration: delete only migrated v1 files, not the entire directory.Also avoid unnecessary clone on put.
Apply:
- // read filter object - let filter_bytes = self - .storage - .get_objects( - Some(&filters_path), - Box::new(|file_name| file_name.ends_with(".json")), - ) - .await?; - - for filter in filter_bytes { - // deserialize into Value - let mut filter_value = serde_json::from_slice::<serde_json::Value>(&filter)?; + // list objects with names so we can delete migrated v1 files + let resp = self + .storage + .list_with_delimiter(Some(object_store::path::Path::from(filters_path.to_string()))) + .await?; + for obj in resp.objects { + let obj_path = RelativePathBuf::from(obj.location.to_string()); + let bytes = self.storage.get_object(&obj_path).await?; + let mut filter_value = serde_json::from_slice::<serde_json::Value>(&bytes)?; @@ - if version == Some("v1") { - // delete older version of the filter - self.storage.delete_object(&filters_path).await?; + if version == Some("v1") { + // delete the specific legacy file + self.storage.delete_object(&obj_path).await?; @@ - let filter_bytes = to_bytes(&filter_value); - self.storage.put_object(&path, filter_bytes.clone()).await?; + let filter_bytes = to_bytes(&filter_value); + self.storage.put_object(&path, filter_bytes).await?;src/query/stream_schema_provider.rs (1)
412-430: Avoid panic on missing manifests; handle Option from get_manifest.expect(...) will crash queries if a manifest is missing/compacted. Handle None by warning and skipping (or return a planning error).
Apply this diff:
- for manifest_item in snapshot.manifests(time_filters) { - manifest_files.push( - PARSEABLE - .metastore - .get_manifest( - stream_name, - manifest_item.time_lower_bound, - manifest_item.time_upper_bound, - Some(manifest_item.manifest_path), - ) - .await - .map_err(|e| DataFusionError::Plan(e.to_string()))? - .expect("Data is invalid for Manifest"), - ) - } + for manifest_item in snapshot.manifests(time_filters) { + let manifest_opt = PARSEABLE + .metastore + .get_manifest( + stream_name, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound, + Some(manifest_item.manifest_path.clone()), + ) + .await + .map_err(|e| DataFusionError::Plan(e.to_string()))?; + if let Some(manifest) = manifest_opt { + manifest_files.push(manifest); + } else { + tracing::warn!( + "Manifest missing for stream={} [{:?} - {:?}]", + stream_name, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound + ); + } + }
🧹 Nitpick comments (9)
src/enterprise/utils.rs (1)
119-156: Simplify selection loop; eliminate filter_map/for_each and unwraps.Current code is harder to read and can panic on malformed paths. Use a straight loop with safe parsing.
Apply:
- for filter in time_filter_expr { - selected_files.retain(|file| !file.can_be_pruned(&filter)) - } - - selected_files - .into_iter() - .filter_map(|file| { - let date = file.file_path.split("/").collect_vec(); - - let year = &date[1][5..9]; - let month = &date[1][10..12]; - let day = &date[1][13..15]; - let hour = &date[2][5..7]; - let min = &date[3][7..9]; - let file_date = Utc - .with_ymd_and_hms( - year.parse::<i32>().unwrap(), - month.parse::<u32>().unwrap(), - day.parse::<u32>().unwrap(), - hour.parse::<u32>().unwrap(), - min.parse::<u32>().unwrap(), - 0, - ) - .unwrap(); - - if file_date < time_range.start { - None - } else { - let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); - - let date = RelativePathBuf::from_iter(date); - - parquet_files.entry(date).or_default().push(file); - Some("") - } - }) - .for_each(|_| {}); + for filter in time_filter_expr { + selected_files.retain(|file| !file.can_be_pruned(&filter)) + } + + for file in selected_files.into_iter() { + let parts = file.file_path.split('/').collect_vec(); + if parts.len() < 4 { continue; } + let y = parts.get(1).and_then(|s| s.get(5..9)).and_then(|s| s.parse::<i32>().ok()); + let m = parts.get(1).and_then(|s| s.get(10..12)).and_then(|s| s.parse::<u32>().ok()); + let d = parts.get(1).and_then(|s| s.get(13..15)).and_then(|s| s.parse::<u32>().ok()); + let h = parts.get(2).and_then(|s| s.get(5..7)).and_then(|s| s.parse::<u32>().ok()); + let n = parts.get(3).and_then(|s| s.get(7..9)).and_then(|s| s.parse::<u32>().ok()); + let (Some(y), Some(m), Some(d), Some(h), Some(n)) = (y, m, d, h, n) else { continue }; + if let Some(file_date) = Utc.with_ymd_and_hms(y, m, d, h, n, 0).single() { + if file_date < time_range.start { continue; } + let date = RelativePathBuf::from_iter(parts[1..4].iter().map(|s| s.to_string())); + parquet_files.entry(date).or_default().push(file); + } + }src/metastore/metastores/object_store_metastore.rs (2)
67-79: Don’t panic in trait methods; return a structured “not implemented” error.unimplemented!() will crash the process if invoked via trait object.
Apply:
- async fn initiate_connection(&self) -> Result<(), MetastoreError> { - unimplemented!() - } + async fn initiate_connection(&self) -> Result<(), MetastoreError> { + Err(MetastoreError::Error { + status_code: StatusCode::NOT_IMPLEMENTED, + message: "initiate_connection is not implemented for ObjectStoreMetastore".into(), + flow: "initiate_connection".into(), + }) + } @@ - async fn list_objects(&self) -> Result<(), MetastoreError> { - unimplemented!() - } + async fn list_objects(&self) -> Result<(), MetastoreError> { + Err(MetastoreError::Error { + status_code: StatusCode::NOT_IMPLEMENTED, + message: "list_objects is not implemented for ObjectStoreMetastore".into(), + flow: "list_objects".into(), + }) + } @@ - async fn get_object(&self) -> Result<(), MetastoreError> { - unimplemented!() - } + async fn get_object(&self) -> Result<(), MetastoreError> { + Err(MetastoreError::Error { + status_code: StatusCode::NOT_IMPLEMENTED, + message: "get_object is not implemented for ObjectStoreMetastore".into(), + flow: "get_object".into(), + }) + }
319-343: Fix flow label for bad-mode error.Make the flow string match the function for easier tracing.
Apply:
- flow: "get_all_streams with mode".into(), + flow: "get_all_stream_jsons with mode".into(),src/catalog/mod.rs (2)
300-303: Partition update routing heuristic may be brittle.Checking manifest_path.contains(manifest_path("")) risks false positives if paths change format. Prefer an explicit flag in ManifestItem or a helper that validates the backend/type.
386-404: Upper-bound construction: simplify or normalize error type.The NaiveTime values are always valid; you can drop ok_or(...) and unwrap safely, or map the IOError explicitly to ObjectStorageError to avoid relying on a Fromio::Error impl.
- .and_time( - NaiveTime::from_num_seconds_from_midnight_opt(23 * 3600 + 59 * 60 + 59, 999_999_999) - .ok_or(IOError::other("Failed to create upper bound for manifest"))?, - ) + .and_time( + NaiveTime::from_num_seconds_from_midnight_opt(23 * 3600 + 59 * 60 + 59, 999_999_999) + .unwrap(), + )src/query/stream_schema_provider.rs (1)
174-185: Typo: get_hottier_exectuion_plan.Nit: rename to get_hottier_execution_plan for consistency.
src/metastore/metastore_traits.rs (3)
36-38: Use a single async-trait macro across the codebase.Elsewhere uses async_trait::async_trait; here it’s tonic::async_trait. Standardize to the former to avoid confusion.
-use tonic::async_trait; +use async_trait::async_trait;And update the attribute accordingly.
126-131: Schema methods look fine; minor API polish.Consider accepting &Schema in put_schema to avoid clones by callers when they need to retain ownership.
- async fn put_schema(&self, obj: Schema, stream_name: &str) -> Result<(), MetastoreError>; + async fn put_schema(&self, obj: &Schema, stream_name: &str) -> Result<(), MetastoreError>;
153-156: MetastoreObject bounds.Add Send to allow objects to cross tasks safely in async contexts.
-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
src/catalog/mod.rs(6 hunks)src/enterprise/utils.rs(5 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/metastore/mod.rs(1 hunks)src/query/mod.rs(4 hunks)src/query/stream_schema_provider.rs(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/query/mod.rs
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/metastore/metastores/object_store_metastore.rssrc/query/stream_schema_provider.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/enterprise/utils.rssrc/catalog/mod.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/enterprise/utils.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/enterprise/utils.rs
🧬 Code graph analysis (2)
src/metastore/mod.rs (3)
src/handlers/http/query.rs (2)
from(603-605)status_code(587-593)src/storage/localfs.rs (2)
from(560-562)new(100-102)src/storage/s3.rs (2)
from(862-870)from(874-876)
src/query/stream_schema_provider.rs (1)
src/parseable/mod.rs (2)
storage(282-284)new(178-192)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (10)
src/catalog/mod.rs (6)
116-122: Metastore-backed stream.json load — looks good.Error mapping to ObjectStorageError::MetastoreError keeps the surface consistent.
126-129: Refactor to partition-first processing — good direction.Keeps the critical path free of storage handles and aligns with the new abstraction.
304-329: Happy path for existing-manifest updates — OK.Fetching via metastore, applying changes, then persisting is consistent with the new model. Verify that manifests[pos].manifest_path is always a fully qualified path acceptable to all Metastore implementations.
365-383: Finalizing snapshot via metastore — OK.Persisting stats and snapshot with put_stream_json is aligned with the abstraction.
434-447: Manifest write + path resolution through metastore — LGTM.Storing the resolved path in ManifestItem.manifest_path is the right move.
464-469: Snapshot inline update path — OK.Persisting via put_stream_json keeps both snapshot and stats in sync.
src/query/stream_schema_provider.rs (3)
490-498: Metastore-backed stream.json load — good.Serde error is mapped to Plan; consistent with the rest of the planner.
514-517: Merging multi-JSON snapshots from metastore — OK.Keeps Query/Prism behavior intact with centralized metadata.
561-562: Updated collect_from_snapshot signature usage — OK.Passing stream_name in aligns with metastore lookups.
src/metastore/metastore_traits.rs (1)
94-106: get_manifest signature is versatile — LGTM.Including manifest_url enables backends to shortcut lookups when paths are known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/catalog/mod.rs (3)
373-382: Write path OK, but still susceptible to RMW races.This
put_stream_jsonis good; pair it with a revision from the correspondingget_stream_jsonto avoid overwriting concurrent updates.
483-499: Mixing metastore read with object-store write will corrupt metadata;first_event_atis dropped.You load
metaviametastore.get_stream_jsonbut persist onlymeta.snapshotto object storage. This diverges sources and losesfirst_event_at/stats. Persist through the metastore.Apply this diff:
- let manifests = &mut meta.snapshot.manifest_list; - // Filter out items whose manifest_path contains any of the dates_to_delete - manifests.retain(|item| !dates.iter().any(|date| item.manifest_path.contains(date))); + let manifests = &mut meta.snapshot.manifest_list; + // Filter by manifest time-bounds to be metastore-agnostic + manifests.retain(|item| { + let lower = item.time_lower_bound.date_naive().to_string(); + let upper = item.time_upper_bound.date_naive().to_string(); + let key = if lower == upper { lower } else { format!("{lower}:{upper}") }; + !dates.iter().any(|d| d == &key) + }); PARSEABLE.get_stream(stream_name)?.reset_first_event_at(); meta.first_event_at = None; - storage.put_snapshot(stream_name, meta.snapshot).await?; + PARSEABLE + .metastore + .put_stream_json(&meta, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?;
495-495: Don’t rely onmanifest_path.contains(date); compare against bounds-derived keys.String contains breaks for alternative metastores and different path schemes. The diff above switches to comparing
{lower}:{upper}or just{lower}derived from the item’s time bounds, matchingpartition_pathsemantics.
♻️ Duplicate comments (4)
src/metastore/metastore_traits.rs (1)
39-41: Fix unusable generic object APIs (add selectors and concrete return types or remove).These stubs are not implementable/useful as-is. Either delete them and rely on the resource-specific methods, or add selectors and concrete return types.
- async fn list_objects(&self) -> Result<(), MetastoreError>; - async fn get_object(&self) -> Result<(), MetastoreError>; + async fn list_objects( + &self, + prefix: &str, + limit: Option<usize>, + ) -> Result<Vec<String>, MetastoreError>; + async fn get_object(&self, path: &str) -> Result<Option<Bytes>, MetastoreError>;src/query/stream_schema_provider.rs (1)
417-429: Don’t panic on missing manifests; handle None and continue (or error).
.expect(...)will crash queries when a manifest is missing/compacted. Map Option to control flow with a warn or return a descriptive error.- for manifest_item in snapshot.manifests(time_filters) { - manifest_files.push( - PARSEABLE - .metastore - .get_manifest( - stream_name, - manifest_item.time_lower_bound, - manifest_item.time_upper_bound, - Some(manifest_item.manifest_path), - ) - .await - .map_err(|e| DataFusionError::Plan(e.to_string()))? - .expect("Data is invalid for Manifest"), - ) - } + for manifest_item in snapshot.manifests(time_filters) { + let maybe = PARSEABLE + .metastore + .get_manifest( + stream_name, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound, + Some(manifest_item.manifest_path), + ) + .await + .map_err(|e| DataFusionError::Plan(e.to_string()))?; + if let Some(m) = maybe { + manifest_files.push(m); + } else { + tracing::warn!( + "Manifest missing for stream={} [{:?} - {:?}]", + stream_name, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound + ); + } + }src/enterprise/utils.rs (2)
65-71: Box the actual error, notto_detail()(compile failure).
to_detail()is likely a String and doesn’t implement Error; boxing it won’t compile. Boxe.- .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?, + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e)))?,
93-109: Avoid panic on missing manifest; return error or warn-and-skip.
.expect("Data is invalid for Manifest")will crash callers. Handle Option explicitly and also fix theto_detail()boxing.- let mut manifest_files = Vec::new(); - - for manifest_item in merged_snapshot.manifests(&time_filters) { - manifest_files.push( - PARSEABLE - .metastore - .get_manifest( - stream, - manifest_item.time_lower_bound, - manifest_item.time_upper_bound, - Some(manifest_item.manifest_path), - ) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? - .expect("Data is invalid for Manifest"), - ) - } + let mut manifest_files = Vec::new(); + for manifest_item in merged_snapshot.manifests(&time_filters) { + let maybe = PARSEABLE + .metastore + .get_manifest( + stream, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound, + Some(manifest_item.manifest_path), + ) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e)))?; + match maybe { + Some(m) => manifest_files.push(m), + None => { + tracing::warn!( + "Manifest missing for stream={} [{:?} - {:?}]", + stream, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound + ); + } + } + }
🧹 Nitpick comments (7)
src/metastore/metastore_traits.rs (3)
153-156: Make MetastoreObject Send for cross-task safety.These objects are passed into async methods; adding Send reduces surprises in multi-threaded executors.
-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {
25-26: Unify async_trait import for consistency.Elsewhere we use
async_trait::async_trait. Consider standardizing.-use tonic::async_trait; +use async_trait::async_trait;
126-131: Schema API consistency.
get_all_schemasreturns typedVec<Schema>whileget_schemareturns rawBytes. Consider returningSchema(orOption<Schema>) for symmetry.src/query/stream_schema_provider.rs (1)
407-430: Optional: fetch manifests concurrently for latency.Batch the per-window calls with FuturesOrdered/FuturesUnordered and try_collect to reduce end-to-end latency on long ranges.
src/enterprise/utils.rs (2)
123-156: Avoid side-effectfulfilter_map(...).for_each(...); use a straight loop.Improves readability and avoids constructing/discarding dummy values.
- selected_files - .into_iter() - .filter_map(|file| { - let date = file.file_path.split("/").collect_vec(); - - let year = &date[1][5..9]; - let month = &date[1][10..12]; - let day = &date[1][13..15]; - let hour = &date[2][5..7]; - let min = &date[3][7..9]; - let file_date = Utc - .with_ymd_and_hms( - year.parse::<i32>().unwrap(), - month.parse::<u32>().unwrap(), - day.parse::<u32>().unwrap(), - hour.parse::<u32>().unwrap(), - min.parse::<u32>().unwrap(), - 0, - ) - .unwrap(); - - if file_date < time_range.start { - None - } else { - let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); - - let date = RelativePathBuf::from_iter(date); - - parquet_files.entry(date).or_default().push(file); - Some("") - } - }) - .for_each(|_| {}); + for file in selected_files { + let parts = file.file_path.split('/').collect_vec(); + let year = &parts[1][5..9]; + let month = &parts[1][10..12]; + let day = &parts[1][13..15]; + let hour = &parts[2][5..7]; + let min = &parts[3][7..9]; + let file_date = Utc + .with_ymd_and_hms( + year.parse::<i32>().unwrap(), + month.parse::<u32>().unwrap(), + day.parse::<u32>().unwrap(), + hour.parse::<u32>().unwrap(), + min.parse::<u32>().unwrap(), + 0, + ) + .unwrap(); + if file_date >= time_range.start { + let date = RelativePathBuf::from_iter(parts[1..4].iter().cloned()); + parquet_files.entry(date).or_default().push(file); + } + }
128-143: Defensive parsing of partition path segments.Indexing/slicing and
unwrap()will panic on unexpected paths. Consider validating segment count and usingget()+strip_prefix("year=")style parsing.src/catalog/mod.rs (1)
434-445: Avoid double round-trip; letput_manifestreturn the resolved path/URL.You call
put_manifestthenget_manifest_path. Expose the path fromput_manifest(or return a handle) to reduce race surface and latency, and to support metastores where the path is assigned at write-time.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
src/catalog/mod.rs(6 hunks)src/enterprise/utils.rs(5 hunks)src/handlers/http/ingest.rs(2 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/metastore/mod.rs(1 hunks)src/query/mod.rs(4 hunks)src/query/stream_schema_provider.rs(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/query/mod.rs
- src/metastore/metastores/object_store_metastore.rs
- src/metastore/mod.rs
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/enterprise/utils.rssrc/query/stream_schema_provider.rssrc/catalog/mod.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/enterprise/utils.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/enterprise/utils.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/query/stream_schema_provider.rs
🧬 Code graph analysis (5)
src/enterprise/utils.rs (1)
src/parseable/mod.rs (4)
storage(282-284)serde_json(338-338)serde_json(344-344)new(178-192)
src/query/stream_schema_provider.rs (1)
src/parseable/mod.rs (4)
storage(282-284)new(178-192)serde_json(338-338)serde_json(344-344)
src/handlers/http/ingest.rs (5)
src/handlers/http/query.rs (2)
status_code(587-593)error_response(595-599)src/metastore/mod.rs (1)
status_code(149-159)src/handlers/http/logstream.rs (2)
status_code(578-613)error_response(615-626)src/handlers/http/users/dashboards.rs (2)
status_code(257-268)error_response(270-279)src/handlers/http/users/filters.rs (2)
status_code(130-139)error_response(141-152)
src/metastore/metastore_traits.rs (15)
src/storage/s3.rs (3)
get_object(569-571)get_objects(573-613)list_streams(692-697)src/metastore/metastores/object_store_metastore.rs (35)
get_object(77-79)get_objects(82-90)get_alerts(93-104)put_alert(107-111)delete_alert(114-120)get_targets(486-505)put_target(507-515)delete_target(517-525)get_dashboards(123-141)put_dashboard(144-152)delete_dashboard(155-161)get_filters(165-234)put_filter(237-245)delete_filter(248-255)get_correlations(258-276)put_correlation(279-285)delete_correlation(288-295)get_stream_json(300-315)put_stream_json(355-364)get_all_stream_jsons(318-352)get_all_manifest_files(367-409)get_manifest(412-446)put_manifest(462-472)delete_manifest(474-483)get_manifest_path(449-460)get_all_schemas(527-541)get_schema(543-545)put_schema(547-550)get_parseable_metadata(552-566)get_ingestor_metadata(568-577)put_parseable_metadata(579-587)get_node_metadata(589-604)delete_node_metadata(614-652)put_node_metadata(606-612)list_streams(654-691)src/storage/azure_blob.rs (3)
get_object(452-454)get_objects(456-496)list_streams(575-580)src/storage/localfs.rs (3)
get_object(140-159)get_objects(195-238)list_streams(290-310)src/storage/gcs.rs (3)
get_object(386-388)get_objects(390-430)list_streams(509-514)src/storage/object_storage.rs (3)
get_object(200-200)get_objects(202-206)list_streams(220-220)src/alerts/alert_traits.rs (1)
get_targets(70-70)src/alerts/alert_types.rs (3)
get_targets(282-284)get_object_path(70-72)get_object_id(74-76)src/users/dashboards.rs (3)
delete_dashboard(294-315)get_object_path(71-79)get_object_id(81-83)src/users/filters.rs (3)
delete_filter(130-133)get_object_path(50-57)get_object_id(59-61)src/catalog/manifest.rs (2)
get_object_path(94-96)get_object_id(98-100)src/correlation.rs (2)
get_object_path(217-219)get_object_id(221-223)src/alerts/target.rs (2)
get_object_path(336-338)get_object_id(340-342)src/storage/store_metadata.rs (2)
get_object_path(109-111)get_object_id(113-115)src/alerts/alert_structs.rs (2)
get_object_path(534-536)get_object_id(530-532)
src/catalog/mod.rs (1)
src/parseable/mod.rs (3)
serde_json(338-338)serde_json(344-344)new(178-192)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (6)
src/handlers/http/ingest.rs (4)
38-38: Importing MetastoreError here is correct.Keeps the handler decoupled while enabling precise status mapping and JSON error bodies.
479-481: Good addition: propagate MetastoreError via a transparent PostError variant.The #[from] + #[error(transparent)] combo preserves the original context and makes conversions ergonomic.
485-501: Grouping + alias import looks fine.The grouped BAD_REQUEST mapping reads clearly and matches similar handlers.
483-516: All wrapped error types already have standalone ResponseError implementations—no further consistency checks needed.src/catalog/mod.rs (2)
108-113: Allupdate_snapshotcall sites correctly use the new signature
Verified the sole invocation insrc/storage/object_storage.rsnow passes onlystream_nameandmanifest_files.
401-402: Check MSRV forstd::io::Error::otherand ensure error conversion
Confirm the project’s MSRV (viarust-versionin Cargo.toml or rust-toolchain) supportsstd::io::Error::other, and implement or verifyimpl From<std::io::Error> for ObjectStorageErrorexists—otherwise the?operator won’t compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/handlers/http/mod.rs (1)
90-95: Handle empty/invalid schema lists from the metastore to avoid merge errors.If
get_all_schemasreturns an empty vec or incompatible schemas,Schema::try_mergecan fail; surface a clearer error.Apply:
pub async fn fetch_schema(stream_name: &str) -> anyhow::Result<arrow_schema::Schema> { - let res: Vec<Schema> = PARSEABLE.metastore.get_all_schemas(stream_name).await?; - - let new_schema = Schema::try_merge(res)?; + let res: Vec<Schema> = PARSEABLE.metastore.get_all_schemas(stream_name).await?; + if res.is_empty() { + return Err(anyhow::anyhow!("No schema found for stream '{stream_name}'")); + } + let new_schema = Schema::try_merge(res) + .map_err(|e| anyhow::anyhow!("Failed to merge schemas for '{stream_name}': {e}"))?; Ok(new_schema) }src/handlers/http/modal/mod.rs (2)
346-353: Avoid panics on disk write; propagate errors and keep metastore write atomic-ish.
put_on_disk(...).expect(...)will crash the server on I/O issues. Propagate the error and only attempt the metastore write if disk write succeeds.Apply:
fn process_and_store_metadata( mut meta: Self, staging_path: &Path, node_type: NodeType, ) -> anyhow::Result<Arc<Self>> { Self::update_metadata(&mut meta, &PARSEABLE.options, node_type); - meta.put_on_disk(staging_path) - .expect("Couldn't write updated metadata to disk"); + meta.put_on_disk(staging_path)?; PARSEABLE.metastore.put_node_metadata(&meta).await?; Ok(Arc::new(meta)) }
356-363: Same here: replace expect with error propagation.Apply:
async fn store_new_metadata(meta: Self, staging_path: &Path) -> anyhow::Result<Arc<Self>> { - meta.put_on_disk(staging_path) - .expect("Couldn't write new metadata to disk"); + meta.put_on_disk(staging_path)?; PARSEABLE.metastore.put_node_metadata(&meta).await?; Ok(Arc::new(meta)) }src/handlers/http/cluster/mod.rs (1)
491-522: Don’texpecton parsing stream.json; skip bad entries and continue.A single malformed/legacy
stream.jsonwill currently panic the handler.Apply:
- for ob in obs { - let stream_metadata: ObjectStoreFormat = - serde_json::from_slice(&ob).expect("stream.json is valid json"); + for ob in obs { + let stream_metadata: ObjectStoreFormat = match serde_json::from_slice(&ob) { + Ok(v) => v, + Err(e) => { + warn!("Skipping invalid stream.json from metastore: {e}"); + continue; + } + };
♻️ Duplicate comments (1)
src/alerts/mod.rs (1)
997-999: Don’t swallow metastore read failures at startup.
unwrap_or_default() hides outages and can drop all alerts. Propagate the error (or fail fast) so startup clearly reflects metastore health.[suggested diff]
- let raw_objects = PARSEABLE.metastore.get_alerts().await.unwrap_or_default(); + let raw_objects = match PARSEABLE.metastore.get_alerts().await { + Ok(v) => v, + Err(e) => { + error!("Failed to load alerts from metastore: {e}"); + return Err(anyhow::Error::new(e)); + } + };
🧹 Nitpick comments (7)
src/alerts/mod.rs (3)
67-67: Consider trimming ObjectStorageError if unused here.
If alerts no longer directly touch object storage, consider removing this import and the AlertError variant in a follow-up to reduce surface area. If other paths still use it, ignore.
139-139: Persist-on-migrate: confirm idempotency and duplicates.
Ensure metastore.put_alert is idempotent for existing IDs (upsert) to avoid duplicate alerts if migrate_from_v1 is retried. Consider emitting a metric for migrate success/failure.
980-981: HTTP mapping: consider 503 for transient metastore issues.
Mapping all MetastoreError to 500 is acceptable. If the error can indicate backend unavailability/timeouts, consider mapping those to 503 Service Unavailable.src/handlers/http/modal/mod.rs (2)
323-338: Load-from-storage swallows errors; prefer bubbling up for diagnosability.
load_from_storagereturnsVec<NodeMetadata>and drops metastore errors;load_node_metadatacan’t distinguish empty vs. failed read.Consider changing to
anyhow::Result<Vec<NodeMetadata>>and handle at call site, or at least log the error details.
365-382: Include error context and avoid generic logs when metastore read fails.Apply:
- async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { - let obs = PARSEABLE.metastore.get_node_metadata(node_type).await; + async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { + let obs = PARSEABLE.metastore.get_node_metadata(node_type).await; let mut metadata = vec![]; - if let Ok(obs) = obs { + if let Ok(obs) = obs { for object in obs { //convert to NodeMetadata match serde_json::from_slice::<NodeMetadata>(&object) { Ok(node_metadata) => metadata.push(node_metadata), Err(e) => error!("Failed to deserialize NodeMetadata: {:?}", e), } } - } else { - error!("Couldn't read from storage"); + } else if let Err(e) = obs { + error!("Couldn't read node metadata from metastore: {:?}", e); } // Return the metadata metadata }src/handlers/http/cluster/mod.rs (2)
813-836: Optional: delete per-node-type concurrently to reduce latency.Four sequential deletes can be joined.
Apply:
- // Delete ingestor metadata - let removed_ingestor = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Ingestor) - .await?; - // Delete indexer metadata - let removed_indexer = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Indexer) - .await?; - // Delete querier metadata - let removed_querier = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Querier) - .await?; - // Delete prism metadata - let removed_prism = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Prism) - .await?; + let (removed_ingestor, removed_indexer, removed_querier, removed_prism) = futures::future::join4( + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Ingestor), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Indexer), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Querier), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Prism), + ).await; + let removed_ingestor = removed_ingestor?; + let removed_indexer = removed_indexer?; + let removed_querier = removed_querier?; + let removed_prism = removed_prism?;
769-775: Nit: variable naming.
dressesis likelymetrics; rename for clarity.Apply:
- let dresses = fetch_cluster_metrics().await.map_err(|err| { + let metrics = fetch_cluster_metrics().await.map_err(|err| { ... - Ok(actix_web::HttpResponse::Ok().json(dresses)) + Ok(actix_web::HttpResponse::Ok().json(metrics))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
Cargo.toml(1 hunks)src/alerts/alert_structs.rs(2 hunks)src/alerts/alert_types.rs(5 hunks)src/alerts/mod.rs(8 hunks)src/handlers/http/cluster/mod.rs(4 hunks)src/handlers/http/mod.rs(2 hunks)src/handlers/http/modal/ingest_server.rs(2 hunks)src/handlers/http/modal/mod.rs(5 hunks)src/handlers/http/oidc.rs(1 hunks)src/handlers/http/role.rs(1 hunks)src/lib.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- src/handlers/http/oidc.rs
- src/alerts/alert_structs.rs
- src/handlers/http/role.rs
- src/lib.rs
- src/alerts/alert_types.rs
- Cargo.toml
- src/handlers/http/modal/ingest_server.rs
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-04-07T13:23:10.092Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1288
File: src/handlers/http/modal/mod.rs:279-301
Timestamp: 2025-04-07T13:23:10.092Z
Learning: For critical operations like writing metadata to disk in NodeMetadata::put_on_disk(), it's preferred to let exceptions propagate (using expect/unwrap) rather than trying to recover with fallback mechanisms, as the failure indicates a fundamental system issue that needs immediate attention.
Applied to files:
src/alerts/mod.rs
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
Applied to files:
src/alerts/mod.rs
📚 Learning: 2025-06-16T09:50:38.636Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Applied to files:
src/handlers/http/cluster/mod.rs
🧬 Code graph analysis (3)
src/alerts/mod.rs (2)
src/parseable/mod.rs (1)
storage(282-284)src/alerts/alert_enums.rs (1)
from(44-50)
src/handlers/http/cluster/mod.rs (2)
src/parseable/mod.rs (1)
storage(282-284)src/handlers/http/modal/mod.rs (4)
node_type(569-569)node_type(582-584)domain_name(567-567)domain_name(574-576)
src/handlers/http/modal/mod.rs (2)
src/metastore/metastore_traits.rs (2)
get_object_path(154-154)get_object_id(155-155)src/parseable/mod.rs (1)
new(178-192)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (10)
src/alerts/mod.rs (4)
59-59: Good: wire MetastoreError into alerts module.
Import aligns with new error variant usage.
107-107: Signature simplification is fine.
Dropping the store parameter makes migration independent of storage choice; call sites below are updated.
951-953: AlertError: transparent MetastoreError variant looks good.
This enables?propagation from metastore operations.
1016-1020: Call sites updated to new migrate_from_v1 signature.
Looks correct.Also applies to: 1036-1040
src/handlers/http/mod.rs (1)
27-27: Import change looks good.Brings PARSEABLE into scope for metastore access; no issues spotted.
src/handlers/http/modal/mod.rs (2)
36-41: MetastoreObject import is appropriate.Required for the impl below; no issues.
276-284: Verify keying semantics: object_id vs deletion by domain_name.
get_object_id()returnsnode_id, while delete APIs in cluster usedomain_name. Ensure the metastore implementation indexes/searches consistently to avoid orphaned or duplicate entries.Would you like me to add a helper in the metastore to delete by
node_id(and have cluster call it), or to makeobject_idderive fromdomain_name?src/handlers/http/cluster/mod.rs (3)
47-47: Mode import addition is correct.Needed for
get_all_stream_jsons(..., Some(Mode::Ingest)).
58-58: Updated modal imports look consistent.Aliasing and traits line up with generic usages below.
777-798: Generic decode and log-on-fail approach is fine.Deserializing into the requested
Tand filtering invalid entries is acceptable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
src/parseable/streams.rs (3)
674-680: Avoid appending to stale .part files; truncate instead.Opening the part file with append(true) risks duplicating data if a previous run left a .part behind. Use write(true) + truncate(true) to start clean.
- let mut part_file = OpenOptions::new() - .create(true) - .append(true) - .open(part_path) + let mut part_file = OpenOptions::new() + .create(true) + .write(true) + .truncate(true) + .open(part_path) .map_err(|_| StagingError::Create)?;
540-549: Don’t default time sort to column 0 when the partition field is missing.Falling back to index 0 silently sorts on the wrong column and sets encoding on a non-existent column path. Guard these operations behind a successful index lookup.
- let time_partition_field = time_partition.map_or(DEFAULT_TIMESTAMP_KEY, |tp| tp.as_str()); - - // Find time partition index - let time_partition_idx = merged_schema.index_of(time_partition_field).unwrap_or(0); - - let mut props = WriterProperties::builder() + let time_partition_field = time_partition.map_or(DEFAULT_TIMESTAMP_KEY, |tp| tp.as_str()); + let mut props = WriterProperties::builder() .set_max_row_group_size(self.options.row_group_size) .set_compression(self.options.parquet_compression.into()) - .set_column_encoding( - ColumnPath::new(vec![time_partition_field.to_string()]), - Encoding::DELTA_BINARY_PACKED, - ); - - // Create sorting columns - let mut sorting_column_vec = vec![SortingColumn { - column_idx: time_partition_idx as i32, - descending: true, - nulls_first: true, - }]; + ; + + // Create sorting columns + let mut sorting_column_vec = Vec::new(); + if let Ok(idx) = merged_schema.index_of(time_partition_field) { + props = props.set_column_encoding( + ColumnPath::new(vec![time_partition_field.to_string()]), + Encoding::DELTA_BINARY_PACKED, + ); + sorting_column_vec.push(SortingColumn { + column_idx: idx as i32, + descending: true, + nulls_first: true, + }); + } else { + warn!( + "Time partition field '{}' not found in schema; skipping time sort/encoding.", + time_partition_field + ); + }Also applies to: 551-557, 559-572
486-500: Update outdated comment to match actual schema path
The code writes{stream_name}.schemaunderdata_pathandget_schemas_if_presentloads all*.schemafiles there. Remove or correct the comment “the path should be stream/.ingestor.{id}.schema” to reflect the actual{stream_name}.schemafilename.src/storage/object_storage.rs (3)
456-487: Don’t swallow all metastore errors; fallback only on NotFound.
match ... { Ok(..) => .., Err(_) => { ...fallback... } }hides network or authorization errors. Restrict fallback to a “not found” condition; propagate others.- let stream_metadata = match PARSEABLE - .metastore - .get_stream_json(stream_name, false) - .await - { - Ok(data) => data, - Err(_) => { + let stream_metadata = match PARSEABLE + .metastore + .get_stream_json(stream_name, false) + .await + { + Ok(data) => data, + Err(e) if e.to_detail().status_code == http::StatusCode::NOT_FOUND => { // get the base stream metadata let bytes = PARSEABLE .metastore .get_stream_json(stream_name, true) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; let mut config = serde_json::from_slice::<ObjectStoreFormat>(&bytes) - .expect("parseable config is valid json"); + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; if PARSEABLE.options.mode == Mode::Ingest { config.stats = FullStats::default(); config.snapshot.manifest_list = vec![]; } PARSEABLE .metastore .put_stream_json(&config, stream_name) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; bytes } + Err(e) => return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), };Note: if
to_detail().status_codeisn’t available, switch on the crate’s concrete NotFound variant instead.
510-526: Bug: treatingResultasOptionbreaks the ingestor creation logic.
get_all_stream_jsons(...).awaitreturns aResult<_, _>, but the code uses.into_iter().next()andis_empty()as if it were aVec. This won’t compile.- if let Some(stream_metadata_obs) = - PARSEABLE.metastore.get_all_stream_jsons(stream_name, true).await - .into_iter().next() - && !stream_metadata_obs.is_empty() - { + if let Ok(stream_metadata_obs) = + PARSEABLE.metastore.get_all_stream_jsons(stream_name, Some(Mode::Ingest)).await + { + if stream_metadata_obs.is_empty() { + return Ok(Bytes::new()); + } let querier_stream_metadata = serde_json::from_slice::<ObjectStoreFormat>(&stream_metadata_obs[0])?; let stream_metadata = ObjectStoreFormat { stats: FullStats::default(), snapshot: Snapshot::default(), ..querier_stream_metadata }; let stream_metadata_bytes: Bytes = serde_json::to_vec(&stream_metadata)?.into(); PARSEABLE .metastore .put_stream_json(&stream_metadata, stream_name) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; return Ok(stream_metadata_bytes); - } + }
539-589: SameResultvsOptionissue here; fix the let-chain and error handling.Use
if let Ok(vec)and early-return on empty; current code tries to call.is_empty()on aResult.- if let Some(stream_metadata_obs) = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) - .await - .into_iter() - .next() - && !stream_metadata_obs.is_empty() - { + if let Ok(stream_metadata_obs) = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) + .await + { + if stream_metadata_obs.is_empty() { + return Ok(Bytes::new()); + } for stream_metadata_bytes in stream_metadata_obs.iter() { let stream_ob_metadata = - serde_json::from_slice::<ObjectStoreFormat>(stream_metadata_bytes)?; + serde_json::from_slice::<ObjectStoreFormat>(stream_metadata_bytes) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; all_log_sources.extend(stream_ob_metadata.log_source.clone()); } … let stream_ob_metadata = - serde_json::from_slice::<ObjectStoreFormat>(&stream_metadata_obs[0])?; + serde_json::from_slice::<ObjectStoreFormat>(&stream_metadata_obs[0]) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; … PARSEABLE .metastore .put_stream_json(&stream_metadata, stream_name) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; return Ok(stream_metadata_bytes); }src/alerts/target.rs (1)
112-120: Make delete atomic w.r.t. persistenceYou remove from memory before deleting from metastore. If delete_target() fails, memory diverges from durable state. Delete in metastore first, then remove from the map.
- let target = self - .target_configs - .write() - .await - .remove(target_id) - .ok_or(AlertError::InvalidTargetID(target_id.to_string()))?; - PARSEABLE.metastore.delete_target(&target).await?; - Ok(target) + // fetch without mutating memory + let target = self + .target_configs + .read() + .await + .get(target_id) + .cloned() + .ok_or(AlertError::InvalidTargetID(target_id.to_string()))?; + + // persist first + PARSEABLE.metastore.delete_target(&target).await?; + + // then update memory + let removed = self + .target_configs + .write() + .await + .remove(target_id) + .expect("target must exist after successful metastore delete"); + Ok(removed)src/query/mod.rs (1)
525-538: Don’t swallow metastore errors when enumerating stream.jsonThe if let Ok(obs) = obs silently hides failures and can undercount manifests. Propagate the error; optionally warn on per-file JSON parse failures.
- let obs = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(obs) = obs { - for ob in obs { - if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { - let snapshot = object_store_format.snapshot; - for manifest in snapshot.manifest_list { - merged_snapshot.manifest_list.push(manifest); - } - } - } - } + use tracing::warn; + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await?; + for ob in obs { + match serde_json::from_slice::<ObjectStoreFormat>(&ob) { + Ok(osf) => merged_snapshot.manifest_list.extend(osf.snapshot.manifest_list), + Err(e) => warn!("Skipping invalid stream.json for {stream_name}: {e}"), + } + }src/alerts/mod.rs (1)
995-1010: Don’t hold the alerts write lock across awaits in load().load() acquires a write lock and then awaits (JSON parse, migration, metastore I/O, task scheduling). This blocks all readers/writers and risks head-of-line blocking. Take short, localized locks only when inserting into the map.
Apply this diff to narrow lock scope:
- let mut map = self.alerts.write().await; + // Avoid holding the write lock across async calls; take short, localized locks only when inserting. @@ - if alert.get_state().eq(&AlertState::Disabled) { - map.insert(*alert.get_id(), alert); - continue; - } + if alert.get_state().eq(&AlertState::Disabled) { + self.alerts.write().await.insert(*alert.get_id(), alert); + continue; + } @@ - map.insert(*alert.get_id(), alert); + self.alerts.write().await.insert(*alert.get_id(), alert);Also applies to: 1063-1066, 1083-1084
♻️ Duplicate comments (13)
src/handlers/http/role.rs (1)
145-150: Nice: panic removed in favor of a typed error.Replacing
expect("metadata is initialized")with an explicit error is the right call.src/handlers/http/oidc.rs (1)
447-453: Same serde_json error mapping issue as before — convert into ObjectStorageError.Mirror the fix used in role handler to ensure compilation and consistent error surfacing.
- Ok(serde_json::from_slice::<StorageMetadata>(&metadata)?) + let metadata: StorageMetadata = serde_json::from_slice(&metadata) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; + Ok(metadata)Optional: prefer a 404-style error (or
MetastoreErrordetail) instead ofCustomfor the None case.src/storage/object_storage.rs (2)
272-276: Verify all backends implement the newlist_with_delimitersignature.S3/GCS/Azure look covered; LocalFS intentionally returns Unsupported. Please run the check below to confirm no missing impls.
#!/bin/bash set -euo pipefail rg -nP 'trait\s+ObjectStorage\b.*?list_with_delimiter' -C2 src/storage/object_storage.rs echo echo "Implementations touching list_with_delimiter:" rg -n 'list_with_delimiter\s*\(' -C2 src/storage | sed 's/^/ /'
982-999: Removeunwrap()onSchema::try_mergeto avoid panic.Bubble up a typed error; this was flagged previously elsewhere.
- let new_schema = Schema::try_merge(vec![ - schema, - serde_json::from_slice::<Schema>(&stream_schema)?, - ]) - .unwrap(); + let existing: Schema = serde_json::from_slice(&stream_schema) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; + let new_schema = Schema::try_merge(vec![schema, existing]) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?;src/handlers/http/modal/utils/rbac_utils.rs (2)
24-37: Follow-up: confirm migration intent (reads via metastore, writes to remote+staging).This split reads from metastore but still writes both remote and staging. Verify this is intentional for the migration phase.
26-31: Map serde_json error into ObjectStorageError; avoid panic on missing metadata.Same fix as OIDC/role: convert the serde error and return a 404-style/typed error when metadata is absent.
- Ok(serde_json::from_slice::<StorageMetadata>(&metadata)?) + let metadata: StorageMetadata = serde_json::from_slice(&metadata) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; + Ok(metadata)Optional:
- .ok_or_else(|| ObjectStorageError::Custom("parseable metadata not initialized".into()))?; + .ok_or_else(|| ObjectStorageError::NotFound("parseable metadata not initialized".into()))?;src/alerts/target.rs (1)
69-73: Persist-first update: good fixSwitching to metastore.put_target() before mutating memory keeps state consistent on failure. Nice.
src/query/mod.rs (1)
549-567: Good: graceful handling for missing manifestSwitch to ok_or_else + QueryError avoids panic and yields a clear error.
src/metastore/metastores/object_store_metastore.rs (3)
98-106: Validate ULID: good upgradeParsing ULID and returning a 400-style MetastoreError is the right behavior.
171-217: Filter migration deletes the whole directory; delete per-file insteaddelete_object(&filters_path) targets the directory. List objects with names, migrate each, and delete only the migrated v1 file.
- // read filter object - let filter_bytes = self - .storage - .get_objects( - Some(&filters_path), - Box::new(|file_name| file_name.ends_with(".json")), - ) - .await?; - - for filter in filter_bytes { - // deserialize into Value - let mut filter_value = serde_json::from_slice::<serde_json::Value>(&filter)?; + // list named objects so we can migrate and delete specific legacy files + let resp = self + .storage + .list_with_delimiter(Some(object_store::path::Path::from(filters_path.to_string()))) + .await?; + for obj in resp.objects { + let obj_path = RelativePathBuf::from(obj.location.to_string()); + let bytes = self.storage.get_object(&obj_path).await?; + let mut filter_value = serde_json::from_slice::<serde_json::Value>(&bytes)?; @@ - if version == Some("v1") { - // delete older version of the filter - self.storage.delete_object(&filters_path).await?; + if version == Some("v1") { + // delete the specific legacy file + self.storage.delete_object(&obj_path).await?; @@ - let filter_bytes = to_bytes(&filter_value); - self.storage.put_object(&path, filter_bytes.clone()).await?; + let filter_bytes = to_bytes(&filter_value); + self.storage.put_object(&path, filter_bytes).await?;
522-539: Don’t panic on malformed schemas; warn and skipMalformed schema bytes shouldn’t crash the server.
- .iter() - // we should be able to unwrap as we know the data is valid schema - .map(|byte_obj| { - serde_json::from_slice(byte_obj) - .unwrap_or_else(|_| panic!("got an invalid schema for stream: {stream_name}")) - }) - .collect()) + .iter() + .filter_map(|bytes| { + serde_json::from_slice::<Schema>(bytes) + .inspect_err(|e| warn!("Invalid schema for stream {}: {}", stream_name, e)) + .ok() + }) + .collect())src/query/stream_schema_provider.rs (1)
416-437: Fixed: no panic on missing manifests; graceful degrade.Replacing expect(...) with get_manifest(...)->Option and warning on None avoids crashes seen under concurrent compaction/missing manifests. This addresses the prior “avoid panic” feedback.
src/metastore/metastore_traits.rs (1)
39-40: Generic getter now has a selector—good.get_objects(&self, parent_path: &str) returns Bytes with a clear selector, addressing the earlier “generic API without selectors” concern.
🧹 Nitpick comments (18)
src/parseable/streams.rs (3)
351-359: Replace expect with graceful skip to avoid crash on odd filesystems.A missing created()/modified() timestamp will panic here. Prefer skipping the file with a warn.
- arrow_files.retain(|path| { - let creation = path - .metadata() - .ok() - .and_then(|meta| meta.created().or_else(|_| meta.modified()).ok()) - .expect("Arrow file should have a valid creation or modified time"); - minute_from_system_time(creation) < minute_from_system_time(exclude) - }); + arrow_files.retain(|path| { + let Some(creation) = path + .metadata() + .ok() + .and_then(|m| m.created().or_else(|_| m.modified()).ok()) + else { + warn!("Skipping arrow file with missing timestamps: {}", path.display()); + return false; + }; + minute_from_system_time(creation) < minute_from_system_time(exclude) + });
756-764: Avoid unwraps in schema update path.Both try_new(...).unwrap() and Schema::try_merge(...).unwrap() can panic during normal operations; return current_schema with a warn on failure.
- let record_reader = MergedRecordReader::try_new(&staging_files).unwrap(); + let record_reader = match MergedRecordReader::try_new(&staging_files) { + Ok(r) => r, + Err(e) => { + warn!("Failed to build record reader for updated schema: {e}"); + return current_schema; + } + }; @@ - Schema::try_merge(vec![schema, current_schema]).unwrap() + match Schema::try_merge(vec![schema, current_schema]) { + Ok(s) => s, + Err(e) => { + warn!("Schema merge failed; returning current schema: {e}"); + current_schema + } + }
1452-1466: Tests: pass the staging directory (not the file path) as arrow_path_to_parquet’s base.This better mirrors production calls, reduces confusion, and still validates the filename logic.
- let result = arrow_path_to_parquet(&file_path, &file_path, random_string); + let result = arrow_path_to_parquet(temp_dir.path(), &file_path, random123);- let result = arrow_path_to_parquet(&file_path, &file_path, random456); + let result = arrow_path_to_parquet(temp_dir.path(), &file_path, random456);Also applies to: 1469-1491
src/storage/object_storage.rs (2)
283-295: Minor: avoid unnecessary clone indirection.
let s = &*schema.clone();thens.clone()is roundabout. Useschema.as_ref().clone()or pass an ownedSchemadirectly ifput_schemaaccepts it.- let s = &*schema.clone(); - PARSEABLE - .metastore - .put_schema(s.clone(), stream_name) + PARSEABLE + .metastore + .put_schema(schema.as_ref().clone(), stream_name) .await
304-311: De-duplicate “get-modify-put” stream_json updates.These blocks repeat parsing, mutation, and saving. Extract a small helper to reduce errors and ensure uniform error mapping.
Example helper (place outside the trait):
async fn update_stream_json<F>(stream: &str, f: F) -> Result<(), ObjectStorageError> where F: FnOnce(&mut ObjectStoreFormat), { let mut fmt: ObjectStoreFormat = serde_json::from_slice( &PARSEABLE.metastore.get_stream_json(stream, false).await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?, ).map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; f(&mut fmt); PARSEABLE.metastore.put_stream_json(&fmt, stream).await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))) }Then each method becomes a one-liner calling
update_stream_json(...).Also applies to: 326-333, 348-355, 392-399, 414-420, 436-442
src/alerts/target.rs (1)
70-71: Avoid unnecessary clone on insertYou own target; no need to clone.
- let mut map = self.target_configs.write().await; - map.insert(target.id, target.clone()); + let mut map = self.target_configs.write().await; + map.insert(target.id, target);src/query/mod.rs (2)
513-519: Avoid fetching base stream.json when not usedIn Query/Prism modes you immediately switch to get_all_stream_jsons and ignore object_store_format. Gate the initial fetch to non-Query/Prism modes to save I/O.
- let object_store_format: ObjectStoreFormat = serde_json::from_slice( - &PARSEABLE - .metastore - .get_stream_json(stream_name, false) - .await?, - )?; + // Only needed outside Query/Prism modes (handled below) + let object_store_format: Option<ObjectStoreFormat> = if PARSEABLE.options.mode == Mode::Query + || PARSEABLE.options.mode == Mode::Prism + { + None + } else { + Some(serde_json::from_slice( + &PARSEABLE.metastore.get_stream_json(stream_name, false).await?, + )?) + };And later:
- } else { - merged_snapshot = object_store_format.snapshot; - } + } else if let Some(osf) = object_store_format { + merged_snapshot = osf.snapshot; + }
549-567: Deduplicate manifest windows before fetchingMultiple stream.json files can reference the same manifest; fetching twice wastes I/O and risks duplicates in results.
- for manifest_item in merged_snapshot.manifests(&time_filter) { + let mut seen = std::collections::HashSet::new(); + for manifest_item in merged_snapshot.manifests(&time_filter) { + if !seen.insert(manifest_item.manifest_path.clone()) { + continue; // already processed + } let manifest_opt = PARSEABLEWould you like me to add a quick test to assert no duplicate paths are returned for a given time window?
src/metastore/metastores/object_store_metastore.rs (1)
666-677: Exclude non-stream top-level folders like lost+foundAligns with LocalFS behavior; avoids surfacing non-stream directories.
- .filter(|name| { + .filter(|name| { name != PARSEABLE_ROOT_DIRECTORY && name != USERS_ROOT_DIR && name != SETTINGS_ROOT_DIRECTORY - && name != ALERTS_ROOT_DIRECTORY + && name != ALERTS_ROOT_DIRECTORY + && name != "lost+found" })src/alerts/mod.rs (1)
1011-1044: Unexpected fatal exit on unsupported alert types during load.Returning Err(...) for enterprise-only types (Anomaly/Forecast) aborts the entire load. Consider logging and skipping unsupported alerts so OSS can still serve existing alerts.
src/query/stream_schema_provider.rs (3)
497-505: Deserialize stream.json via metastore; add context to errors.Current mapping loses context (e.g., stream name). Consider enriching the Plan error for faster diagnosis.
- ) - .map_err(|e| DataFusionError::Plan(e.to_string()))?; + ) + .map_err(|e| DataFusionError::Plan(format!( + "failed to deserialize stream.json for stream={}: {}", + self.stream, e + )))?;
521-536: Don’t silently swallow metastore errors when fetching all stream JSONs.If get_all_stream_jsons fails, the scan proceeds without those snapshots and no telemetry is emitted. At least warn; optionally propagate as Plan error when running in Query/Prism modes.
- let obs = PARSEABLE - .metastore - .get_all_stream_jsons(&self.stream, None) - .await; - if let Ok(obs) = obs { + let obs = PARSEABLE.metastore.get_all_stream_jsons(&self.stream, None).await; + match obs { + Ok(obs) => { for ob in obs { if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { let snapshot = object_store_format.snapshot; for manifest in snapshot.manifest_list { merged_snapshot.manifest_list.push(manifest); } } } - } + } + Err(e) => { + tracing::warn!("failed to load stream.jsons from metastore for stream={}: {}", self.stream, e); + } + }
174-185: Minor: typos and lossy path conversion.
- get_hottier_exectuion_plan: spelling typo in “exectuion”.
- to_str().unwrap() can panic for non-UTF8 paths; prefer to_string_lossy().
Also applies to: 193-201
src/metastore/metastore_traits.rs (5)
25-26: Unify async_trait import to avoid tonic coupling.Elsewhere the codebase uses async_trait::async_trait. Prefer that for consistency and to avoid an unnecessary tonic dependency surface here.
-use tonic::async_trait; +use async_trait::async_trait; @@ -#[async_trait] +#[async_trait] pub trait Metastore: std::fmt::Debug + Send + Sync {Also applies to: 36-38
98-104: Parameter naming: manifest_url is actually a path.Call it manifest_path to reflect semantics across backends (S3/local). This is a naming-only cleanup but reduces confusion.
- manifest_url: Option<String>, + manifest_path: Option<String>,
127-129: Borrow Schema in put_schema to avoid unnecessary moves.Arrow Schema is cheap to clone but no need to move ownership.
- async fn put_schema(&self, obj: Schema, stream_name: &str) -> Result<(), MetastoreError>; + async fn put_schema(&self, obj: &Schema, stream_name: &str) -> Result<(), MetastoreError>;
151-154: MetastoreObject should be Send.Many put_* methods accept &dyn MetastoreObject and then await I/O (after serialization). While we serialize before await, requiring Send is safer for future uses and matches typical trait-object bounds in async contexts.
-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {
92-123: Given that theMetastoreObjectimpl forManifesthas bothget_object_pathandget_object_idunimplemented (panicking if called) and theput_manifest/delete_manifestimplementations never invoke these methods, requiring a&dyn MetastoreObjectis unnecessary—and misleading. Change the signatures to take&Manifestdirectly.Refactor:
Insrc/metastore/metastore_traits.rsupdate- async fn put_manifest( - &self, - obj: &dyn MetastoreObject, + async fn put_manifest( + &self, + manifest: &Manifest, stream_name: &str, lower_bound: DateTime<Utc>, upper_bound: DateTime<Utc>, ) -> Result<(), MetastoreError>; - async fn delete_manifest( - &self, - stream_name: &str, + async fn delete_manifest( + &self, + stream_name: &str, lower_bound: DateTime<Utc>, upper_bound: DateTime<Utc>, ) -> Result<(), MetastoreError>;And update all trait impls/callers (
object_store_metastore.rs,src/catalog/mod.rs) to pass&manifest: &Manifest. Document thatManifestis not a genericMetastoreObject.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
src/alerts/mod.rs(8 hunks)src/alerts/target.rs(4 hunks)src/handlers/http/alerts.rs(3 hunks)src/handlers/http/modal/utils/rbac_utils.rs(1 hunks)src/handlers/http/oidc.rs(1 hunks)src/handlers/http/role.rs(1 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/metastore/mod.rs(1 hunks)src/parseable/streams.rs(1 hunks)src/query/mod.rs(3 hunks)src/query/stream_schema_provider.rs(5 hunks)src/storage/object_storage.rs(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/metastore/mod.rs
- src/handlers/http/alerts.rs
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/parseable/streams.rs
📚 Learning: 2025-09-06T04:26:17.168Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/handlers/http/oidc.rssrc/alerts/target.rssrc/metastore/metastore_traits.rssrc/handlers/http/role.rssrc/alerts/mod.rssrc/storage/object_storage.rssrc/handlers/http/modal/utils/rbac_utils.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/metastore/metastores/object_store_metastore.rssrc/query/stream_schema_provider.rssrc/storage/object_storage.rs
📚 Learning: 2025-06-18T06:39:04.775Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Applied to files:
src/query/mod.rs
📚 Learning: 2025-04-07T13:23:10.092Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1288
File: src/handlers/http/modal/mod.rs:279-301
Timestamp: 2025-04-07T13:23:10.092Z
Learning: For critical operations like writing metadata to disk in NodeMetadata::put_on_disk(), it's preferred to let exceptions propagate (using expect/unwrap) rather than trying to recover with fallback mechanisms, as the failure indicates a fundamental system issue that needs immediate attention.
Applied to files:
src/handlers/http/role.rssrc/alerts/mod.rs
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
Applied to files:
src/alerts/mod.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/query/stream_schema_provider.rssrc/storage/object_storage.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
🧬 Code graph analysis (10)
src/handlers/http/oidc.rs (1)
src/parseable/mod.rs (1)
new(178-192)
src/alerts/target.rs (5)
src/correlation.rs (4)
update(134-164)update(236-243)get_object_path(217-219)get_object_id(221-223)src/users/filters.rs (3)
update(124-128)get_object_path(50-57)get_object_id(59-61)src/handlers/http/targets.rs (1)
update(50-76)src/metastore/metastore_traits.rs (2)
get_object_path(152-152)get_object_id(153-153)src/storage/object_storage.rs (1)
target_json_path(1066-1072)
src/metastore/metastores/object_store_metastore.rs (7)
src/catalog/mod.rs (3)
partition_path(528-540)file_name(56-56)file_name(66-68)src/storage/object_storage.rs (15)
alert_json_path(1060-1062)filter_path(1042-1050)manifest_path(1075-1095)parseable_json_path(1054-1056)schema_path(1008-1020)stream_json_path(1023-1038)to_bytes(1002-1006)get_objects(202-206)new(77-86)serde_json(470-470)serde_json(514-514)serde_json(549-549)serde_json(575-575)serde_json(622-622)name(190-190)src/storage/azure_blob.rs (2)
get_objects(456-496)name(161-163)src/storage/gcs.rs (2)
get_objects(390-430)name(122-124)src/storage/localfs.rs (4)
get_objects(195-238)from(560-562)new(100-102)name(72-74)src/storage/s3.rs (4)
get_objects(573-613)from(862-870)from(874-876)name(293-295)src/storage/mod.rs (3)
from(179-185)new(204-206)new(217-223)
src/metastore/metastore_traits.rs (10)
src/metastore/metastores/object_store_metastore.rs (32)
initiate_connection(67-69)get_objects(72-80)get_alerts(83-94)put_alert(97-106)delete_alert(109-115)get_targets(481-500)get_dashboards(118-136)put_dashboard(139-147)delete_dashboard(150-156)get_filters(160-229)put_filter(232-240)delete_filter(243-250)get_correlations(253-271)put_correlation(274-280)delete_correlation(283-290)get_stream_json(295-310)get_all_stream_jsons(313-347)get_all_manifest_files(362-404)get_manifest(407-441)put_manifest(457-467)delete_manifest(469-478)get_manifest_path(444-455)get_all_schemas(522-539)get_schema(541-543)put_schema(545-548)get_parseable_metadata(550-564)get_ingestor_metadata(566-575)put_parseable_metadata(577-585)get_node_metadata(587-602)delete_node_metadata(612-650)put_node_metadata(604-610)list_streams(652-695)src/storage/object_storage.rs (2)
get_objects(202-206)list_streams(220-220)src/alerts/alert_traits.rs (1)
get_targets(70-70)src/users/dashboards.rs (3)
delete_dashboard(294-315)get_object_path(71-79)get_object_id(81-83)src/users/filters.rs (3)
delete_filter(130-133)get_object_path(50-57)get_object_id(59-61)src/alerts/target.rs (2)
get_object_path(336-338)get_object_id(340-342)src/catalog/manifest.rs (2)
get_object_path(94-96)get_object_id(98-100)src/correlation.rs (2)
get_object_path(217-219)get_object_id(221-223)src/storage/store_metadata.rs (2)
get_object_path(109-111)get_object_id(113-115)src/alerts/alert_structs.rs (2)
get_object_path(538-540)get_object_id(534-536)
src/query/mod.rs (1)
src/parseable/mod.rs (2)
storage(282-284)new(178-192)
src/handlers/http/role.rs (1)
src/parseable/mod.rs (1)
new(178-192)
src/alerts/mod.rs (3)
src/parseable/mod.rs (1)
storage(282-284)src/storage/localfs.rs (1)
from(560-562)src/storage/s3.rs (2)
from(862-870)from(874-876)
src/query/stream_schema_provider.rs (2)
src/parseable/mod.rs (4)
storage(282-284)new(178-192)serde_json(338-338)serde_json(344-344)src/metastore/metastores/object_store_metastore.rs (4)
serde_json(182-182)serde_json(219-219)serde_json(400-400)serde_json(627-627)
src/storage/object_storage.rs (9)
src/handlers/http/mod.rs (1)
fetch_schema(90-95)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/localfs.rs (3)
list_with_delimiter(451-461)new(100-102)from(560-562)src/storage/s3.rs (4)
list_with_delimiter(853-858)s(179-179)from(862-870)from(874-876)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)src/parseable/mod.rs (6)
create_stream(647-728)new(178-192)serde_json(338-338)serde_json(344-344)custom_partition(789-789)custom_partition(961-961)src/metastore/metastores/object_store_metastore.rs (4)
serde_json(182-182)serde_json(219-219)serde_json(400-400)serde_json(627-627)src/catalog/mod.rs (1)
update_snapshot(108-129)
src/handlers/http/modal/utils/rbac_utils.rs (1)
src/parseable/mod.rs (1)
new(178-192)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (5)
src/parseable/streams.rs (1)
663-664: Good change: propagate merge failure instead of panicking.Replacing unwrap with error propagation here is correct and aligns with the PR’s error-handling direction.
src/alerts/target.rs (1)
335-343: MetastoreObject impl looks correctPath/id mapping matches target_json_path and ULID stringification.
src/alerts/mod.rs (3)
107-142: Good migration to metastore and idempotent persist on upgrade.The v1→v2 migration persisting via PARSEABLE.metastore.put_alert(...) is clean and idempotent. Nice move off the object store path coupling.
951-953: AlertError integrates MetastoreError correctly and maps to 500.Transparent #[from] and HTTP 500 mapping look right for backend failures.
Also applies to: 980-981
282-313: Schema fetch failure aborts migration; confirm this is desired.Migration currently hard-fails if fetch_schema(stream) errors. If partial salvage is acceptable, log and skip only the offending condition (or the alert), not the entire migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/alerts/target.rs (1)
112-120: Make delete atomic: persist first, then remove from memoryCurrently removes from memory before metastore delete. If the metastore call fails, memory diverges from durable state. Persist first, then mutate memory; also avoid holding any locks across .await.
- let target = self - .target_configs - .write() - .await - .remove(target_id) - .ok_or(AlertError::InvalidTargetID(target_id.to_string()))?; - PARSEABLE.metastore.delete_target(&target).await?; - Ok(target) + // Get a snapshot without taking a write-lock + let target = { + let map = self.target_configs.read().await; + map.get(target_id) + .cloned() + .ok_or(AlertError::InvalidTargetID(target_id.to_string()))? + }; + // Persist first + PARSEABLE.metastore.delete_target(&target).await?; + // Then mutate memory + let removed = self.target_configs.write().await.remove(target_id); + debug_assert!(removed.is_some()); + Ok(target)src/handlers/http/alerts.rs (1)
263-271: DELETE ordering: cancel task before removing from memoryDoc says “disk, scheduled tasks, then memory.” Current order is disk → memory → tasks. If a task reads state during cancellation, deleting memory first can race. Reorder as below.
- PARSEABLE.metastore.delete_alert(&*alert).await?; - - // delete from memory - alerts.delete(alert_id).await?; - - // delete the scheduled task - alerts.delete_task(alert_id).await?; + PARSEABLE.metastore.delete_alert(&*alert).await?; + + // stop scheduled task first + alerts.delete_task(alert_id).await?; + + // then delete from memory + alerts.delete(alert_id).await?;src/storage/object_storage.rs (2)
138-156: Fix potential panic when parsingfilenamefor date.Indexing into
split()results will panic if the staged filename is malformed. Parse defensively and skip per-date metrics when the date segment is absent.- let mut file_date_part = filename.split('.').collect::<Vec<&str>>()[0]; - file_date_part = file_date_part.split('=').collect::<Vec<&str>>()[1]; - let compressed_size = path.metadata().map_or(0, |meta| meta.len()); + let compressed_size = path.metadata().map_or(0, |meta| meta.len()); @@ - EVENTS_STORAGE_SIZE_DATE - .with_label_values(&["data", stream_name, "parquet", file_date_part]) - .add(compressed_size as i64); + if let Some(file_date_part) = filename + .splitn(2, '.') + .next() + .and_then(|s| s.strip_prefix("date=")) + { + EVENTS_STORAGE_SIZE_DATE + .with_label_values(&["data", stream_name, "parquet", file_date_part]) + .add(compressed_size as i64); + } else { + warn!("Failed to parse date from filename: {filename}; skipping per-date metric"); + }
539-589: Don’t swallow metastore errors increate_stream_from_ingestor.
into_iter().next()on theResultignores errors and can mask real issues.- if let Some(stream_metadata_obs) = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) - .await - .into_iter() - .next() - && !stream_metadata_obs.is_empty() - { + let stream_metadata_obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + if !stream_metadata_obs.is_empty() { for stream_metadata_bytes in stream_metadata_obs.iter() { let stream_ob_metadata = serde_json::from_slice::<ObjectStoreFormat>(stream_metadata_bytes)?; all_log_sources.extend(stream_ob_metadata.log_source.clone()); }
♻️ Duplicate comments (9)
src/alerts/target.rs (1)
59-63: Load: good — errors propagated and no lock held across awaitSwitched from unwrap_or_default() to Result and fetch-before-lock pattern. This fixes silent drops and avoids holding the write-lock across .await.
src/handlers/http/alerts.rs (1)
211-219: POST: persist-then-memory ordering is correctWrites to metastore before updating in-memory state. This avoids partial state on failure.
src/metastore/mod.rs (2)
27-36: Ensure chrono’s serde feature is enabledMetastoreErrorDetail.timestamp uses DateTime; needs chrono with “serde”.
Run:
#!/bin/bash # Verify chrono has serde feature in any Cargo.toml rg -n --type toml -C2 'chrono\s*=\s*.*features.*\[\s*".*serde.*"\s*\]'
149-159: status_code() should return 4xx for client JSON errors (align with to_detail and handlers)These variants still map to 500. Return 400 to avoid misclassifying client faults; handlers already defer to e.status_code().
pub fn status_code(&self) -> StatusCode { match self { MetastoreError::ObjectStorageError(..) => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::JsonParseError(..) => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::JsonSchemaError { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::InvalidJsonStructure { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::MissingJsonField { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::InvalidJsonValue { .. } => StatusCode::INTERNAL_SERVER_ERROR, + MetastoreError::JsonParseError(..) => StatusCode::BAD_REQUEST, + MetastoreError::JsonSchemaError { .. } => StatusCode::BAD_REQUEST, + MetastoreError::InvalidJsonStructure { .. } => StatusCode::BAD_REQUEST, + MetastoreError::MissingJsonField { .. } => StatusCode::BAD_REQUEST, + MetastoreError::InvalidJsonValue { .. } => StatusCode::BAD_REQUEST, MetastoreError::Error { status_code, .. } => *status_code, } }src/metastore/metastores/object_store_metastore.rs (4)
652-694: Exclude non-stream top-level folders (parity with LocalFS)Hide
lost+foundand similar non-stream dirs.- .filter(|name| { + .filter(|name| { name != PARSEABLE_ROOT_DIRECTORY && name != USERS_ROOT_DIR && name != SETTINGS_ROOT_DIRECTORY - && name != ALERTS_ROOT_DIRECTORY + && name != ALERTS_ROOT_DIRECTORY + && name != "lost+found" })
67-69: Don’t leaveunimplemented!()on a production pathReturn Ok(()) to avoid accidental panics via the trait.
- async fn initiate_connection(&self) -> Result<(), MetastoreError> { - unimplemented!() - } + async fn initiate_connection(&self) -> Result<(), MetastoreError> { + Ok(()) + }
171-216: Filter migration deletes entire directory; delete specific files
delete_object(&filters_path)targets the directory, not the migrated file. List objects with names, migrate, then delete per-object.- // read filter object - let filter_bytes = self - .storage - .get_objects( - Some(&filters_path), - Box::new(|file_name| file_name.ends_with(".json")), - ) - .await?; - - for filter in filter_bytes { - // deserialize into Value - let mut filter_value = serde_json::from_slice::<serde_json::Value>(&filter)?; + // list objects with names so we can delete v1 files specifically + let resp = self + .storage + .list_with_delimiter(Some(object_store::path::Path::from(filters_path.to_string()))) + .await?; + for obj in resp.objects { + let obj_path = RelativePathBuf::from(obj.location.to_string()); + let bytes = self.storage.get_object(&obj_path).await?; + let mut filter_value = serde_json::from_slice::<serde_json::Value>(&bytes)?; @@ - if version == Some("v1") { - // delete older version of the filter - self.storage.delete_object(&filters_path).await?; + if version == Some("v1") { + // delete the legacy v1 file only + self.storage.delete_object(&obj_path).await?; @@ - let filter_bytes = to_bytes(&filter_value); - self.storage.put_object(&path, filter_bytes.clone()).await?; + let filter_bytes = to_bytes(&filter_value); + self.storage.put_object(&path, filter_bytes).await?;
521-539: Don’t panic on malformed schemas; warn and skipReplace
unwrap_or_else(...panic!)withfilter_map+ warn to keep the server resilient.- .iter() - // we should be able to unwrap as we know the data is valid schema - .map(|byte_obj| { - serde_json::from_slice(byte_obj) - .unwrap_or_else(|_| panic!("got an invalid schema for stream: {stream_name}")) - }) - .collect()) + .iter() + .filter_map(|bytes| { + serde_json::from_slice::<Schema>(bytes) + .inspect_err(|err| warn!("Expected compatible schema for {stream_name}, error = {err}")) + .ok() + }) + .collect())src/storage/object_storage.rs (1)
616-626: Propagate metastore errors when reading stream JSONs.Current code ignores
Err(e)fromget_all_stream_jsons, potentially hiding real failures. This mirrors a prior review; apply the same fix here.- let stream_metas = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(stream_metas) = stream_metas { - for stream_meta in stream_metas.iter() { + let stream_metas = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for stream_meta in stream_metas.iter() { if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { // fetch unique log sources and their fields all_log_sources.extend(stream_meta.log_source.clone()); - } } - } + }
🧹 Nitpick comments (7)
src/alerts/target.rs (1)
68-73: Update path is now atomic w.r.t. persistencePersisting to the metastore before mutating memory prevents divergence on failure. One micro-nit: avoid the extra clone.
Apply:
- let mut map = self.target_configs.write().await; - map.insert(target.id, target.clone()); + let mut map = self.target_configs.write().await; + map.insert(target.id, target);src/metastore/metastores/object_store_metastore.rs (2)
214-216: Remove unnecessary clone
to_bytesreturns a new Bytes; no need to clone before put.- let filter_bytes = to_bytes(&filter_value); - self.storage.put_object(&path, filter_bytes.clone()).await?; + let filter_bytes = to_bytes(&filter_value); + self.storage.put_object(&path, filter_bytes).await?;
384-389: Avoidunwrap()on filenameDefensive parse to avoid panics on unexpected object keys.
- .filter(|name| name.location.filename().unwrap().ends_with("manifest.json")) + .filter_map(|name| { + name.location + .filename() + .filter(|f| f.ends_with("manifest.json")) + }) + .map(|_| ()) // keep the collector shape below; adjust as neededNote: adjust the collector to carry the path alongside the filter if needed.
src/query/stream_schema_provider.rs (1)
521-536: Merging multiple stream.json snapshotsThe merge loop is fine; consider logging a warn on JSON parse failures to aid debugging.
- if let Ok(object_store_format) = - serde_json::from_slice::<ObjectStoreFormat>(&ob) - { + if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { let snapshot = object_store_format.snapshot; for manifest in snapshot.manifest_list { merged_snapshot.manifest_list.push(manifest); } + } else { + tracing::warn!("Invalid stream.json encountered for stream={}", self.stream); }src/metastore/metastore_traits.rs (2)
151-154: Make MetastoreObject Send + SyncTrait objects may cross threads; require Send to avoid surprises.
-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {Note: update all impls accordingly.
124-129: Consider returning Result<Option, _> forget_schemaSome streams may not have a schema; surfacing absence avoids mapping to storage-layer errors.
src/storage/object_storage.rs (1)
283-295: Avoid extra clones increate_stream.
let s = &*schema.clone(); … put_schema(s.clone(), …)performs two clones. Useschema.as_ref().clone()once.- let s = &*schema.clone(); - PARSEABLE - .metastore - .put_schema(s.clone(), stream_name) + PARSEABLE + .metastore + .put_schema(schema.as_ref().clone(), stream_name) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
src/alerts/mod.rs(8 hunks)src/alerts/target.rs(4 hunks)src/handlers/http/alerts.rs(3 hunks)src/handlers/http/modal/utils/rbac_utils.rs(1 hunks)src/handlers/http/oidc.rs(1 hunks)src/handlers/http/role.rs(1 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/metastore/mod.rs(1 hunks)src/parseable/streams.rs(1 hunks)src/query/mod.rs(3 hunks)src/query/stream_schema_provider.rs(5 hunks)src/storage/object_storage.rs(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- src/query/mod.rs
- src/parseable/streams.rs
- src/handlers/http/role.rs
- src/handlers/http/oidc.rs
- src/handlers/http/modal/utils/rbac_utils.rs
- src/alerts/mod.rs
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-06T04:26:17.168Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/metastore/mod.rssrc/metastore/metastore_traits.rssrc/storage/object_storage.rssrc/alerts/target.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/metastore/metastores/object_store_metastore.rssrc/query/stream_schema_provider.rssrc/storage/object_storage.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
🧬 Code graph analysis (6)
src/metastore/mod.rs (5)
src/parseable/mod.rs (2)
storage(282-284)new(178-192)src/handlers/http/query.rs (1)
status_code(587-593)src/handlers/http/ingest.rs (1)
status_code(484-516)src/handlers/http/logstream.rs (1)
status_code(578-613)src/prism/home/mod.rs (1)
status_code(484-494)
src/metastore/metastores/object_store_metastore.rs (8)
src/catalog/mod.rs (3)
partition_path(528-540)file_name(56-56)file_name(66-68)src/parseable/mod.rs (4)
storage(282-284)new(178-192)serde_json(338-338)serde_json(344-344)src/storage/object_storage.rs (16)
alert_json_path(1060-1062)filter_path(1042-1050)manifest_path(1075-1095)parseable_json_path(1054-1056)schema_path(1008-1020)stream_json_path(1023-1038)to_bytes(1002-1006)get_objects(202-206)new(77-86)serde_json(470-470)serde_json(514-514)serde_json(549-549)serde_json(575-575)serde_json(622-622)name(190-190)list_streams(220-220)src/metastore/metastore_traits.rs (5)
initiate_connection(38-38)get_objects(39-39)get_stream_json(71-75)get_all_stream_jsons(86-90)list_streams(144-144)src/storage/azure_blob.rs (3)
get_objects(456-496)name(161-163)list_streams(575-580)src/storage/gcs.rs (3)
get_objects(390-430)name(122-124)list_streams(509-514)src/storage/localfs.rs (5)
get_objects(195-238)from(560-562)new(100-102)name(72-74)list_streams(290-310)src/storage/s3.rs (5)
get_objects(573-613)from(862-870)from(874-876)name(293-295)list_streams(692-697)
src/metastore/metastore_traits.rs (10)
src/handlers/http/modal/mod.rs (6)
node_type(569-569)node_type(582-584)domain_name(567-567)domain_name(574-576)get_object_path(277-279)get_object_id(281-283)src/metastore/metastores/object_store_metastore.rs (34)
get_objects(72-80)get_alerts(83-94)put_alert(97-106)delete_alert(109-115)get_targets(481-500)put_target(502-510)delete_target(512-520)get_dashboards(118-136)put_dashboard(139-147)delete_dashboard(150-156)get_filters(160-229)put_filter(232-240)delete_filter(243-250)get_correlations(253-271)put_correlation(274-280)delete_correlation(283-290)get_stream_json(295-310)put_stream_json(350-359)get_all_stream_jsons(313-347)get_all_manifest_files(362-404)get_manifest(407-441)put_manifest(457-467)delete_manifest(469-478)get_manifest_path(444-455)get_all_schemas(522-539)get_schema(541-543)put_schema(545-548)get_parseable_metadata(550-564)get_ingestor_metadata(566-575)put_parseable_metadata(577-585)get_node_metadata(587-602)delete_node_metadata(612-650)put_node_metadata(604-610)list_streams(652-695)src/storage/object_storage.rs (2)
get_objects(202-206)list_streams(220-220)src/alerts/alert_types.rs (3)
get_targets(293-295)get_object_path(71-73)get_object_id(75-77)src/users/dashboards.rs (3)
delete_dashboard(294-315)get_object_path(71-79)get_object_id(81-83)src/users/filters.rs (3)
delete_filter(130-133)get_object_path(50-57)get_object_id(59-61)src/alerts/target.rs (2)
get_object_path(336-338)get_object_id(340-342)src/correlation.rs (2)
get_object_path(217-219)get_object_id(221-223)src/catalog/manifest.rs (2)
get_object_path(94-96)get_object_id(98-100)src/storage/store_metadata.rs (2)
get_object_path(109-111)get_object_id(113-115)
src/query/stream_schema_provider.rs (1)
src/parseable/mod.rs (2)
storage(282-284)new(178-192)
src/storage/object_storage.rs (10)
src/handlers/http/mod.rs (1)
fetch_schema(90-95)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/localfs.rs (3)
list_with_delimiter(451-461)new(100-102)from(560-562)src/storage/s3.rs (4)
list_with_delimiter(853-858)s(179-179)from(862-870)from(874-876)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)src/parseable/mod.rs (6)
create_stream(647-728)new(178-192)serde_json(338-338)serde_json(344-344)custom_partition(789-789)custom_partition(961-961)src/metastore/metastores/object_store_metastore.rs (4)
serde_json(182-182)serde_json(219-219)serde_json(400-400)serde_json(627-627)src/handlers/http/logstream.rs (1)
put_retention(213-238)src/catalog/mod.rs (1)
update_snapshot(108-129)
src/alerts/target.rs (3)
src/handlers/http/targets.rs (1)
update(50-76)src/metastore/metastore_traits.rs (2)
get_object_path(152-152)get_object_id(153-153)src/storage/object_storage.rs (1)
target_json_path(1066-1072)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (11)
src/alerts/target.rs (1)
335-343: MetastoreObject impl: path/id wiring looks correctULID as object_id and object-store JSON path are consistent with existing layout.
src/handlers/http/alerts.rs (1)
466-485: PUT/modify: persist first, then replace state and task — LGTMFlow is safe and minimizes inconsistent windows. Good separation of I/O vs. in-memory ops.
src/metastore/mod.rs (1)
91-146: to_detail(): 4xx mapping for client JSON errors — LGTMJSON parse/validation variants correctly emit status_code 400 in the detail payload.
src/query/stream_schema_provider.rs (3)
407-413: collect_from_snapshot: signature update looks goodAccepts
stream_nameand removes object-store dependency from the helper.
416-437: Avoid panic on missing manifests: good changeReplaced
expectwith Option handling and warning. This will prevent query crashes during concurrent compaction.
497-505: Metastore-backed stream.json ingestion is correctserde path + error mapping to
DataFusionError::Planis appropriate here.src/metastore/metastore_traits.rs (1)
32-45: Trait surface is coherent and purpose-builtCentralized APIs per resource are clear; removal of overly-generic object getters is an improvement.
src/storage/object_storage.rs (4)
272-276: New trait methodlist_with_delimiterlooks good.Signature aligns with backend adapters (S3/GCS/Azure) and LocalFS intentionally returns Unsupported. No issues.
596-609: LGTM oncreate_schema_from_metastore.Clean handoff to
fetch_schemaand consistent error mapping toObjectStorageError::MetastoreError(Box::new(e.to_detail())).
918-920: LGTM on snapshot update path.Early-exit on empty manifests is clear and avoids unnecessary work.
667-712: Partition key formats are consistent across all storage backends. Verified thatlist_datesreturnsdate=YYYY-MM-DDentries and thatlist_hours/list_minutesin S3, localfs, GCS, and Azure Blob filter onhour=HH/minute=MMprefixes without trailing slashes, matching the assumptions inget_first_and_latest_event_from_storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
src/metastore/metastores/object_store_metastore.rs (5)
447-454: Handle absolute manifest URLs explicitlyRelativePathBuf::from("s3://...") will not work. Either normalize or reject absolute URLs.
Apply:
- let path = match manifest_url { - Some(url) => RelativePathBuf::from(url), + let path = match manifest_url { + Some(url) if url.starts_with("s3://") + || url.starts_with("gs://") + || url.starts_with("azblob://") + || url.starts_with("http://") + || url.starts_with("https://") => + { + return Err(MetastoreError::Error { + status_code: StatusCode::BAD_REQUEST, + message: "Absolute manifest URLs are not supported by object-store metastore".into(), + flow: "get_manifest".into(), + }); + } + Some(url) => RelativePathBuf::from(url), None => { let path = partition_path(stream_name, lower_bound, upper_bound); manifest_path(path.as_str()) } };To confirm assumptions about
manifest_urlvalues across the repo, run:#!/bin/bash set -e rg -nP -C2 '\bmanifest_url\b' | sed -n '1,200p'
704-709: Exclude non-stream top-level folders (lost+found)Aligns with LocalFS behavior to prevent surfacing non-stream dirs.
Apply:
.filter(|name| { name != PARSEABLE_ROOT_DIRECTORY && name != USERS_ROOT_DIR && name != SETTINGS_ROOT_DIRECTORY - && name != ALERTS_ROOT_DIRECTORY + && name != ALERTS_ROOT_DIRECTORY + && name != "lost+found" })
66-69: Avoid panic: replace unimplemented!() with a no-op Ok(())This trait method can be invoked; unimplemented!() will crash the server at runtime.
Apply:
- async fn initiate_connection(&self) -> Result<(), MetastoreError> { - unimplemented!() - } + async fn initiate_connection(&self) -> Result<(), MetastoreError> { + Ok(()) + }
201-249: Migration bug: deletes entire filters directory; also panics on malformed JSON
- Deleting
filters_pathnukes the whole dir; delete the specific legacy file you migrated.- Multiple
.as_object().unwrap()will panic on bad data.- Avoid cloning
filter_bytesunnecessarily.Apply:
- // read filter object - let filter_bytes = self - .storage - .get_objects( - Some(&filters_path), - Box::new(|file_name| file_name.ends_with(".json")), - ) - .await?; - - for filter in filter_bytes { - // deserialize into Value - let mut filter_value = serde_json::from_slice::<serde_json::Value>(&filter)?; + // list objects with names so we can migrate + delete specific v1 files + let resp = self + .storage + .list_with_delimiter(Some(object_store::path::Path::from(filters_path.to_string()))) + .await?; + for obj in resp.objects { + let obj_path = RelativePathBuf::from(obj.location.to_string()); + let bytes = self.storage.get_object(&obj_path).await?; + let mut filter_value = serde_json::from_slice::<serde_json::Value>(&bytes)?; if let Some(meta) = filter_value.clone().as_object() { let version = meta.get("version").and_then(|version| version.as_str()); if version == Some("v1") { - // delete older version of the filter - self.storage.delete_object(&filters_path).await?; + // delete the specific legacy file + self.storage.delete_object(&obj_path).await?; filter_value = migrate_v1_v2(filter_value); - let user_id = filter_value - .as_object() - .unwrap() - .get("user_id") - .and_then(|user_id| user_id.as_str()); - let filter_id = filter_value - .as_object() - .unwrap() - .get("filter_id") - .and_then(|filter_id| filter_id.as_str()); - let stream_name = filter_value - .as_object() - .unwrap() - .get("stream_name") - .and_then(|stream_name| stream_name.as_str()); + let (user_id, stream_name, filter_id) = match filter_value.as_object() { + Some(obj) => ( + obj.get("user_id").and_then(|v| v.as_str()), + obj.get("stream_name").and_then(|v| v.as_str()), + obj.get("filter_id").and_then(|v| v.as_str()), + ), + None => (None, None, None), + }; // if these values are present, create a new file if let (Some(user_id), Some(stream_name), Some(filter_id)) = (user_id, stream_name, filter_id) { let path = filter_path(user_id, stream_name, &format!("{filter_id}.json")); - let filter_bytes = to_bytes(&filter_value); - self.storage.put_object(&path, filter_bytes.clone()).await?; + let filter_bytes = to_bytes(&filter_value); + self.storage.put_object(&path, filter_bytes).await?; } } if let Ok(filter) = serde_json::from_value::<Filter>(filter_value) { this.retain(|f: &Filter| f.filter_id != filter.filter_id); this.push(filter); } } }
562-571: Don’t panic on malformed schemas; warn and skipAvoid crashing the server due to a single bad file; mirror the targets pattern.
Apply:
- .iter() - // we should be able to unwrap as we know the data is valid schema - .map(|byte_obj| { - serde_json::from_slice(byte_obj) - .unwrap_or_else(|_| panic!("got an invalid schema for stream: {stream_name}")) - }) - .collect()) + .iter() + .filter_map(|bytes| { + serde_json::from_slice::<Schema>(bytes) + .inspect_err(|err| warn!("Expected compatible schema, stream={stream_name}, error={err}")) + .ok() + }) + .collect())
🧹 Nitpick comments (4)
src/metastore/metastores/object_store_metastore.rs (2)
562-563: Tighten schema filter to avoid false positives
.contains(".schema")can match unintended names; use an exact suffix.Apply:
- Box::new(|file_name: String| file_name.contains(".schema")), + Box::new(|file_name: String| file_name.ends_with("schema.json")),
364-369: Minor: correct flow stringThe flow label says “get_all_streams”; should be “get_all_stream_jsons”.
Apply:
- flow: "get_all_streams with mode".into(), + flow: "get_all_stream_jsons with mode".into(),src/metastore/metastore_traits.rs (2)
156-159: Make MetastoreObject SendObjects are passed across async boundaries; requiring Send avoids surprise trait-object limitations.
Apply:
-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {
97-109: Clarify contract: manifest_url must be a relative keyTrait docs should state that absolute URLs aren’t supported by the object-store metastore (or define normalization semantics). This prevents backend mismatches.
Proposed doc addition (above get_manifest):
/// manifest +/// Note: `manifest_url`, when provided, must be a relative object key within the configured +/// object store namespace (e.g., "<stream>/date=.../hostname.manifest.json"). Absolute URLs +/// such as "s3://..." are not supported by the object-store metastore. async fn get_all_manifest_files(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/alerts/alert_types.rs(5 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/alerts/alert_types.rs
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-06T04:26:17.168Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/metastore/metastores/object_store_metastore.rssrc/metastore/metastore_traits.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/metastore/metastores/object_store_metastore.rs
🧬 Code graph analysis (2)
src/metastore/metastores/object_store_metastore.rs (3)
src/catalog/mod.rs (3)
partition_path(528-540)file_name(56-56)file_name(66-68)src/storage/object_storage.rs (16)
alert_json_path(1060-1062)filter_path(1042-1050)manifest_path(1075-1095)parseable_json_path(1054-1056)schema_path(1008-1020)stream_json_path(1023-1038)to_bytes(1002-1006)get_objects(202-206)new(77-86)serde_json(470-470)serde_json(514-514)serde_json(549-549)serde_json(575-575)serde_json(622-622)name(190-190)list_streams(220-220)src/storage/localfs.rs (5)
get_objects(195-238)from(560-562)new(100-102)name(72-74)list_streams(290-310)
src/metastore/metastore_traits.rs (9)
src/handlers/http/modal/mod.rs (7)
fmt(259-261)node_type(569-569)node_type(582-584)domain_name(567-567)domain_name(574-576)get_object_path(277-279)get_object_id(281-283)src/metastore/metastores/object_store_metastore.rs (35)
initiate_connection(67-69)get_objects(72-80)get_alerts(83-94)put_alert(97-106)delete_alert(109-115)get_llmconfigs(118-129)put_llmconfig(132-139)delete_llmconfig(142-148)get_dashboards(151-169)put_dashboard(172-180)delete_dashboard(183-189)get_filters(193-262)put_filter(265-273)delete_filter(276-283)get_correlations(286-304)put_correlation(307-313)delete_correlation(316-323)get_stream_json(328-343)put_stream_json(383-392)get_all_stream_jsons(346-380)get_all_manifest_files(395-437)get_manifest(440-474)put_manifest(490-500)delete_manifest(502-511)get_manifest_path(477-488)get_all_schemas(555-572)get_schema(574-576)put_schema(578-581)get_parseable_metadata(583-597)get_ingestor_metadata(599-608)put_parseable_metadata(610-618)get_node_metadata(620-635)delete_node_metadata(645-683)put_node_metadata(637-643)list_streams(685-728)src/storage/object_storage.rs (2)
get_objects(202-206)list_streams(220-220)src/alerts/alert_types.rs (2)
get_object_path(71-73)get_object_id(75-77)src/users/dashboards.rs (3)
delete_dashboard(294-315)get_object_path(71-79)get_object_id(81-83)src/users/filters.rs (3)
delete_filter(130-133)get_object_path(50-57)get_object_id(59-61)src/alerts/target.rs (2)
get_object_path(336-338)get_object_id(340-342)src/correlation.rs (2)
get_object_path(217-219)get_object_id(221-223)src/storage/store_metadata.rs (2)
get_object_path(109-111)get_object_id(113-115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (1)
src/metastore/metastores/object_store_metastore.rs (1)
96-105: LGTM: ULID parsing now returns 400 instead of panickingGood fix; avoids unwrap() and maps invalid IDs to BAD_REQUEST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
src/metastore/metastores/object_store_metastore.rs (6)
99-107: Return a stable validation message for bad ULIDs.Avoid leaking parser internals; use a clear client-facing message.
- let id = Ulid::from_string(&obj.get_object_id()).map_err(|e| MetastoreError::Error { + let id = Ulid::from_string(&obj.get_object_id()).map_err(|_| MetastoreError::Error { status_code: StatusCode::BAD_REQUEST, - message: e.to_string(), + message: "Invalid alert id".into(), flow: "put_alert".into(), })?;
754-759: Exclude non-stream top-level folders (e.g.,lost+found).Aligns with LocalFS and avoids surfacing system dirs as streams.
.filter(|name| { name != PARSEABLE_ROOT_DIRECTORY && name != USERS_ROOT_DIR && name != SETTINGS_ROOT_DIRECTORY - && name != ALERTS_ROOT_DIRECTORY + && name != ALERTS_ROOT_DIRECTORY + && name != "lost+found" })
65-70: Don’t leaveunimplemented!()in trait impl; return Ok(()).Prevents accidental panics if invoked via trait.
- async fn initiate_connection(&self) -> Result<(), MetastoreError> { - unimplemented!() - } + async fn initiate_connection(&self) -> Result<(), MetastoreError> { + Ok(()) + }
253-299: Filter migration deletes entire directory and can panic on bad JSON. Migrate per-file and remove unwraps.Delete the specific legacy file you migrated, not the whole folder; also remove
.unwrap()s and avoid cloning bytes.- // read filter object - let filter_bytes = self - .storage - .get_objects( - Some(&filters_path), - Box::new(|file_name| file_name.ends_with(".json")), - ) - .await?; - - for filter in filter_bytes { - // deserialize into Value - let mut filter_value = serde_json::from_slice::<serde_json::Value>(&filter)?; + // list with names so we can migrate and delete specific v1 files + let resp = self + .storage + .list_with_delimiter(Some(object_store::path::Path::from(filters_path.to_string()))) + .await?; + for obj in resp.objects { + let obj_path = RelativePathBuf::from(obj.location.to_string()); + let bytes = self.storage.get_object(&obj_path).await?; + let mut filter_value = serde_json::from_slice::<serde_json::Value>(&bytes)?; @@ - if version == Some("v1") { - // delete older version of the filter - self.storage.delete_object(&filters_path).await?; - - filter_value = migrate_v1_v2(filter_value); - let user_id = filter_value - .as_object() - .unwrap() - .get("user_id") - .and_then(|user_id| user_id.as_str()); - let filter_id = filter_value - .as_object() - .unwrap() - .get("filter_id") - .and_then(|filter_id| filter_id.as_str()); - let stream_name = filter_value - .as_object() - .unwrap() - .get("stream_name") - .and_then(|stream_name| stream_name.as_str()); + if version == Some("v1") { + // delete the specific legacy file + self.storage.delete_object(&obj_path).await?; + + filter_value = migrate_v1_v2(filter_value); + let (user_id, stream_name, filter_id) = match filter_value.as_object() { + Some(obj) => ( + obj.get("user_id").and_then(|v| v.as_str()), + obj.get("stream_name").and_then(|v| v.as_str()), + obj.get("filter_id").and_then(|v| v.as_str()), + ), + None => (None, None, None), + }; @@ - let filter_bytes = to_bytes(&filter_value); - self.storage.put_object(&path, filter_bytes.clone()).await?; + let filter_bytes = to_bytes(&filter_value); + self.storage.put_object(&path, filter_bytes).await?; } }
496-503: Reject absolute manifest URLs;RelativePathBuf::fromwill not handle schemes.Return 400 for
s3://,gs://, etc., or normalize upstream.- let path = match manifest_url { - Some(url) => RelativePathBuf::from(url), + let path = match manifest_url { + Some(url) if url.contains("://") => { + return Err(MetastoreError::Error { + status_code: StatusCode::BAD_REQUEST, + message: "Absolute manifest URLs are not supported by object-store metastore".into(), + flow: "get_manifest".into(), + }); + } + Some(url) => RelativePathBuf::from(url), None => { let path = partition_path(stream_name, lower_bound, upper_bound); manifest_path(path.as_str()) } };
615-620: Don’t panic on malformed schemas; warn and skip.Keeps server resilient to stray/corrupt files.
- // we should be able to unwrap as we know the data is valid schema - .map(|byte_obj| { - serde_json::from_slice(byte_obj) - .unwrap_or_else(|_| panic!("got an invalid schema for stream: {stream_name}")) - }) + .filter_map(|bytes| { + serde_json::from_slice::<Schema>(bytes) + .inspect_err(|err| warn!("Expected compatible schema for {stream_name}, error = {err}")) + .ok() + })
🧹 Nitpick comments (6)
Cargo.toml (2)
61-61: Avoid exact pin; allow patch updates forerased-serde.Use a non-pinned requirement to pick up compatible fixes automatically.
-erased-serde = "=0.3.16" +erased-serde = "0.3.16"
39-39: Unifyurlcrate version with build-dependency (2.5).Reduces duplicate deps and compile time.
-url = "2.4.0" +url = "2.5"src/metastore/metastores/object_store_metastore.rs (1)
72-74: Typo in docstring (“mutiple”).Minor cleanup.
- /// Fetch mutiple .json objects + /// Fetch multiple .json objectssrc/metastore/metastore_traits.rs (3)
162-165: AddSendbound toMetastoreObject.These objects cross async boundaries;
Sendprevents runtime surprises.-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {
63-66: Decouple API fromDashMapin return type.Prefer
HashMap<String, Vec<Bytes>>; callers can wrap inDashMapif needed.- async fn get_chats(&self) -> Result<DashMap<String, Vec<Bytes>>, MetastoreError>; + async fn get_chats(&self) -> Result<std::collections::HashMap<String, Vec<Bytes>>, MetastoreError>;
40-41: Clarifyget_objectssemantics.Document that
parent_pathis a relative prefix and only.jsonfiles are returned.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
Cargo.toml(2 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-06T04:26:17.168Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/metastore/metastore_traits.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (1)
Cargo.toml (1)
39-39: Verify new deps and versions exist/are resolvable.Quick check against crates.io for
dashmap 6.1.0,erased-serde 0.3.16, andtokio ^1.43.#!/bin/bash set -euo pipefail check_ver() { local crate="$1" ver="$2" curl -fsSL "https://crates.io/api/v1/crates/${crate}" | jq -r --arg v "$ver" ' .versions[] | select(.num==$v) | .num' | grep -qx "$ver" \ && echo "OK: ${crate} ${ver} exists" || { echo "MISSING: ${crate} ${ver} not on crates.io"; exit 1; } } check_ver dashmap 6.1.0 check_ver erased-serde 0.3.16 # Using ^1.43 — ensure at least 1.43.0 exists check_ver tokio 1.43.0 || trueAlso applies to: 61-61, 70-75, 145-145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (18)
src/handlers/http/modal/mod.rs (3)
341-353: Avoid panic on disk write; propagate error instead.Replace expect with proper error context to prevent hard crashes if staging is unwritable.
- meta.put_on_disk(staging_path) - .expect("Couldn't write updated metadata to disk"); + meta.put_on_disk(staging_path) + .context("write updated metadata to disk")?;
356-363: Same here: don’t panic on new metadata write.- meta.put_on_disk(staging_path) - .expect("Couldn't write new metadata to disk"); + meta.put_on_disk(staging_path) + .context("write new metadata to disk")?;
586-588: Bug: infinite recursion risk in Metadata::file_path.self.file_path() here resolves to the same trait method; use the inherent method explicitly.
- fn file_path(&self) -> RelativePathBuf { - self.file_path() - } + fn file_path(&self) -> RelativePathBuf { + NodeMetadata::file_path(self) + }src/handlers/http/cluster/mod.rs (1)
495-522: Don’t panic on malformed stream.json; handle gracefully.Metastore may return a bad/legacy blob; avoid expect to keep stats endpoint resilient.
- for ob in obs { - let stream_metadata: ObjectStoreFormat = - serde_json::from_slice(&ob).expect("stream.json is valid json"); + for ob in obs { + let stream_metadata: ObjectStoreFormat = match serde_json::from_slice(&ob) { + Ok(v) => v, + Err(e) => { + warn!("Skipping invalid stream.json from metastore: {e:?}"); + continue; + } + };src/correlation.rs (1)
119-131: Don’t trust user_id from request; set it from the authenticated session.If user_id comes from the client body, a user can write under another user’s namespace. Derive and set correlation.user_id from session_key (or upstream handler) before persisting.
Would you like me to wire this here or at the HTTP layer? To locate the create caller and confirm how user_id is sourced, run:
#!/bin/bash # Find correlation create callers and user_id handling fd -t f -S -0 'correlation' src | xargs -0 -I{} rg -n -C3 'create\s*\(' {} rg -n -C3 'user_id' src | rg -n -C2 'correlation'src/handlers/http/alerts.rs (3)
324-330: update_notification_state doesn’t persist the change.State changes are in-memory only; they’ll be lost on restart. Persist the updated config.
alerts .update_notification_state(alert_id, new_notification_state) .await?; - let alert = alerts.get_alert_by_id(alert_id).await?; + let alert = alerts.get_alert_by_id(alert_id).await?; + // persist updated state + PARSEABLE + .metastore + .put_alert(&alert.to_alert_config()) + .await?; Ok(web::Json(alert.to_alert_config().to_response()))
354-360: disable_alert doesn’t persist the new state.Persist the updated alert after mutating state.
alerts .update_state(alert_id, AlertState::Disabled, Some("".into())) .await?; - let alert = alerts.get_alert_by_id(alert_id).await?; + let alert = alerts.get_alert_by_id(alert_id).await?; + PARSEABLE + .metastore + .put_alert(&alert.to_alert_config()) + .await?; Ok(web::Json(alert.to_alert_config().to_response()))
392-398: enable_alert doesn’t persist the new state.Same issue as disable; persist after update.
alerts .update_state(alert_id, AlertState::NotTriggered, Some("".into())) .await?; - let alert = alerts.get_alert_by_id(alert_id).await?; + let alert = alerts.get_alert_by_id(alert_id).await?; + PARSEABLE + .metastore + .put_alert(&alert.to_alert_config()) + .await?; Ok(web::Json(alert.to_alert_config().to_response()))src/alerts/target.rs (3)
112-119: Make delete atomic w.r.t. persistenceCurrently memory is updated before durable delete; on failure, memory diverges. Persist first, then update memory (or reinsert on failure).
- let target = self - .target_configs - .write() - .await - .remove(target_id) - .ok_or(AlertError::InvalidTargetID(target_id.to_string()))?; - PARSEABLE.metastore.delete_target(&target).await?; - Ok(target) + // 1) Read target without holding a write lock + let target = { + let map = self.target_configs.read().await; + map.get(target_id) + .cloned() + .ok_or(AlertError::InvalidTargetID(target_id.to_string()))? + }; + // 2) Persist delete + PARSEABLE.metastore.delete_target(&target).await?; + // 3) Update memory + let _ = self.target_configs.write().await.remove(target_id); + Ok(target)
311-313: Bug: potential underflow leads to huge retry looptimes - 1 will underflow when times == 0, creating 0..usize::MAX. Guard or use saturating_sub.
- for _ in 0..(times - 1) { + for _ in 0..times.saturating_sub(1) {Also consider validating times >= 1 at parse time and defaulting to 1 if 0 is provided.
484-490: Avoid panic on invalid headerstry_into().expect("valid_headers") can panic on user-provided header names/values. Fail gracefully and log instead.
- let request = client - .post(self.endpoint.clone()) - .headers((&self.headers).try_into().expect("valid_headers")); + let headers: HeaderMap = match (&self.headers).try_into() { + Ok(h) => h, + Err(e) => { + error!("Invalid headers for webhook target: {e}"); + HeaderMap::new() + } + }; + let request = client.post(self.endpoint.clone()).headers(headers);src/hottier.rs (1)
364-366: Fix off-by-one: allow exact-fit downloads without cleanupUsing <= forces cleanup even when available_size equals file_size. Use < to only clean when strictly insufficient.
- if !self.is_disk_available(parquet_file.file_size).await? - || stream_hot_tier.available_size <= parquet_file.file_size + if !self.is_disk_available(parquet_file.file_size).await? + || stream_hot_tier.available_size < parquet_file.file_size {src/migration/mod.rs (1)
237-253: Differentiate NotFound from real errors when falling back.Swallowing all metastore errors and silently falling back to querier/ingestor can mask genuine failures (network/transient). Prefer fallback only on a NotFound-equivalent; otherwise propagate the error.
Example sketch:
match PARSEABLE.metastore.get_stream_json(stream, false).await { Ok(bytes) => Ok(bytes), Err(e) if e.is_not_found() => { /* fallback to querier/ingestor */ } Err(e) => return Err(e.into()), }src/catalog/mod.rs (1)
483-499: Bug: snapshot persisted to object storage, not metastore.You load and mutate snapshot via metastore JSON but write back through storage.put_snapshot, bypassing the metastore and leaving JSON stale/inconsistent when using non‑object‑store metastores.
- storage.put_snapshot(stream_name, meta.snapshot).await?; + // Persist the full updated stream JSON via metastore to keep metadata consistent. + PARSEABLE + .metastore + .put_stream_json(&meta, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?;Run to find any other lingering direct snapshot writes:
#!/bin/bash rg -n 'put_snapshot\s*\(' -C2 --type=rustsrc/handlers/http/logstream.rs (2)
91-95: Don’t unwrap metastore result in list(); propagate MetastoreError.
unwrap()will panic on metastore failures. You already haveStreamError::MetastoreError(#[from]), so just use?.- .list_streams() - .await - .unwrap() + .list_streams() + .await?
452-481: Hot tier delete path leaves metadata and in-memory flag enabled.
put_stream_hot_tiersetshot_tier_enabled = trueand toggles the in-memory flag.delete_stream_hot_tiershould symmetrically set the in-memory flag to false and persisthot_tier_enabled = falsein metastore.hot_tier_manager.delete_hot_tier(&stream_name).await?; - Ok(( + // reflect the change in memory + PARSEABLE.get_stream(&stream_name)?.set_hot_tier(false); + + // reflect the change in metastore + let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice( + &PARSEABLE + .metastore + .get_stream_json(&stream_name, false) + .await?, + )?; + stream_metadata.hot_tier_enabled = false; + PARSEABLE + .metastore + .put_stream_json(&stream_metadata, &stream_name) + .await?; + + Ok(( format!("hot tier deleted for stream {stream_name}"), StatusCode::OK, ))src/storage/object_storage.rs (2)
456-489: Don’t treat all metastore errors as missing; avoid expect() panics.
- Any metastore failure (network, 5xx) currently falls back as if missing, potentially masking real issues and overwriting data.
- Two
expect("parseable config is valid json")calls can crash the process on corrupt metadata; return a typed error instead.- let stream_metadata = match PARSEABLE - .metastore - .get_stream_json(stream_name, false) - .await - { - Ok(data) => data, - Err(_) => { + let stream_metadata = match PARSEABLE.metastore.get_stream_json(stream_name, false).await { + Ok(data) => data, + Err(e) => { + if e.status_code() != http::StatusCode::NOT_FOUND { + return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))); + } // get the base stream metadata let bytes = PARSEABLE .metastore .get_stream_json(stream_name, true) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - let mut config = serde_json::from_slice::<ObjectStoreFormat>(&bytes) - .expect("parseable config is valid json"); + let mut config = serde_json::from_slice::<ObjectStoreFormat>(&bytes) + .map_err(|e| ObjectStorageError::Custom(format!("invalid base stream json: {e}")))?; if PARSEABLE.options.mode == Mode::Ingest { config.stats = FullStats::default(); config.snapshot.manifest_list = vec![]; } PARSEABLE .metastore .put_stream_json(&config, stream_name) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - bytes + serde_json::to_vec(&config)?.into() } }; - Ok(serde_json::from_slice(&stream_metadata).expect("parseable config is valid json")) + Ok(serde_json::from_slice(&stream_metadata) + .map_err(|e| ObjectStorageError::Custom(format!("invalid stream json: {e}")))?)Additionally import http::StatusCode at module top if not already in scope:
use http::StatusCode;
539-547: Don’t silently ignore metastore errors in create_stream_from_ingestor.Using
.await.into_iter().next()turns Err into “no data”. Propagate the error as storage failure.- if let Some(stream_metadata_obs) = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) - .await - .into_iter() - .next() - && !stream_metadata_obs.is_empty() + let stream_metadata_obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + if !stream_metadata_obs.is_empty()
♻️ Duplicate comments (8)
src/correlation.rs (1)
148-161: Update flow overwrites new data with old; validate wrong object.Calling updated_correlation.update(correlation) clobbers the client's fields with the stored record, and you validate the old record instead of the new/merged one. Merge “new over old” and validate accordingly.
- correlation.validate(session_key).await?; - updated_correlation.update(correlation); - - // Update in metastore - PARSEABLE - .metastore - .put_correlation(&updated_correlation) - .await?; - - // Update in memory - self.write().await.insert( - updated_correlation.id.to_owned(), - updated_correlation.clone(), - ); - - Ok(updated_correlation) + // validate the incoming payload (or validate the merged object below) + updated_correlation.validate(session_key).await?; + + // merge: new fields over existing, preserve id/user_id + let mut merged = correlation.clone(); + merged.update(updated_correlation); + + // persist + PARSEABLE.metastore.put_correlation(&merged).await?; + + // update in memory + self.write() + .await + .insert(merged.id.to_owned(), merged.clone()); + + Ok(merged)src/migration/mod.rs (1)
107-110: Keep in-memory parseable_json consistent in the default branch.You write remote metadata but don’t update the in-memory copy, diverging runtime state from the metastore. This mirrors a prior comment; please update the local buffer too.
- _ => { - let metadata = metadata_migration::remove_querier_metadata(storage_metadata); - put_remote_metadata(metadata).await?; - } + _ => { + let metadata = metadata_migration::remove_querier_metadata(storage_metadata); + let _metadata: Bytes = serde_json::to_vec(&metadata)?.into(); + *parseable_json = Some(_metadata); + put_remote_metadata(metadata).await?; + }src/catalog/mod.rs (2)
300-361: Brittle manifest detection; rely on metastore presence instead.Using contains(manifest_path("")) to decide update vs create is not metastore‑agnostic and risks duplicates. Query the metastore for the specific bounds/path; update if found, else create. This echoes a prior comment; keeping as a nudge for the follow‑up PR.
- let manifest_file_name = manifest_path("").to_string(); - let should_update = manifests[pos].manifest_path.contains(&manifest_file_name); - - if should_update { - if let Some(mut manifest) = PARSEABLE + if let Some(mut manifest) = PARSEABLE .metastore .get_manifest( stream_name, manifests[pos].time_lower_bound, manifests[pos].time_upper_bound, Some(manifests[pos].manifest_path.clone()), ) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? { // Update existing manifest … PARSEABLE .metastore .put_manifest( &manifest, stream_name, manifests[pos].time_lower_bound, manifests[pos].time_upper_bound, ) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; manifests[pos].events_ingested = events_ingested; manifests[pos].ngestion_size = ingestion_size; manifests[pos].storage_size = storage_size; Ok(None) - } else { - // Create new manifest for different partition + } else { create_manifest( partition_lower, partition_changes, stream_name, false, - ObjectStoreFormat::default(), + meta.clone(), events_ingested, ingestion_size, storage_size, ) .await }
116-129: Add optimistic concurrency for metastore JSON updates.Read-modify-write of stream JSON lacks ETag/revision checks; concurrent writers can clobber each other. Use a revision (If‑Match) on put_stream_json and retry on conflict.
Sketch:
// get returns (bytes, rev) let (bytes, rev) = PARSEABLE.metastore.get_stream_json_with_rev(stream_name, false).await?; let mut meta: ObjectStoreFormat = serde_json::from_slice(&bytes)?; loop { // mutate meta … match PARSEABLE.metastore.put_stream_json_if_match(&meta, stream_name, &rev).await { Ok(_) => break, Err(e) if e.is_conflict() => { let (bytes, new_rev) = PARSEABLE.metastore.get_stream_json_with_rev(stream_name, false).await?; meta = serde_json::from_slice(&bytes)?; // reapply and retry (bounded) } Err(e) => return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), } }Also applies to: 377-381
src/handlers/http/role.rs (1)
145-150: Map serde error and prefer NotFound over Custom for missing metadata.
- Propagate a typed 404 (if available) instead of Custom when metadata is absent.
- Explicitly map serde_json::Error into ObjectStorageError to avoid relying on a possibly missing From impl.
.await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? - .ok_or_else(|| ObjectStorageError::Custom("parseable metadata not initialized".into()))?; - Ok(serde_json::from_slice::<StorageMetadata>(&metadata)?) + .ok_or_else(|| ObjectStorageError::NotFound("parseable metadata not initialized".into()))?; + let metadata: StorageMetadata = serde_json::from_slice(&metadata) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; + Ok(metadata)src/storage/object_storage.rs (2)
616-627: Propagate metastore errors in get_log_source_from_storage (not ignore).Current code drops Err and continues. Return a typed error so callers can react.
- let stream_metas = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(stream_metas) = stream_metas { - for stream_meta in stream_metas.iter() { - if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { - // fetch unique log sources and their fields - all_log_sources.extend(stream_meta.log_source.clone()); - } - } - } + let stream_metas = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for stream_meta in stream_metas.iter() { + if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { + all_log_sources.extend(stream_meta.log_source.clone()); + } + }
982-999: Avoid unwrap on Schema::try_merge; handle NotFound and merge failures.
unwrap()can crash the ingestor on incompatible schemas.- For brand-new streams, treat NotFound as “no existing schema” and just persist the provided one.
- let stream_schema = PARSEABLE - .metastore - .get_schema(stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - - let new_schema = Schema::try_merge(vec![ - schema, - serde_json::from_slice::<Schema>(&stream_schema)?, - ]) - .unwrap(); - - PARSEABLE - .metastore - .put_schema(new_schema, stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))) + match PARSEABLE.metastore.get_schema(stream_name).await { + Ok(bytes) => { + let existing: Schema = serde_json::from_slice(&bytes)?; + let merged = Schema::try_merge(vec![schema, existing]) + .map_err(|e| ObjectStorageError::Custom(format!("schema merge failed: {e}")))?; + PARSEABLE + .metastore + .put_schema(merged, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))) + } + Err(e) if e.status_code() == http::StatusCode::NOT_FOUND => PARSEABLE + .metastore + .put_schema(schema, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), + Err(e) => Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), + }src/storage/azure_blob.rs (1)
575-580: Don’t hard-error list_streams(); delegate or migrate callers before merge.Returning Err here will break existing call-sites (migrations, HTTP handlers, UI) under Azure. Either provide a temporary compatibility shim or update all callers in this PR. Prior review already flagged this.
Option A (temporary shim):
@@ - async fn list_streams(&self) -> Result<HashSet<LogStream>, ObjectStorageError> { - // self._list_streams().await - Err(ObjectStorageError::Custom( - "Azure Blob Store doesn't implement list_streams".into(), - )) - } + async fn list_streams(&self) -> Result<HashSet<LogStream>, ObjectStorageError> { + // Deprecated: callers should move to Metastore. + // Temporary back-compat to avoid breaking Azure users. + tracing::warn!("Azure Blob list_streams() is deprecated; delegating to list_old_streams()"); + self.list_old_streams().await + }Option B (no shim): update every
.list_streams()caller to use Metastore orlist_old_streams()/list_with_delimiter()and handle Result properly (no unwraps). I can help generate the changes.
🧹 Nitpick comments (24)
src/handlers/http/modal/mod.rs (1)
365-382: Improve metastore read error handling and minor perf.
- Log the underlying error; current log loses the cause.
- Prefer match over if-let to retain the Err(e).
- Pre-allocate the vec.
- async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { - let obs = PARSEABLE.metastore.get_node_metadata(node_type).await; - - let mut metadata = vec![]; - if let Ok(obs) = obs { - for object in obs { - //convert to NodeMetadata - match serde_json::from_slice::<NodeMetadata>(&object) { - Ok(node_metadata) => metadata.push(node_metadata), - Err(e) => error!("Failed to deserialize NodeMetadata: {:?}", e), - } - } - } else { - error!("Couldn't read from storage"); - } - // Return the metadata - metadata - } + async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { + match PARSEABLE.metastore.get_node_metadata(node_type).await { + Ok(bytes_list) => { + let mut metadata = Vec::with_capacity(bytes_list.len()); + for object in bytes_list { + match serde_json::from_slice::<NodeMetadata>(&object) { + Ok(node_metadata) => metadata.push(node_metadata), + Err(e) => error!("Failed to deserialize NodeMetadata: {e:?}"), + } + } + metadata + } + Err(e) => { + error!("Couldn't read node metadata from metastore: {e:?}"); + Vec::new() + } + } + }src/handlers/http/cluster/mod.rs (2)
783-798: LGTM: get_node_info now reads via metastore; add context on failure.Tiny improvement: enrich errors with node_type for easier ops debugging.
- let metadata = PARSEABLE - .metastore - .get_node_metadata(node_type) - .await? + let metadata = PARSEABLE + .metastore + .get_node_metadata(node_type.clone()) + .await + .with_context(|| format!("get_node_metadata failed for node_type={node_type:?}"))?
814-836: Delete metadata for all node types concurrently.Saves network round-trips; simplifies result handling.
- // Delete ingestor metadata - let removed_ingestor = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Ingestor) - .await?; - - // Delete indexer metadata - let removed_indexer = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Indexer) - .await?; - - // Delete querier metadata - let removed_querier = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Querier) - .await?; - - // Delete prism metadata - let removed_prism = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Prism) - .await?; + let (ri, rx, rq, rp) = future::join4( + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Ingestor), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Indexer), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Querier), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Prism), + ).await; + let removed_ingestor = ri?; + let removed_indexer = rx?; + let removed_querier = rq?; + let removed_prism = rp?;src/correlation.rs (3)
59-75: Reduce lock contention in load(): parse before taking write lock.You hold the write lock while deserializing all items. Build a temporary Vec and acquire the lock only for inserts.
- let mut guard = self.write().await; - - for correlations_bytes in all_correlations { - let correlation = match serde_json::from_slice::<CorrelationConfig>(&correlations_bytes) - { - Ok(c) => c, - Err(e) => { - error!("Unable to load correlation file : {e}"); - continue; - } - }; - - guard.insert(correlation.id.to_owned(), correlation); - } + let mut parsed: Vec<CorrelationConfig> = Vec::new(); + for correlations_bytes in all_correlations { + match serde_json::from_slice::<CorrelationConfig>(&correlations_bytes) { + Ok(c) => parsed.push(c), + Err(e) => { + error!("Unable to load correlation object: {e}"); + continue; + } + }; + } + let mut guard = self.write().await; + for correlation in parsed { + guard.insert(correlation.id.to_owned(), correlation); + }
339-349: Map Unauthorized to 403 Forbidden (or 401 if unauthenticated), not 400.400 implies client syntax error; permission failures should be 403.
- Self::Unauthorized => StatusCode::BAD_REQUEST, + Self::Unauthorized => StatusCode::FORBIDDEN,
66-66: Nit: log message still says “file”.We’re not reading files anymore; s/file/object/.
- error!("Unable to load correlation file : {e}"); + error!("Unable to load correlation object: {e}");src/users/dashboards.rs (2)
70-84: Avoid unwrap() in MetastoreObject impl; panic risk pre-metadata.get_object_path/get_object_id will panic if author/dashboard_id are None. Guard with expect() (and optional debug asserts) to fail fast with a clear message.
impl MetastoreObject for Dashboard { fn get_object_path(&self) -> String { - RelativePathBuf::from_iter([ - USERS_ROOT_DIR, - self.author.as_ref().unwrap(), - DASHBOARDS_DIR, - &format!("{}.json", self.dashboard_id.unwrap()), - ]) + debug_assert!(self.author.is_some(), "Dashboard.author must be set before persisting"); + debug_assert!(self.dashboard_id.is_some(), "Dashboard.dashboard_id must be set before persisting"); + RelativePathBuf::from_iter([ + USERS_ROOT_DIR, + self.author.as_ref().expect("Dashboard.author must be set before persisting"), + DASHBOARDS_DIR, + &format!("{}.json", self.dashboard_id.expect("Dashboard.dashboard_id must be set before persisting")), + ]) .to_string() } fn get_object_id(&self) -> String { - self.dashboard_id.unwrap().to_string() + self.dashboard_id + .expect("Dashboard.dashboard_id must be set before persisting") + .to_string() } }
197-201: Minor: O(n^2) retain-per-insert during load.Retaining on every push scales poorly. Consider collecting first, then dedup once by dashboard_id before appending.
src/alerts/target.rs (3)
69-73: Atomic update order is correct; drop unnecessary clonePersisting to the metastore before mutating memory avoids divergence. Also, you can avoid cloning the Target.
- let mut map = self.target_configs.write().await; - map.insert(target.id, target.clone()); + let mut map = self.target_configs.write().await; + map.insert(target.id, target);
60-63: Optional: clear map before load to avoid stale entriesIf load() can be called more than once, clear the in-memory map before inserting to prevent stale targets.
- let mut map = self.target_configs.write().await; + let mut map = self.target_configs.write().await; + map.clear(); for target in targets { map.insert(target.id, target); }
335-343: Path helper couplingMetastoreObject::get_object_path uses storage::object_storage::target_json_path, coupling the trait to object-store path layout. If you foresee non-object-store metastores, consider centralizing path generation under a metastore-focused module (e.g., metastore::paths) to decouple from the storage backend.
src/alerts/mod.rs (2)
980-981: Optional: return structured metastore error detailsWhen Self::MetastoreError(_), you can serialize and return MetastoreError::to_detail() for richer diagnostics instead of plain text.
Example inside error_response():
if let Self::MetastoreError(e) = self { let detail = e.to_detail(); return actix_web::HttpResponse::build(self.status_code()) .insert_header(ContentType::json()) .body(serde_json::to_string(&detail).unwrap_or_else(|_| self.to_string())); }
896-901: Optional: replace Result sum() with try_fold for claritySumming Results can be ambiguous; try_fold makes error propagation explicit.
- plan.inputs() - .iter() - .map(|input| _get_number_of_agg_exprs(input)) - .sum() + plan.inputs().iter().try_fold(0usize, |acc, input| { + _get_number_of_agg_exprs(input).map(|n| acc + n) + })src/prism/home/mod.rs (3)
338-349: Propagate MetastoreError instead of erasing it to AnyhowYou’re dropping structured error info and status codes by mapping metastore errors to anyhow. Let it bubble up via PrismHomeError::MetastoreError (already added).
- let stream_titles: Vec<String> = PARSEABLE - .metastore - .list_streams() - .await - .map_err(|e| PrismHomeError::Anyhow(anyhow::Error::new(e)))? + let stream_titles: Vec<String> = PARSEABLE + .metastore + .list_streams() + .await? .into_iter()
229-254: Avoid relying on first entry for dataset_type/time_partitionstream_jsons[0] is ordering-sensitive. Consider selecting the latest/active format deterministically (e.g., by snapshot version or manifest recency) before reading telemetry_type/time_partition.
306-334: Normalize query to lowercase onceMinor improvement: compute lowercase query once and reuse to avoid repeated allocations and to ensure consistent case-insensitive matching across resources.
pub async fn generate_home_search_response( key: &SessionKey, query_value: &str, ) -> Result<HomeSearchResponse, PrismHomeError> { let mut resources = Vec::new(); + let q = query_value.to_lowercase(); let (alert_titles, correlation_titles, dashboard_titles, filter_titles, stream_titles) = tokio::join!( - get_alert_titles(key, query_value), - get_correlation_titles(key, query_value), - get_dashboard_titles(query_value), - get_filter_titles(key, query_value), + get_alert_titles(key, &q), + get_correlation_titles(key, &q), + get_dashboard_titles(&q), + get_filter_titles(key, &q), get_stream_titles(key) ); @@ - for title in stream_titles { - if title.to_lowercase().contains(query_value) { + for title in stream_titles { + if title.to_lowercase().contains(&q) {Also applies to: 324-331
src/hottier.rs (2)
381-383: Don’t unwrap parent(); handle root-path edge caseAvoid potential panic if file_path is malformed.
- fs::create_dir_all(parquet_path.parent().unwrap()).await?; - let mut file = fs::File::create(parquet_path.clone()).await?; + let Some(parent) = parquet_path.parent() else { + return Err(HotTierError::Anyhow(anyhow::anyhow!( + "invalid parquet path: {}", + parquet_path.display() + ))); + }; + fs::create_dir_all(parent).await?; + let mut file = fs::File::create(&parquet_path).await?;
308-312: Micro-opt: avoid cloning the whole Vec on each extendUse iter().cloned() or extend_from_slice for slightly less churn.
- for storage_manifest in manifest_files { - storage_combined_manifest - .files - .extend(storage_manifest.files.clone()); - } + for storage_manifest in manifest_files { + storage_combined_manifest + .files + .extend(storage_manifest.files.iter().cloned()); + }src/query/mod.rs (2)
525-541: Avoid swallowing metastore errors; or at least log themThe current if let Ok(obs) = obs silently ignores failures. Prefer propagating or logging to aid debugging.
- let obs = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(obs) = obs { + match PARSEABLE.metastore.get_all_stream_jsons(stream_name, None).await { + Ok(obs) => { for ob in obs { if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { let snapshot = object_store_format.snapshot; for manifest in snapshot.manifest_list { merged_snapshot.manifest_list.push(manifest); } } } - } + } + Err(e) => { + tracing::warn!("get_all_stream_jsons failed for {stream_name}: {e}"); + } + }
513-541: Optional: Skip single-stream fetch in Query/Prism modeYou fetch stream_json and then immediately replace snapshot using get_all_stream_jsons in Query/Prism. Save a roundtrip by fetching only what you use.
- let object_store_format: ObjectStoreFormat = serde_json::from_slice( - &PARSEABLE - .metastore - .get_stream_json(stream_name, false) - .await?, - )?; - - // all the manifests will go here - let mut merged_snapshot: Snapshot = Snapshot::default(); - - // get a list of manifests - if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { + // all the manifests will go here + let mut merged_snapshot: Snapshot = Snapshot::default(); + if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { let obs = PARSEABLE .metastore .get_all_stream_jsons(stream_name, None) .await; if let Ok(obs) = obs { for ob in obs { if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { let snapshot = object_store_format.snapshot; for manifest in snapshot.manifest_list { merged_snapshot.manifest_list.push(manifest); } } } } } else { - merged_snapshot = object_store_format.snapshot; + let bytes = PARSEABLE + .metastore + .get_stream_json(stream_name, false) + .await?; + let object_store_format: ObjectStoreFormat = serde_json::from_slice(&bytes) + .map_err(|e| QueryError::CustomError(format!("invalid stream.json for {stream_name}: {e}")))?; + merged_snapshot = object_store_format.snapshot; }src/migration/mod.rs (2)
272-282: Ensure atomicity/consistency of stream_json + schema writes.If put_stream_json succeeds and put_schema fails (or vice‑versa), the metastore may enter an inconsistent state. Consider a transactional API, a two‑phase write with idempotent retries and rollback, or storing schema within the same CAS-protected write.
Also applies to: 289-299
377-381: Potential duplicate schema write.You already persist schema during versioned migrations (v1/v2). This unconditional write here may be redundant. If intentional, add a brief comment; otherwise gate by version/dirty flag.
src/storage/object_storage.rs (1)
304-317: Pattern is consistent; consider factoring get_mutate_put helper.These repeated “get_stream_json → mutate → put_stream_json” blocks are correct. A small helper would DRY this and centralize error mapping.
Also applies to: 326-339, 348-361, 392-405, 414-429, 436-450
src/storage/azure_blob.rs (1)
672-679: Delete commented-out legacy list_manifest_files block.Dead commented code adds noise and confuses future maintenance.
- // async fn list_manifest_files( - // &self, - // stream_name: &str, - // ) -> Result<BTreeMap<String, Vec<String>>, ObjectStorageError> { - // let files = self._list_manifest_files(stream_name).await?; - - // Ok(files) - // } + // (removed: legacy list_manifest_files)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (45)
Cargo.toml(2 hunks)src/alerts/alert_structs.rs(2 hunks)src/alerts/alert_traits.rs(2 hunks)src/alerts/alert_types.rs(5 hunks)src/alerts/mod.rs(8 hunks)src/alerts/target.rs(4 hunks)src/catalog/manifest.rs(2 hunks)src/catalog/mod.rs(6 hunks)src/correlation.rs(8 hunks)src/enterprise/utils.rs(5 hunks)src/handlers/http/alerts.rs(3 hunks)src/handlers/http/cluster/mod.rs(4 hunks)src/handlers/http/ingest.rs(2 hunks)src/handlers/http/logstream.rs(6 hunks)src/handlers/http/mod.rs(2 hunks)src/handlers/http/modal/ingest_server.rs(2 hunks)src/handlers/http/modal/mod.rs(5 hunks)src/handlers/http/modal/query/querier_logstream.rs(2 hunks)src/handlers/http/modal/utils/rbac_utils.rs(1 hunks)src/handlers/http/oidc.rs(1 hunks)src/handlers/http/query.rs(2 hunks)src/handlers/http/role.rs(1 hunks)src/handlers/http/users/dashboards.rs(3 hunks)src/handlers/http/users/filters.rs(6 hunks)src/hottier.rs(3 hunks)src/lib.rs(1 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/mod.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/metastore/mod.rs(1 hunks)src/migration/mod.rs(11 hunks)src/parseable/mod.rs(7 hunks)src/parseable/streams.rs(1 hunks)src/prism/home/mod.rs(5 hunks)src/query/mod.rs(3 hunks)src/query/stream_schema_provider.rs(5 hunks)src/storage/azure_blob.rs(6 hunks)src/storage/gcs.rs(6 hunks)src/storage/localfs.rs(4 hunks)src/storage/mod.rs(3 hunks)src/storage/object_storage.rs(15 hunks)src/storage/s3.rs(7 hunks)src/storage/store_metadata.rs(3 hunks)src/users/dashboards.rs(7 hunks)src/users/filters.rs(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (27)
- src/catalog/manifest.rs
- src/alerts/alert_structs.rs
- src/handlers/http/oidc.rs
- src/handlers/http/ingest.rs
- src/parseable/streams.rs
- src/metastore/metastores/mod.rs
- src/handlers/http/mod.rs
- src/metastore/metastore_traits.rs
- src/handlers/http/users/filters.rs
- src/alerts/alert_traits.rs
- src/lib.rs
- src/handlers/http/users/dashboards.rs
- Cargo.toml
- src/handlers/http/modal/query/querier_logstream.rs
- src/handlers/http/query.rs
- src/handlers/http/modal/ingest_server.rs
- src/metastore/mod.rs
- src/storage/mod.rs
- src/handlers/http/modal/utils/rbac_utils.rs
- src/storage/localfs.rs
- src/metastore/metastores/object_store_metastore.rs
- src/parseable/mod.rs
- src/users/filters.rs
- src/storage/store_metadata.rs
- src/alerts/alert_types.rs
- src/enterprise/utils.rs
- src/query/stream_schema_provider.rs
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-09-06T04:26:17.168Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/handlers/http/role.rssrc/handlers/http/logstream.rssrc/correlation.rssrc/storage/object_storage.rssrc/alerts/mod.rssrc/alerts/target.rssrc/prism/home/mod.rs
📚 Learning: 2025-04-07T13:23:10.092Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1288
File: src/handlers/http/modal/mod.rs:279-301
Timestamp: 2025-04-07T13:23:10.092Z
Learning: For critical operations like writing metadata to disk in NodeMetadata::put_on_disk(), it's preferred to let exceptions propagate (using expect/unwrap) rather than trying to recover with fallback mechanisms, as the failure indicates a fundamental system issue that needs immediate attention.
Applied to files:
src/handlers/http/role.rssrc/alerts/mod.rs
📚 Learning: 2025-06-18T06:39:04.775Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Applied to files:
src/query/mod.rssrc/correlation.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/handlers/http/logstream.rssrc/migration/mod.rssrc/storage/object_storage.rssrc/storage/gcs.rssrc/storage/azure_blob.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/migration/mod.rssrc/handlers/http/cluster/mod.rssrc/storage/object_storage.rssrc/catalog/mod.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/migration/mod.rssrc/storage/object_storage.rs
📚 Learning: 2025-06-16T09:50:38.636Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
Applied to files:
src/alerts/mod.rs
📚 Learning: 2025-05-01T10:27:56.858Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Applied to files:
src/users/dashboards.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: coverage
🔇 Additional comments (31)
src/handlers/http/modal/mod.rs (3)
40-40: LGTM: pulling in MetastoreObject is correct here.
276-284: LGTM: MetastoreObject impl aligns path/id with existing file layout.
312-338: Minor: verify NodeType::All is never passed to load_from_storage.If All is used, you’d mix node types and the domain/port match below may select an unintended node.
src/handlers/http/cluster/mod.rs (1)
47-47: LGTM: Mode import required for metastore API.src/correlation.rs (1)
352-363: Structured metastore errors are correctly surfaced.Good use of e.status_code() and JSON body via e.to_detail(), aligned with the new MetastoreErrorDetail contract.
src/users/dashboards.rs (1)
218-219: Metastore-first persistence for dashboards looks good.save_dashboard delegating to PARSEABLE.metastore.put_dashboard(dashboard) aligns with the new model.
src/handlers/http/alerts.rs (2)
211-219: POST now persists first; resolves prior non-atomicity.Writing to the metastore before mutating in-memory state is the right order.
263-263: DELETE persists first; good.Deletion from metastore before memory/task teardown avoids divergence on failures.
src/alerts/target.rs (1)
59-63: Good: no lock held across awaitYou fetch from the metastore before taking the write lock. This prevents await while holding the lock.
src/alerts/mod.rs (4)
951-953: Error wiring: MetastoreError passthroughAdding AlertError::MetastoreError with #[error(transparent)] is correct; callers can use ? on metastore ops.
107-142: Migration flow: metastore write on successful conversionPersisting the migrated v2 alert via PARSEABLE.metastore.put_alert() is correct and keeps storage consistent.
1017-1023: LGTM: v1 detection and migration triggerThe version checks and fallbacks to migrate_from_v1() cover both explicit v1 and missing-version cases.
Also applies to: 1037-1044
1278-1284: Verify persistence responsibility for deletionsAlerts::delete() only removes from memory. Confirm that HTTP handlers (or another layer) invoke PARSEABLE.metastore.delete_alert(...) so durable state is removed too; otherwise alerts will resurrect on restart.
#!/bin/bash # Find durable delete call sites rg -n -C2 'delete_alert\s*\(' src # Confirm who calls Alerts::delete and whether it pairs with metastore deletion rg -n -C3 'fn\s+delete\s*\(' src/alerts | catsrc/prism/home/mod.rs (2)
497-505: Good: JSON error body for MetastoreError using to_detail()Returning a structured JSON payload for metastore failures aligns with the new error model and preserves status codes.
485-493: Good: status code delegation to MetastoreErrorUsing e.status_code() keeps HTTP semantics consistent with metastore operations.
src/hottier.rs (2)
276-285: Good: wrap metastore failures as ObjectStorageError::MetastoreError with detailThis preserves structured error info (operation, stream, etc.) per the metastore error model.
292-301: Signature change SGTMSwitching to BTreeMap<String, Vec> matches the metastore shape and simplifies downstream processing.
src/query/mod.rs (3)
549-567: Nice: graceful handling when manifests are missingReplacing expect(...) with an Option-to-Result conversion prevents panics in the query path.
543-548: Verify time-bound inclusivity matches storage semanticsBoth bounds are Included; Data paths elsewhere often use [start, end) for queries. Confirm we won’t over-fetch manifests that start exactly at end.
#!/bin/bash # Inspect manifest selection code and semantics for time bounds. rg -nP "manifests\(\s*&\s*\[.*PartialTimeFilter::Low.*PartialTimeFilter::High" -n -C2 src | cat rg -nP "struct\s+ManifestListItem|time_lower_bound|time_upper_bound" -n -C3 src | cat
1-1: Check QueryError supports CustomError and serde_json conversionsIf CustomError isn’t available or From<serde_json::Error> exists already, adjust the mapping accordingly.
#!/bin/bash rg -nP 'enum\s+QueryError' -C3 src | cat rg -nP 'CustomError' -n -C2 src | cat rg -nP 'impl\s+From<\s*serde_json::Error\s*>\s+for\s+QueryError' -n src | catsrc/migration/mod.rs (1)
435-442: Good: centralized persistence via metastore.Converting JSON to StorageMetadata and persisting through PARSEABLE.metastore keeps the write path consistent with the new abstraction.
src/catalog/mod.rs (1)
434-447: Good: manifest creation path fully metastore-backed.put_manifest + get_manifest_path + recording the URL keeps the flow backend-agnostic.
src/handlers/http/logstream.rs (1)
573-575: Good: MetastoreError is first-class and returned as structured JSON.The new variant and ResponseError mapping (status + to_detail JSON) look solid and consistent with the metastore design.
Also applies to: 611-626
src/storage/object_storage.rs (1)
272-276: New ObjectStorage::list_with_delimiter requires full backend coverage.Confirm every ObjectStorage implementation (S3, GCS, Azure, LocalFS) implements this. Otherwise trait impl will fail or default callers may panic.
#!/bin/bash # Verify all backends implement the new trait method rg -nP 'impl\s+ObjectStorage\s+for\s+' -C1 src/storage rg -nP 'fn\s+list_with_delimiter\s*\(' -C2 src/storagesrc/storage/gcs.rs (2)
657-662: LGTM: GCS list_with_delimiter delegates correctly.Bridging to the underlying client is clean and consistent with the trait.
509-514: Returning Err for list_streams may break callers—verify and gate or implement minimal listing.
- If any non-metastore paths still call
list_streams(), this will fail at runtime (and we saw an unwrap in logstream before fix).- Either implement a minimal listing (via
list_with_delimiter+STREAM_METADATA_FILE_NAME) or guard all call sites to avoid usinglist_streamson GCS.#!/bin/bash # Find direct call sites to `list_streams` outside storage provider impls rg -nP --type=rust -g '!src/storage/**' '\.list_streams\s*\(' -C2src/storage/s3.rs (2)
853-858: LGTM: S3 list_with_delimiter is implemented via the client.Matches the trait and other backends.
692-697: Same concern as GCS: list_streams returns Err—confirm no remaining callers.If any path still relies on S3
list_streams(), you'll return an error. Either implement a minimal version or gate usage behind metastore.#!/bin/bash # Find direct call sites to `list_streams` outside storage provider impls rg -nP --type=rust -g '!src/storage/**' '\.list_streams\s*\(' -C2src/storage/azure_blob.rs (3)
20-20: LGTM: HashSet import is appropriate for set-based listings.
37-42: LGTM: ListResult and Azure client imports align with delimiter-based listing.
55-58: LGTM: Super-imports updated correctly (metrics layer, stream metadata, path helpers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/handlers/http/alerts.rs (3)
324-329: Persist updated notification state to metastoreThis PATCH updates only memory; on restart the state will be lost. Persist the updated config.
alerts .update_notification_state(alert_id, new_notification_state) .await?; let alert = alerts.get_alert_by_id(alert_id).await?; + + // persist updated notification state + PARSEABLE + .metastore + .put_alert(&alert.to_alert_config()) + .await?; Ok(web::Json(alert.to_alert_config().to_response()))
354-360: Persist disable state to metastoreDisabled state isn’t persisted; it will revert after a restart. Save after computing the new config.
alerts .update_state(alert_id, AlertState::Disabled, Some("".into())) .await?; let alert = alerts.get_alert_by_id(alert_id).await?; + + PARSEABLE + .metastore + .put_alert(&alert.to_alert_config()) + .await?; Ok(web::Json(alert.to_alert_config().to_response()))
392-398: Persist enable state to metastoreEnabled state changes also need persistence to survive restarts.
alerts .update_state(alert_id, AlertState::NotTriggered, Some("".into())) .await?; let alert = alerts.get_alert_by_id(alert_id).await?; + + PARSEABLE + .metastore + .put_alert(&alert.to_alert_config()) + .await?; Ok(web::Json(alert.to_alert_config().to_response()))src/handlers/http/logstream.rs (1)
89-95: Don’t unwrap metastore.list_streams(); propagate as StreamError.This can panic at runtime. Use
?to leverage the new MetastoreError conversion.- let res = PARSEABLE - .metastore - .list_streams() - .await - .unwrap() + let res = PARSEABLE + .metastore + .list_streams() + .await?src/storage/object_storage.rs (1)
299-361: RMW on stream.json without CAS; high risk of lost updates.update_time_partition_limit_in_stream, update_custom_partition_in_stream, update_log_source_in_stream, put_stats, put_retention all read JSON and write it back without revisions. Multiple ingestors/actors will overwrite fields. Introduce optimistic concurrency (expected revision/If-Match) on put_stream_json, and retry on conflict.
Also applies to: 409-429, 431-450
♻️ Duplicate comments (13)
src/handlers/http/oidc.rs (1)
445-453: Map serde_json error into ObjectStorageError;?won’t compile here
get_metadata()returnsResult<StorageMetadata, ObjectStorageError>, butserde_json::from_sliceyieldsserde_json::Error. Convert it explicitly.async fn get_metadata() -> Result<crate::storage::StorageMetadata, ObjectStorageError> { let metadata = PARSEABLE .metastore .get_parseable_metadata() .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? .ok_or_else(|| ObjectStorageError::Custom("parseable metadata not initialized".into()))?; - Ok(serde_json::from_slice::<StorageMetadata>(&metadata)?) + let metadata: StorageMetadata = serde_json::from_slice(&metadata) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; + Ok(metadata) }src/storage/store_metadata.rs (1)
108-116: Implementget_object_idto avoid runtime panicThis will panic if invoked. Return a stable ID for metastore keying (the file name is sufficient).
impl MetastoreObject for StorageMetadata { fn get_object_path(&self) -> String { parseable_json_path().to_string() } fn get_object_id(&self) -> String { - unimplemented!() + PARSEABLE_METADATA_FILE_NAME.to_string() } }src/handlers/http/modal/utils/rbac_utils.rs (1)
31-31: Map serde_json errors to ObjectStorageError instead of bubbling serde::Error.Use a concrete ObjectStorageError variant for deserialization failures.
Apply:
- Ok(serde_json::from_slice::<StorageMetadata>(&metadata)?) + let metadata: StorageMetadata = serde_json::from_slice(&metadata) + .map_err(|e| ObjectStorageError::Invalid(anyhow::anyhow!(e)))?; + Ok(metadata)If
Invalidisn’t available, replace with your closest serialization/parse error variant.src/enterprise/utils.rs (1)
95-109: Remove expect on missing manifest; return a structured error.
expect("Data is invalid for Manifest")can crash query paths. ConvertOptionto a descriptiveObjectStorageError.- .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? - .expect("Data is invalid for Manifest"), + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? + .ok_or_else(|| ObjectStorageError::Custom(format!( + "Manifest not found for {} [{} - {}]", + stream, manifest_item.time_lower_bound, manifest_item.time_upper_bound + )))?,Optional follow-up: batch these requests with
FuturesUnorderedto reduce tail latency.src/parseable/mod.rs (1)
258-264: Startup gating bug:.is_ok()conflates “empty” with “error”.Treat empty stream list as false and propagate metastore errors.
- let has_streams = PARSEABLE.metastore.list_streams().await.is_ok(); + let has_streams = match PARSEABLE.metastore.list_streams().await { + Ok(streams) => !streams.is_empty(), + Err(e) => return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), + };src/metastore/mod.rs (2)
27-36: Ensurechronois compiled withserdefor timestamp serialization.
MetastoreErrorDetail.timestamp: DateTime<Utc>needschrono = { features = ["serde"] }.#!/bin/bash # Verify chrono serde feature is enabled across all Cargo.toml files rg -nP '\bchrono\b' -g '*Cargo.toml' -C2 rg -nP 'chrono\s*=\s*{[^}]*features\s*=\s*\[[^]]*serde' -g '*Cargo.toml' -C2
149-159: Align status_code() with to_detail(): return 400 for client JSON errors.Current mapping returns 500; this will confuse clients and handlers.
pub fn status_code(&self) -> StatusCode { match self { MetastoreError::ObjectStorageError(..) => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::JsonParseError(..) => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::JsonSchemaError { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::InvalidJsonStructure { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::MissingJsonField { .. } => StatusCode::INTERNAL_SERVER_ERROR, - MetastoreError::InvalidJsonValue { .. } => StatusCode::INTERNAL_SERVER_ERROR, + MetastoreError::JsonParseError(..) => StatusCode::BAD_REQUEST, + MetastoreError::JsonSchemaError { .. } => StatusCode::BAD_REQUEST, + MetastoreError::InvalidJsonStructure { .. } => StatusCode::BAD_REQUEST, + MetastoreError::MissingJsonField { .. } => StatusCode::BAD_REQUEST, + MetastoreError::InvalidJsonValue { .. } => StatusCode::BAD_REQUEST, MetastoreError::Error { status_code, .. } => *status_code, } }src/catalog/mod.rs (2)
300-361: Brittle manifest detection; pass correct meta; avoid duplicate manifests.
- contains(manifest_path("").to_string()) is not metastore-agnostic and can misclassify updates vs creates.
- Creating with ObjectStoreFormat::default() drops time_partition and other fields.
Refactor to: always attempt get_manifest with known bounds/path; update if Some, else create; and forward meta.clone() for creates.
- let manifest_file_name = manifest_path("").to_string(); - let should_update = manifests[pos].manifest_path.contains(&manifest_file_name); - - if should_update { - if let Some(mut manifest) = PARSEABLE + if let Some(mut manifest) = PARSEABLE .metastore .get_manifest( stream_name, manifests[pos].time_lower_bound, manifests[pos].time_upper_bound, Some(manifests[pos].manifest_path.clone()), ) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? { // Update existing manifest for change in partition_changes { manifest.apply_change(change); } PARSEABLE .metastore .put_manifest( &manifest, stream_name, manifests[pos].time_lower_bound, manifests[pos].time_upper_bound, ) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - manifests[pos].events_ingested = events_ingested; manifests[pos].ingestion_size = ingestion_size; manifests[pos].storage_size = storage_size; Ok(None) - } else { - // Manifest not found, create new one - create_manifest( - partition_lower, - partition_changes, - stream_name, - false, - meta.clone(), - events_ingested, - ingestion_size, - storage_size, - ) - .await - } - } else { - // Create new manifest for different partition + } else { + // Manifest not found, create new one create_manifest( partition_lower, partition_changes, stream_name, false, - ObjectStoreFormat::default(), + meta.clone(), events_ingested, ingestion_size, storage_size, ) .await }
116-122: Prevent lost updates on stream.json (add optimistic concurrency).Both read-modify-write flows call metastore.get_stream_json(), mutate, then put_stream_json() without a revision/If-Match. Concurrent writers will clobber each other (stats, manifest_list, retention, etc.). Add a CAS-style argument to put_stream_json (expected revision/etag) and retry-on-conflict with small backoff, or have metastore expose atomic update. This applies here and anywhere we RMW stream JSON.
Also applies to: 377-381
src/storage/azure_blob.rs (1)
575-580: Confirm no callers depend on list_streams() for Azure.This returns an error; verify all call-sites are migrated to metastore or list_old_streams/list_with_delimiter.
#!/bin/bash # Remaining uses of list_streams (Azure or generic paths) rg -nC2 --type=rust '\.list_streams\s*\(' src | sed -n '1,200p'src/query/stream_schema_provider.rs (1)
521-536: Don’t silently ignore metastore failures when merging snapshots.Swallowing errors can hide outages and produce incomplete results. Propagate or at least warn and bail.
- if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { - let obs = PARSEABLE - .metastore - .get_all_stream_jsons(&self.stream, None) - .await; - if let Ok(obs) = obs { + if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(&self.stream, None) + .await + .map_err(|e| DataFusionError::Plan(e.to_string()))?; + { for ob in obs {src/storage/object_storage.rs (2)
611-649: Propagate get_all_stream_jsons errors when aggregating log sources.Currently errors are ignored; callers get partial/empty results. Propagate metastore failures.
- let stream_metas = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(stream_metas) = stream_metas { - for stream_meta in stream_metas.iter() { + let stream_metas = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for stream_meta in stream_metas.iter() { if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { // fetch unique log sources and their fields all_log_sources.extend(stream_meta.log_source.clone()); } - } - } + }
978-999: Avoid unwrap() on Schema::try_merge; handle merge errors.A bad merge (schema drift) will panic the ingestor. Propagate as ObjectStorageError.
- let new_schema = Schema::try_merge(vec![ - schema, - serde_json::from_slice::<Schema>(&stream_schema)?, - ]) - .unwrap(); + let existing: Schema = serde_json::from_slice(&stream_schema)?; + let new_schema = Schema::try_merge(vec![schema, existing]) + .map_err(|e| ObjectStorageError::Custom(format!("schema merge failed: {e}")))?;
🧹 Nitpick comments (17)
src/handlers/http/modal/query/querier_logstream.rs (1)
173-181: Include stream name in parse error to aid diagnostics.Log already warns, but attaching stream_name and index helps root cause quickly.
- let stream_metadata: ObjectStoreFormat = match serde_json::from_slice(&ob) { + let stream_metadata: ObjectStoreFormat = match serde_json::from_slice(&ob) { Ok(d) => d, Err(e) => { - error!("Failed to parse stream metadata: {:?}", e); + error!("Failed to parse stream metadata for stream {}: {:?}", stream_name, e); continue; } };src/metastore/metastore_traits.rs (3)
62-66: Avoid returning DashMap from a trait API.Expose a plain HashMap to decouple callers from DashMap internals and avoid unnecessary concurrency primitives in return types.
- async fn get_chats(&self) -> Result<DashMap<String, Vec<Bytes>>, MetastoreError>; + async fn get_chats(&self) -> Result<std::collections::HashMap<String, Vec<Bytes>>, MetastoreError>;
128-134: Path terminology may not fit non-object-store metastores.Consider renaming to get_manifest_locator (or document that “path” is a logical key, not necessarily a filesystem/object-store path).
162-165: Add Send to MetastoreObject for cross-thread safety.These objects are passed into async methods; adding Send avoids surprises in multi-threaded executors.
-pub trait MetastoreObject: ErasedSerialize + Sync { +pub trait MetastoreObject: ErasedSerialize + Send + Sync {src/handlers/http/alerts.rs (1)
263-270: Reorder: cancel scheduled task before removing from memoryThe function comment says “disk, scheduled tasks, then memory,” but code does memory before tasks. Swap to avoid a window where a running task references a now-removed in-memory alert.
- // delete from memory - alerts.delete(alert_id).await?; - - // delete the scheduled task - alerts.delete_task(alert_id).await?; + // delete the scheduled task first + alerts.delete_task(alert_id).await?; + // then delete from memory + alerts.delete(alert_id).await?;src/enterprise/utils.rs (1)
81-92: Don’t silently ignore get_all_stream_jsons errors.Swallowing errors can hide partial results and create nondeterminism. Prefer propagating (or at least logging) the error.
- let obs = PARSEABLE.metastore.get_all_stream_jsons(stream, None).await; - if let Ok(obs) = obs { + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + {Optionally add a
tracing::warn!with context if you intentionally continue on error.src/parseable/mod.rs (3)
107-118: Deduplicate metastore wiring across storage backends.Four nearly identical blocks construct ObjectStoreMetastore. Consider extracting a small helper/closure to reduce repetition and drift.
Also applies to: 121-132, 134-145, 147-157
258-259: Preferself.metastoreover globalPARSEABLE.metastore.Using the receiver improves testability and reduces hidden globals in methods that already have
&self.Also applies to: 326-327
505-512: Route log-source updates through the Metastore abstraction.Directly calling object storage here bypasses the new metastore layer and will break if a non-object-store metastore is configured.
Happy to propose an interface like
Metastore::put_stream_log_source(stream, &[LogSourceEntry])and update call sites.src/handlers/http/modal/mod.rs (1)
365-382: Preserve error context when loading from metastore.Currently logs a generic message and drops
MetastoreError. Log detail (op, status) to aid ops; optionally return Result to callers.- async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { - let obs = PARSEABLE.metastore.get_node_metadata(node_type).await; - - let mut metadata = vec![]; - if let Ok(obs) = obs { + async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { + let mut metadata = vec![]; + match PARSEABLE.metastore.get_node_metadata(node_type).await { + Ok(obs) => { for object in obs { //convert to NodeMetadata match serde_json::from_slice::<NodeMetadata>(&object) { Ok(node_metadata) => metadata.push(node_metadata), Err(e) => error!("Failed to deserialize NodeMetadata: {:?}", e), } } - } else { - error!("Couldn't read from storage"); - } + } + Err(e) => { + let d = e.to_detail(); + error!( + "Couldn't read NodeMetadata from metastore: {} (op={}, status={})", + d.message, d.operation, d.status_code + ); + } + } // Return the metadata metadata }src/metastore/mod.rs (1)
58-64: Improve Display forErrorvariant.Debug-printing the whole enum is noisy. Emit flow/message directly.
- #[error("{self:?}")] + #[error("{flow}: {message}")] Error { status_code: StatusCode, message: String, flow: String, },src/catalog/mod.rs (1)
476-499: Avoid fuzzy date matching on manifest_path when deleting.Filtering manifests via substring contains(date) is fragile across metastores and path schemes. Prefer: parse date bounds from each item, or ask metastore for manifests by exact lower/upper bounds and delete/update by identity.
src/storage/s3.rs (1)
822-847: Root listing and relative listing normalization.
- list_dirs: passing Path::from("/") typically yields no matches in object_store. Use None for root.
- list_dirs_relative: normalize to ensure trailing slash to avoid prefix bleed (dir vs dir-other).
- async fn list_dirs(&self) -> Result<Vec<String>, ObjectStorageError> { - let pre = object_store::path::Path::from("/"); - let resp = self.client.list_with_delimiter(Some(&pre)).await?; + async fn list_dirs(&self) -> Result<Vec<String>, ObjectStorageError> { + let resp = self.client.list_with_delimiter(None).await?; @@ - async fn list_dirs_relative( + async fn list_dirs_relative( &self, relative_path: &RelativePath, ) -> Result<Vec<String>, ObjectStorageError> { - let prefix = object_store::path::Path::from(relative_path.as_str()); + let mut s = relative_path.as_str().to_string(); + if !s.ends_with('/') { s.push('/'); } + let prefix = object_store::path::Path::from(s); let resp = self.client.list_with_delimiter(Some(&prefix)).await?;src/handlers/http/cluster/mod.rs (1)
491-499: Map metastore errors explicitly to StreamError.Using
?here depends on a From<...> impl; make the mapping explicit to avoid accidental type mismatches and clearer responses.- let obs = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) - .await?; + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) + .await + .map_err(|e| StreamError::Anyhow(e.into()))?;src/storage/object_storage.rs (2)
283-295: Nit: avoid needless Schema clone.You clone Schema twice before put_schema.
- let s = &*schema.clone(); - PARSEABLE - .metastore - .put_schema(s.clone(), stream_name) + PARSEABLE + .metastore + .put_schema((*schema).clone(), stream_name) .await
456-489: Differentiate NotFound from other metastore errors.Falling back to base config on any error can mask real faults. Only synthesize from base when the error is NotFound, otherwise propagate.
- let stream_metadata = match PARSEABLE - .metastore - .get_stream_json(stream_name, false) - .await - { - Ok(data) => data, - Err(_) => { + let stream_metadata = match PARSEABLE + .metastore + .get_stream_json(stream_name, false) + .await + { + Ok(data) => data, + Err(e) if e.to_detail().message.contains("NotFound") => { // get the base stream metadata let bytes = PARSEABLE .metastore .get_stream_json(stream_name, true) .await .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; @@ bytes - } + } + Err(e) => return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), };src/users/dashboards.rs (1)
236-244: Dashboard title uniqueness should likely be per-author, not global.Current check forbids two users from using the same title. Consider scoping duplicate detection by author.
- let has_duplicate = dashboards + let has_duplicate = dashboards .iter() - .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); + .any(|d| d.author == dashboard.author + && d.title == dashboard.title + && d.dashboard_id != dashboard.dashboard_id);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (45)
Cargo.toml(2 hunks)src/alerts/alert_structs.rs(2 hunks)src/alerts/alert_traits.rs(2 hunks)src/alerts/alert_types.rs(5 hunks)src/alerts/mod.rs(8 hunks)src/alerts/target.rs(4 hunks)src/catalog/manifest.rs(2 hunks)src/catalog/mod.rs(6 hunks)src/correlation.rs(8 hunks)src/enterprise/utils.rs(5 hunks)src/handlers/http/alerts.rs(3 hunks)src/handlers/http/cluster/mod.rs(4 hunks)src/handlers/http/ingest.rs(2 hunks)src/handlers/http/logstream.rs(6 hunks)src/handlers/http/mod.rs(2 hunks)src/handlers/http/modal/ingest_server.rs(2 hunks)src/handlers/http/modal/mod.rs(5 hunks)src/handlers/http/modal/query/querier_logstream.rs(2 hunks)src/handlers/http/modal/utils/rbac_utils.rs(1 hunks)src/handlers/http/oidc.rs(1 hunks)src/handlers/http/query.rs(2 hunks)src/handlers/http/role.rs(1 hunks)src/handlers/http/users/dashboards.rs(3 hunks)src/handlers/http/users/filters.rs(6 hunks)src/hottier.rs(3 hunks)src/lib.rs(1 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/mod.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/metastore/mod.rs(1 hunks)src/migration/mod.rs(11 hunks)src/parseable/mod.rs(7 hunks)src/parseable/streams.rs(1 hunks)src/prism/home/mod.rs(5 hunks)src/query/mod.rs(3 hunks)src/query/stream_schema_provider.rs(5 hunks)src/storage/azure_blob.rs(6 hunks)src/storage/gcs.rs(6 hunks)src/storage/localfs.rs(4 hunks)src/storage/mod.rs(3 hunks)src/storage/object_storage.rs(15 hunks)src/storage/s3.rs(7 hunks)src/storage/store_metadata.rs(3 hunks)src/users/dashboards.rs(7 hunks)src/users/filters.rs(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (23)
- src/metastore/metastores/mod.rs
- src/parseable/streams.rs
- src/alerts/alert_structs.rs
- src/lib.rs
- src/handlers/http/users/dashboards.rs
- src/storage/mod.rs
- Cargo.toml
- src/alerts/alert_types.rs
- src/alerts/alert_traits.rs
- src/migration/mod.rs
- src/handlers/http/role.rs
- src/metastore/metastores/object_store_metastore.rs
- src/handlers/http/mod.rs
- src/handlers/http/ingest.rs
- src/prism/home/mod.rs
- src/storage/localfs.rs
- src/users/filters.rs
- src/catalog/manifest.rs
- src/query/mod.rs
- src/alerts/target.rs
- src/handlers/http/query.rs
- src/handlers/http/modal/ingest_server.rs
- src/handlers/http/users/filters.rs
🧰 Additional context used
🧠 Learnings (12)
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/handlers/http/logstream.rssrc/handlers/http/modal/query/querier_logstream.rssrc/parseable/mod.rssrc/query/stream_schema_provider.rssrc/storage/gcs.rssrc/storage/azure_blob.rssrc/storage/object_storage.rs
📚 Learning: 2025-09-06T04:26:17.168Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/handlers/http/logstream.rssrc/handlers/http/oidc.rssrc/alerts/mod.rssrc/parseable/mod.rssrc/storage/store_metadata.rssrc/enterprise/utils.rssrc/metastore/mod.rssrc/correlation.rssrc/handlers/http/modal/utils/rbac_utils.rssrc/metastore/metastore_traits.rssrc/storage/object_storage.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/handlers/http/modal/query/querier_logstream.rssrc/catalog/mod.rssrc/storage/object_storage.rs
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
Applied to files:
src/alerts/mod.rs
📚 Learning: 2025-04-07T13:23:10.092Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1288
File: src/handlers/http/modal/mod.rs:279-301
Timestamp: 2025-04-07T13:23:10.092Z
Learning: For critical operations like writing metadata to disk in NodeMetadata::put_on_disk(), it's preferred to let exceptions propagate (using expect/unwrap) rather than trying to recover with fallback mechanisms, as the failure indicates a fundamental system issue that needs immediate attention.
Applied to files:
src/alerts/mod.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/parseable/mod.rssrc/query/stream_schema_provider.rssrc/catalog/mod.rssrc/storage/object_storage.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/store_metadata.rssrc/enterprise/utils.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/enterprise/utils.rs
📚 Learning: 2025-06-18T06:39:04.775Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Applied to files:
src/correlation.rs
📚 Learning: 2025-06-16T09:50:38.636Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-05-01T10:27:56.858Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Applied to files:
src/users/dashboards.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (34)
src/handlers/http/modal/query/querier_logstream.rs (2)
50-50: Import cleanup is correct.STREAM_ROOT_DIRECTORY removal aligns with metastore-driven lookup.
60-67: Re-check stream existence gating against Mode.Pattern used elsewhere avoids attempting storage creates outside Query mode. If this handler can run in non-Query modes, consider restoring the established check.
- if !PARSEABLE.streams.contains(&stream_name) - && !PARSEABLE - .create_stream_and_schema_from_storage(&stream_name) - .await - .unwrap_or(false) - { + use crate::option::Mode; + if !PARSEABLE.streams.contains(&stream_name) + && (PARSEABLE.options.mode != Mode::Query + || !PARSEABLE + .create_stream_and_schema_from_storage(&stream_name) + .await + .unwrap_or(false)) + { return Err(StreamNotFound(stream_name.clone()).into()); }src/correlation.rs (5)
57-63: LGTM: Loading via metastore.Switch to PARSEABLE.metastore.get_correlations() with robust per-item deserialization looks good.
122-124: LGTM: Create persists to metastore first, then memory.Order and error propagation are correct.
179-181: LGTM: Delete uses metastore.Deletion now targets the metastore backend, matching the new architecture.
333-335: Error mapping for MetastoreError is solid.Status code delegation and structured JSON body via to_detail() are correct.
Also applies to: 352-363
216-224: LGTM: MetastoreObject implementation.get_object_path and get_object_id line up with trait expectations.
src/storage/store_metadata.rs (1)
293-298: Good: Metastore errors mapped with structured detailProperly converts
MetastoreErrorviato_detail()intoObjectStorageError::MetastoreError. This matches the new metastore contract.src/handlers/http/alerts.rs (2)
211-219: LGTM: persist-first, then mutate memorySwitching the POST flow to write to the metastore before updating in-memory state fixes the previous non-atomic update.
467-471: LGTM: metastore persistence in modify flowPersisting the updated alert config via metastore before in-memory mutation aligns with the new persistence model.
src/handlers/http/modal/utils/rbac_utils.rs (1)
26-30: Good: metastore error mapped to structured detail.Boxing
e.to_detail()intoObjectStorageError::MetastoreErrormatches the expected type.src/handlers/http/logstream.rs (3)
415-426: Hot-tier metadata write via metastore looks correct.Fetching stream JSON, toggling
hot_tier_enabled, and persisting back through the metastore is consistent with the new flow.
573-575: Good addition: surface MetastoreError in StreamError.The transparent variant enables ergonomic
?from metastore calls.
611-626: Nice: JSON error body for metastore failures.Returning
to_detail()with proper status improves API diagnostics.src/enterprise/utils.rs (1)
65-71: Good: consistent metastore error mapping.Boxing
e.to_detail()intoObjectStorageError::MetastoreErroraligns with the error contract.src/storage/gcs.rs (2)
657-663: LGTM: list_with_delimiter passthrough.Thin wrapper to the underlying client is correct and matches the trait’s intent.
509-514: Confirm no remaining callers rely on GCS::list_streams.Since this returns Err, ensure all call sites either use metastore or handle the error without unwraps.
#!/bin/bash # Find and review all call sites of `.list_streams(` and nearby handling. rg -nP --type=rust -C3 '\.list_streams\s*\(' # Spot unwraps/expect after calls rg -nP --type=rust -C2 '\.list_streams\s*\([\s\S]*?\)\s*\.unwrap\(\)'src/hottier.rs (2)
270-287: Good migration: use metastore for manifest retrieval with structured error mapping.Error wrapping via
ObjectStorageError::MetastoreError(Box::new(e.to_detail()))keeps diagnostics consistent across layers.
292-312: Manifest aggregation refactor looks sound.Switch to in-memory
Manifestobjects, extendingfilesand sorting before processing is appropriate for the hot-tier sync.src/parseable/mod.rs (2)
241-247: Good: metastore errors mapped to structured ObjectStorageError.Boxing
e.to_detail()intoObjectStorageError::MetastoreErroraligns with the new error contract.
320-333: Confirm structured error propagation for metastore list_streamsIf
From<MetastoreError> for StreamErrorisn’t implemented to wrap the error as
StreamError::Storage(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), replace the?with an explicitmap_errso callers receive a structured error:- let streams = PARSEABLE.metastore.list_streams().await?; + let streams = PARSEABLE + .metastore + .list_streams() + .await + .map_err(|e| StreamError::Storage( + ObjectStorageError::MetastoreError(Box::new(e.to_detail())) + ))?;src/handlers/http/modal/mod.rs (2)
276-284: Nice: NodeMetadata implements MetastoreObject.
get_object_pathandget_object_idchoices look appropriate for addressing metadata items.
350-353: Good: persist NodeMetadata via metastore.This matches the PR’s direction and keeps staging as a local cache only.
Also applies to: 360-363
src/metastore/mod.rs (1)
66-147: Good: to_detail uses 400 for client JSON errors and enriches context.Variant-specific metadata and timestamps are helpful for debugging.
src/storage/s3.rs (2)
853-858: LGTM: delimiter listing bridge.Thin wrapper over client.list_with_delimiter keeps the trait parity and error type consistent.
692-697: Guard remaining call-sites of list_streams().Since S3 returns an error here, ensure no production path calls list_streams() for S3 backends.
#!/bin/bash # Find usages of list_streams and show context rg -nC2 --type=rust '\.list_streams\s*\(' src | sed -n '1,200p'src/handlers/http/cluster/mod.rs (1)
813-836: LGTM: node metadata deletion via metastore.Cleanly removes per-type entries and reports success if any existed.
src/storage/azure_blob.rs (1)
736-742: LGTM: list_with_delimiter passthrough.Brings Azure in line with the trait and object_store API.
src/alerts/mod.rs (2)
139-141: LGTM: persist migrated alerts via metastore.Writes back the v2 alert through the new metastore path; good.
951-953: LGTM: propagate MetastoreError and map to 500.AlertError now carries MetastoreError and ResponseError maps it to 500.
Also applies to: 980-981
src/query/stream_schema_provider.rs (2)
416-437: LGTM: manifest fetch now metastore-backed with missing-manifest tolerance.Switch to metastore.get_manifest and warn on missing manifests prevents panics during compaction/races.
497-505: LGTM: load stream metadata via metastore.Error mapping to DataFusionError is appropriate here.
src/storage/object_storage.rs (1)
272-276: LGTM: trait extended with list_with_delimiter.Matches backend bridges and object_store API.
src/users/dashboards.rs (1)
182-203: LGTM: load dashboards via metastore with robust JSON parsing.Good incremental load with dedupe by dashboard_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/handlers/http/logstream.rs (2)
400-406: Set in-memory hot-tier only after confirming feature is enabled.Currently set_hot_tier(true) happens before HotTierManager::global() check; on error, state stays true.
- stream.set_hot_tier(true); - let Some(hot_tier_manager) = HotTierManager::global() else { + let Some(hot_tier_manager) = HotTierManager::global() else { return Err(StreamError::HotTierNotEnabled(stream_name)); }; + stream.set_hot_tier(true);
60-69: Also delete metastore metadata when deleting a stream.Object-store deletion won’t clean non-object-store metastores; streams may reappear in list_streams().
Follow-up (requires a new Metastore API; see comment in metastore_traits.rs):
- After storage deletion, call: PARSEABLE.metastore.delete_stream_jsons(&stream_name, None).await
- Log and continue on error to keep delete idempotent.
♻️ Duplicate comments (2)
src/metastore/metastore_traits.rs (2)
39-41: Generic get_objects is under-specified; add an optional filter or drop it.Current signature forces callers to post-filter bytes, unlike object_store’s filtered get_objects.
- async fn get_objects(&self, parent_path: &str) -> Result<Vec<Bytes>, MetastoreError>; + async fn get_objects( + &self, + parent_path: &str, + filter: Option<Box<dyn Fn(String) -> bool + Send>>, + ) -> Result<Vec<Bytes>, MetastoreError>;
77-101: Add delete API for stream.json to avoid orphaned records.Without it, non-object-store metastores retain stale metadata after stream deletion.
async fn get_all_stream_jsons( &self, stream_name: &str, mode: Option<Mode>, ) -> Result<Vec<Bytes>, MetastoreError>; + + /// Delete all stream.json entries for a stream. + /// If `mode` is Some(Mode::Ingest), delete only ingestor stream.jsons; if None, delete all. + /// Returns number of deleted entries. + async fn delete_stream_jsons( + &self, + stream_name: &str, + mode: Option<Mode>, + ) -> Result<usize, MetastoreError>;I can wire the trait and add the object_store_metastore implementation if you’d like.
🧹 Nitpick comments (2)
src/alerts/alert_types.rs (2)
188-193: Avoid referencing a temporary across .await.Bind AlertConfig before calling put_alert to sidestep subtle borrow-lifetime pitfalls and re-cloning.
- PARSEABLE - .metastore - .put_alert(&self.to_alert_config()) - .await?; + let cfg = self.to_alert_config(); + PARSEABLE.metastore.put_alert(&cfg).await?;
215-218: Same lifetime refactor here.Apply the same local-binding pattern in both places.
- PARSEABLE - .metastore - .put_alert(&self.to_alert_config()) - .await?; + let cfg = self.to_alert_config(); + PARSEABLE.metastore.put_alert(&cfg).await?;- PARSEABLE - .metastore - .put_alert(&self.to_alert_config()) - .await?; + let cfg = self.to_alert_config(); + PARSEABLE.metastore.put_alert(&cfg).await?;Also applies to: 251-254
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
Cargo.toml(2 hunks)src/alerts/alert_types.rs(5 hunks)src/alerts/mod.rs(8 hunks)src/handlers/http/logstream.rs(6 hunks)src/handlers/http/modal/query/querier_logstream.rs(2 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- Cargo.toml
- src/handlers/http/modal/query/querier_logstream.rs
- src/metastore/metastores/object_store_metastore.rs
- src/alerts/mod.rs
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
📚 Learning: 2025-09-06T04:26:17.168Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.168Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/metastore/metastore_traits.rssrc/handlers/http/logstream.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/handlers/http/logstream.rs
📚 Learning: 2025-03-28T06:17:01.201Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
Applied to files:
src/handlers/http/logstream.rs
🧬 Code graph analysis (3)
src/alerts/alert_types.rs (4)
src/parseable/mod.rs (1)
storage(282-284)src/storage/object_storage.rs (1)
alert_json_path(1060-1062)src/metastore/metastore_traits.rs (2)
get_object_path(163-163)get_object_id(164-164)src/alerts/alert_structs.rs (2)
get_object_path(538-540)get_object_id(534-536)
src/metastore/metastore_traits.rs (9)
src/handlers/http/modal/mod.rs (6)
node_type(569-569)node_type(582-584)domain_name(567-567)domain_name(574-576)get_object_path(277-279)get_object_id(281-283)src/metastore/metastores/object_store_metastore.rs (38)
initiate_connection(68-70)get_objects(73-81)get_alerts(84-95)put_alert(98-107)delete_alert(110-116)get_llmconfigs(119-130)put_llmconfig(133-140)delete_llmconfig(143-149)get_dashboards(152-170)put_dashboard(173-181)delete_dashboard(184-190)get_chats(193-218)put_chat(221-229)delete_chat(232-238)get_filters(242-311)put_filter(314-322)delete_filter(325-332)get_correlations(335-353)put_correlation(356-362)delete_correlation(365-372)get_stream_json(377-392)put_stream_json(432-441)get_all_stream_jsons(395-429)get_all_manifest_files(444-486)get_manifest(489-523)put_manifest(539-549)delete_manifest(551-560)get_manifest_path(526-537)get_all_schemas(604-621)get_schema(623-625)put_schema(627-630)get_parseable_metadata(632-646)get_ingestor_metadata(648-657)put_parseable_metadata(659-667)get_node_metadata(669-684)delete_node_metadata(694-732)put_node_metadata(686-692)list_streams(734-777)src/storage/object_storage.rs (2)
get_objects(202-206)list_streams(220-220)src/alerts/alert_types.rs (2)
get_object_path(71-73)get_object_id(75-77)src/users/dashboards.rs (3)
delete_dashboard(294-315)get_object_path(71-79)get_object_id(81-83)src/users/filters.rs (3)
delete_filter(130-133)get_object_path(50-57)get_object_id(59-61)src/alerts/target.rs (2)
get_object_path(336-338)get_object_id(340-342)src/correlation.rs (2)
get_object_path(217-219)get_object_id(221-223)src/storage/store_metadata.rs (2)
get_object_path(109-111)get_object_id(113-115)
src/handlers/http/logstream.rs (3)
src/parseable/mod.rs (1)
storage(282-284)src/correlation.rs (1)
error_response(352-363)src/handlers/http/ingest.rs (1)
error_response(518-529)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: coverage
🔇 Additional comments (4)
src/alerts/alert_types.rs (2)
38-43: Imports LGTM.Correctly wires ThresholdAlert into the metastore layer.
70-78: MetastoreObject impl is correct and consistent with AlertConfig.Path/id mapping matches alert_json_path(id) and Ulid semantics.
src/handlers/http/logstream.rs (2)
91-94: list() correctly switches to metastore.Authorization filter remains intact.
499-499: MetastoreError surfacing over HTTP is correct.Status maps via e.status_code(); body uses to_detail() JSON. Aligns with ingest/correlation.
Also applies to: 572-574, 610-621
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/query/mod.rs (1)
550-563: Don’t swallow metastore/JSON errors; propagate them (and add a safe fallback).Current code ignores failures from
get_all_stream_jsonsand JSON parse, potentially undercounting manifests silently. Propagate the call error, surface parse errors with context, and if nothing merges, fall back to the primarystream.jsonparsed above.- let obs = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(obs) = obs { - for ob in obs { - if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { - let snapshot = object_store_format.snapshot; - for manifest in snapshot.manifest_list { - merged_snapshot.manifest_list.push(manifest); - } - } - } - } + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await?; + for ob in obs { + let ob_format: ObjectStoreFormat = serde_json::from_slice(&ob) + .map_err(|e| QueryError::CustomError(format!("invalid stream.json for {stream_name}: {e}")))?; + merged_snapshot + .manifest_list + .extend(ob_format.snapshot.manifest_list); + } + // Fallback: if nothing merged (e.g., empty metastore listing), use the primary stream.json + if merged_snapshot.manifest_list.is_empty() { + merged_snapshot = object_store_format.snapshot; + }
♻️ Duplicate comments (3)
src/storage/localfs.rs (1)
463-474: Implement list_with_delimiter for LocalFS (delegate to object_store::local).Many paths (e.g., metrics layer) now rely on delimiter-based listing; returning Unsupported will break local/dev flows. Delegate to the object_store local backend.
- async fn list_with_delimiter( - &self, - _prefix: Option<object_store::path::Path>, - ) -> Result<ListResult, ObjectStorageError> { - Err(ObjectStorageError::UnhandledError(Box::new( - std::io::Error::new( - std::io::ErrorKind::Unsupported, - "list_with_delimiter is not implemented for LocalFS", - ), - ))) - } + async fn list_with_delimiter( + &self, + prefix: Option<object_store::path::Path>, + ) -> Result<ListResult, ObjectStorageError> { + let store = object_store::local::LocalFileSystem::new_with_prefix(self.root.clone()) + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?; + Ok(store.list_with_delimiter(prefix.as_ref()).await?) + }src/query/mod.rs (2)
538-543: Map serde_json parse error to QueryError with context (and improve readability).Avoid relying on implicit From for
serde_json::Error; attachstream_namefor diagnostics and pull the bytes out first for clarity.- let object_store_format: ObjectStoreFormat = serde_json::from_slice( - &PARSEABLE - .metastore - .get_stream_json(stream_name, false) - .await?, - )?; + let bytes = PARSEABLE + .metastore + .get_stream_json(stream_name, false) + .await?; + let object_store_format: ObjectStoreFormat = serde_json::from_slice(&bytes) + .map_err(|e| QueryError::CustomError(format!("invalid stream.json for {stream_name}: {e}")))?;
574-594: Good fix: replaced panic with typed error when manifest is missing.Graceful failure with
ok_or_elseis exactly what we want in query paths.
🧹 Nitpick comments (1)
src/query/mod.rs (1)
586-592: Tiny wording nit in error message.Use
path=instead ofpath-for clarity.- "Manifest not found for {stream_name} [{} - {}], path- {}", + "Manifest not found for {stream_name} [{} - {}], path={}",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/metastore/metastores/object_store_metastore.rs(1 hunks)src/query/mod.rs(3 hunks)src/storage/localfs.rs(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/metastore/metastores/object_store_metastore.rs
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
📚 Learning: 2025-09-06T04:26:17.191Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/query/mod.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/query/mod.rs
📚 Learning: 2025-06-18T06:39:04.775Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Applied to files:
src/query/mod.rs
🧬 Code graph analysis (2)
src/query/mod.rs (2)
src/parseable/mod.rs (4)
storage(282-284)serde_json(338-338)serde_json(344-344)new(178-192)src/metastore/metastores/object_store_metastore.rs (4)
serde_json(264-264)serde_json(301-301)serde_json(482-482)serde_json(709-709)
src/storage/localfs.rs (5)
src/storage/object_storage.rs (1)
list_with_delimiter(272-275)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/s3.rs (1)
list_with_delimiter(853-858)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (3)
src/storage/localfs.rs (2)
417-427: Good: error propagation and precise filtering in list_minutes.Propagating read_dir errors and filtering on "minute=" is correct and consistent.
20-20: LGTM on import adjustments.Imports align with the new APIs and constants used in this file.
Also applies to: 31-31, 49-49
src/query/mod.rs (1)
60-60: LGTM: storage imports align with metastore-driven flow.
ObjectStoreFormatis now used below for stream.json parsing; consistent with the new design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/storage/object_storage.rs (2)
539-589: Don’t swallow metastore errors and fix invalidif let … &&pattern.
- Using
into_iter().next()on aResulthides errors; propagate asObjectStorageError::MetastoreError(Box::new(e.to_detail())).if let Some(x) = … && !x.is_empty()is not valid; split the condition or use a prior binding.- if let Some(stream_metadata_obs) = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) - .await - .into_iter() - .next() - && !stream_metadata_obs.is_empty() - { - for stream_metadata_bytes in stream_metadata_obs.iter() { + let stream_metadata_obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + + if !stream_metadata_obs.is_empty() { + for stream_metadata_bytes in stream_metadata_obs.iter() { let stream_ob_metadata = serde_json::from_slice::<ObjectStoreFormat>(stream_metadata_bytes)?; all_log_sources.extend(stream_ob_metadata.log_source.clone()); } @@ - let stream_ob_metadata = - serde_json::from_slice::<ObjectStoreFormat>(&stream_metadata_obs[0])?; + let stream_ob_metadata = + serde_json::from_slice::<ObjectStoreFormat>(&stream_metadata_obs[0])?; @@ - return Ok(stream_metadata_bytes); - } - Ok(Bytes::new()) + return Ok(stream_metadata_bytes); + } + Ok(Bytes::new())
958-959: Use accessor, not private field, for storage.Accessing
PARSEABLE.storagelikely fails outsideparseablemodule. Use the public accessor for consistency with the rest of this file.- let object_store = PARSEABLE.storage.get_object_store(); + let object_store = PARSEABLE.storage().get_object_store();
♻️ Duplicate comments (2)
src/handlers/http/logstream.rs (1)
487-493: Delete hot-tier doesn’t clear flags in-memory/metastore.After deleting from HotTierManager, in-memory stream metadata and metastore JSON remain with hot_tier_enabled=true. This was flagged earlier and is still present.
Apply:
hot_tier_manager.delete_hot_tier(&stream_name).await?; + // keep in-memory and metastore metadata consistent + PARSEABLE.get_stream(&stream_name)?.set_hot_tier(false); + let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice( + &PARSEABLE.metastore.get_stream_json(&stream_name, false).await?, + )?; + stream_metadata.hot_tier_enabled = false; + PARSEABLE + .metastore + .put_stream_json(&stream_metadata, &stream_name) + .await?; + Ok(( format!("hot tier deleted for stream {stream_name}"), StatusCode::OK, ))src/storage/object_storage.rs (1)
616-626: Propagate metastore failures instead of ignoring them. (Regression)Current code drops
Errfromget_all_stream_jsons, masking real issues.- let stream_metas = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(stream_metas) = stream_metas { - for stream_meta in stream_metas.iter() { + let stream_metas = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for stream_meta in stream_metas.iter() { if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { // fetch unique log sources and their fields all_log_sources.extend(stream_meta.log_source.clone()); - } - } - } + } + }
🧹 Nitpick comments (3)
src/handlers/http/logstream.rs (1)
404-409: Available size should subtract used size.Setting available_size = size ignores used_size; prefer saturating subtraction.
Apply:
- hottier.available_size = hottier.size; + hottier.available_size = hottier.size.saturating_sub(hottier.used_size);src/storage/object_storage.rs (2)
982-999: Optional: handle “schema not found” without failing merge.If
get_schema(stream_name)returns NotFound for new streams, merge will fail; persist the providedschemadirectly in that case.- let stream_schema = PARSEABLE - .metastore - .get_schema(stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - - let new_schema = Schema::try_merge(vec![ - schema, - serde_json::from_slice::<Schema>(&stream_schema)?, - ]) - .map_err(|e| ObjectStorageError::Custom(e.to_string()))?; + let existing = match PARSEABLE.metastore.get_schema(stream_name).await { + Ok(bytes) => Some(bytes), + Err(e) => { + // Treat “not found” as no existing schema; otherwise propagate. + if e.status_code() == 404 { None } else { + return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))); + } + } + }; + let new_schema = if let Some(bytes) = existing { + let prev: Schema = serde_json::from_slice(&bytes)?; + Schema::try_merge(vec![schema, prev]) + .map_err(|e| ObjectStorageError::Custom(format!("schema merge failed: {e}")))? + } else { + schema + };
667-712: Partition-derived timestamp logic — producers zero-pad; backends return raw prefixes; tighten validation or normalize
- Verified writers and naming docs require two-digit zero-padding: src/utils/time.rs:168 (hour uses {hour:02}), parseable filename templates in src/parseable/streams.rs (use {:02} for hour), and the object_storage trait docs require
hour=HH/minute=MM(src/storage/object_storage.rs:229-241).- Backend list_* (src/storage/{s3.rs,gcs.rs,azure_blob.rs,localfs.rs}) merely filter prefixes starting with "hour=" / "minute=" and return them unchanged — they do not normalize or validate numeric range.
- Minor inconsistency: src/storage/field_stats.rs uses regex r"date=(\d{4}-\d{2}-\d{2}).hour=(\d{1,2}).minute=(\d{1,2})" (allows 1-digit), which is more permissive than the trait docs.
- Recommendation (optional refactor): either enforce/normalize two-digit
hour=/minute=in list_hours/list_minutes or make extract_timestamp_for_date/consumers robust to unpadded values; alternatively tighten the field_stats regex to\d{2}to match the trait.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/handlers/http/logstream.rs(7 hunks)src/storage/object_storage.rs(15 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/handlers/http/logstream.rssrc/storage/object_storage.rs
📚 Learning: 2025-03-28T06:17:01.201Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
Applied to files:
src/handlers/http/logstream.rs
📚 Learning: 2025-09-06T04:26:17.191Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/handlers/http/logstream.rssrc/storage/object_storage.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
🧬 Code graph analysis (2)
src/handlers/http/logstream.rs (2)
src/parseable/mod.rs (1)
storage(282-284)src/correlation.rs (1)
error_response(352-363)
src/storage/object_storage.rs (7)
src/handlers/http/mod.rs (1)
fetch_schema(90-95)src/storage/localfs.rs (3)
list_with_delimiter(463-473)new(100-102)from(572-574)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/s3.rs (4)
list_with_delimiter(853-858)s(179-179)from(862-870)from(874-876)src/parseable/mod.rs (6)
create_stream(647-728)new(178-192)serde_json(338-338)serde_json(344-344)custom_partition(789-789)custom_partition(961-961)src/catalog/mod.rs (1)
update_snapshot(108-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (9)
src/handlers/http/logstream.rs (8)
31-31: Import looks correct.ObjectStoreFormat is used below; no unused import.
91-94: Good: stream listing moved to metastore.Authorization preserved; errors propagate as MetastoreError.
414-425: Hot-tier flag persist on enable: LGTM.Correctly loads metastore JSON, flips hot_tier_enabled=true, and persists.
512-513: Import of MetastoreError: good.Enables unified error propagation.
585-587: StreamError variant for MetastoreError: good addition.Transparent conversion keeps call sites clean.
623-624: Status mapping delegates to metastore error.Consistent with other modules; good.
628-637: JSON error body for metastore failures: LGTM.Matches correlation module pattern and uses to_detail() for structured payloads.
59-75: Confirm whether stream delete also removes metastore metadata.delete() erases storage and local data but doesn’t touch metastore. If objectstore.delete_stream() doesn’t cascade, you’ll leave orphaned stream JSON. Please verify and, if needed, delete metastore entries here.
src/storage/object_storage.rs (1)
272-276: Trait surface addition looks good.
list_with_delimitersignature andListResultimport align with downstream implementations (S3/GCS/Azure) and LocalFS’ intentional Unsupported. No changes needed.
0ae45a7 to
7d03cd4
Compare
7d03cd4 to
2b066a8
Compare
Two new traits have been introduced, `Metastore` and `MetastoreObject`. The `Metastore` trait will handle Parseable's connection to the underlying metastore (object store, postgres, etc) and expose CRUD operations whereas the `MetastoreObject` trait will expose any required methods for the underlying struct which needs to be treated as meta. For now metadata means any data other than Parquet. This commit also contains changes for Alerts to start interacting with its files using metastore instead of `PARSEABLE.storage`
- Implemented changes for dashboards and alerts interactions - Changes to `MetastoreError` object to make it more readable - path handling for file upload/deletion should be taken care of by the metastore
- Since we were migrating filters on load, metastore definition is doing that (since it also has access to the path of the loaded filter)
2b066a8 to
262f4a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
src/handlers/http/modal/mod.rs (1)
586-589: Bug: infinite recursion in Metadata::file_path for NodeMetadata.self.file_path() calls the trait method itself.
- fn file_path(&self) -> RelativePathBuf { - self.file_path() - } + fn file_path(&self) -> RelativePathBuf { + NodeMetadata::file_path(self) + }src/alerts/target.rs (1)
112-120: Make delete atomic: persist first, then mutate memory (or rollback on error)Currently removes from memory before metastore delete; on failure, states diverge.
Apply:
- let target = self - .target_configs - .write() - .await - .remove(target_id) - .ok_or(AlertError::InvalidTargetID(target_id.to_string()))?; - PARSEABLE.metastore.delete_target(&target).await?; - Ok(target) + // Read the target without removing first + let target = { + let guard = self.target_configs.read().await; + guard + .get(target_id) + .cloned() + .ok_or(AlertError::InvalidTargetID(target_id.to_string()))? + }; + + // Persist deletion + PARSEABLE.metastore.delete_target(&target).await?; + + // Mutate memory only after successful persistence + let removed = self + .target_configs + .write() + .await + .remove(target_id) + .expect("target must exist after successful delete"); + Ok(removed)src/handlers/http/logstream.rs (1)
398-426: Set in-memory hot-tier flag after persisting, not beforeAvoids memory/durable drift if manager/metastore writes fail.
- stream.set_hot_tier(true); let Some(hot_tier_manager) = HotTierManager::global() else { return Err(StreamError::HotTierNotEnabled(stream_name)); }; @@ hot_tier_manager .put_hot_tier(&stream_name, &mut hottier) .await?; @@ - stream_metadata.hot_tier_enabled = true; + stream_metadata.hot_tier_enabled = true; @@ - PARSEABLE + PARSEABLE .metastore .put_stream_json(&stream_metadata, &stream_name) .await?; + + // Update in-memory only after successful persistence + stream.set_hot_tier(true);src/enterprise/utils.rs (1)
81-91: Don’t swallow metastore errors; propagate them.
Silently ignoring get_all_stream_jsons() failures can yield partial/incorrect results.Apply:
- let obs = PARSEABLE.metastore.get_all_stream_jsons(stream, None).await; - if let Ok(obs) = obs { - for ob in obs { - if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { - let snapshot = object_store_format.snapshot; - for manifest in snapshot.manifest_list { - merged_snapshot.manifest_list.push(manifest); - } - } - } - } + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for ob in obs { + if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { + let snapshot = object_store_format.snapshot; + for manifest in snapshot.manifest_list { + merged_snapshot.manifest_list.push(manifest); + } + } + }src/alerts/mod.rs (1)
270-313: Validate logical operator to prevent query-string injection during migration.
conditions["operator"] is unvalidated and injected into SQL (AND/OR join). Restrict to ["and","or"] and default to AND.Outside-this-hunk patch sketch:
let logical_op = match conditions["operator"] .as_str() .map(|s| s.to_ascii_lowercase()) { Some(ref op) if op == "or" => "OR", _ => "AND", }; let where_clause = where_clauses.join(&format!(" {logical_op} "));src/query/stream_schema_provider.rs (1)
520-536: Don’t silently ignore metastore errors when merging multiple stream.json files.Swallowing Err may yield inconsistent/incomplete results. Propagate as DataFusionError::Plan (or at least warn).
Apply:
- if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { - let obs = PARSEABLE - .metastore - .get_all_stream_jsons(&self.stream, None) - .await; - if let Ok(obs) = obs { + if PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism { + let obs = PARSEABLE + .metastore + .get_all_stream_jsons(&self.stream, None) + .await + .map_err(|e| DataFusionError::Plan(e.to_string()))?; + { for ob in obs { if let Ok(object_store_format) = serde_json::from_slice::<ObjectStoreFormat>(&ob) { let snapshot = object_store_format.snapshot; for manifest in snapshot.manifest_list { merged_snapshot.manifest_list.push(manifest); } } } } } else { merged_snapshot = object_store_format.snapshot; }src/storage/object_storage.rs (1)
667-789: Partition-walk timestamp derivation: fix date-format contract mismatchget_first_and_latest_event_from_storage expects list_dates entries as "date=YYYY-MM-DD" (no trailing '/'); list_hours/list_minutes already normalize trailing slashes, but _list_dates in S3/GCS/Azure returns the stripped prefix without removing a possible trailing '/'. This can cause chrono::NaiveDate::parse_from_str to fail.
- Fix (pick one): normalize dates in _list_dates (src/storage/s3.rs:: _list_dates, src/storage/gcs.rs:: _list_dates, src/storage/azure_blob.rs:: _list_dates) by stripping a trailing '/' (same pattern used in list_hours/list_minutes), or trim the trailing '/' in get_first_and_latest_event_from_storage before parsing.
src/catalog/mod.rs (1)
483-499: Persist full stream meta via metastore (don’t write only snapshot to object storage)Writing only meta.snapshot drops top-level fields (e.g., first_event_at) and bypasses the metastore abstraction — persist the entire meta via the metastore instead of storage.put_snapshot.
Location: src/catalog/mod.rs (line with storage.put_snapshot)
- storage.put_snapshot(stream_name, meta.snapshot).await?; + PARSEABLE + .metastore + .put_stream_json(&meta, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?;
♻️ Duplicate comments (19)
src/handlers/http/modal/query/querier_logstream.rs (1)
48-51: Filter to ingestor stream.jsons to avoid double-counting.Pass Mode::Ingest and import Mode per intent “all ingestor stream jsons”.
@@ use crate::{ @@ - storage::{ObjectStoreFormat, StreamType}, + storage::{ObjectStoreFormat, StreamType}, + option::Mode, }; @@ - .get_all_stream_jsons(&stream_name, None) + .get_all_stream_jsons(&stream_name, Some(Mode::Ingest)) .await?;Also applies to: 165-169
src/alerts/target.rs (2)
59-63: Good: load() no longer awaits under lock and surfaces metastore failuresThis correctly fetches from metastore first, then updates the map.
69-73: Good: update() persists before mutating in-memory mapPrevents divergence on persistence failure.
src/handlers/http/logstream.rs (2)
623-637: HTTP mapping + JSON body for MetastoreError is correctUses e.status_code() and e.to_detail(); returns structured JSON as intended.
476-487: Also clear in-memory hot-tier flag on delete; keep ordering atomicYou persist to metastore, but the in-memory flag remains true.
- let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice( + let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice( &PARSEABLE .metastore .get_stream_json(&stream_name, false) .await?, )?; stream_metadata.hot_tier_enabled = false; - PARSEABLE + PARSEABLE .metastore .put_stream_json(&stream_metadata, &stream_name) .await?; + + // Update in-memory only after persistence succeeds + PARSEABLE.get_stream(&stream_name)?.set_hot_tier(false);src/storage/localfs.rs (2)
141-165: Fix path resolution in get_object(): current logic is unsafe and brittleSubstring/root-slice checks enable path traversal and break on Windows; always anchor to root.
- let file_path; - #[cfg(windows)] - { - file_path = path.to_path(""); - } - #[cfg(not(windows))] - { - let root_str = self.root.to_str().unwrap(); - file_path = if path.to_string().contains(&root_str[1..]) && root_str.len() > 1 { - path.to_path("/") - } else { - self.path_in_root(path) - }; - } + let file_path = self.path_in_root(path);
474-484: Implement list_with_delimiter for LocalFS (don’t return Unsupported)Several call sites rely on this; returning Unsupported will break local/dev flows. Delegate to object_store’s LocalFileSystem.
async fn list_with_delimiter( &self, _prefix: Option<object_store::path::Path>, ) -> Result<ListResult, ObjectStorageError> { - Err(ObjectStorageError::UnhandledError(Box::new( - std::io::Error::new( - std::io::ErrorKind::Unsupported, - "list_with_delimiter is not implemented for LocalFS", - ), - ))) + let store = object_store::local::LocalFileSystem::new_with_prefix(self.root.clone()) + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?; + Ok(store.list_with_delimiter(_prefix.as_ref()).await?) }src/parseable/mod.rs (1)
258-264: Startup gating bug: list_streams().is_ok() treats "no streams" as "streams exist".This can admit stale buckets or misconfigure fresh ones. Inspect the set and propagate metastore errors.
- let has_streams = PARSEABLE.metastore.list_streams().await.is_ok(); + let has_streams = match PARSEABLE.metastore.list_streams().await { + Ok(streams) => !streams.is_empty(), + Err(e) => return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), + }; if !has_dirs && !has_parseable_json { return Ok(None); } if has_streams { return Ok(parseable_json_result); }src/migration/mod.rs (1)
107-110: Updateparseable_jsonin the default branch too.Currently remote metadata is written but the in-memory copy isn't updated.
_ => { let metadata = metadata_migration::remove_querier_metadata(storage_metadata); + let _metadata: Bytes = serde_json::to_vec(&metadata)?.into(); + *parseable_json = Some(_metadata); put_remote_metadata(metadata).await?; }src/enterprise/utils.rs (1)
95-109: Avoid panic on missing/invalid manifest; return a structured error.
.expect(...) will crash query on gaps; propagate a descriptive error instead.Apply:
- for manifest_item in merged_snapshot.manifests(&time_filters) { - manifest_files.push( - PARSEABLE - .metastore - .get_manifest( - stream, - manifest_item.time_lower_bound, - manifest_item.time_upper_bound, - Some(manifest_item.manifest_path), - ) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))? - .expect("Data is invalid for Manifest"), - ) - } + for manifest_item in merged_snapshot.manifests(&time_filters) { + let maybe_manifest = PARSEABLE + .metastore + .get_manifest( + stream, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound, + Some(manifest_item.manifest_path), + ) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + match maybe_manifest { + Some(m) => manifest_files.push(m), + None => { + return Err(ObjectStorageError::Custom(format!( + "Manifest not found for {} [{} - {}] at {}", + stream, + manifest_item.time_lower_bound, + manifest_item.time_upper_bound, + manifest_item.manifest_path + ))) + } + } + }src/alerts/mod.rs (3)
996-1005: Don’t hold the write lock across awaits in load().
You acquire a write lock before async migrations/IO, increasing contention risk.Apply:
- let mut map = self.alerts.write().await; + // Acquire write lock only when inserting/updating entries.And replace insert sites later in the loop with brief locks, e.g.:
// Disabled-branch insert { let mut w = self.alerts.write().await; w.insert(*alert.get_id(), alert.clone_box()); } // Final insert { let mut w = self.alerts.write().await; w.insert(*alert.get_id(), alert.clone_box()); }
1017-1017: Lock is still held during async migration.
migrate_from_v1().await runs while holding the write lock; move the lock to after migration.
1037-1037: Same here: avoid awaiting under the write lock.
Move lock to just the insertion.src/users/dashboards.rs (2)
70-84: Avoid unwrap() in MetastoreObject getters; can panic on malformed/legacy dashboards.Unwrapping author and dashboard_id risks runtime panics for older or bad entries. Prefer filtering invalid dashboards on load so in-memory state is always safe (and getters can assume presence).
Apply load-time validation (see next comment) and optionally change unwrap() to expect() with a clear message.
182-203: Filter invalid dashboards during load to prevent later panics.Skip entries missing author or dashboard_id when hydrating in-memory state.
Apply:
for dashboard in all_dashboards { if dashboard.is_empty() { continue; } @@ - if let Ok(dashboard) = serde_json::from_value::<Dashboard>(dashboard_value.clone()) { - this.retain(|d: &Dashboard| d.dashboard_id != dashboard.dashboard_id); - this.push(dashboard); - } else { - tracing::warn!("Failed to deserialize dashboard: {:?}", dashboard_value); - } + if let Ok(dashboard) = serde_json::from_value::<Dashboard>(dashboard_value.clone()) { + if dashboard.author.is_none() || dashboard.dashboard_id.is_none() { + tracing::warn!("Skipping dashboard missing author/id: {:?}", dashboard_value); + continue; + } + this.retain(|d: &Dashboard| d.dashboard_id != dashboard.dashboard_id); + this.push(dashboard); + } else { + tracing::warn!("Failed to deserialize dashboard: {:?}", dashboard_value); + }src/storage/object_storage.rs (1)
616-626: Propagate metastore errors instead of ignoring them in get_log_source_from_storage.Silently ignoring Err may hide issues and produce incomplete results.
Apply:
- let stream_metas = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(stream_metas) = stream_metas { - for stream_meta in stream_metas.iter() { + let stream_metas = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for stream_meta in stream_metas.iter() { if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { // fetch unique log sources and their fields all_log_sources.extend(stream_meta.log_source.clone()); - } - } - } + } + }src/catalog/mod.rs (3)
116-123: Add optimistic concurrency for metastore JSON RMW (duplicate).This is still a read-modify-write of stream JSON without CAS/ETag or revision checks, risking lost updates under concurrent writers. Previous comment already covers adding revision-aware get/conditional put with retry-on-conflict.
303-329: Brittle update gate: avoid contains(...) on manifest_path (duplicate).The branch is still guarded by a string contains check computed from object-store paths. Prefer an unconditional get_manifest by known bounds/path and branch on Option, keeping the flow metastore-agnostic; also remove the fallback that later uses ObjectStoreFormat::default().
377-381: Persist via metastore OK, but protect with CAS (duplicate).put_stream_json should enforce If-Match (or equivalent) to avoid clobbering concurrent updates.
🧹 Nitpick comments (18)
src/handlers/http/cluster/mod.rs (2)
814-835: Consider consolidating node deletion results.The deletion logic correctly uses the metastore for all node types. However, you could make this more concise using a single async block or future::join4.
- // Delete ingestor metadata - let removed_ingestor = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Ingestor) - .await?; - - // Delete indexer metadata - let removed_indexer = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Indexer) - .await?; - - // Delete querier metadata - let removed_querier = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Querier) - .await?; - - // Delete prism metadata - let removed_prism = PARSEABLE - .metastore - .delete_node_metadata(&domain_name, NodeType::Prism) - .await?; + // Delete metadata for all node types + let (removed_ingestor, removed_indexer, removed_querier, removed_prism) = future::join4( + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Ingestor), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Indexer), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Querier), + PARSEABLE.metastore.delete_node_metadata(&domain_name, NodeType::Prism), + ).await; + + let removed_ingestor = removed_ingestor?; + let removed_indexer = removed_indexer?; + let removed_querier = removed_querier?; + let removed_prism = removed_prism?;
1009-1055: Add error handling for empty metrics.The function correctly gates ingestion based on non-empty metrics, but consider logging when metrics are empty to help with debugging.
let cluster_metrics = fetch_cluster_metrics().await; if let Ok(metrics) = cluster_metrics && !metrics.is_empty() { info!("Cluster metrics fetched successfully from all ingestors"); if let Ok(metrics_bytes) = serde_json::to_vec(&metrics) { if matches!( ingest_internal_stream( INTERNAL_STREAM_NAME.to_string(), bytes::Bytes::from(metrics_bytes), ) .await, Ok(()) ) { info!("Cluster metrics successfully ingested into internal stream"); } else { error!("Failed to ingest cluster metrics into internal stream"); } } else { error!("Failed to serialize cluster metrics"); } + } else if let Ok(metrics) = cluster_metrics { + // Metrics are empty, skip ingestion + info!("No cluster metrics available, skipping ingestion"); }src/handlers/http/modal/ingest_server.rs (1)
309-330: Validate credentials against all ingestor metadata, not just the first item.Current code checks only index 0; mismatch on any node should fail.
- if !ingestor_metadata.is_empty() { - let ingestor_metadata_value: Value = - serde_json::from_slice(&ingestor_metadata[0]).expect("ingestor.json is valid json"); - let check = ingestor_metadata_value - .as_object() - .and_then(|meta| meta.get("token")) - .and_then(|token| token.as_str()) - .unwrap(); - - let token = base64::prelude::BASE64_STANDARD.encode(format!( - "{}:{}", - PARSEABLE.options.username, PARSEABLE.options.password - )); - let token = format!("Basic {token}"); - - if check != token { - return Err(anyhow::anyhow!( - "Credentials do not match with other ingestors. Please check your credentials and try again." - )); - } - } + if !ingestor_metadata.is_empty() { + let expected = format!( + "Basic {}", + base64::prelude::BASE64_STANDARD.encode(format!( + "{}:{}", + PARSEABLE.options.username, PARSEABLE.options.password + )) + ); + let mismatch = ingestor_metadata.iter().any(|bytes| { + serde_json::from_slice::<Value>(bytes) + .ok() + .and_then(|v| v.get("token").and_then(|t| t.as_str())) + .map_or(true, |t| t != expected) + }); + if mismatch { + return Err(anyhow::anyhow!( + "Credentials do not match with other ingestors. Please check your credentials and try again." + )); + } + }src/handlers/http/modal/mod.rs (1)
365-382: Preserve error context when loading node metadata from metastore.Log the underlying error; current message drops detail.
- async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { - let obs = PARSEABLE.metastore.get_node_metadata(node_type).await; - - let mut metadata = vec![]; - if let Ok(obs) = obs { - for object in obs { - //convert to NodeMetadata - match serde_json::from_slice::<NodeMetadata>(&object) { - Ok(node_metadata) => metadata.push(node_metadata), - Err(e) => error!("Failed to deserialize NodeMetadata: {:?}", e), - } - } - } else { - error!("Couldn't read from storage"); - } + async fn load_from_storage(node_type: NodeType) -> Vec<NodeMetadata> { + let mut metadata = vec![]; + match PARSEABLE.metastore.get_node_metadata(node_type).await { + Ok(obs) => { + for object in obs { + match serde_json::from_slice::<NodeMetadata>(&object) { + Ok(node_metadata) => metadata.push(node_metadata), + Err(e) => error!("Failed to deserialize NodeMetadata: {:?}", e), + } + } + } + Err(e) => { + error!("Couldn't read node metadata from metastore: {e}"); + } + } // Return the metadata metadata }src/users/filters.rs (2)
49-62: Avoid unwrap()s in MetastoreObject impl; make invariants explicit.Use expect() with clear messages so violations are actionable.
impl MetastoreObject for Filter { fn get_object_path(&self) -> String { - filter_path( - self.user_id.as_ref().unwrap(), - &self.stream_name, - &format!("{}.json", self.filter_id.as_ref().unwrap()), - ) + let user_id = self + .user_id + .as_ref() + .expect("Filter.user_id must be set before persistence"); + let filter_id = self + .filter_id + .as_ref() + .expect("Filter.filter_id must be set before persistence"); + filter_path(user_id, &self.stream_name, &format!("{filter_id}.json")) .to_string() } fn get_object_id(&self) -> String { - self.filter_id.as_ref().unwrap().clone() + self.filter_id + .as_ref() + .expect("Filter.filter_id must be set before persistence") + .clone() } }
175-187: Add a brief doc comment for migrate_v1_v2.Clarify purpose and that it sets CURRENT_FILTER_VERSION.
+/// Migrates a v1 filter JSON payload to v2: +/// - hashes user_id +/// - sets `version` to CURRENT_FILTER_VERSION pub fn migrate_v1_v2(mut filter_meta: Value) -> Value {src/hottier.rs (4)
275-289: Correct the documentation to reference metastore instead of S3.The documentation still references fetching manifests from S3, but the implementation now retrieves them from the metastore.
- /// process the hot tier files for the stream - /// delete the files from the hot tier directory if the available date range is outside the hot tier range + /// Process the hot tier files for the stream + /// Fetches manifests from the metastore and manages hot tier storage
281-289: Consider validating the stream name before passing to metastore.The function directly passes the stream to the metastore without validation. Consider checking if the stream name is valid before making the metastore call.
let mut s3_manifest_file_list = PARSEABLE .metastore .get_all_manifest_files(&stream) .await .map_err(|e| { - HotTierError::ObjectStorageError(ObjectStorageError::MetastoreError(Box::new( - e.to_detail(), - ))) + HotTierError::ObjectStorageError(ObjectStorageError::MetastoreError(Box::new( + e.to_detail(), + ))) })?;
297-300: Update documentation to reflect metastore usage.The comment still mentions "collect all manifests from S3" while the implementation uses the metastore.
- /// process the hot tier files for the date for the stream - /// collect all manifests from metastore for the date, sort the parquet file list - /// in order to download the latest files first - /// download the parquet files if not present in hot tier directory + /// Process the hot tier files for the date for the stream + /// Collect all manifests from metastore for the date, sort the parquet file list + /// in order to download the latest files first + /// Download the parquet files if not present in hot tier directory
353-358: Update documentation to be consistent with metastore terminology.The comment still references "S3" while the implementation uses object storage.
- /// process the parquet file for the stream - /// check if the disk is available to download the parquet file - /// if not available, delete the oldest entry from the hot tier directory - /// download the parquet file from S3 to the hot tier directory - /// update the used and available size in the hot tier metadata - /// return true if the parquet file is processed successfully + /// Process the parquet file for the stream + /// Check if the disk is available to download the parquet file + /// If not available, delete the oldest entry from the hot tier directory + /// Download the parquet file from object storage to the hot tier directory + /// Update the used and available size in the hot tier metadata + /// Return true if the parquet file is processed successfullysrc/parseable/mod.rs (1)
107-118: Consider extracting metastore creation logic to reduce duplication.The metastore creation logic is duplicated across all storage options. Consider extracting this to a common function.
Create a helper function to create the metastore:
fn create_metastore_for_storage(storage: &dyn ObjectStorageProvider) -> Arc<dyn Metastore> { Arc::new(ObjectStoreMetastore { storage: storage.construct_client(), }) }Then use it in each branch:
- // for now create a metastore without using a CLI arg - let metastore = ObjectStoreMetastore { - storage: args.storage.construct_client(), - }; + let metastore = create_metastore_for_storage(&args.storage);src/migration/mod.rs (1)
272-282: Consider extracting the repeated pattern for stream JSON persistence.The pattern of converting to
ObjectStoreFormatand callingput_stream_jsonis repeated 6 times. Consider extracting this into a helper function.async fn persist_stream_json( stream_metadata_value: &Value, stream: &str, ) -> anyhow::Result<()> { let stream_json: ObjectStoreFormat = serde_json::from_value(stream_metadata_value.clone())?; PARSEABLE .metastore .put_stream_json(&stream_json, stream) .await?; Ok(()) }Then use it in each branch:
- let stream_json: ObjectStoreFormat = - serde_json::from_value(stream_metadata_value.clone())?; - PARSEABLE - .metastore - .put_stream_json(&stream_json, stream) - .await?; + persist_stream_json(&stream_metadata_value, stream).await?;Also applies to: 289-298, 306-312, 319-324, 328-334, 337-343
src/enterprise/utils.rs (2)
93-109: Optional: fetch manifests concurrently for latency reduction.
Use FuturesUnordered/try_collect to parallelize get_manifest() calls; throughput will improve on remote metastores.
123-142: Make path parsing robust; avoid panics on unexpected file_path shapes.
Hard-coded indices and unwrap() can panic. Prefer a parser (e.g., a shared helper) and return errors or skip invalid entries.Example outside this hunk:
// Prefer a shared helper, e.g., extract_datetime_from_parquet_path_regex(path) // Fallback sketch: fn parse_ts_from_path(p: &str) -> Option<chrono::DateTime<chrono::Utc>> { let parts = p.split('/').collect::<Vec<_>>(); let (date, hour, minute) = (*parts.get(1)?, *parts.get(2)?, *parts.get(3)?); let (y, m, d) = (&date[5..9], &date[10..12], &date[13..15]); let h = &hour[5..7]; let mm = &minute[7..9]; chrono::Utc.with_ymd_and_hms(y.parse().ok()?, m.parse().ok()?, d.parse().ok()?, h.parse().ok()?, mm.parse().ok()?, 0).single() }src/storage/s3.rs (1)
426-458: Remove long commented-out code or guard it with cfg.
Dead/commented blocks hinder maintainability.src/alerts/mod.rs (1)
233-261: Escape identifiers for stream names.
If a legacy v1 alert contains a double-quote in stream, it can break the generated SQL. Escape " by doubling.let esc_stream = stream.replace('"', "\"\""); format!("SELECT COUNT(*) as alert_value FROM \"{esc_stream}\"")src/storage/object_storage.rs (1)
982-999: Handle brand-new streams when committing schema.If no existing schema, merging will fail. Fall back to storing the provided schema when get_schema returns an error.
Apply:
-pub async fn commit_schema_to_storage( +pub async fn commit_schema_to_storage( stream_name: &str, schema: Schema, ) -> Result<(), ObjectStorageError> { - let stream_schema = PARSEABLE - .metastore - .get_schema(stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - - let new_schema = Schema::try_merge(vec![ - schema, - serde_json::from_slice::<Schema>(&stream_schema)?, - ]) - .map_err(|e| ObjectStorageError::Custom(e.to_string()))?; - - PARSEABLE - .metastore - .put_schema(new_schema, stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))) + match PARSEABLE.metastore.get_schema(stream_name).await { + Ok(existing_bytes) => { + let existing: Schema = serde_json::from_slice(&existing_bytes)?; + let merged = Schema::try_merge(vec![schema, existing]) + .map_err(|e| ObjectStorageError::Custom(format!("schema merge failed: {e}")))?; + PARSEABLE + .metastore + .put_schema(merged, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))) + } + Err(e) => { + // Treat absence as first write; otherwise, you may want to detect NotFound explicitly. + tracing::info!("No existing schema for stream {stream_name}; storing initial schema. {}", e); + PARSEABLE + .metastore + .put_schema(schema, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail()))) + } + } }src/catalog/mod.rs (1)
434-447: Avoid extra round-trip: have put_manifest return the manifest path.You write the manifest and then fetch its path. Expose the path from put_manifest (e.g., Result) so callers don’t need a subsequent get_manifest_path.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (45)
Cargo.toml(2 hunks)src/alerts/alert_structs.rs(2 hunks)src/alerts/alert_traits.rs(2 hunks)src/alerts/alert_types.rs(5 hunks)src/alerts/mod.rs(8 hunks)src/alerts/target.rs(4 hunks)src/catalog/manifest.rs(2 hunks)src/catalog/mod.rs(6 hunks)src/correlation.rs(8 hunks)src/enterprise/utils.rs(5 hunks)src/handlers/http/alerts.rs(3 hunks)src/handlers/http/cluster/mod.rs(4 hunks)src/handlers/http/ingest.rs(2 hunks)src/handlers/http/logstream.rs(7 hunks)src/handlers/http/mod.rs(2 hunks)src/handlers/http/modal/ingest_server.rs(2 hunks)src/handlers/http/modal/mod.rs(5 hunks)src/handlers/http/modal/query/querier_logstream.rs(2 hunks)src/handlers/http/modal/utils/rbac_utils.rs(1 hunks)src/handlers/http/oidc.rs(1 hunks)src/handlers/http/query.rs(2 hunks)src/handlers/http/role.rs(1 hunks)src/handlers/http/users/dashboards.rs(3 hunks)src/handlers/http/users/filters.rs(6 hunks)src/hottier.rs(3 hunks)src/lib.rs(1 hunks)src/metastore/metastore_traits.rs(1 hunks)src/metastore/metastores/mod.rs(1 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/metastore/mod.rs(1 hunks)src/migration/mod.rs(11 hunks)src/parseable/mod.rs(7 hunks)src/parseable/streams.rs(1 hunks)src/prism/home/mod.rs(5 hunks)src/query/mod.rs(3 hunks)src/query/stream_schema_provider.rs(5 hunks)src/storage/azure_blob.rs(6 hunks)src/storage/gcs.rs(6 hunks)src/storage/localfs.rs(5 hunks)src/storage/mod.rs(3 hunks)src/storage/object_storage.rs(16 hunks)src/storage/s3.rs(7 hunks)src/storage/store_metadata.rs(3 hunks)src/users/dashboards.rs(7 hunks)src/users/filters.rs(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (25)
- src/alerts/alert_traits.rs
- src/metastore/metastores/mod.rs
- Cargo.toml
- src/catalog/manifest.rs
- src/handlers/http/users/dashboards.rs
- src/prism/home/mod.rs
- src/handlers/http/oidc.rs
- src/handlers/http/query.rs
- src/alerts/alert_structs.rs
- src/parseable/streams.rs
- src/lib.rs
- src/metastore/metastore_traits.rs
- src/metastore/metastores/object_store_metastore.rs
- src/handlers/http/mod.rs
- src/storage/store_metadata.rs
- src/storage/gcs.rs
- src/handlers/http/role.rs
- src/handlers/http/ingest.rs
- src/alerts/alert_types.rs
- src/handlers/http/users/filters.rs
- src/metastore/mod.rs
- src/query/mod.rs
- src/handlers/http/alerts.rs
- src/correlation.rs
- src/handlers/http/modal/utils/rbac_utils.rs
🧰 Additional context used
🧠 Learnings (18)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
📚 Learning: 2025-09-06T04:26:17.191Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/storage/mod.rssrc/handlers/http/logstream.rssrc/parseable/mod.rssrc/enterprise/utils.rssrc/storage/object_storage.rssrc/alerts/target.rssrc/alerts/mod.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/mod.rssrc/migration/mod.rssrc/handlers/http/modal/query/querier_logstream.rssrc/handlers/http/cluster/mod.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/handlers/http/logstream.rssrc/parseable/mod.rssrc/migration/mod.rssrc/handlers/http/modal/query/querier_logstream.rssrc/query/stream_schema_provider.rssrc/storage/object_storage.rssrc/storage/azure_blob.rs
📚 Learning: 2025-03-28T06:17:01.201Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
Applied to files:
src/handlers/http/logstream.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/s3.rssrc/storage/localfs.rssrc/storage/azure_blob.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/parseable/mod.rssrc/users/filters.rssrc/migration/mod.rssrc/query/stream_schema_provider.rssrc/storage/object_storage.rssrc/handlers/http/cluster/mod.rssrc/catalog/mod.rs
📚 Learning: 2025-09-05T09:27:12.659Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/users/filters.rs:116-121
Timestamp: 2025-09-05T09:27:12.659Z
Learning: The Filters::load() function in src/users/filters.rs is only called once at server initialization, so there's no risk of duplicate entries from repeated invocations.
Applied to files:
src/users/filters.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/handlers/http/modal/query/querier_logstream.rs
📚 Learning: 2025-08-18T17:59:31.642Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/handlers/http/modal/utils/ingest_utils.rs:149-156
Timestamp: 2025-08-18T17:59:31.642Z
Learning: The time_partition parameter in push_logs() function in src/handlers/http/modal/utils/ingest_utils.rs is determined by the caller (flatten_and_push_logs). OSS callers pass None, enterprise callers pass the appropriate value (None or Some<>), and OTEL callers always pass None. The push_logs() function should not add additional time partition logic since it's already handled at the caller level.
Applied to files:
src/handlers/http/modal/query/querier_logstream.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/enterprise/utils.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/enterprise/utils.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-09-11T06:32:03.705Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/localfs.rs:142-153
Timestamp: 2025-09-11T06:32:03.705Z
Learning: In LocalFS implementation, the root path is typically either "/" for Unix systems or a drive letter path for Windows systems. The user mentioned that substring matching against sliced root paths works in their context, but this approach can still have security vulnerabilities with path traversal and cross-platform issues.
Applied to files:
src/storage/localfs.rs
📚 Learning: 2025-06-16T09:50:38.636Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-05-01T10:27:56.858Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Applied to files:
src/users/dashboards.rs
📚 Learning: 2025-07-24T11:09:21.781Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
Applied to files:
src/alerts/mod.rs
📚 Learning: 2025-04-07T13:23:10.092Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1288
File: src/handlers/http/modal/mod.rs:279-301
Timestamp: 2025-04-07T13:23:10.092Z
Learning: For critical operations like writing metadata to disk in NodeMetadata::put_on_disk(), it's preferred to let exceptions propagate (using expect/unwrap) rather than trying to recover with fallback mechanisms, as the failure indicates a fundamental system issue that needs immediate attention.
Applied to files:
src/alerts/mod.rs
🧬 Code graph analysis (20)
src/storage/mod.rs (4)
src/alerts/alert_types.rs (2)
get_object_path(71-73)get_object_id(75-77)src/metastore/metastore_traits.rs (2)
get_object_path(163-163)get_object_id(164-164)src/catalog/manifest.rs (2)
get_object_path(94-96)get_object_id(98-100)src/storage/store_metadata.rs (2)
get_object_path(109-111)get_object_id(113-115)
src/handlers/http/logstream.rs (3)
src/parseable/mod.rs (1)
storage(282-284)src/handlers/http/ingest.rs (1)
error_response(518-529)src/correlation.rs (1)
error_response(352-363)
src/hottier.rs (2)
src/parseable/mod.rs (2)
storage(282-284)new(178-192)src/catalog/manifest.rs (1)
default(71-76)
src/storage/s3.rs (7)
src/storage/object_storage.rs (2)
parseable_json_path(1054-1056)list_with_delimiter(272-275)src/storage/mod.rs (1)
to_object_store_path(293-295)src/storage/localfs.rs (1)
list_with_delimiter(474-484)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)src/correlation.rs (1)
path(227-234)
src/parseable/mod.rs (1)
src/storage/object_storage.rs (3)
new(77-86)create_stream_from_ingestor(533-593)create_schema_from_metastore(596-609)
src/users/filters.rs (2)
src/metastore/metastore_traits.rs (2)
get_object_path(163-163)get_object_id(164-164)src/storage/object_storage.rs (1)
filter_path(1042-1050)
src/migration/mod.rs (2)
src/parseable/mod.rs (1)
storage(282-284)src/storage/store_metadata.rs (1)
put_remote_metadata(292-298)
src/handlers/http/modal/query/querier_logstream.rs (1)
src/parseable/mod.rs (1)
storage(282-284)
src/handlers/http/modal/mod.rs (3)
src/alerts/alert_types.rs (2)
get_object_path(71-73)get_object_id(75-77)src/metastore/metastore_traits.rs (2)
get_object_path(163-163)get_object_id(164-164)src/parseable/mod.rs (1)
new(178-192)
src/enterprise/utils.rs (1)
src/parseable/mod.rs (2)
storage(282-284)new(178-192)
src/query/stream_schema_provider.rs (1)
src/parseable/mod.rs (4)
storage(282-284)new(178-192)serde_json(338-338)serde_json(344-344)
src/storage/object_storage.rs (7)
src/handlers/http/mod.rs (1)
fetch_schema(90-95)src/storage/localfs.rs (3)
list_with_delimiter(474-484)new(100-102)from(583-585)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/s3.rs (4)
list_with_delimiter(853-858)s(179-179)from(862-870)from(874-876)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)src/catalog/mod.rs (1)
update_snapshot(108-129)
src/handlers/http/modal/ingest_server.rs (1)
src/parseable/mod.rs (2)
storage(282-284)new(178-192)
src/storage/localfs.rs (4)
src/storage/object_storage.rs (1)
list_with_delimiter(272-275)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/s3.rs (1)
list_with_delimiter(853-858)
src/handlers/http/cluster/mod.rs (2)
src/parseable/mod.rs (1)
storage(282-284)src/handlers/http/modal/mod.rs (4)
node_type(569-569)node_type(582-584)domain_name(567-567)domain_name(574-576)
src/catalog/mod.rs (1)
src/parseable/mod.rs (3)
serde_json(338-338)serde_json(344-344)new(178-192)
src/alerts/target.rs (4)
src/correlation.rs (4)
update(134-164)update(236-243)get_object_path(217-219)get_object_id(221-223)src/metastore/metastore_traits.rs (2)
get_object_path(163-163)get_object_id(164-164)src/alerts/alert_structs.rs (2)
get_object_path(538-540)get_object_id(534-536)src/storage/object_storage.rs (1)
target_json_path(1066-1072)
src/users/dashboards.rs (4)
src/correlation.rs (2)
get_object_path(217-219)get_object_id(221-223)src/metastore/metastore_traits.rs (2)
get_object_path(163-163)get_object_id(164-164)src/alerts/alert_structs.rs (2)
get_object_path(538-540)get_object_id(534-536)src/users/filters.rs (2)
get_object_path(50-57)get_object_id(59-61)
src/alerts/mod.rs (3)
src/parseable/mod.rs (1)
storage(282-284)src/storage/localfs.rs (1)
from(583-585)src/storage/s3.rs (2)
from(862-870)from(874-876)
src/storage/azure_blob.rs (6)
src/storage/object_storage.rs (2)
parseable_json_path(1054-1056)list_with_delimiter(272-275)src/storage/mod.rs (1)
to_object_store_path(293-295)src/storage/localfs.rs (1)
list_with_delimiter(474-484)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/s3.rs (1)
list_with_delimiter(853-858)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
262f4a2 to
f61e969
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/users/dashboards.rs (2)
234-247: Don’t hold RwLock across await in create(); move put_dashboard outside locks.Awaiting while holding a write lock can deadlock and blocks concurrent readers/writers.
- let mut dashboards = self.0.write().await; - - let has_duplicate = dashboards - .iter() - .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); - - if has_duplicate { - return Err(DashboardError::Metadata("Dashboard title must be unique")); - } - - self.save_dashboard(dashboard).await?; - - dashboards.push(dashboard.clone()); + // preflight duplicate check (read lock) + { + let dashboards = self.0.read().await; + let has_duplicate = dashboards + .iter() + .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); + if has_duplicate { + return Err(DashboardError::Metadata("Dashboard title must be unique")); + } + } + // persist outside of any lock + self.save_dashboard(dashboard).await?; + // update in-memory + let mut dashboards = self.0.write().await; + dashboards.push(dashboard.clone());
260-287: Don’t hold RwLock across await in update(); read → persist → write.Refactor to avoid awaiting under the lock and keep the in-memory update isolated.
- let mut dashboards = self.0.write().await; - - let existing_dashboard = dashboards - .iter() - .find(|d| d.dashboard_id == Some(dashboard_id) && d.author == Some(user_id.to_string())) - .cloned() - .ok_or_else(|| { - DashboardError::Metadata( - "Dashboard does not exist or you do not have permission to access it", - ) - })?; + // lookup under read lock + let existing_dashboard = { + let dashboards = self.0.read().await; + dashboards + .iter() + .find(|d| d.dashboard_id == Some(dashboard_id) && d.author == Some(user_id.to_string())) + .cloned() + } + .ok_or_else(|| { + DashboardError::Metadata( + "Dashboard does not exist or you do not have permission to access it", + ) + })?; @@ - let has_duplicate = dashboards - .iter() - .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); - - if has_duplicate { - return Err(DashboardError::Metadata("Dashboard title must be unique")); - } - - self.save_dashboard(dashboard).await?; - - dashboards.retain(|d| d.dashboard_id != Some(dashboard_id)); - dashboards.push(dashboard.clone()); + // preflight duplicate check under read lock + { + let dashboards = self.0.read().await; + let has_duplicate = dashboards + .iter() + .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); + if has_duplicate { + return Err(DashboardError::Metadata("Dashboard title must be unique")); + } + } + // persist outside of locks + self.save_dashboard(dashboard).await?; + // update in-memory + let mut dashboards = self.0.write().await; + dashboards.retain(|d| d.dashboard_id != Some(dashboard_id)); + dashboards.push(dashboard.clone());src/handlers/http/logstream.rs (2)
57-76: Deleting a stream must also purge metastore artifacts (or risk stale UI/ACLs).Currently only object storage and local staging are cleaned. If a non-object-store metastore is configured, the stream, schema, alerts, targets, etc. will linger. Add metastore deletions (stream JSON + schema at minimum), or extend the Metastore trait with a coarse
delete_stream(stream_name).Example (adjust to actual trait methods):
// Delete from storage objectstore.delete_stream(&stream_name).await?; + // Delete from metastore + PARSEABLE + .metastore + .delete_stream(&stream_name) + .await?;
400-413: Set in-memory flag after successful write; fix available_size math.
- Set
stream.set_hot_tier(true)only afterput_hot_tiersucceeds.available_sizeshould besize - used_size, notsize.Apply:
- stream.set_hot_tier(true); @@ - hottier.used_size = existing_hot_tier_used_size; - hottier.available_size = hottier.size; + hottier.used_size = existing_hot_tier_used_size; + hottier.available_size = hottier + .size + .saturating_sub(existing_hot_tier_used_size); @@ - hot_tier_manager - .put_hot_tier(&stream_name, &mut hottier) - .await?; + hot_tier_manager + .put_hot_tier(&stream_name, &mut hottier) + .await?; + // flip in-memory flag only after success + stream.set_hot_tier(true);src/storage/object_storage.rs (1)
539-589: BUG: Error from metastore is silently dropped; incorrect Result handling.
await.into_iter().next()turns Err into None, masking failures. Properly match and propagate metastore errors.- if let Some(stream_metadata_obs) = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) - .await - .into_iter() - .next() - && !stream_metadata_obs.is_empty() - { + let stream_metadata_obs = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, Some(Mode::Ingest)) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + if !stream_metadata_obs.is_empty() { for stream_metadata_bytes in stream_metadata_obs.iter() { let stream_ob_metadata = serde_json::from_slice::<ObjectStoreFormat>(stream_metadata_bytes)?; all_log_sources.extend(stream_ob_metadata.log_source.clone()); } ... let stream_metadata_bytes: Bytes = serde_json::to_vec(&stream_metadata)?.into(); - PARSEABLE - .metastore - .put_stream_json(&stream_metadata, stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + PARSEABLE + .metastore + .put_stream_json(&stream_metadata, stream_name) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; return Ok(stream_metadata_bytes); }
♻️ Duplicate comments (9)
src/users/dashboards.rs (3)
299-305: Good: Reusing validated dashboard for deletion avoids TOCTOU and unwrap().This addresses the prior suggestion by deleting the already ownership-validated object.
70-84: Avoid unwrap() in MetastoreObject impl; enforce invariants or filter invalid entries.get_object_path/get_object_id unwrap author/dashboard_id; malformed/legacy entries can panic at runtime. Fix by filtering in load() so only valid dashboards reach memory (see suggested diff below on Lines 182–203). Optionally add a debug_assert to document the invariant.
182-203: Filter malformed dashboards during load() to prevent unwrap panics later.Skip entries missing author or dashboard_id. This makes MetastoreObject unwraps safe.
@@ - if let Ok(dashboard) = serde_json::from_value::<Dashboard>(dashboard_value.clone()) { - this.retain(|d: &Dashboard| d.dashboard_id != dashboard.dashboard_id); - this.push(dashboard); - } else { - tracing::warn!("Failed to deserialize dashboard: {:?}", dashboard_value); - } + if let Ok(dashboard) = serde_json::from_value::<Dashboard>(dashboard_value.clone()) { + if dashboard.author.is_none() || dashboard.dashboard_id.is_none() { + tracing::warn!("Skipping dashboard missing author/id: {:?}", dashboard_value); + continue; + } + this.retain(|d: &Dashboard| d.dashboard_id != dashboard.dashboard_id); + this.push(dashboard); + } else { + tracing::warn!("Failed to deserialize dashboard: {:?}", dashboard_value); + }src/handlers/http/logstream.rs (1)
476-487: Also clear in-memory hot-tier flag on delete.You persist
hot_tier_enabled = false, but the in-memory stream remains enabled. Please also callset_hot_tier(false).hot_tier_manager.delete_hot_tier(&stream_name).await?; + PARSEABLE.get_stream(&stream_name)?.set_hot_tier(false);src/storage/object_storage.rs (3)
616-627: Don’t ignore get_all_stream_jsons errors; propagate metastore failures.Same concern previously flagged.
- let stream_metas = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(stream_metas) = stream_metas { - for stream_meta in stream_metas.iter() { - if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { - // fetch unique log sources and their fields - all_log_sources.extend(stream_meta.log_source.clone()); - } - } - } + let stream_metas = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for stream_meta in stream_metas.iter() { + if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { + all_log_sources.extend(stream_meta.log_source.clone()); + } + }
982-999: Handle brand-new streams: schema merge should tolerate missing existing schema.If
get_schema(stream_name)returns NotFound, write the provided schema directly instead of erroring.- let stream_schema = PARSEABLE - .metastore - .get_schema(stream_name) - .await - .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; - - let new_schema = Schema::try_merge(vec![ - schema, - serde_json::from_slice::<Schema>(&stream_schema)?, - ]) - .map_err(|e| ObjectStorageError::Custom(e.to_string()))?; + let new_schema = match PARSEABLE.metastore.get_schema(stream_name).await { + Ok(existing_schema_bytes) => { + let existing: Schema = serde_json::from_slice(&existing_schema_bytes)?; + Schema::try_merge(vec![schema, existing]) + .map_err(|e| ObjectStorageError::Custom(format!("schema merge failed: {e}")))? + } + Err(e) if e.status_code() == http::StatusCode::NOT_FOUND => schema, + Err(e) => return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), + };
272-276: LocalFS: implement list_with_delimiter or guard callers.LocalFS currently returns Unsupported (src/storage/localfs.rs:474); multiple callers invoke storage.list_with_delimiter (notably src/metastore/metastores/object_store_metastore.rs:746–767 and src/storage/metrics_layer.rs:230–233), so running with LocalFS will produce runtime errors.
- Fix: implement list_with_delimiter for LocalFS (delegate to object_store’s LocalFileSystem like S3/GCS/Azure do) or add caller-side guards/fallbacks.
src/storage/localfs.rs (2)
142-165: Path construction is unsafe and platform-fragile; always anchor to root.Substring checks on
self.rootandto_path("/")allow traversal and break on Windows. Resolve within root consistently.- let file_path; - #[cfg(windows)] - { - file_path = path.to_path(""); - } - #[cfg(not(windows))] - { - let root_str = self.root.to_str().unwrap(); - file_path = if path.to_string().contains(&root_str[1..]) && root_str.len() > 1 { - path.to_path("/") - } else { - self.path_in_root(path) - }; - } + let file_path = self.path_in_root(path);If older manifests depend on absolute paths, handle that upstream by storing relative paths, or add a guarded, explicit absolute-path branch that rejects
..components and verifies containment withinself.root.
474-484: Implement list_with_delimiter for LocalFS (required)Forward the call to object_store::local::LocalFileSystem instead of returning Unsupported — multiple callers (metastore, metrics_layer, etc.) rely on this and will break otherwise.
async fn list_with_delimiter( &self, _prefix: Option<object_store::path::Path>, ) -> Result<ListResult, ObjectStorageError> { - Err(ObjectStorageError::UnhandledError(Box::new( - std::io::Error::new( - std::io::ErrorKind::Unsupported, - "list_with_delimiter is not implemented for LocalFS", - ), - ))) + let store = object_store::local::LocalFileSystem::new_with_prefix(self.root.clone()) + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?; + Ok(store.list_with_delimiter(_prefix.as_ref()).await?) }
🧹 Nitpick comments (3)
src/users/dashboards.rs (1)
205-207: Prefer replace over append to avoid duplicates if load() runs multiple times.Assigning replaces the in-memory snapshot atomically.
- let mut s = self.0.write().await; - s.append(&mut this); + let mut s = self.0.write().await; + *s = this;src/storage/object_storage.rs (2)
456-460: Don’t panic on bad JSON; minor refactor.
expect("parseable config is valid json")will panic. Prefer error propagation to preserve metastore error semantics.- let mut config = serde_json::from_slice::<ObjectStoreFormat>(&bytes) - .expect("parseable config is valid json"); + let mut config: ObjectStoreFormat = serde_json::from_slice(&bytes)?; @@ - Ok(serde_json::from_slice(&stream_metadata).expect("parseable config is valid json")) + Ok(serde_json::from_slice(&stream_metadata)?)Also applies to: 465-469, 478-483
510-526: Treat NotFound differently; propagate other metastore errors.Swallowing all errors by
if let Ok(...)hides real failures. Consider matching on the error and only ignoring 404.- if let Ok(querier_stream_json_bytes) = - PARSEABLE.metastore.get_stream_json(stream_name, true).await - { + match PARSEABLE.metastore.get_stream_json(stream_name, true).await { + Ok(querier_stream_json_bytes) => { ... - return Ok(stream_metadata_bytes); - } + return Ok(stream_metadata_bytes); + } + Err(e) if e.status_code() == http::StatusCode::NOT_FOUND => {} + Err(e) => return Err(ObjectStorageError::MetastoreError(Box::new(e.to_detail()))), + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/handlers/http/logstream.rs(7 hunks)src/storage/localfs.rs(5 hunks)src/storage/object_storage.rs(16 hunks)src/users/dashboards.rs(7 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/handlers/http/logstream.rssrc/storage/object_storage.rs
📚 Learning: 2025-03-28T06:17:01.201Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
Applied to files:
src/handlers/http/logstream.rs
📚 Learning: 2025-09-06T04:26:17.191Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/handlers/http/logstream.rssrc/storage/object_storage.rs
📚 Learning: 2025-09-11T06:32:03.705Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/localfs.rs:142-153
Timestamp: 2025-09-11T06:32:03.705Z
Learning: In LocalFS implementation, the root path is typically either "/" for Unix systems or a drive letter path for Windows systems. The user mentioned that substring matching against sliced root paths works in their context, but this approach can still have security vulnerabilities with path traversal and cross-platform issues.
Applied to files:
src/storage/localfs.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/localfs.rs
📚 Learning: 2025-05-01T10:27:56.858Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Applied to files:
src/users/dashboards.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
🧬 Code graph analysis (4)
src/handlers/http/logstream.rs (2)
src/parseable/mod.rs (3)
storage(282-284)serde_json(338-338)serde_json(344-344)src/handlers/http/ingest.rs (1)
error_response(518-529)
src/storage/localfs.rs (5)
src/storage/object_storage.rs (1)
list_with_delimiter(272-275)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/s3.rs (1)
list_with_delimiter(853-858)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)
src/users/dashboards.rs (2)
src/metastore/metastore_traits.rs (2)
get_object_path(163-163)get_object_id(164-164)src/users/filters.rs (2)
get_object_path(50-57)get_object_id(59-61)
src/storage/object_storage.rs (7)
src/handlers/http/mod.rs (1)
fetch_schema(90-95)src/storage/localfs.rs (3)
list_with_delimiter(474-484)new(100-102)from(583-585)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/s3.rs (4)
list_with_delimiter(853-858)s(179-179)from(862-870)from(874-876)src/parseable/mod.rs (6)
create_stream(647-728)new(178-192)serde_json(338-338)serde_json(344-344)custom_partition(789-789)custom_partition(961-961)src/catalog/mod.rs (1)
update_snapshot(108-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (22)
src/users/dashboards.rs (5)
21-21: LGTM: RelativePathBuf import is appropriate for metastore paths.
28-31: LGTM: Clean wiring of DASHBOARDS_DIR/USERS_ROOT_DIR, MetastoreObject, and PARSEABLE.
215-221: LGTM: save_dashboard abstraction matches metastore responsibility.
307-313: LGTM: In-memory removal after metastore delete keeps state consistent.
218-218: No change needed — MetastoreErrorDetail is preserved.DashboardError holds MetastoreError and error_response already calls e.to_detail(); elsewhere MetastoreError is converted with ObjectStorageError::MetastoreError(Box::new(e.to_detail())). No changes required.
src/handlers/http/logstream.rs (5)
31-31: LGTM: new imports align with metastore flow.
91-94: LGTM: list() correctly pivots to metastore and propagates errors.
414-425: LGTM: persist hot-tier flag via metastore JSON.
585-587: LGTM: add MetastoreError to StreamError.
623-637: LGTM: JSON error body for metastore failures matches ingest handler behavior.src/storage/object_storage.rs (12)
25-25: LGTM: imports for ListResult, Snapshot, users paths, and stream metadata constants.Also applies to: 46-46, 53-53, 67-67
283-289: LGTM: commit schema then stream JSON through metastore with proper error mapping.Also applies to: 290-295
304-317: LGTM: time_partition_limit update via metastore JSON.
326-339: LGTM: custom_partition update path is consistent.
348-361: LGTM: log_source update via metastore JSON.
392-405: LGTM: first_event_at updater; maps metastore errors to storage errors.
414-429: LGTM: put_stats persists through metastore JSON.
436-450: LGTM: put_retention writes to metastore JSON and returns early with Ok(..).
596-608: LGTM: create_schema_from_metastore aligns with new fetch/put flow.
667-713: LGTM: directory-walk for first/latest timestamp avoids manifest I/O.Nice improvement; clear min/max selection and RFC3339 formatting.
Also applies to: 714-789
918-920: LGTM: manifest updates delegated to catalog::update_snapshot.
958-958: LGTM: reuse cached object store via provider().get_object_store().
f61e969 to
293e2ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/storage/object_storage.rs (1)
616-627: Propagate metastore errors instead of silently ignoring them.The code currently ignores errors from
get_all_stream_jsons, which could mask real issues. This should propagate the error properly.let mut all_log_sources: Vec<LogSourceEntry> = Vec::new(); - let stream_metas = PARSEABLE - .metastore - .get_all_stream_jsons(stream_name, None) - .await; - if let Ok(stream_metas) = stream_metas { - for stream_meta in stream_metas.iter() { - if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { - // fetch unique log sources and their fields - all_log_sources.extend(stream_meta.log_source.clone()); - } + let stream_metas = PARSEABLE + .metastore + .get_all_stream_jsons(stream_name, None) + .await + .map_err(|e| ObjectStorageError::MetastoreError(Box::new(e.to_detail())))?; + for stream_meta in stream_metas.iter() { + if let Ok(stream_meta) = serde_json::from_slice::<ObjectStoreFormat>(stream_meta) { + // fetch unique log sources and their fields + all_log_sources.extend(stream_meta.log_source.clone()); } }
🧹 Nitpick comments (3)
src/storage/s3.rs (3)
19-20: Remove unused importHashSet.The import of
HashSeton line 20 is no longer used after the removal of_list_streamsand related functionality.use std::{ - collections::HashSet, fmt::Display,
426-458: Consider removing or implementing commented manifest listing code.The
_list_manifest_filesfunction is commented out but left in the codebase. If this functionality is no longer needed with the metastore abstraction, it should be removed entirely. If it might be needed in the future, consider adding a TODO comment explaining why it's retained.Remove the entire commented block if the functionality is deprecated, or add a clear TODO comment if it's planned for future implementation:
- // async fn _list_manifest_files( - // &self, - // stream: &str, - // ) -> Result<BTreeMap<String, Vec<String>>, ObjectStorageError> { - // ... - // } + // TODO: Implement manifest listing via metastore if needed + // Previously implemented direct object storage listing
789-796: Remove commented-out code for list_manifest_files.The public interface method
list_manifest_filesis also commented out. Since this appears to be deprecated functionality replaced by the metastore abstraction, it should be removed entirely.- // async fn list_manifest_files( - // &self, - // stream_name: &str, - // ) -> Result<BTreeMap<String, Vec<String>>, ObjectStorageError> { - // let files = self._list_manifest_files(stream_name).await?; - - // Ok(files) - // }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/handlers/http/logstream.rs(7 hunks)src/storage/localfs.rs(5 hunks)src/storage/object_storage.rs(16 hunks)src/storage/s3.rs(8 hunks)src/users/dashboards.rs(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/handlers/http/logstream.rs
- src/storage/localfs.rs
- src/users/dashboards.rs
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/s3.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-09-06T04:26:17.191Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/storage/object_storage.rs
🧬 Code graph analysis (2)
src/storage/s3.rs (7)
src/storage/object_storage.rs (2)
parseable_json_path(1054-1056)list_with_delimiter(272-275)src/storage/mod.rs (1)
to_object_store_path(293-295)src/metastore/metastores/object_store_metastore.rs (2)
resp(454-460)resp(748-759)src/storage/azure_blob.rs (3)
resp(713-718)resp(728-733)list_with_delimiter(736-741)src/storage/gcs.rs (3)
resp(634-639)resp(649-654)list_with_delimiter(657-662)src/storage/localfs.rs (1)
list_with_delimiter(474-484)src/storage/metrics_layer.rs (1)
list_with_delimiter(230-238)
src/storage/object_storage.rs (7)
src/handlers/http/mod.rs (1)
fetch_schema(90-95)src/storage/localfs.rs (3)
list_with_delimiter(474-484)new(100-102)from(583-585)src/storage/s3.rs (4)
list_with_delimiter(853-858)s(179-179)from(862-870)from(874-876)src/storage/azure_blob.rs (1)
list_with_delimiter(736-741)src/storage/gcs.rs (1)
list_with_delimiter(657-662)src/parseable/mod.rs (6)
create_stream(647-728)new(178-192)serde_json(338-338)serde_json(344-344)custom_partition(789-789)custom_partition(961-961)src/catalog/mod.rs (1)
update_snapshot(108-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (10)
src/storage/s3.rs (5)
39-44: LGTM! Clean implementation of list_with_delimiter.The addition of
ListResultimport and the newlist_with_delimiterimplementation properly delegates to the underlying client with consistent error handling.
56-58: Good consolidation of imports.The cleanup of removed imports (
USERS_ROOT_DIR,BTreeMap) and the removal of public exports (SCHEMA_FILE_NAME,STREAM_ROOT_DIRECTORY) aligns well with the metastore abstraction changes.
349-349: Fix error propagation with?operator.Good fix replacing
unwrap()with?to properly propagate errors instead of panicking.
692-696: LGTM! Proper error handling for list_streams.Returning an explicit error message instead of implementing undefined behavior is a good defensive programming practice. This clearly indicates that S3 doesn't support the legacy list_streams method.
853-858: LGTM! Clean implementation of list_with_delimiter.The implementation properly delegates to the underlying client and correctly handles the optional prefix by using
as_ref().src/storage/object_storage.rs (5)
272-275: LGTM! Well-designed trait method addition.The
list_with_delimitermethod is properly defined as an async trait method with appropriate error handling. All storage backends have corresponding implementations.
283-288: LGTM! Clean schema ownership handling.The code properly clones the inner Schema from the Arc reference and passes it by value to the metastore, avoiding any reference clone issues.
988-992: LGTM! Proper error handling for Schema::try_merge.Excellent fix replacing
unwrap()with proper error propagation usingmap_err. This prevents potential panics during schema merging.
919-919: Good removal of unnecessary await on a non-async function.The removal of
awaitis correct sincecatalog::update_snapshotis not an async function based on the relevant code snippets.
958-958: LGTM! Consistent usage of storage abstraction.The change to use
PARSEABLE.storage().get_object_store()is consistent with the new metastore architecture pattern.
This PR introduces Metastore to Parseable. Anything that is not parquet can be considered as metadata. Up till now, all metadata was stored as objects in object-storage. The new traits
MetastoreandMetastoreObjectallow users to implement their own metastores. By default, Parseable uses Object store as metastore.Description
This PR has:
Summary by CodeRabbit
New Features
Refactor
User-visible behavior
Chores