Skip to content

Conversation

@parmesant
Copy link
Contributor

@parmesant parmesant commented Nov 25, 2025

Remove the cluster scheduler task as it's not required anymore

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed a typo in an analytics log message.
  • Refactor

    • Removed automatic periodic ingestion of cluster and billing metrics.
    • Made cluster billing metrics retrieval accessible to other parts of the system.
    • Reduced informational logging to lower noise.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Walkthrough

This PR fixes a typo in an analytics log message and removes the periodic cluster metrics scheduler and its wiring, making the cluster billing metrics fetch function public.

Changes

Cohort / File(s) Summary
Analytics typo
src/analytics.rs
Corrected log message: "schedular" → "scheduler" in init_analytics_scheduler; removed an extra blank line in build_metrics (formatting only).
Cluster metrics refactor
src/handlers/http/cluster/mod.rs
Removed scheduler imports (AsyncScheduler, Interval), CLUSTER_METRICS_INTERVAL_SECONDS, and init_cluster_metrics_schedular; dropped ingest_internal_stream usage; made fetch_cluster_billing_metrics public (pub async fn); reduced logging imports by removing info!.

Sequence Diagram(s)

Previous flow (with scheduler)

sequenceDiagram
  participant Scheduler as Scheduler (AsyncScheduler)
  participant Cluster as ClusterModule
  participant Fetch as fetch_cluster_billing_metrics()
  participant Ingest as ingest_internal_stream()

  Note over Scheduler,Cluster: periodic job
  Scheduler->>Cluster: trigger job
  Cluster->>Fetch: call fetch_cluster_billing_metrics()
  Fetch-->>Cluster: return metrics
  Cluster->>Ingest: ingest metrics
  Ingest-->>Cluster: ack
Loading

New flow (scheduler removed, fetch public)

sequenceDiagram
  participant Caller as External Caller / Task
  participant Fetch as pub fetch_cluster_billing_metrics()
  participant Cluster as ClusterModule

  Note over Caller,Fetch: on-demand or caller-managed scheduling
  Caller->>Fetch: call fetch_cluster_billing_metrics()
  Fetch-->>Caller: return metrics
  Note over Caller,Cluster: caller decides ingestion/wiring
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas to review:
    • Call sites that previously relied on the internal scheduler — ensure they now invoke fetch_cluster_billing_metrics or an alternative scheduling mechanism.
    • Confirm removal of ingest_internal_stream references doesn't leave unreachable code or missing imports elsewhere.
    • Validate visibility change of fetch_cluster_billing_metrics does not expose unintended internals.

Possibly related PRs

Poem

🐰 I hopped through code at break of day,
Found a typo hopping in my way.
Scheduler boxed and set aside,
Metrics fetch now steps outside.
A tidy burrow, clean and bright — hooray! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is largely incomplete, providing only a single-line intent without detailed goals, rationale, key changes, or completion of required checklist items. Expand the description to include the rationale for removal, list key code changes (e.g., removed imports, functions, and logging), and check the testing/documentation checklist items or explain why they're not applicable.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Updates for parseable cluster scheduler' is overly vague and generic, using non-descriptive phrasing that doesn't clearly convey the main change of removing the cluster scheduler. Use a more specific title like 'Remove cluster scheduler task' or 'Remove cluster metrics scheduler' to clearly indicate the primary change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f706208 and 159cf7d.

📒 Files selected for processing (2)
  • src/analytics.rs (1 hunks)
  • src/handlers/http/cluster/mod.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/analytics.rs
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-20T17:48:53.444Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/handlers/http/cluster/mod.rs:1370-1400
Timestamp: 2025-10-20T17:48:53.444Z
Learning: In src/handlers/http/cluster/mod.rs, the billing metrics processing logic should NOT accumulate counter values from multiple Prometheus samples with the same labels. The intended behavior is to convert each received counter from nodes into individual events for ingestion, using `.insert()` to store the counter value directly.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-20T06:28:46.051Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1409
File: src/catalog/mod.rs:472-492
Timestamp: 2025-08-20T06:28:46.051Z
Learning: In the Parseable codebase, the `domain_name` field in NodeMetadata/IngestorMetadata always includes a trailing slash (e.g., "http://host:port/"). This is ensured by the `to_url_string` function in src/handlers/http/cluster/utils.rs which formats URLs as `format!("http://{str}/")`. Therefore, when concatenating with `base_path_without_preceding_slash()` which returns paths like "api/v1", the pattern `"{}{}"` is correct and produces valid URLs like "http://host:port/api/v1".

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-18T09:59:20.177Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.

Applied to files:

  • src/handlers/http/cluster/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (2)
src/handlers/http/cluster/mod.rs (2)

38-38: LGTM - Cleaned up unused import.

The removal of info! from the tracing imports is consistent with the removal of the cluster scheduler functionality.


1485-1485: Repository access unavailable for verification

I'm unable to execute shell scripts to verify the callers and access controls due to repository cloning failures. The original review comment appropriately requests verification of:

  1. New caller/HTTP endpoint invoking this function
  2. Authentication and authorization controls if exposed
  3. Rate limiting for concurrent network calls to cluster nodes

Given that fetch_cluster_billing_metrics performs expensive operations (concurrent network calls to all cluster nodes) and has been made public, these verification points remain valid. However, I cannot confirm or refute whether these concerns have been addressed in the PR without access to the repository.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 25, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 25, 2025
@parmesant
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@nikhilsinhaparseable nikhilsinhaparseable merged commit 53fe248 into parseablehq:main Nov 25, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants