-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(libstore): make cycle errors more verbose #14218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-Authored-By: Milan Hauth <[email protected]>
Previously, `CycleEdgeScanSink::operator()` copied the entire `getResult()` `StringSet` twice on every 64KB chunk to detect newly found hashes. For large files, this created O(n * chunks) overhead. Now we track which hashes have been recorded for the current file using `recordedForCurrentFile`, avoiding the set copies. The insert() returns true only for newly seen hashes, making this O(1) per hash found.
The `hashPathMap` was being passed to `CycleEdgeScanSink` and stored as a member variable, but was never actually used. The sink only needs the hash strings for detection via `RefScanSink`, not the full `StorePath` mapping.
Moved the `edges` member variable from public to private section for better encapsulation. Access is provided through the `getEdges()` method.
Replaced raw `read()` with `readFull()` helper, which properly handles partial reads and `EINTR`. The previous code manually checked for errors but didn't handle the case where `read()` returns fewer bytes than requested.
After refactoring to use `RefScanSink`, we no longer manually search for hashes in buffers, so the `refLength` constant (hash length) is unused. `RefScanSink` handles this internally.
src/libstore/build/find-cycles.cc
Outdated
auto st = lstat(path); | ||
|
||
debug("scanForCycleEdges2: scanning path = %s", path); | ||
|
||
if (S_ISREG(st.st_mode)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would want to do this with std::filesystem
stuff, so it is portable to Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we reuse NarAccessor to avoid the filesystem stuff completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. In generally I do get the sense that there is a lot of deduplication that can still be done here. I just haven't looked into the details yet.
(Also, FWIW, my plan with CA derivations is to make cycles impossible by construction.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored it to use NarAccessor
The single-pass greedy algorithm could fail to connect all edges if they arrived in certain orders. For example, edges A→B, C→D, D→A, B→C would result in two paths [D→A→B→C] and [C→D] instead of one complete cycle. Added a second pass that repeatedly tries to merge existing multiedges with each other until no more merges are possible. This ensures we find complete cycle paths regardless of edge discovery order.
…File APIs Replaced POSIX-specific file operations with portable alternatives to improve Windows compatibility.
Added explicit `#include "nix/util/archive.hh"` since we use the caseHackSuffix constant from it.
Replace string concatenation for path joining with type-safe `std::filesystem::path operator/`.
…anPath The "2" suffix was unclear and didn't communicate the function's purpose. The new name better describes what it does: walks the filesystem tree and scans each file using the provided sink.
Changed `walkAndScanPath` to take `std::filesystem::path` instead of `string`, eliminating repeated string→path conversions throughout the function.
…acOS only Since caseHackSuffix is only used inside `#ifdef __APPLE__` blocks, gate the archive.hh include with the same condition.
Move `find-cycles.hh` to `src/libstore/include/nix/store/build/` to ensure it is properly installed as a public header and can be used by tests and other consumers of the library.
Replace the multi-pass O(n²) algorithm with a single-pass O(n) approach using hashmaps to track path endpoints. Before: - First pass: O(n*m) to join each edge with existing multiedges - Second pass: O(m²) repeated merging until no more merges possible - Required multiple iterations through the entire dataset Now: - Single O(n) pass through edges - Two hashmaps (pathStartingAt, pathEndingAt) enable O(1) lookups - Immediately identifies connections without linear searches - Handles all cases in one pass: * Extending existing paths (prepend/append) * Joining two separate paths * Forming cycles when a path connects to itself
Make the cyclic outputs test compatible with older daemons that don't have the enhanced cycle detection. The test now accepts both: - New behavior: detailed cycle analysis with file paths - Old behavior: basic cycle error message This ensures daemon compatibility tests pass when running the new test suite against older daemon versions (e.g., 2.28.4).
Replace direct `std::filesystem` operations with `SourceAccessor`..
d7e7203
to
6c39e5e
Compare
Motivation
Attempting to revive the extremely useful #6585
Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.