An Enhancement Proposal Workflow #1674

janfb · 2025-10-13T18:56:56Z

I suggest we adopt an Enhancement Proposal (EP) workflow to better organize and discuss larger refactorings and API changes. EPs should live on the docs website and have a correspondng GH discussions entry for public discussions.
This will help us plan and reach consensus on major changes (e.g., roadmap toward sbi 1.0, deprecations, architectural refactors), while keeping the process transparent and easy to follow.

I created a corresponding entry on the website, an explanation of the workflow and an example EP-01 for the planned training infrastructure refactoring (numfocus/small-development-grant-proposals#60).

In general, I see this as a first step towards a more transparent and professional governance model.

All open for discussions, let's discuss in next week's org meeting.

codecov · 2025-10-13T19:03:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.63%. Comparing base (4d1dc5f) to head (0e3a5d8).
⚠️ Report is 7 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1674      +/-   ##
==========================================
- Coverage   87.77%   84.63%   -3.15%     
==========================================
  Files         134      137       +3     
  Lines       11126    11473     +347     
==========================================
- Hits         9766     9710      -56     
- Misses       1360     1763     +403

Flag	Coverage Δ
unittests	`84.63% <ø> (-3.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 48 files with indirect coverage changes

michaeldeistler

Thanks!

The general idea of EPs is great, I really like it! We should also advertise this on the contributing.md (or even require it for "external" PRs).

I do have doubts about the new abstractions though. For example, we currently have EarlyStopping. In the future, we would probably also have LRScheduler, ClippingMethod, and possibly more. Should we really aim to build the ultimate training loop? I think this has a few downsides:

all of these features have to be well-documented (otherwise will forever be unused)
they create more abstractions, which will make it harder for new maintainers
more testing...

The simple alternative would be to simply point people to the flexible training loop. Everybody can use an LLM to get, for example, a learning rate scheduler. Of course, this requires a bit more work for users, but getting to know yet another sbi abstraction (which includes reading and understanding documentation, testing the feature, making sure it works,...) is also non-zero work for users.

michaeldeistler · 2025-10-14T07:30:20Z

docs/proposals/ep-01-pluggable-training.md

+
+### For Maintainers
+
+- **Reduced code duplication**: Extract shared training logic into reusable components


I am fine with doing this for this PR, but in general, I don't think that code duplication is always a bad thing. Introducing "reusable components" adds cognitive load and will make the package harder to maintain in the future.

agreed, not all code duplication is avoidable and sometimes it's better to keep them in favor of adding more complexity and routing in the "reusable components".
Regarding the plan for our standard methods, e.g., NPE, NLE, NRE etc, I think it does make sense to centralize the loop and it will make the code more readable and better maintainable.

michaeldeistler · 2025-10-14T07:32:47Z

mkdocs/docs/proposals/ep-01-pluggable-training.md

+
+- **Seamless experiment tracking**: Support for TensorBoard, WandB, MLflow, and stdout
+  without changing existing code
+- **Multiple early stopping strategies**: Patience-based, plateau detection, and custom


is there any evidence that patience-based is not working well? I would only add more methods this if it is really low-effort to implement and maintain other methods or if it they are really important for performance.

Also agreed! In general, my aim would be to add really lightweight interfaces for adding external tools, and not implement those external tools ourselves internally. E.g, for the logging, it would be great to just add the possibility of connecting other common loggers such that users create the logger instance on their side and just pass the logger and we call the logger just like before, i.e., agnostic to which logger it really is.

A similar approach should be taken for the early stopping. We should check, whether we really need this, and if so, aim for a lightweight interface to support them, and otherwise refer to the flexible training interface.

michaeldeistler · 2025-10-14T07:33:20Z

docs/proposals/ep-01-pluggable-training.md

+    learning_rate=5e-4,
+    max_epochs=1000,
+    device="cuda"
+)


I don't really see the point of this. All of our train config args are ints, floats, or bools. Why do we need this additional abstraction?

The current API mixes training-specific args (common to all sbi methods) and algorithm-specific args, e.g., exclude_invalid_x, resume_training. Adding the dataclass layers helps separating those. Additionally, it helps the user to see which options actually exist and what they do more easily (easier than checking the API ref on the website).

Adding the dataclass layers helps separating those. Additionally, it helps the user to see which options actually exist and what they do more easily (easier than checking the API ref on the website).

I don't understand this. Why is it more easy to look up the docs for TrainConfig() than to look up the docs for .train()?

With typed dataclasses you get IDE autocomplete and inline docs (arg names, types, defaults). I can imagine many users discover options this way instead of reading long docstrings in the source file or the API refs.

the IDE support, at least for my VSCode settings also look nicer for the dataclass configs, e.g. comparing the train() view vs TrainConfig view.

I really don't get it :D

I do get inline docs and autocomplete in VS code (and in jupyter lab) for the .train() method.

For me, the inline docs for .train() and MCMCPosteriorParameters are also formatted in the exact way.

Let's chat on Thursday

michaeldeistler · 2025-10-14T07:33:56Z

docs/proposals/ep-01-pluggable-training.md

+)
+
+# Method-specific loss configuration
+loss_args = LossArgsNPE(exclude_invalid_x=True)


I would not consider exclude_invalid_x to live in LossArgsNPE

Agreed, this should be somewhere else semantically, but where? it does not belong the training args either. Or we call the args NPEArgs.. that's some we need to decide upon implementation.

I think this is one of the downsides of grouping Args into categories.

True, we would need to come up with a meaningful grouping. However, at the moment there is non and the user just sees a bunch of arguments.

michaeldeistler · 2025-10-14T07:34:30Z

docs/proposals/ep-01-pluggable-training.md

+early_stop = EarlyStopping.validation_loss(patience=20, min_delta=1e-4)
+
+# Stop when learning rate drops too low
+early_stop = EarlyStopping.lr_threshold(min_lr=1e-6)


Why is this not a part of train_config?

Agreed, this is a place where we should simplify things and potentially gather all training related options in one place, e.g., TrainConfig. Then we need to discuss whether and how we want to offer early stopping options.

janfb · 2025-10-29T10:24:53Z

Thanks!

The general idea of EPs is great, I really like it! We should also advertise this on the contributing.md (or even require it for "external" PRs).

I do have doubts about the new abstractions though. For example, we currently have EarlyStopping. In the future, we would probably also have LRScheduler, ClippingMethod, and possibly more. Should we really aim to build the ultimate training loop? I think this has a few downsides:

all of these features have to be well-documented (otherwise will forever be unused)

they create more abstractions, which will make it harder for new maintainers

more testing...

The simple alternative would be to simply point people to the flexible training loop. Everybody can use an LLM to get, for example, a learning rate scheduler. Of course, this requires a bit more work for users, but getting to know yet another sbi abstraction (which includes reading and understanding documentation, testing the feature, making sure it works,...) is also non-zero work for users.

Thanks for the review @michaeldeistler ! I agree with most of your concerns (see comments to your comments) and I have changed the proposal accordingly, e.g., I added a "Non-goals" section to stress these points.

I also moved the proposals to the docs/ folder (had put them in mkdocs by mistake before).

add ep workflow and ep01

634a980

janfb requested a review from michaeldeistler October 13, 2025 18:57

michaeldeistler reviewed Oct 14, 2025

View reviewed changes

move proposals to docs, add non-goals

0e3a5d8

michaeldeistler approved these changes Oct 29, 2025

View reviewed changes

janfb merged commit 76facab into main Oct 30, 2025
9 checks passed

janfb deleted the enhancement-proposals branch October 30, 2025 16:58

janfb mentioned this pull request Oct 31, 2025

Add built-in WandB and TensorBoard logger adapters #1683

Open


		### For Maintainers

		- Reduced code duplication: Extract shared training logic into reusable components

An Enhancement Proposal Workflow #1674

An Enhancement Proposal Workflow #1674

Conversation

janfb commented Oct 13, 2025

Uh oh!

codecov bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

michaeldeistler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janfb Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janfb commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 13, 2025 •

edited

Loading

janfb Oct 29, 2025 •

edited

Loading