Skip to content

Conversation

@gvdongen
Copy link
Contributor

@gvdongen gvdongen commented Feb 25, 2025

I tried to consolidate the cluster documentation between

This might make it easier to find the needed information.
I didn't remove anything, just restructured it.
I prefer not having a restatectl page with too much output on it because this will get outdated soon.

A follow-up PR will collapse the terminal output together with the command to make the page more concise: https://terminal-improvements.documentation-beg.pages.dev/operate/clusters

@gvdongen gvdongen changed the title Polishing Clean up cluster documentation Feb 25, 2025
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 26, 2025

Deploying documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 800e646
Status: ✅  Deploy successful!
Preview URL: https://f044379c.documentation-beg.pages.dev
Branch Preview URL: https://polishing.documentation-beg.pages.dev

View logs

@gvdongen gvdongen marked this pull request as ready for review February 26, 2025 16:18
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for updating the cluster documentation @gvdongen. +1 for merging :-)

Comment on lines 150 to 170
```shell
restatectl status
```

<details className={"grey-details"}>
<summary>Output</summary>

```
Node Configuration (v3)
NODE GEN NAME ADDRESS ROLES
N1 2 n1 http://127.0.0.1:5122/ admin | log-server | metadata-server | worker

Log Configuration (v2)
Default Provider Config: Local
L-ID FROM-LSN KIND LOGLET-ID REPLICATION SEQUENCER NODESET
0 1 Local N/A N/A N/A N/A

Alive partition processors (nodes config v3, partition table v2)
P-ID NODE MODE STATUS LEADER EPOCH SEQUENCER APPLIED-LSN ARCHIVED-LSN LAST-UPDATE
0 N1:2 Leader Active N1:2 e1 1 - 615 ms ago
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now, this is already outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just remove it while it is still changing often

```
</details>

The cluster controller reconfigures the log nodeset to exclude `N1`. Depending on the configured log replication level, you may see a warning about compromised availability or, if insufficient log servers are available to achieve the minimum required replication, the log will stop accepting writes altogether. You have to take care as `restatectl` does not currently check whether the cluster will be able to generate new nodesets with the remaining log servers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now checking whether it is is possible to create new node sets after marking a given node or set of nodes as read-only.

@pcholakov
Copy link
Contributor

Hey @gvdongen, I merged #556 and took the liberty of merging the latest main into your PR branch. I have added a new section to the troubleshooting page - which is now a heading under the clusters page. I am not super fond of having these procedures collapsed by default. If an operator needs to urgently discover and follow these guides under pressure, this might add unnecessary friction.

A secondary concern is that we have a simple, clean URL that we can print in the restate-server logs if needed - see this pending change. I'm going to hold off merging that PR until we've merged this one, my main interest is in having a relatively short SEO-friendly URL that points directly to the steps for fixing the issue.

I moved the new troubleshooting sub-section to the cluster page to resolve the conflict and keep the content from being deleted. I've tried to keep it in the the style of the one you created, but the combination of collapsed details + steps doesn't look very good visually. I expect that we'll have more of these "standard operating procedure" type entries in the docs in the future, and I think it's worth planning for them being separate pages rather to avoid overwhelming those following them.

@gvdongen
Copy link
Contributor Author

gvdongen commented Mar 4, 2025

@pcholakov
Your changes look good!

To keep the urls compact, we can override the link for the subsection header with ### My long header {#custom-id} . What do you think?

The problem with separate pages is that it creates a long sidebar that is not intuitive and makes information hard to discover. So that's why for now I would just keep it together.

I will have a look at the steps inside the details section.

@gvdongen gvdongen merged commit 9fa6983 into main Mar 5, 2025
2 checks passed
@pcholakov
Copy link
Contributor

This is great, thank you very much @gvdongen! 🚢

@gvdongen gvdongen deleted the polishing branch March 5, 2025 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants