Skip to content
This repository was archived by the owner on Nov 12, 2025. It is now read-only.

Commit 03acdec

Browse files
committed
Add snapshot configuration links and examples to cluster pages
We want to steer users towards always using a snapshot repository. - expand Docker Compose example with Minio - update all cluster references to link to the snapshots page - add strong recommendation to always use snapshots for clusters
1 parent 4d81af6 commit 03acdec

File tree

5 files changed

+68
-44
lines changed

5 files changed

+68
-44
lines changed

docs/deploy/server/cluster/deployment.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import Admonition from '@theme/Admonition';
1111
This page describes how you can deploy a distributed Restate cluster.
1212

1313
<Admonition type="tip" title="Quickstart using Docker">
14-
Check out the [Restate cluster guide](/guides/cluster) for a docker-compose ready-made example.
14+
Check out the [Restate cluster guide](/guides/cluster) for a Docker Compose ready-made example.
1515
</Admonition>
1616

1717
<Admonition type="tip" title="Migrating an existing single-node deployment">
@@ -24,6 +24,10 @@ This page describes how you can deploy a distributed Restate cluster.
2424
To understand the terminology used on this page, it might be helpful to read through the [architecture reference](/references/architecture).
2525
</Admonition>
2626

27+
<Admonition type="caution">
28+
Snapshots are essential to support safe log trimming and also allow you to set partition replication to a subset of all cluster nodes, while still allowing for fast partition fail-over to to any live node. Snapshots are also necessary to add more nodes in the future.
29+
</Admonition>
30+
2731
To deploy a distributed Restate cluster without external dependencies, you need to configure the following settings in your [server configuration](/operate/configuration/server):
2832

2933
```toml restate.toml

docs/deploy/server/cluster/growing-cluster.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This allows the new node to discover the metadata servers and join the cluster.
1515
<Admonition type="note" title="Growing the cluster in the future">
1616
If you plan to scale your cluster over time, we strongly recommend enabling snapshotting.
1717
Without it, newly added nodes may not be fully utilized by the system.
18-
See the [snapshotting documentation](/operate/data-backup#snapshotting) for more details.
18+
See the [snapshotting documentation](/operate/snapshots) for more details.
1919
</Admonition>
2020

2121
<Admonition type="note" title="Shrinking the cluster">

docs/guides/cluster.mdx

Lines changed: 55 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -19,88 +19,95 @@ This guide shows how to deploy a distributed Restate cluster consisting of 3 nod
1919

2020
<Step stepLabel="1" title="Deploy the Restate cluster using Docker">
2121

22-
To deploy a 3 node distributed Restate cluster, copy the `docker-compose.yml` and run `docker compose up`.
22+
To deploy a 3 node distributed Restate cluster, create a file `docker-compose.yml` and run `mkdir restate-data object-store
23+
&& docker compose up`.
2324

2425
```yaml docker-compose.yml
25-
x-environment: &common-envs
26-
RESTATE_CLUSTER_NAME: "my-cluster"
27-
# In this setup every node fulfills every role.
28-
RESTATE_ROLES: '["admin","worker","log-server","metadata-server"]'
29-
# To customize logging, check https://docs.restate.dev/operate/monitoring/logging
26+
x-environment: &common-env
27+
RESTATE_CLUSTER_NAME: "restate-cluster"
28+
# Every node runs every role
29+
RESTATE_ROLES: '["admin", "worker", "log-server", "metadata-server"]'
30+
# For more on logging, see: https://docs.restate.dev/operate/monitoring/logging
3031
RESTATE_LOG_FILTER: "restate=info"
3132
RESTATE_BIFROST__DEFAULT_PROVIDER: "replicated"
32-
RESTATE_BIFROST__REPLICATED_LOGLET__DEFAULT_LOG_REPLICATION: 2
33+
RESTATE_BIFROST__REPLICATED_LOGLET__DEFAULT_LOG_REPLICATION: 2 # We require minimum of 2 nodes to accept writes
3334
RESTATE_METADATA_SERVER__TYPE: "replicated"
34-
# This needs to be configured with the hostnames/ports the nodes can use to talk to each other.
35-
# In this setup, they interact within the "internal" Docker compose network setup.
35+
# The addresses where nodes can reach each other over the "internal" Docker Compose network
3636
RESTATE_METADATA_CLIENT__ADDRESSES: '["http://restate-1:5122","http://restate-2:5122","http://restate-3:5122"]'
37+
# Partition snapshotting, see: https://docs.restate.dev/operate/snapshots
38+
RESTATE_WORKER__SNAPSHOTS__DESTINATION: "s3://restate/snapshots"
39+
RESTATE_WORKER__SNAPSHOTS__SNAPSHOT_INTERVAL_NUM_RECORDS: "1000"
40+
RESTATE_WORKER__SNAPSHOTS__AWS_REGION: "local"
41+
RESTATE_WORKER__SNAPSHOTS__AWS_ENDPOINT_URL: "http://minio:9000"
42+
RESTATE_WORKER__SNAPSHOTS__AWS_ALLOW_HTTP: true
43+
RESTATE_WORKER__SNAPSHOTS__AWS_ACCESS_KEY_ID: "minioadmin"
44+
RESTATE_WORKER__SNAPSHOTS__AWS_SECRET_ACCESS_KEY: "minioadmin"
45+
46+
x-defaults: &defaults
47+
image: docker.restate.dev/restatedev/restate:1.2
48+
extra_hosts:
49+
- "host.docker.internal:host-gateway"
3750

3851
services:
3952
restate-1:
40-
image: docker.restate.dev/restatedev/restate:1.2
53+
<<: *defaults
4154
ports:
42-
# Ingress port
43-
- "8080:8080"
44-
# Admin/UI port
45-
- "9070:9070"
46-
# Admin query port (psql)
47-
- "9071:9071"
48-
# Node port
49-
- "5122:5122"
55+
- "8080:8080" # Ingress
56+
- "9070:9070" # Admin
57+
- "5122:5122" # Node-to-node communication
5058
environment:
51-
<<: *common-envs
59+
<<: *common-env
5260
RESTATE_NODE_NAME: restate-1
5361
RESTATE_FORCE_NODE_ID: 1
54-
# This needs to be configured with the hostname/port the other Restate nodes can use to talk to this node.
55-
RESTATE_ADVERTISED_ADDRESS: "http://restate-1:5122"
56-
# Only restate-1 provisions the cluster
57-
RESTATE_AUTO_PROVISION: "true"
58-
extra_hosts:
59-
- "host.docker.internal:host-gateway"
62+
RESTATE_ADVERTISED_ADDRESS: "http://restate-1:5122" # Other Restate nodes must be able to reach us using this address
63+
RESTATE_AUTO_PROVISION: "true" # Only the first node provisions the cluster
6064

6165
restate-2:
62-
image: docker.restate.dev/restatedev/restate:1.2
66+
<<: *defaults
6367
ports:
6468
- "25122:5122"
6569
- "29070:9070"
66-
- "29071:9071"
6770
- "28080:8080"
6871
environment:
69-
<<: *common-envs
72+
<<: *common-env
7073
RESTATE_NODE_NAME: restate-2
7174
RESTATE_FORCE_NODE_ID: 2
7275
RESTATE_ADVERTISED_ADDRESS: "http://restate-2:5122"
73-
# Only restate-1 provisions the cluster
7476
RESTATE_AUTO_PROVISION: "false"
75-
extra_hosts:
76-
- "host.docker.internal:host-gateway"
7777

7878
restate-3:
79-
image: docker.restate.dev/restatedev/restate:1.2
79+
<<: *defaults
8080
ports:
8181
- "35122:5122"
8282
- "39070:9070"
83-
- "39071:9071"
8483
- "38080:8080"
8584
environment:
86-
<<: *common-envs
85+
<<: *common-env
8786
RESTATE_NODE_NAME: restate-3
8887
RESTATE_FORCE_NODE_ID: 3
8988
RESTATE_ADVERTISED_ADDRESS: "http://restate-3:5122"
90-
# Only restate-1 provisions the cluster
9189
RESTATE_AUTO_PROVISION: "false"
92-
extra_hosts:
93-
- "host.docker.internal:host-gateway"
90+
91+
minio:
92+
image: quay.io/minio/minio
93+
# volumes:
94+
# - object-store:/data
95+
entrypoint: "/bin/sh"
96+
# Ensure a bucket called "restate" exists on startup:
97+
command: "-c 'mkdir -p /data/restate && /usr/bin/minio server --quiet /data'"
98+
ports:
99+
- "9000:9000"
94100
```
95101
96-
The cluster uses the `replicated` Bifrost provider and replicates data to 2 nodes.
102+
The cluster uses the `replicated` Bifrost provider and replicates log writes to a minimum of 2 nodes.
97103
Since we are running with 3 nodes, the cluster can tolerate 1 node failure without becoming unavailable.
104+
By default, partition state is replicated to all workers (though each partition has only one acting leader at a time).
98105

99106
The `replicated` metadata cluster consists of all nodes since they all run the `metadata-server` role.
100107
Since the `replicated` metadata cluster requires a majority quorum to operate, the cluster can tolerate 1 node failure without becoming unavailable.
101108

102109
Take a look at the [cluster deployment documentation](/deploy/server/cluster/deployment) for more information on how to configure and deploy a distributed Restate cluster.
103-
110+
In this example we also deployed a Minio server to host the cluster snapshots bucket. Visit [Snapshots](/operate/snapshots) to learn more about whis is strongly recommended for all clusters.
104111
</Step>
105112

106113
<Step stepLabel="2" title="Check the cluster status">
@@ -143,10 +150,19 @@ Take a look at the [cluster deployment documentation](/deploy/server/cluster/dep
143150
```
144151
</Step>
145152

153+
<Step stepLabel="7" title="Create snapshots">
154+
Try instructing the partition processors to create a snapshot of their state in the object store bucket:
155+
```shell
156+
docker compose exec restate-1 restatectl snapshot create
157+
```
158+
Navigate to the Minio console at [http://localhost:9000](http://localhost:9000) and browse the bucket contents (default credentials: `minioadmin`/`minioadmin`).
159+
</Step>
160+
146161
<Step end={true} stepLabel="🎉" title="Congratulations, you managed to run your first distributed Restate cluster and simulated some failures!"/>
147162

148163

149164
Here are some next steps for you to try:
150165

151166
- Try to configure a 5 server Restate cluster that can tolerate up to 2 server failures.
167+
- Trim the logs (either manually, or by setting up automatic trimming) _before_ adding more nodes.
152168
- Try to deploy a 3 server Restate cluster using Kubernetes.

docs/guides/local-to-replicated.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Once you restart your Restate server, it will start using the replicated metadat
2828
type = "replicated"
2929
```
3030

31-
If you plan to extend your single-node deployment to a multi-node deployment, you also need to [configure the snapshot repository](/operate/data-backup#snapshotting).
31+
If you plan to extend your single-node deployment to a multi-node deployment, you also need to [configure the snapshot repository](/operate/snapshots).
3232
This allows new nodes to join the cluster by restoring the latest snapshot.
3333

3434
```toml restate.toml

docs/operate/snapshots.mdx

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,16 @@ import Admonition from '@theme/Admonition';
1111
This page covers configuring a Restate cluster to share partition snapshots for fast fail-over and bootstrapping new nodes. For backup of Restate nodes, see [Data Backup](/operate/data-backup).
1212
</Admonition>
1313

14-
Restate workers can be configured to periodically publish snapshots of their partition state to a shared destination. Snapshots are not necessarily backups. Rather, snapshots allow nodes that had not previously served a partition to bootstrap a copy of its state. Without snapshots, placing a partition processor on a node that wasn't previously a follower would require the full replay of that partition's log. Replaying the log might take a long time - and is impossible if the log gets trimmed.
15-
1614
<Admonition type="note" title="Architectural overview">
1715
To understand the terminology used on this page, it might be helpful to read through the [architecture reference](/references/architecture).
1816
</Admonition>
1917

18+
<Admonition type="caution">
19+
Snapshots are essential to support safe log trimming and also allow you to set partition replication to a subset of all cluster nodes, while still allowing for fast partition fail-over to to any live node. Snapshots are also necessary to add more nodes in the future.
20+
</Admonition>
21+
22+
Restate workers can be configured to periodically publish snapshots of their partition state to a shared destination. Snapshots are not necessarily backups. Rather, snapshots allow nodes that had not previously served a partition to bootstrap a copy of its state. Without snapshots, placing a partition processor on a node that wasn't previously a follower would require the full replay of that partition's log. Replaying the log might take a long time - and is impossible if the log gets trimmed.
23+
2024
## Configuring Snapshots
2125
Restate clusters should always be configured with a snapshot repository to allow nodes to efficiently share partition state, and for new nodes to be added to the cluster in the future.
2226
Restate currently supports using Amazon S3 (or an API-compatible object store) as a shared snapshot repository.

0 commit comments

Comments
 (0)