Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ opinions. Please communicate with us on [Slack](https://t.mp/slack) in the `#rub
- [Activity Worker Shutdown](#activity-worker-shutdown)
- [Activity Concurrency and Executors](#activity-concurrency-and-executors)
- [Activity Testing](#activity-testing)
- [Telemetry](#telemetry)
- [Metrics](#metrics)
- [OpenTelemetry Tracing](#opentelemetry-tracing)
- [OpenTelemetry Tracing in Workflows](#opentelemetry-tracing-in-workflows)
- [Ractors](#ractors)
- [Platform Support](#platform-support)
- [Development](#development)
Expand Down Expand Up @@ -1034,6 +1038,122 @@ it will raise the error raised in the activity.
The constructor of the environment has multiple keyword arguments that can be set to affect the activity context for the
activity.

### Telemetry

#### Metrics

Metrics can be configured on a `Temporalio::Runtime`. Only one runtime is expected to be created for the entire
application and it should be created before any clients are created. For example, this configures Prometheus to export
metrics at `http://127.0.0.1:9000/metrics`:

```ruby
require 'temporalio/runtime'

Temporalio::Runtime.default = Temporalio::Runtime.new(
telemetry: Temporalio::Runtime::TelemetryOptions.new(
metrics: Temporalio::Runtime::MetricsOptions.new(
prometheus: Temporalio::Runtime::PrometheusMetricsOptions.new(
bind_address: '127.0.0.1:9000'
)
)
)
)
```

Now every client created will use this runtime. Setting the default will fail if a runtime has already been requested or
a default already set. Technically a runtime can be created without setting the default and be set on each client via
the `runtime` parameter, but this is discouraged because a runtime represents a heavy internal engine not meant to be
created multiple times.

OpenTelemetry metrics can be configured instead by passing `Temporalio::Runtime::OpenTelemetryMetricsOptions` as the
`opentelemetry` parameter to the metrics options. See API documentation for details.

#### OpenTelemetry Tracing

OpenTelemetry tracing for clients, activities, and workflows can be enabled using the
`Temporalio::Contrib::OpenTelemetry::TracingInterceptor`. Specifically, when creating a client, set the interceptor like
so:

```ruby
require 'opentelemetry/api'
require 'opentelemetry/sdk'
require 'temporalio/client'
require 'temporalio/contrib/open_telemetry'

# ... assumes my_otel_tracer_provider is a tracer provider created by the user
my_tracer = my_otel_tracer_provider.tracer('my-otel-tracer')
Comment on lines +1083 to +1084
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an example of one that someone would use would be nice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should just change this to

Suggested change
# ... assumes my_otel_tracer_provider is a tracer provider created by the user
my_tracer = my_otel_tracer_provider.tracer('my-otel-tracer')
my_tracer = OpenTelemetry.tracer_provider.tracer('my-otel-tracer')

Per https://opentelemetry.io/docs/languages/ruby/instrumentation/#acquiring-a-tracer. I guess most people configure the global instead of creating a tracing provider.


my_client = Temporalio::Client.connect(
'localhost:7233', 'my-namespace',
interceptors: [Temporalio::Contrib::OpenTelemetry::TracingInterceptor.new(my_tracer)]
)
```

Now many high-level client calls and activities/workflows on workers using this client will have spans created on that
OpenTelemetry tracer.

##### OpenTelemetry Tracing in Workflows

OpenTelemetry works by creating spans as necessary and in some cases serializing them to Temporal headers to be
deserialized by workflows/activities to be set on the context. However, OpenTelemetry requires spans to be finished
where they start, so spans cannot be resumed. This is fine for client calls and activity attempts, but Temporal
workflows are resumable functions that may start on a different machine than they complete. Due to this, spans created
by workflows are immediately closed since there is no way for the span to actually span machines. They are also not
created during replay. The spans still become the proper parents of other spans if they are created.

Custom spans can be created inside of workflows using class methods on the
`Temporalio::Contrib::OpenTelemetry::Workflow` module. For example:

```ruby
class MyWorkflow < Temporalio::Workflow::Definition
def execute
# Sleep for a bit
Temporalio::Workflow.sleep(10)
# Run activity in span
Temporalio::Contrib::OpenTelemetry::Workflow.with_completed_span(
'my-span',
attributes: { 'my-attr' => 'some val' }
) do
# Execute an activity
Temporalio::Workflow.execute_activity(MyActivity, start_to_close_timeout: 10)
end
end
end
```

If this all executes on one worker (because Temporal has a concept of stickiness that caches instances), the span tree
may look like:

```
StartWorkflow:MyWorkflow <-- created by client outbound
RunWorkflow:MyWorkflow <-- created inside workflow on first task
my-span <-- created inside workflow by code
StartActivity:MyActivity <-- created inside workflow when first called
RunActivity:MyActivity <-- created inside activity attempt 1
CompleteWorkflow:MyWorkflow <-- created inside workflow on last task
```

However if, say, the worker crashed during the 10s sleep and the workflow was resumed (i.e. replayed) on another worker,
the span tree may look like:

```
StartWorkflow:MyWorkflow <-- created by client outbound
RunWorkflow:MyWorkflow <-- created by workflow inbound on first task
my-span <-- created inside the workflow
StartActivity:MyActivity <-- created by workflow outbound
RunActivity:MyActivity <-- created by activity attempt 1 inbound
CompleteWorkflow:MyWorkflow <-- created by workflow inbound on last task
```

Notice how the spans are no longer under `RunWorkflow`. This is because spans inside the workflow are not created on
replay, so there is no parent on replay. But there are no orphans because we still have the overarching parent of
`StartWorkflow` that was created by the client and is serialized into Temporal headers so it can always be the parent.

And reminder that `StartWorkflow` and `RunActivity` spans do last the length of their calls (so time to start the
workflow and time to run the activity attempt respectively), but the other spans have no measurable time because they
are created in workflows and closed immediately since long-lived spans cannot work for durable software that may resume
on other machines.

### Ractors

It was an original goal to have workflows actually be Ractors for deterministic state isolation and have the library
Expand Down
4 changes: 4 additions & 0 deletions temporalio/Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ group :development do
gem 'grpc', '~> 1.69'
gem 'grpc-tools', '~> 1.69'
gem 'minitest'
# We are intentionally not pinning OTel versions here so that CI tests the latest. This also means that the OTel
# contrib library also does not require specific versions, we are relying on the compatibility rigor of OTel.
Comment on lines +15 to +16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good luck lol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 The problem is that Ruby doesn't have a concept of "optional dependency constraints" so I can't put a constraint on users' OTel library, I can only fail to require it or add advanced validation to check the OTel version which I don't want to do. So this is the best way to at least make sure that we always work with the latest. If there was enough concern about breaking support for older OTel API versions, we could make a separate GH step to only test with a specific oldest OTel API version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do want to make this an option (and I also opened temporalio/sdk-python#794). I will update this PR.

gem 'opentelemetry-api'
gem 'opentelemetry-sdk'
gem 'rake'
gem 'rake-compiler'
gem 'rbs', '~> 3.5.3'
Expand Down
3 changes: 3 additions & 0 deletions temporalio/lib/temporalio/activity/info.rb
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ module Activity
# @return [String] Workflow run ID that started this activity.
# @!attribute workflow_type
# @return [String] Workflow type name that started this activity.
#
# @note WARNING: This class may have required parameters added to its constructor. Users should not instantiate this
# class or it may break in incompatible ways.
class Info; end # rubocop:disable Lint/EmptyClass
end
end
6 changes: 3 additions & 3 deletions temporalio/lib/temporalio/client.rb
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ def start_workflow(
rpc_options: nil
)
@impl.start_workflow(Interceptor::StartWorkflowInput.new(
workflow:,
workflow: Workflow::Definition._workflow_type_from_workflow_parameter(workflow),
args:,
workflow_id: id,
task_queue:,
Expand Down Expand Up @@ -386,7 +386,7 @@ def start_update_with_start_workflow(
@impl.start_update_with_start_workflow(
Interceptor::StartUpdateWithStartWorkflowInput.new(
update_id: id,
update:,
update: Workflow::Definition::Update._name_from_parameter(update),
args:,
wait_for_stage:,
start_workflow_operation:,
Expand Down Expand Up @@ -449,7 +449,7 @@ def signal_with_start_workflow(
)
@impl.signal_with_start_workflow(
Interceptor::SignalWithStartWorkflowInput.new(
signal:,
signal: Workflow::Definition::Signal._name_from_parameter(signal),
args:,
start_workflow_operation:,
rpc_options:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def initialize(
headers: {}
)
@options = Options.new(
workflow:,
workflow: Workflow::Definition._workflow_type_from_workflow_parameter(workflow),
args:,
id:,
task_queue:,
Expand Down
6 changes: 3 additions & 3 deletions temporalio/lib/temporalio/client/workflow_handle.rb
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ def signal(signal, *args, rpc_options: nil)
@client._impl.signal_workflow(Interceptor::SignalWorkflowInput.new(
workflow_id: id,
run_id:,
signal:,
signal: Workflow::Definition::Signal._name_from_parameter(signal),
args:,
headers: {},
rpc_options:
Expand Down Expand Up @@ -254,7 +254,7 @@ def query(
@client._impl.query_workflow(Interceptor::QueryWorkflowInput.new(
workflow_id: id,
run_id:,
query:,
query: Workflow::Definition::Query._name_from_parameter(query),
args:,
reject_condition:,
headers: {},
Expand Down Expand Up @@ -291,7 +291,7 @@ def start_update(
workflow_id: self.id,
run_id:,
update_id: id,
update:,
update: Workflow::Definition::Update._name_from_parameter(update),
args:,
wait_for_stage:,
headers: {},
Expand Down
Loading
Loading