Skip to content

Commit 58e8b34

Browse files
committed
feat: add new metrics
add info metrics about providers, enterprises, organizations, repositories and pools. Also expose most of the configurable pool information as metric like e.g. max Runners as garm_pool_max_runners Signed-off-by: Mario Constanti <[email protected]>
1 parent a48ec0c commit 58e8b34

File tree

10 files changed

+579
-96
lines changed

10 files changed

+579
-96
lines changed

doc/config_metrics.md

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,55 @@
22

33
This is one of the features in GARM that I really love having. For one thing, it's community contributed and for another, it really adds value to the project. It allows us to create some pretty nice visualizations of what is happening with GARM.
44

5-
At the moment there are only three meaningful metrics being collected, besides the default ones that the prometheus golang package enables by default. These are:
5+
## Common metrics
66

7-
* `garm_health` - This is a gauge that is set to 1 if GARM is healthy and 0 if it is not. This is useful for alerting.
8-
* `garm_runner_status` - This is a gauge value that gives us details about the runners garm spawns
9-
* `garm_webhooks_received` - This is a counter that increments every time GARM receives a webhook from GitHub.
7+
| Metric name | Type | Labels | Description |
8+
|--------------------------|---------|-------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
9+
| `garm_health` | Gauge | `controller_id`=&lt;controller id&gt; <br>`name`=&lt;hostname&gt; | This is a gauge that is set to 1 if GARM is healthy and 0 if it is not. This is useful for alerting. |
10+
| `garm_webhooks_received` | Counter | `controller_id`=&lt;controller id&gt; <br>`name`=&lt;hostname&gt; | This is a counter that increments every time GARM receives a webhook from GitHub. |
11+
12+
## Enterprise metrics
13+
14+
| Metric name | Type | Labels | Description |
15+
|---------------------------------------|-------|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
16+
| `garm_enterprise_info` | Gauge | `id`=&lt;enterprise id&gt; <br>`name`=&lt;enterprise name&gt; | This is a gauge that is set to 1 and expose enterprise information |
17+
| `garm_enterprise_pool_manager_status` | Gauge | `id`=&lt;enterprise id&gt; <br>`name`=&lt;enterprise name&gt; <br>`running`=&lt;true\|false&gt; | This is a gauge that is set to 1 if the enterprise pool manager is running and set to 0 if not |
18+
19+
## Organization metrics
20+
21+
| Metric name | Type | Labels | Description |
22+
|-----------------------------------------|-------|-----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
23+
| `garm_organization_info` | Gauge | `id`=&lt;organization id&gt; <br>`name`=&lt;organization name&gt; | This is a gauge that is set to 1 and expose organization information |
24+
| `garm_organization_pool_manager_status` | Gauge | `id`=&lt;organization id&gt; <br>`name`=&lt;organization name&gt; <br>`running`=&lt;true\|false&gt; | This is a gauge that is set to 1 if the organization pool manager is running and set to 0 if not |
25+
26+
## Repository metrics
27+
28+
| Metric name | Type | Labels | Description |
29+
|---------------------------------------|-------|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
30+
| `garm_repository_info` | Gauge | `id`=&lt;repository id&gt; <br>`name`=&lt;repository name&gt; | This is a gauge that is set to 1 and expose repository information |
31+
| `garm_repository_pool_manager_status` | Gauge | `id`=&lt;repository id&gt; <br>`name`=&lt;repository name&gt; <br>`running`=&lt;true\|false&gt; | This is a gauge that is set to 1 if the repository pool manager is running and set to 0 if not |
32+
33+
## Provider metrics
34+
35+
| Metric name | Type | Labels | Description |
36+
|----------------------|-------|-------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|
37+
| `garm_provider_info` | Gauge | `description`=&lt;provider description&gt; <br>`name`=&lt;provider name&gt; <br>`type`=&lt;internal\|external&gt; | This is a gauge that is set to 1 and expose provider information |
38+
39+
## Pool metrics
40+
41+
| Metric name | Type | Labels | Description |
42+
|-------------------------------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
43+
| `garm_pool_info` | Gauge | `flavor`=&lt;flavor&gt; <br>`id`=&lt;pool id&gt; <br>`image`=&lt;image name&gt; <br>`os_arch`=&lt;defined OS arch&gt; <br>`os_type`=&lt;defined OS name&gt; <br>`pool_owner`=&lt;owner name&gt; <br>`pool_type`=&lt;repository\|organization\|enterprise&gt; <br>`prefix`=&lt;prefix&gt; <br>`provider`=&lt;provider name&gt; <br>`tags`=&lt;concatenated list of pool tags&gt; <br> | This is a gauge that is set to 1 and expose pool information |
44+
| `garm_pool_status` | Gauge | `enabled`=&lt;true\|false&gt; <br>`id`=&lt;pool id&gt; | This is a gauge that is set to 1 if the pool is enabled and set to 0 if not |
45+
| `garm_pool_bootstrap_timeout` | Gauge | `id`=&lt;pool id&gt; | This is a gauge that is set to the pool bootstrap timeout |
46+
| `garm_pool_max_runners` | Gauge | `id`=&lt;pool id&gt; | This is a gauge that is set to the pool max runners |
47+
| `garm_pool_min_idle_runners` | Gauge | `id`=&lt;pool id&gt; | This is a gauge that is set to the pool min idle runners |
48+
49+
## Runner metrics
50+
51+
| Metric name | Type | Labels | Description |
52+
|----------------------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|
53+
| `garm_runner_status` | Gauge | `controller_id`=&lt;controller id&gt; <br>`hostname`=&lt;hostname&gt; <br>`name`=&lt;runner name&gt; <br>`pool_owner`=&lt;owner name&gt; <br>`pool_type`=&lt;repository\|organization\|enterprise&gt; <br>`provider`=&lt;provider name&gt; <br>`runner_status`=&lt;running\|stopped\|error\|pending_delete\|deleting\|pending_create\|creating\|unknown&gt; <br>`status`=&lt;idle\|pending\|terminated\|installing\|failed\|active&gt; <br> | This is a gauge value that gives us details about the runners garm spawns |
1054

1155
More metrics will be added in the future.
1256

metrics/enterprise.go

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
package metrics
2+
3+
import (
4+
"log"
5+
"strconv"
6+
7+
"github.com/cloudbase/garm/auth"
8+
"github.com/prometheus/client_golang/prometheus"
9+
)
10+
11+
// CollectOrganizationMetric collects the metrics for the enterprise objects
12+
func (c *GarmCollector) CollectEnterpriseMetric(ch chan<- prometheus.Metric, hostname string, controllerID string) {
13+
ctx := auth.GetAdminContext()
14+
15+
enterprises, err := c.runner.ListEnterprises(ctx)
16+
if err != nil {
17+
log.Printf("listing providers: %s", err)
18+
return
19+
}
20+
21+
for _, enterprise := range enterprises {
22+
23+
enterpriseInfo, err := prometheus.NewConstMetric(
24+
c.enterpriseInfo,
25+
prometheus.GaugeValue,
26+
1,
27+
enterprise.Name, // label: name
28+
enterprise.ID, // label: id
29+
)
30+
if err != nil {
31+
log.Printf("cannot collect enterpriseInfo metric: %s", err)
32+
continue
33+
}
34+
ch <- enterpriseInfo
35+
36+
enterprisePoolManagerStatus, err := prometheus.NewConstMetric(
37+
c.enterprisePoolManagerStatus,
38+
prometheus.GaugeValue,
39+
bool2float64(enterprise.PoolManagerStatus.IsRunning),
40+
enterprise.Name, // label: name
41+
enterprise.ID, // label: id
42+
strconv.FormatBool(enterprise.PoolManagerStatus.IsRunning), // label: running
43+
)
44+
if err != nil {
45+
log.Printf("cannot collect enterprisePoolManagerStatus metric: %s", err)
46+
continue
47+
}
48+
ch <- enterprisePoolManagerStatus
49+
}
50+
}

metrics/health.go

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
package metrics
2+
3+
import (
4+
"log"
5+
6+
"github.com/prometheus/client_golang/prometheus"
7+
)
8+
9+
func (c *GarmCollector) CollectHealthMetric(ch chan<- prometheus.Metric, hostname string, controllerID string) {
10+
m, err := prometheus.NewConstMetric(
11+
c.healthMetric,
12+
prometheus.GaugeValue,
13+
1,
14+
hostname,
15+
controllerID,
16+
)
17+
if err != nil {
18+
log.Printf("error on creating health metric: %s", err)
19+
return
20+
}
21+
ch <- m
22+
}

metrics/instance.go

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
package metrics
2+
3+
import (
4+
"log"
5+
6+
"github.com/cloudbase/garm/auth"
7+
"github.com/prometheus/client_golang/prometheus"
8+
)
9+
10+
// CollectInstanceMetric collects the metrics for the runner instances
11+
// reflecting the statuses and the pool they belong to.
12+
func (c *GarmCollector) CollectInstanceMetric(ch chan<- prometheus.Metric, hostname string, controllerID string) {
13+
ctx := auth.GetAdminContext()
14+
15+
instances, err := c.runner.ListAllInstances(ctx)
16+
if err != nil {
17+
log.Printf("cannot collect metrics, listing instances: %s", err)
18+
return
19+
}
20+
21+
pools, err := c.runner.ListAllPools(ctx)
22+
if err != nil {
23+
log.Printf("listing pools: %s", err)
24+
return
25+
}
26+
27+
type poolInfo struct {
28+
Name string
29+
Type string
30+
ProviderName string
31+
}
32+
33+
poolNames := make(map[string]poolInfo)
34+
for _, pool := range pools {
35+
if pool.EnterpriseName != "" {
36+
poolNames[pool.ID] = poolInfo{
37+
Name: pool.EnterpriseName,
38+
Type: string(pool.PoolType()),
39+
ProviderName: pool.ProviderName,
40+
}
41+
} else if pool.OrgName != "" {
42+
poolNames[pool.ID] = poolInfo{
43+
Name: pool.OrgName,
44+
Type: string(pool.PoolType()),
45+
ProviderName: pool.ProviderName,
46+
}
47+
} else {
48+
poolNames[pool.ID] = poolInfo{
49+
Name: pool.RepoName,
50+
Type: string(pool.PoolType()),
51+
ProviderName: pool.ProviderName,
52+
}
53+
}
54+
}
55+
56+
for _, instance := range instances {
57+
58+
m, err := prometheus.NewConstMetric(
59+
c.instanceMetric,
60+
prometheus.GaugeValue,
61+
1,
62+
instance.Name, // label: name
63+
string(instance.Status), // label: status
64+
string(instance.RunnerStatus), // label: runner_status
65+
poolNames[instance.PoolID].Name, // label: pool_owner
66+
poolNames[instance.PoolID].Type, // label: pool_type
67+
instance.PoolID, // label: pool_id
68+
hostname, // label: hostname
69+
controllerID, // label: controller_id
70+
poolNames[instance.PoolID].ProviderName, // label: provider
71+
)
72+
73+
if err != nil {
74+
log.Printf("cannot collect runner metric: %s", err)
75+
continue
76+
}
77+
ch <- m
78+
}
79+
}

0 commit comments

Comments
 (0)