Skip to content

Conversation

@aljesusg
Copy link
Contributor

@aljesusg aljesusg commented Dec 1, 2025

Kiali will run Gevals when the task filter is set to 'kiali'

Signed-off-by: Alberto Gutierrez <[email protected]>
Signed-off-by: Alberto Gutierrez <[email protected]>
@aljesusg
Copy link
Contributor Author

aljesusg commented Dec 1, 2025

First approach @Cali0707
I think it’s better to run Kiali only in certain specific cases, and not with the entire weekly set alongside the rest. WDYT?

Copy link
Collaborator

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aljesusg I left a comment on one of your tasks, but I think it applies across most of them having looked at more of the PR so far. In general, we support llm judge on responses where we expect the LLM to be giving us information/a summary of tool findings.

In terms of when to run kiali vs. the whole set of evals I defer to @manusa - what do you think Marc?

Comment on lines +35 to +36
inline: |-
#!/usr/bin/env bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aljesusg do you maybe want to use the llm judge impl here to check for if the model response contains the correct info? See an example here:

verify:
contains: "division by zero"

@manusa
Copy link
Member

manusa commented Dec 2, 2025

In terms of when to run kiali vs. the whole set of evals I defer to @manusa - what do you think Marc?

I understand that this might be beneficial to run on PRs that affect the Kiali toolset, the filter sounds like a good approach.
However, we should probably document this to avoid forgetting how the PR comments work.

Additionally, we could improve the workflow dispatch to include a drop-down of the features/toolsets to test so at some point we can dispatch an execution for a given toolset evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants