Skip to content

Commit 3bc69e5

Browse files
authored
Merge pull request #782 from VladimirKadlec/main
add evaluation
2 parents 18ca0fb + 166b168 commit 3bc69e5

File tree

1 file changed

+75
-5
lines changed

1 file changed

+75
-5
lines changed

docs/demos/lcore/lcore.md

Lines changed: 75 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -374,11 +374,81 @@ uv run llama stack list-providers
374374

375375
## Evaluation
376376

377-
* Motivation
378-
* Evaluation tool
379-
- Ragas
380-
- Deep Eval
381-
* Statistical significance
377+
---
378+
379+
## Why Evaluate an LLM System?
380+
381+
* Measure performance
382+
* Ensure good user experience
383+
* Detect bias & harm
384+
* Comply with ethical & legal standards
385+
386+
---
387+
388+
## Benefits of Evaluation
389+
390+
* Improvement:
391+
- Pinpoints weaknesses (e.g., hallucinations)
392+
- Enables data-driven model tuning
393+
394+
* Benchmarking:
395+
- Compare models (GPT, Gemini, Granite, etc.)
396+
- Ensures reliability over time
397+
398+
---
399+
### Lightspeed Evaluation Framework
400+
401+
<font size="10">[https://github.com/lightspeed-core/lightspeed-evaluation/](https://github.com/lightspeed-core/lightspeed-evaluation)</font>
402+
---
403+
404+
### Lightspeed Evaluation Framework
405+
406+
* Multi-Framework LLM as a Judge
407+
- Ragas, DeepEval and custom implementations
408+
* Turn & Conversation-Level
409+
- Individual queries and multi-turn conversations
410+
* Tools/Agents Support
411+
* LLM Providers
412+
- OpenAI, Watsonx, Gemini, vLLM and others
413+
* Setup/Cleanup Scripts
414+
* Statistical Analysis
415+
416+
---
417+
```yaml
418+
- conversation_group_id: "test_conversation"
419+
description: "Sample evaluation"
420+
421+
# Optional: Environment setup/cleanup scripts, when API is enabled
422+
setup_script: "scripts/setup_env.sh" # Run before conversation
423+
cleanup_script: "scripts/cleanup_env.sh" # Run after conversation
424+
425+
# Conversation-level metrics
426+
conversation_metrics:
427+
- "deepeval:conversation_completeness"
428+
429+
conversation_metrics_metadata:
430+
"deepeval:conversation_completeness":
431+
threshold: 0.8
432+
433+
turns:
434+
- turn_id: id1
435+
query: What is OpenShift Virtualization?
436+
response: null # Populated by API if enabled, otherwise provide
437+
contexts:
438+
- OpenShift Virtualization is an extension of the OpenShift ...
439+
attachments: [] # Attachments (Optional)
440+
expected_response: OpenShift Virtualization is an extension of the OpenShift Container Platform that allows running virtual machines alongside containers
441+
expected_intent: "explain a concept" # Expected intent for intent evaluation
442+
443+
# Per-turn metrics (overrides system defaults)
444+
turn_metrics:
445+
- "ragas:faithfulness"
446+
- "custom:answer_correctness"
447+
- "custom:intent_eval"
448+
```
449+
---
450+
451+
## Demo
382452
383453
---
384454

0 commit comments

Comments
 (0)