Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added rfcs/docs/Rust/alloc-Rust.pdf
Binary file not shown.
Binary file added rfcs/docs/Rust/core-Rust.pdf
Binary file not shown.
Binary file added rfcs/docs/Rust/proc_macro-Rust.pdf
Binary file not shown.
Binary file added rfcs/docs/Rust/std-Rust.pdf
Binary file not shown.
Binary file added rfcs/docs/Rust/test-Rust.pdf
Binary file not shown.
77 changes: 77 additions & 0 deletions src/evaluation/evaluate_retrieval.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
自动粗标脚本:对检索结果调用 LLM 打分 0/1
输入:json 格式结果文件(可含多条 query)
输出:带 llm_score 的新 json 文件
"""
import json
import os
import requests
from tqdm import tqdm
from datetime import datetime

# ===== 1. 配置区 =====
INPUT_FILE = "rust_rag_dataset_deepseek50_result_0927215213.json" # 你的原始结果
OUTPUT_FILE = f"retrieve_results_scored_{datetime.now().strftime('%Y-%m-%d-%H:%M:%S')}.json"
API_KEY = ""
URL = "https://api.chatanywhere.tech/v1/chat/completions"
MODEL = "gpt-5-mini"
# =====================

HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}

SYS_PROMPT = (
"You are a Rust language expert. "
"Given a user question and a document paragraph, output ONLY 1 if the paragraph can help answer the question or the paragraph is related to the question; "
"otherwise output 0. Do not explain."
)

def llm_judge(question: str, doc_content: str) -> int:
"""返回 0 或 1"""
doc_clip = doc_content[:1500] # 防超长
payload = {
"model": MODEL,
"temperature": 0,
"messages": [
{"role": "system", "content": SYS_PROMPT},
{"role": "user", "content": f"Question: {question}\nDocument: {doc_clip}"}
]
}
resp = requests.post(URL, headers=HEADERS, json=payload, timeout=30)
if resp.status_code != 200:
print("API error:", resp.text)
return 0
try:
return int(resp.json()["choices"][0]["message"]["content"].strip()[0])
except Exception as e:
print("Parse error:", e)
return 0

def main():
data = json.load(open(INPUT_FILE, "r", encoding="utf-8"))
# 检查data是否为列表,如果不是则包装成列表
if isinstance(data, dict):
data = [data]
err_cnt = 0

for item in tqdm(data, desc="LLM judging"):
q = item["question"]
if not item.get("source_documents"):
err_cnt += 1
continue
for doc in item["source_documents"]:
score = llm_judge(q, doc["content"])
doc["llm_score"] = score # 新增字段

# 如果原始数据是单一对象,保存时也保存为单一对象
if len(data) == 1 and not isinstance(json.load(open(INPUT_FILE, "r", encoding="utf-8")), list):
json.dump(data[0], open(OUTPUT_FILE, "w", encoding="utf-8"), ensure_ascii=False, indent=2)
else:
json.dump(data, open(OUTPUT_FILE, "w", encoding="utf-8"), ensure_ascii=False, indent=2)

print(f"Scored results saved -> {OUTPUT_FILE}")
print(f"Total errors: {err_cnt}")

if __name__ == "__main__":
main()
26 changes: 26 additions & 0 deletions src/evaluation/rust_rag_dataset_chatgpt25.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
id,question,context
1,What is the Box type used for in Rust?,"The Box type is a smart pointer type. There can only be one owner of a Box, and the owner can decide to mutate the contents, which live on the heap."
2,How does Rc differ from Box in Rust?,The Rc type is a non-threadsafe reference-counted pointer type intended for sharing memory within a thread.
3,What is the difference between Rc and Arc?,The Arc type is the threadsafe equivalent of the Rc type. It provides the same functionality but requires the contained type T to be shareable.
4,Which Rust type is used for heap allocation of arrays?,"Vec is a contiguous growable array type with heap-allocated contents, written Vec<T>."
5,What does the alloc crate provide?,This library provides smart pointers and collections for managing heap-allocated values.
6,What is the Rust Core Library?,The Rust Core Library is the dependency-free foundation of The Rust Standard Library. It defines the intrinsic and primitive building blocks of all Rust code.
7,Does the core library provide I/O or concurrency?,"The core library is minimal: it isn’t even aware of heap allocation, nor does it provide concurrency or I/O."
8,What is the boolean type in Rust?,The boolean type is represented as bool.
9,What function handles panics in the core library?,"This function takes one argument, a &panic::PanicInfo. Consumers of the core library must define it with #[panic_handler]."
10,What does the Option type represent?,Option represents optional values and is used for values that may or may not be present.
11,What is the purpose of the proc_macro crate?,A support library for macro authors when defining new macros. It provides the types consumed in the interfaces of procedurally defined macro definitions such as function-like macros #[proc_macro].
12,What does the TokenStream type represent?,"TokenStream represents an abstract stream of tokens, or, more specifically, a sequence of token trees."
13,What does the quote! macro do?,The quote! macro accepts arbitrary tokens and expands into a TokenStream describing the input.
14,What does the Ident struct represent in proc_macro?,Ident represents an identifier.
15,What does the Span type represent in proc_macro?,"Span represents a region of source code, along with macro expansion information."
16,What is the Rust Standard Library?,"The Rust Standard Library is the foundation of portable Rust software, providing core types, macros, I/O, multithreading, and more."
17,What does the Result type represent in Rust?,"The Result<T, E> type is used for error handling and represents either success (Ok) or failure (Err)."
18,What is the Rust Prelude?,"The Rust Prelude is a small collection of items, mostly traits, that are imported into every module of every crate by default."
19,What does the collections module provide?,"The collections module defines maps, sets, linked lists and other typical collection types, including HashMap<K, V>."
20,What types are used for contiguous memory in Rust?,"Vec<T>, [T; N], and [T] are the three common ways to deal with contiguous regions of memory."
21,What is the purpose of the test crate?,Support code for rustc’s built-in unit-test and micro-benchmarking framework.
22,What does the Bencher struct represent?,Bencher is used for benchmarking in Rust's test framework.
23,What is the black_box function used for?,The black_box function is used in benchmarks to prevent compiler optimizations on values.
24,What does the assert_test_result function do?,It is invoked when unit tests terminate. Returns Result::Err if the test is considered a failure.
25,What does the run_tests_console function do?,It runs provided tests reporting process and results to the stdout.
Loading