-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Labels
Description
The CodeQwen 1.5 Model supports Fill-in-the-middle (https://github.com/QwenLM/CodeQwen1.5?tab=readme-ov-file#2-file-level-code-completion-fill-in-the-middle) therefore I was hoping to use the /infill
api to leverage it.
After #6689 being merged I was hoping it would work out-of-the-box, but I guess the FIM tokens are not set correctly in the GGUF model file, only for Codellama and CodeGemma?
I tested it with codeqwen-1_5-7b-chat-q3_k_m.gguf:
curl --location 'http://localhost:9090/infill' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--data '{
"prompt": "",
"input_prefix": "public int gcd(int x, int y) {",
"input_suffix": "\n}",
"n_predict": 100,
"stream": false
}'
Which gave the following response:
{
"content": "WriteLine (\n '\n{\n \"id\": \"x\",\n \"name\": \"x\",\n \"description\": \"x\",\n \"version\": \"x\",\n \"author\": \"x\",\n \"license\": \"x\",\n \"type\": \"x\",\n \"main\": \"x\",\n \"dependencies\": [],\n \"devDependencies\": [],\n \"scripts\": {\n \"start\": \"node x",
"id_slot": 0,
"stop": true,
"model": "/home/user/Downloads/codeqwen-1_5-7b-chat-q3_k_m.gguf",
//...
}
Which looks like gibberish. I suppose llama.cpp can't find the FIM prefix, suffix and middle token and then the prompt doesnt make any sense?
The same request but with Codellama respond with a much more expected answer:
{
"content": "\n return (x % y == 0) ? y : gcd(y, x % y);\n }\n\n public static void main(String[] args) {\n int x = 30, y = 20;\n GCD gcd = new GCD();\n System.out.println(gcd.gcd(x, y));\n }\n}\n\n// 30\n\n// 20",
"id_slot": 0,
"stop": true,
"model": "/home/user/.codegpt/models/gguf/codellama-7b-instruct.Q4_K_M.gguf",
"tokens_predicted": 100,
"tokens_evaluated": 18,
//...
}