Skip to content

Commit 7c6c079

Browse files
committed
ggml : fix quantize_row_q8_0() ARM_NEON rounding
1 parent eafd47f commit 7c6c079

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

ggml.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,8 +1093,7 @@ static void quantize_row_q8_0(const float * restrict x, void * restrict vy, int
10931093

10941094
for (int l = 0; l < 8; l++) {
10951095
const float32x4_t v = vmulq_n_f32(srcv[l], id);
1096-
//TODO: rounding
1097-
const int32x4_t vi = vcvtq_s32_f32(v);
1096+
const int32x4_t vi = vcvtnq_s32_f32(v);
10981097

10991098
y[i].qs[4*l + 0] = vgetq_lane_s32(vi, 0);
11001099
y[i].qs[4*l + 1] = vgetq_lane_s32(vi, 1);

0 commit comments

Comments
 (0)