Skip to content

Commit 611ec77

Browse files
snadampalhodlen
authored andcommitted
ggml : update softmax n_task calculation (ggml-org#5126)
updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.
1 parent 2dbab79 commit 611ec77

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ggml.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16597,7 +16597,7 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) {
1659716597
} break;
1659816598
case GGML_OP_SOFT_MAX:
1659916599
{
16600-
n_tasks = MIN(MIN(4, n_threads), ggml_nrows(node->src[0]));
16600+
n_tasks = MIN(n_threads, ggml_nrows(node->src[0]));
1660116601
} break;
1660216602
case GGML_OP_CONV_TRANSPOSE_1D:
1660316603
{

0 commit comments

Comments
 (0)