Skip to content

train-text-from-scratch.exe stop after "begin training" (tensor->src0 is null) #1869

@Entretoize

Description

@Entretoize

I'm running the latest release (master-254a7a7) like that:

bin\train-text-from-scratch.exe --vocab-model models\ggml-vocab.bin --checkpoint-in chk-lamartine-256x16.bin --checkpoint-out chk-lamartine-256x16.bin --model-out ggml-lamartine-265x16-f32.bin --train-data "shakespeare.txt"
I tried with several models.

Expected Behavior

Training shoud run for a long time

Current Behavior

Training stop immediatly without error:

D:\git\llama.cpp>bin\train-text-from-scratch.exe --vocab-model models\ggml-vocab.bin --ctx 64 --embd 256 --head 8 --layer 16 --checkpoint-in chk-lamartine-256x16.bin --checkpoint-out chk-lamartine-256x16.bin --model-out ggml-lamartine-265x16-f32.bin --train-data "alphonsedelamartine.txt" -t 6 -b 1 -n 32 --seed 2 --adam-iter 16 --print-details-interval 0 --predict 16 --use-flash
main: seed: 2
llama.cpp: loading model from models\ggml-vocab.bin
llama_model_load_internal: format     = ggjt v1 (pre #1405)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 1 (mostly F16)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
main: tokenize training data
main: number of training tokens: 474
print_params: n_vocab: 32000
print_params: n_ctx:   64
print_params: n_embd:  256
print_params: n_mult:  256
print_params: n_head:  8
print_params: n_ff:    768
print_params: n_layer: 16
print_params: n_rot:   32
main: number of unique tokens: 253
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3080
main: init model
load_checkpoint: Training iterations: 0.
load_checkpoint: Training samples:    0.
load_checkpoint: Training tokens:     0.
main: opt iter 0
used_mem model+cache: 242364416 bytes
main: begin training

Environment and Context

Windows 11
NVidia RTX 3080
Ryzen 7 2700
Ram 32GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions