Skip to content

Add special token modification capability #6778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 142 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
9e4968c
Add special token modification capability
CISC Apr 20, 2024
8d36967
improve help text
CISC Apr 20, 2024
c4e6f6f
flake--
CISC Apr 20, 2024
a2410b6
fix multiple tokens warning
CISC Apr 20, 2024
aed82f6
common : try to fix Android CI (#6780)
ggerganov Apr 20, 2024
b8109bc
doc : server tests require llama to be built with curl enabled (#6788)
kaetemi Apr 20, 2024
e5956f5
make script executable
CISC Apr 21, 2024
ff5d21e
switch to namedtuple, no need to dataclass
CISC Apr 21, 2024
b97bc39
llama : support Llama 3 HF conversion (#6745)
pcuenca Apr 21, 2024
89b0bf0
llava : use logger in llava-cli (#6797)
jart Apr 21, 2024
2cca09d
readme : add Fedora instructions (#6783)
Man2Dev Apr 21, 2024
e8d35f4
doc : add link to falcon (#6789)
kaetemi Apr 21, 2024
c1386c9
gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761)
pmysl Apr 21, 2024
7dbdba5
llama : add llama-3 chat template (#6751)
DifferentialityDevelopment Apr 21, 2024
b9cc76d
ggml : fix ggml_backend_cpu_supports_op() for CPY (#0)
ggerganov Apr 21, 2024
40f74e4
llama : add option to render special/control tokens (#6807)
ggerganov Apr 21, 2024
5cf5e7d
`build`: generate hex dump of server assets during build (#6661)
ochafik Apr 21, 2024
e9b4a1b
flake.lock: Update
github-actions[bot] Apr 21, 2024
c0956b0
ci: fix job are cancelling each other (#6781)
phymbert Apr 22, 2024
8960fe8
llama : fix typo in <|im_end|> token text (#6745)
ggerganov Apr 22, 2024
e931888
ggml : fix calloc argument ordering. (#6820)
airlied Apr 22, 2024
192090b
llamafile : improve sgemm.cpp (#6796)
jart Apr 22, 2024
4e96a81
[SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 fl…
aahouzi Apr 23, 2024
c8297c6
llama : add phi3 support (#6852)
liuwei-git Apr 24, 2024
3fec68b
convert : add support of codeqwen due to tokenizer (#6707)
JustinLin610 Apr 24, 2024
abd3314
llama : add phi 3 chat template (#6857)
tristandruyen Apr 24, 2024
c0d1b3e
ggml : move 32-bit arm compat in ggml-impl.h (#6865)
ggerganov Apr 24, 2024
28103f4
Server: fix seed for multiple slots (#6835)
JohannesGaessler Apr 24, 2024
37246b1
common : revert showing control tokens by default for server (#6860)
K-Mistele Apr 24, 2024
3fe847b
server : do not apply Markdown formatting in code sections (#6850)
mgroeber9110 Apr 24, 2024
b4e4b8a
llama : add llama_get_pooling_type function (#6862)
iamlemec Apr 24, 2024
784e11d
README: add graphic for matrix multiplication (#6881)
JohannesGaessler Apr 24, 2024
1966eb2
quantize : add '--keep-split' to quantize model into shards (#6688)
zj040045 Apr 25, 2024
aa750c1
tests : minor bash stuff (#6902)
ggerganov Apr 25, 2024
5477041
ggml : fix MIN / MAX macros (#6904)
ggerganov Apr 25, 2024
4ab99d8
clip : rename lerp function to avoid conflict (#6894)
danbev Apr 25, 2024
5154372
ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906)
ggerganov Apr 25, 2024
0ead1f1
llama : check that all the tensor data is in the model file (#6885)
slaren Apr 25, 2024
3fe0596
readme : update model list (#6908)
BarfingLemurs Apr 25, 2024
853d06f
ci : tmp disable slow tests
ggerganov Apr 25, 2024
d6e1d44
llama : synchronize before get/set session data (#6911)
slaren Apr 25, 2024
fa0b4ad
cmake : remove obsolete ANDROID check
ggerganov Apr 25, 2024
dba497e
cmake : restore LLAMA_LLAMAFILE_DEFAULT
ggerganov Apr 25, 2024
46e12c4
llava : add support for moondream vision language model (#6899)
vikhyat Apr 25, 2024
5790c8d
bench: server add stop word for PHI-2 (#6916)
phymbert Apr 26, 2024
7d641c2
ci: fix concurrency for pull_request_target (#6917)
phymbert Apr 26, 2024
d4a9afc
ci: server: fix python installation (#6918)
phymbert Apr 26, 2024
83b72cb
Merge pull request from GHSA-p5mv-gjc5-mwqv
ggerganov Apr 26, 2024
9e4e077
ci: server: fix python installation (#6922)
phymbert Apr 26, 2024
7f5ff55
server: stop generation at `n_ctx_train` if `n_predict` is not set (#…
phymbert Apr 26, 2024
bbe3c6e
ci: server: fix python installation (#6925)
phymbert Apr 26, 2024
4b1c3c9
llamafile : use 64-bit integers in sgemm (#6928)
jart Apr 26, 2024
e2764cd
gguf : fix mismatch between alloc and free functions (#6929)
slaren Apr 26, 2024
017e699
add basic tensor data validation function (#6884)
slaren Apr 26, 2024
0c4d489
quantize: add imatrix and dataset metadata in GGUF (#6658)
phymbert Apr 26, 2024
928e0b7
Reset schedule earlier to allow overlap with ggml graph computation o…
agray3 Apr 26, 2024
b736833
ci: server: tests python env on github container ubuntu latest / fix …
phymbert Apr 27, 2024
4dba7e8
Replace "alternative" boolean operator in conditional compilation dir…
mgroeber9110 Apr 27, 2024
6e472f5
flake.lock: Update
github-actions[bot] Apr 28, 2024
ce023f6
add device version in device list (#6959)
arthw Apr 28, 2024
7bb36cc
gguf : enforce that tensor names are unique (#6905)
ngxson Apr 28, 2024
e00b4a8
Fix more int overflow during quant (PPL/CUDA). (#6563)
dranger003 Apr 28, 2024
c4f708a
llama : fix typo LAMMAFILE -> LLAMAFILE (#6974)
JohannesGaessler Apr 29, 2024
ca7f29f
ci : add building in MSYS2 environments (Windows) (#6967)
przemoc Apr 29, 2024
577277f
make : change GNU make default CXX from g++ to c++ (#6966)
przemoc Apr 29, 2024
3055a41
convert : fix conversion of some BERT embedding models (#6937)
christianazinn Apr 29, 2024
3f16747
sampling : use std::random_device{}() for default random seed (#6962)
dwrensha Apr 29, 2024
f4ab2a4
llama : fix BPE pre-tokenization (#6920)
ggerganov Apr 29, 2024
24affa7
readme : update hot topics
ggerganov Apr 29, 2024
ffe6665
llava-cli : multiple images (#6969)
cpumaxx Apr 29, 2024
544f1f1
ggml : fix __MSC_VER -> _MSC_VER (#6977)
ggerganov Apr 29, 2024
d2c898f
ci : tmp disable gguf-split (#6983)
ggerganov Apr 29, 2024
b8a7a5a
build(cmake): simplify instructions (`cmake -B build && cmake --build…
ochafik Apr 29, 2024
5539e6f
main : fix typo in comment in main.cpp (#6985)
danbev Apr 29, 2024
b8c1476
Extending grammar integration tests (#6644)
HanClinto Apr 29, 2024
8843a98
Improve usability of --model-url & related flags (#6930)
ochafik Apr 29, 2024
952d03d
convert : use utf8 encoding (#7000)
ggerganov Apr 30, 2024
9c67c27
ggml : add Flash Attention (#5021)
ggerganov Apr 30, 2024
a68a1e7
metal : log more info on error (#6987)
bakkot Apr 30, 2024
77e15be
metal : remove deprecated error code (#7008)
ggerganov Apr 30, 2024
f364eb6
switch to using localizedDescription (#7010)
bakkot Apr 30, 2024
a8f9b07
perplexity: more statistics, added documentation (#6936)
JohannesGaessler Apr 30, 2024
c4ec9c0
ci : exempt confirmed bugs from being tagged as stale (#7014)
slaren May 1, 2024
1613ef8
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)
JohannesGaessler May 1, 2024
3ea0d36
Server: add tests for batch size, different seeds (#6950)
JohannesGaessler May 1, 2024
8d608a8
main : fix off by one error for context shift (#6921)
l3utterfly May 1, 2024
b0d943d
Update LOG_IMPL and LOG_TEE_IMPL (#7029)
a-downing May 1, 2024
6ecf318
chore: fix typo in llama.cpp (#7032)
alwqx May 2, 2024
60325fa
Remove .attention from skipped tensors to match more accurately (#7051)
bartowski1182 May 2, 2024
433def2
llama : rename ctx to user_data in progress_callback (#7045)
danbev May 3, 2024
a2ac89d
convert.py : add python logging instead of print() (#6511)
mofosyne May 3, 2024
92139b9
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
ggerganov May 4, 2024
03fb8a0
If first token generated from the server is the stop word the server …
maor-ps May 4, 2024
fcd84a0
Fix Linux /sys cpu path to guess number of cores (#7064)
viric May 4, 2024
cf768b7
Tidy Android Instructions README.md (#7016)
Jeximo May 4, 2024
8425001
gguf-split: add --no-tensor-first-split (#7072)
ngxson May 4, 2024
d39f203
typing++
CISC May 4, 2024
158215c
add progress bar
CISC May 4, 2024
6fbd432
py : logging and flake8 suppression refactoring (#7081)
mofosyne May 5, 2024
889bdd7
command-r : add BPE pre-tokenization (#7063)
dranger003 May 5, 2024
ca36326
readme : add note that LLaMA 3 is not supported with convert.py (#7065)
lyledean1 May 5, 2024
8f8acc8
Disable benchmark on forked repo (#7034)
CISC May 5, 2024
628b299
Adding support for the --numa argument for llama-bench. (#7080)
kunnis May 5, 2024
bcdee0d
minor : fix trailing whitespace
ggerganov May 6, 2024
b3a995b
flake.lock: Update (#7079)
ggerganov May 6, 2024
858f6b7
Add an option to build without CUDA VMM (#7067)
WilliamTambellini May 6, 2024
947d3ad
ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098)
ggerganov May 7, 2024
04976db
docs: fix typos (#7124)
omahs May 7, 2024
3af34c1
main : update log text (EOS to EOG) (#7104)
RhinoDevel May 7, 2024
53d6c52
readme : update hot topics
ggerganov May 7, 2024
260b7c6
server : update readme with undocumented options (#7013)
K-Mistele May 7, 2024
b6aa670
Fix OLMo HF to GGUF conversion (#6910)
nopperl May 7, 2024
af0a5b6
server: fix incorrectly reported token probabilities (#7125)
JohannesGaessler May 7, 2024
48b2f9c
Fixed save_imatrix to match old behaviour for MoE (#7099)
jukofyork May 8, 2024
c780e75
Further tidy on Android instructions README.md (#7077)
Jeximo May 8, 2024
c0e6fbf
metal : fix unused warning
ggerganov May 8, 2024
3855416
ggml : introduce bfloat16 support (#6412)
jart May 8, 2024
acdce3c
compare-llama-bench.py: add missing basicConfig (#7138)
mofosyne May 8, 2024
7e0b6a7
py : also print the normalizers
ggerganov May 8, 2024
4cd621c
convert : add BPE pre-tokenization for DBRX (#7132)
dranger003 May 8, 2024
1fd9c17
clean up json_value & server_log (#7142)
ngxson May 8, 2024
229ffff
llama : add BPE pre-tokenization for Qwen2 (#7114)
jklj077 May 8, 2024
ad211ed
convert.py : --vocab-only generates false but valid params (#7027)
20kdc May 8, 2024
911b390
server : add_special option for tokenize endpoint (#7059)
JohanAR May 8, 2024
465263d
sgemm : AVX Q4_0 and Q8_0 (#6891)
netrunnereve May 8, 2024
83330d8
main : add --conversation / -cnv flag (#7108)
May 8, 2024
26458af
metal : use `vm_allocate` instead of `posix_memalign` on macOS (#7078)
giladgd May 8, 2024
bd1871f
server : add themes + favicon (#6848)
jboero May 8, 2024
9da243b
Revert "llava : add support for moondream vision language model (#6899)"
ggerganov May 8, 2024
c12452c
JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143)
JohannesGaessler May 8, 2024
bc4bba3
Introduction of CUDA Graphs to LLama.cpp (#6766)
agray3 May 8, 2024
f98eb31
convert-hf : save memory with lazy evaluation (#7075)
compilade May 8, 2024
4426e29
cmake : fix typo (#7151)
cebtenzzre May 8, 2024
ed72533
Add special token modification capability
CISC Apr 20, 2024
27caf19
improve help text
CISC Apr 20, 2024
8737ca1
flake--
CISC Apr 20, 2024
3e3e7c3
fix multiple tokens warning
CISC Apr 20, 2024
bc92f65
make script executable
CISC Apr 21, 2024
87e2d73
switch to namedtuple, no need to dataclass
CISC Apr 21, 2024
981bd44
typing++
CISC May 4, 2024
609df3c
add progress bar
CISC May 4, 2024
144d99a
Merge branch 'modify-special-tokens-metadata' of github.com:CISC/llam…
CISC May 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions .devops/main-intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,12 @@ WORKDIR /app

COPY . .

RUN mkdir build && \
cd build && \
if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
fi && \
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
cmake --build . --config Release --target main
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
cmake --build build --config Release --target main

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime

Expand Down
6 changes: 2 additions & 4 deletions .devops/main-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,8 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
# Build it
WORKDIR /app
COPY . .
RUN mkdir build && \
cd build && \
cmake .. -DLLAMA_VULKAN=1 && \
cmake --build . --config Release --target main
RUN cmake -B build -DLLAMA_VULKAN=1 && \
cmake --build build --config Release --target main

# Clean up
WORKDIR /
Expand Down
8 changes: 3 additions & 5 deletions .devops/server-intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,12 @@ WORKDIR /app

COPY . .

RUN mkdir build && \
cd build && \
if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
fi && \
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build . --config Release --target server
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build build --config Release --target server

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime

Expand Down
6 changes: 2 additions & 4 deletions .devops/server-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,8 @@ RUN apt-get update && \
# Build it
WORKDIR /app
COPY . .
RUN mkdir build && \
cd build && \
cmake .. -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build . --config Release --target server
RUN cmake -B build -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build build --config Release --target server

# Clean up
WORKDIR /
Expand Down
16 changes: 15 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
[flake8]
max-line-length = 125
ignore = W503
ignore = E203,E211,E221,E225,E231,E241,E251,E261,E266,E501,E701,E704,W503
exclude =
# Do not traverse examples
examples,
# Do not include package initializers
__init__.py,
# No need to traverse our git directory
.git,
# There's no value in checking cache directories
__pycache__,
# No need to include the build path
build,
# This contains builds that we don't want to check
dist # This is generated with `python build .` for package releases
# max-complexity = 10
22 changes: 16 additions & 6 deletions .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ on:
- cron: '04 2 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}-${{ github.event.inputs.sha }}
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref || github.run_id }}-${{ github.event.inputs.sha }}
cancel-in-progress: true

jobs:
Expand All @@ -52,7 +52,19 @@ jobs:
ftype: q4_0
pr_comment_enabled: "true"

if: ${{ github.event.inputs.gpu-series == 'Standard_NC4as_T4_v3' || github.event.schedule || github.event.pull_request || github.head_ref == 'master' || github.ref_name == 'master' || github.event.push.ref == 'refs/heads/master' }}
if: |
inputs.gpu-series == 'Standard_NC4as_T4_v3'
|| (
github.event_name == 'schedule'
&& github.ref_name == 'master'
&& github.repository_owner == 'ggerganov'
)
|| github.event_name == 'pull_request_target'
|| (
github.event_name == 'push'
&& github.event.ref == 'refs/heads/master'
&& github.repository_owner == 'ggerganov'
)
steps:
- name: Clone
id: checkout
Expand Down Expand Up @@ -96,9 +108,7 @@ jobs:
id: cmake_build
run: |
set -eux
mkdir build
cd build
cmake .. \
cmake -B build \
-DLLAMA_NATIVE=OFF \
-DLLAMA_BUILD_SERVER=ON \
-DLLAMA_CURL=ON \
Expand All @@ -109,7 +119,7 @@ jobs:
-DLLAMA_FATAL_WARNINGS=OFF \
-DLLAMA_ALL_WARNINGS=OFF \
-DCMAKE_BUILD_TYPE=Release;
cmake --build . --config Release -j $(nproc) --target server
cmake --build build --config Release -j $(nproc) --target server

- name: Download the dataset
id: download_dataset
Expand Down
57 changes: 57 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -593,6 +593,63 @@ jobs:
run: |
make swift

windows-msys2:
runs-on: windows-latest

strategy:
fail-fast: false
matrix:
include:
- { sys: UCRT64, env: ucrt-x86_64, build: Release }
- { sys: CLANG64, env: clang-x86_64, build: Release }

steps:
- name: Clone
uses: actions/checkout@v4

- name: Setup ${{ matrix.sys }}
uses: msys2/setup-msys2@v2
with:
update: true
msystem: ${{matrix.sys}}
install: >-
base-devel
mingw-w64-${{matrix.env}}-toolchain
mingw-w64-${{matrix.env}}-cmake
mingw-w64-${{matrix.env}}-openblas

- name: Build using make
shell: msys2 {0}
run: |
make -j $(nproc)

- name: Clean after building using make
shell: msys2 {0}
run: |
make clean

- name: Build using make w/ OpenBLAS
shell: msys2 {0}
run: |
make LLAMA_OPENBLAS=1 -j $(nproc)

- name: Build using CMake
shell: msys2 {0}
run: |
cmake -B build
cmake --build build --config ${{ matrix.build }} -j $(nproc)

- name: Clean after building using CMake
shell: msys2 {0}
run: |
rm -rf build

- name: Build using CMake w/ OpenBLAS
shell: msys2 {0}
run: |
cmake -B build -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS
cmake --build build --config ${{ matrix.build }} -j $(nproc)

windows-latest-cmake:
runs-on: windows-latest

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/close-issue.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
steps:
- uses: actions/stale@v5
with:
exempt-issue-labels: "refactor,help wanted,good first issue,research"
exempt-issue-labels: "refactor,help wanted,good first issue,research,bug"
days-before-issue-stale: 30
days-before-issue-close: 14
stale-issue-label: "stale"
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/python-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,4 @@ jobs:
- name: flake8 Lint
uses: py-actions/flake8@v2
with:
ignore: "E203,E211,E221,E225,E231,E241,E251,E261,E266,E501,E701,E704,W503"
exclude: "examples/*,examples/*/**,*/**/__init__.py"
plugins: "flake8-no-print"
41 changes: 19 additions & 22 deletions .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ on:
- cron: '2 4 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
Expand All @@ -41,23 +41,16 @@ jobs:
sanitizer: ""
fail-fast: false # While -DLLAMA_SANITIZE_THREAD=ON is broken

container:
image: ubuntu:latest
ports:
- 8888
options: --cpus 4

steps:
- name: Dependencies
id: depends
run: |
apt-get update
apt-get -y install \
sudo apt-get update
sudo apt-get -y install \
build-essential \
xxd \
git \
cmake \
python3-pip \
curl \
wget \
language-pack-en \
Expand All @@ -70,6 +63,17 @@ jobs:
fetch-depth: 0
ref: ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha || github.head_ref || github.ref_name }}

- name: Python setup
id: setup_python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Tests dependencies
id: test_dependencies
run: |
pip install -r examples/server/tests/requirements.txt

- name: Verify server deps
id: verify_server_deps
run: |
Expand All @@ -90,20 +94,14 @@ jobs:
- name: Build
id: cmake_build
run: |
mkdir build
cd build
cmake .. \
cmake -B build \
-DLLAMA_NATIVE=OFF \
-DLLAMA_BUILD_SERVER=ON \
-DLLAMA_CURL=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON ;
cmake --build . --config ${{ matrix.build_type }} -j $(nproc) --target server
cmake --build build --config ${{ matrix.build_type }} -j $(nproc) --target server

- name: Tests dependencies
id: test_dependencies
run: |
pip install -r examples/server/tests/requirements.txt

- name: Tests
id: server_integration_tests
Expand All @@ -129,6 +127,7 @@ jobs:
uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha || github.head_ref || github.ref_name }}

- name: libCURL
id: get_libcurl
Expand All @@ -142,10 +141,8 @@ jobs:
- name: Build
id: cmake_build
run: |
mkdir build
cd build
cmake .. -DLLAMA_CURL=ON -DCURL_LIBRARY="$env:RUNNER_TEMP/libcurl/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:RUNNER_TEMP/libcurl/include"
cmake --build . --config Release -j ${env:NUMBER_OF_PROCESSORS} --target server
cmake -B build -DLLAMA_CURL=ON -DCURL_LIBRARY="$env:RUNNER_TEMP/libcurl/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:RUNNER_TEMP/libcurl/include"
cmake --build build --config Release -j ${env:NUMBER_OF_PROCESSORS} --target server

- name: Python setup
id: setup_python
Expand Down
20 changes: 20 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
*.a
*.so
*.gguf
*.gguf.json
*.bin
*.exe
*.dll
Expand Down Expand Up @@ -34,6 +35,7 @@ lcov-report/
gcovr-report/

build*
!build.zig
cmake-build-*
out/
tmp/
Expand Down Expand Up @@ -100,7 +102,25 @@ qnt-*.txt
perf-*.txt

examples/jeopardy/results.txt
examples/server/*.html.hpp
examples/server/*.js.hpp
examples/server/*.mjs.hpp

poetry.lock
poetry.toml
nppBackup

# Test binaries
/tests/test-grammar-parser
/tests/test-llama-grammar
/tests/test-double-float
/tests/test-grad0
/tests/test-opt
/tests/test-quantize-fns
/tests/test-quantize-perf
/tests/test-sampling
/tests/test-tokenizer-0
/tests/test-tokenizer-1-spm
/tests/test-tokenizer-1-bpe
/tests/test-rope
/tests/test-backend-ops
5 changes: 3 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@
exclude: prompts/.*.txt
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
rev: 7.0.0
hooks:
- id: flake8
additional_dependencies: [flake8-no-print]
Loading
Loading