You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
[apologies for early send, accidentally hit enter]
Hey there! Turns out we think on extremely similar wavelengths - I did the exact same thing as you, for the exact same reasons (libraryification), and through the use of similar abstractions: https://github.com/philpax/ggllama
Couple of differences I spotted on my quick perusal:
My version builds on both Windows and Linux, but fails to infer correctly past the first round. Windows performance is also pretty crappy because ggml doesn't support multithreading on Windows.
I use PhantomData with the Tensors to prevent them outliving the Context they're spawned from.
I vendored llama.cpp in so that I could track it more directly and use its ggml.c/h, and to make it obvious which version I was porting.
Given yours actually works, I think that it's more promising :p
What are your immediate plans, and what do you want people to help you out with? My plan was to get it working, then librarify it, make a standalone Discord bot with it as a showcase, and then investigate using a Rust-native solution for the tensor manipulation (burn, ndarray, arrayfire, etc) to free it from the ggml dependency.
ryboe, setzer22, faassen, melMass, Rewarp and 12 more