Add git-based build information for better issue tracking #1232

DannyDaemonic · 2023-04-29T13:05:47Z

Description:

This pull request adds a build-info.h header file, which will contain git-related build information such as branch and commit count. This will help users and maintainers to identify the build being used when reporting issues or bugs. The build-info.h file is generated using CMake and Makefile whenever .git/index changes, and is printed to stderr in main.cpp.

Changes:

Modified .gitignore to include build-info.h.
Added build information generation to CMakeLists.txt (and updated main's CMakeLists.txt to add a dependency).
Added build information generation to Makefile.
Modified examples/main/main.cpp to print build information at runtime.

This will allow us to track the build version in a more granular way and may be helpful in identifying and addressing issues or bugs until an official versioning system is introduced.

Edit: In the end this also ended being added to all examples.

slaren · 2023-04-29T14:20:30Z

I think this is a good idea. One question, currently it looks like this outputs the number of commits in HEAD, but how do we find what version they are using from this information? I think I would prefer a commit hash id, such as obtained by git rev-parse --short HEAD.

DannyDaemonic · 2023-04-29T14:56:49Z

@slaren

but how do we find what version they are using from this information?

To go the other direction:

git rev-list HEAD --skip=$(( $(git rev-list HEAD --count) - BUILD_NUMBER)) --max-count=1

So if you were looking for 461:

$ git rev-list HEAD --skip=$(( $(git rev-list HEAD --count) - 461)) --max-count=1
305eb5afd51325e3142c01c17431febb7c67de87

It's not as straight forward I suppose. I just thought a build number would be something easier for people to understand and without looking up the hash, judge how old the build is, and at the same time allow people (or documentation) to say "you need at least build number 461 to do X."

If you want the direct hash, how would you feel about a combination? Something like:

build = 461 (305eb5afd51325e3142c01c17431febb7c67de87)
seed  = 189273487

j-f1 · 2023-04-29T15:03:59Z

How about pulling out the commit date?

build = 2022-04-29 (305eb5afd51325e3142c01c17431febb7c67de87)

Green-Sky · 2023-04-29T15:08:14Z

Did you test how this behaves when it is a cmake subdirectory?

edit: also git submodules. which have a .git file that points to the root.

DannyDaemonic · 2023-04-29T15:35:29Z

@j-f1:

The date might be better for a quick judge of how old something is. I thought a straight number component would be more useful in the long run because it could be used more easily programmatically.

For example, if we wanted to tag all the new quantizations with which build number they were quantized in and there were some quantization bugs fixed in commit number 670, you just have to test if the build number in the quantized file was < 760 and then warn the user they should requantize for best performance. The hash alone wouldn't be good enough and a date would mostly work, but it's not exact (and time zones?) and no one wants to parse date strings.

@Green-Sky:

I tried it with main (which is in another subdirectory, if that's what you mean?) and it worked fine. The BUILD_INFO project it makes is itself relative to the source root so even when I write CMAKE_CURRENT_SOURCE_DIR it's still using that path and should use only the .git at that that particular location but there might be some edge cases I'm not thinking of? I can replace CMAKE_CURRENT_SOURCE_DIR with CMAKE_SOURCE_DIR just to cover my bases.

It's still all configured on my dev machine if there's a particular cmake command you want me to try out.

Green-Sky · 2023-04-29T16:31:02Z

I tried it with main (which is in another subdirectory, if that's what you mean?)

hm no.

The BUILD_INFO project it makes is itself relative to the source root so even when I write CMAKE_CURRENT_SOURCE_DIR it's still using that path and should use only the .git at that that particular location but there might be some edge cases I'm not thinking of?

I was thinking of including llama.cpp as a git submodule or git subtree and including the cmake via eg add_subirectory(external/llama.cpp) .

CMAKE_CURRENT_SOURCE_DIR

In the llama.cpp's project root directory is the way to go, I think.

DannyDaemonic · 2023-04-29T16:32:33Z

I committed a change so it shows the hash and the build number. I think this is better after all.

main: build = 464 (5e7d90a838c2543687fcff2e603b1189b6bc1084)
main: seed  = 1682785118

Previously, this would have shown my branch instead (conditionally, because it wasn't master), but now it just always shows the hash, which I do admit is easier. The example even points to this exact commit: 5e7d90a838c2543687fcff2e603b1189b6bc1084

And while I don't think it was necessary (because BUILD_INFO is treated as its own project in cmake), I didn't see any harm in replacing all the CMAKE_CURRENT_SOURCE_DIR with CMAKE_SOURCE_DIR just to be extra careful. (Spoiler, I undid this below.)

DannyDaemonic · 2023-04-29T16:37:15Z

@Green-Sky Oh I see what you're saying. If someone else is using this project in their own project, cmake should be smart about that since it would still be a sub directory, but I haven't tested it and now I think CMAKE_CURRENT_SOURCE_DIR was actually the right thing to do. I need to go back undo that.

DannyDaemonic · 2023-04-29T17:09:53Z

Alright, I put CMAKE_CURRENT_SOURCE_DIR back and checked for any other spots it may have caused issues when used with add_subdirectory(). I also made the building comments a little more descript.

sw

If the working directory is not a git repo (for example when downloading a ZIP from Github), the Cmake build fails:

...
-- Configuring done (0.4s)
CMake Error at examples/main/CMakeLists.txt:5 (add_dependencies):
  The dependency target "BUILD_INFO" of target "main" does not exist.

Can we handle this gracefully?

DannyDaemonic · 2023-04-29T19:07:16Z

@sw Yeah, it's embarrassingly easy to fix. Everything else is all set up to generate dummy values. I just overlooked that dependency line because I originally wasn't going to add it (since BUILD_INFO is it's own thing and part of ALL) but added it the last minute in case someone builds only main (which could cause other issues anyway), and it missed the .git testing. Give me a second.

Edit: git will still spit out an error claiming "not a git repository", but only when configuring, not during builds. I can also hide that but I think maybe it's good people see that so if they look at their logs they will understand why their build info is "unknown"?

sw · 2023-04-29T19:30:46Z

Thanks @DannyDaemonic, I've noticed another problem, though. CMake doesn't find the git executable:

-- Could NOT find Git (missing: GIT_EXECUTABLE)

This is on Ubuntu 22.04 with cmake 3.22.1 and git 2.34.1

Also seems to happen in the CI build: https://github.com/ggerganov/llama.cpp/actions/runs/4840382315/jobs/8625989141?pr=1232#step:4:22

I think you have to use find_package(Git) in the top-level CMakeLists.txt as well, not just in the generated BUILD_INFO.cmake.

examples/main/main.cpp

sw · 2023-04-29T19:58:34Z

Couldn't you add BUILD_INFO.cmake as a regular file in scripts/, instead of writing it to the build directory?

I think the goal of this PR is good, the implementation just seems very complicated. But I'm no Cmake expert.

DannyDaemonic · 2023-04-29T23:08:56Z

@sw

I think you have to use find_package(Git) in the top-level CMakeLists.txt as well, not just in the generated BUILD_INFO.cmake.

Hmm, I don't know why that would make a difference when BUILD_INFO has its own top-level project (at build/BUILD_INFO.vcxproj. Unless you mean it didn't work when called during the config but then it worked after? Either way, I added a fallback on which, that always works (outside of Windows anyway). I tested it and it was able to find git even in a subshell. I also double checked and it was working in Windows without the which work around. 🤷

As for making a separate cmake file, all of the other .cmake files are generated by the CMakeLists.txt files, so it seemed natural for it to create BUILD_INFO.cmake as well. It was also easier to work with the code in one place, but it has grown quite a bit since so I separated it out and left just the project stuff behind in CMakeLists.txt.

The other downside to separating it into scripts was it messed with the path stuff Green-Sky and I were talking about earlier. Even thought I was using CMAKE_CURRENT_SOURCE_DIR it was using the build directory and apparently you can't override the source directories with -D. I was able to find a way to do it with nmake 3.12, which luckily, we already require, it but it took longer than I'd like to admit to figure out what was wrong...

@slaren

Alright. I've put it in all the examples. I tried to place it after parameter checks that could cause an exit, but before the seed is written out so that they'd be printed together. This also means I had to update the makefile to depend on the .h file for generation. The only thing is this made the Makefile a bit messy. We could probably clean it up with a pattern. But I'm not sure that's desired. I can always try to clean it up in a separate PR to see if that's prefered.

slaren · 2023-04-29T23:46:33Z

This is very minor, but I wonder if the filter-out in the Makefile could be replaced by a more generic $(filter-out %.h,$^).

DannyDaemonic · 2023-04-30T00:27:12Z

@slaren Oh, Yeah, I could do that much without reshaping the makefile. Previously, I was thinking of something like this:

# Define the targets
TARGETS := main quantize quantize-stats perplexity embedding

# Define dependencies for each target
main_DEPS := ggml.o llama.o common.o $(OBJS)
quantize_DEPS := ggml.o llama.o $(OBJS)
quantize-stats_DEPS := ggml.o llama.o $(OBJS)
perplexity_DEPS := ggml.o llama.o common.o $(OBJS)
embedding_DEPS := ggml.o llama.o common.o $(OBJS)

# Pattern rule for compiling and linking
%.out: %.cpp $(%_DEPS) build-info.h
	$(CXX) $(CXXFLAGS) $(filter-out *.h,$^) -o $@ $(LDFLAGS)

But I don't know if that's too far removed from a ~~normal~~ simple makefile.

slaren · 2023-04-30T00:42:38Z

Yeah that probably would be better in the long run. I also noticed that the LLAMA_GPROF and LLAMA_PERF flags don't work with cuBLAS. There is some refactoring to do there, but that would be a different PR.

ggerganov

Use 4 spaces indentation in the CMake

DannyDaemonic · 2023-04-30T08:08:25Z

@ggerganov: Oh, good catch.

@slaren I tried to make that particular pattern/rule work but it was really not having it. The only way to join the target with a variable is to use an $eval.

It still looks great to use:

TARGETS_CPP += quantize
DEPS_quantize := examples/quantize/quantize.cpp ggml.o llama.o

TARGETS_CPP += quantize-stats
DEPS_quantize-stats := examples/quantize-stats/quantize-stats.cpp ggml.o llama.o

TARGETS_CPP += perplexity
DEPS_perplexity := examples/perplexity/perplexity.cpp ggml.o llama.o common.o

TARGETS_CPP += embedding
DEPS_embedding := examples/embedding/embedding.cpp ggml.o llama.o common.o

But then you have something like this at the end that includes all the extras (OBJS, build-info, etc):

define template_cpp
OUTP_$(1) ?= $(1)
$(1): $$(DEPS_$(1)) $$(OBJS) build-info.h
	$$(CXX) $$(CXXFLAGS) $$(filter-out build-info.h,$$^) -o $$(OUTP_$(1)) $$(LDFLAGS)
	$$(if $$(value EXEC_$(1)),$$(EXEC_$(1)))
endef

$(foreach target,$(TARGETS_CPP),$(eval $(call template_cpp,$(target))))

I commented it nicely but I don't know if that's still preferred.

Thoughts?

SlyEcho · 2023-04-30T09:04:50Z

Makefile

+define template_cpp
+OUTP_$(1) ?= $(1)
+$(1): $$(DEPS_$(1)) $$(OBJS) build-info.h
+	$$(CXX) $$(CXXFLAGS) $$(filter-out build-info.h,$$^) -o $$(OUTP_$(1)) $$(LDFLAGS)
+	$$(if $$(value EXEC_$(1)),$$(EXEC_$(1)))
+endef


This is too "smart", in my humble opinion.

The Makefile could be much simpler, actually.

VPATH= examples examples/main %.o: %.c $(CC) $(CPPFLAGS) $(CFLAGS) -c $< -o $@ %.o: %.cpp $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< -o $@ %: %.o $(CXX) $(LDFLAGS) -o $@ $^ $(LDLIBS) ggml.o: ggml.h ggml-cuda.h llama.o: ggml.h ggml-cuda.h llama.h llama_util.h common.o: common.h main: ggml.o llama.o common.o $(OBJS)

The Makefile was already pretty busy when I started trying to edit build-info.h in. Each target had its own rule. Since the build info has to be generated from git, the example projects have to depend on it. But putting it in the prerequisites was somehow causing the compilers on macOS to try compiling it and failing, so you then have to exclude it from the compiler with $(filter-out build-info.h, $^). I couldn't get it to pass the build tests any other way, so I had to edit that into every example target.

Originally, I had thought I could use a static pattern rule for this (even mentioned as much earlier) and hoped to wrap up all the example target rules into one pattern rule so everyone wouldn't have to see a dozen lines of $(filter-out build-info.h, $^).

My goal was to put keep all the weird stuff in one place. This way, no has to understand, or copy and paste without understanding, code to change the makefile when a new project is added. There were so many little things different. It seemed each target had its own exception. (main wants to echo text when it's done, benchmark's source is actually benchmark-q4_0-matmult.c not benchmark.c.

Maybe if I had started with a Makefile from scratch I could have come up with a better solution, but I just wanted to get the example projects in line with the change. I do think a pattern rule is ideal, but I just couldn't get one to work in practice.

I hated the foreach eval when I first thought about it, but after stumbling over patterns for way too long, it seemed like the only solution where each example, each program, each test, could just be written the same way in the Makefile. My argument for this approach is it's easy to add future examples and if you need to change the way we're building those examples you just need to edit one line (the template), which also means escaping your $ with another $.

What I can do is revert it to how it was before and just make the bare minimum changes to each target.. After spending a little too much time fighting with patterns, I kind of wish I had just done to begin with, but since I arrived at a destination that wasn't too bad I decided to push it and see what people's thoughts were. Part of the reason I pointed out the "smart" code in a comment after my commit was so that I could try to judge people's comfort level with it. I just want to make sure the code is easy to follow and that people are comfortable updating it if needed.

I guess it's not too crazy if I look at the whole file.

The Makefile was already pretty busy when I started trying to edit build-info.h in. Each target had its own rule. Since the build info has to be generated from git, the example projects have to depend on it. But putting it in the prerequisites was somehow causing the compilers on macOS to try compiling it and failing

Wouldn't it be better to generate build-info.c that has global variables and have everyone link to build-info.o? It also means that it wouldn't be necessary to rebuild everything on every git pull if some files have not changed.

main wants to echo text when it's done

Double colon rule could work?

benchmark's source is actually benchmark-q4_0-matmult.c not benchmark.c

I would say it's two targets, a PHONY benchmark that depends on benchmark-q4_0-matmult and runs it.

I do think the general consensus has been that Makefile needs some refactoring, but this PR was never intended to do that. I've reset it to a more minimally changed version and just tried to make it less ugly with *.h instead of build-info.h, which actually covers future cases of header only includes. I still have my new Makefile stashed away; maybe I'll open a different pull request with a draft for cleaning up Makefile to see if there's any interest there, but I'm less comfortable taking that role as I'm not the person who edits new stuff in regularly.

Building a build-info.o would add a little more complexity to the process, but I do agree it would be most effective at preventing unnecessary recompilation. I imagine when there's an official version system, that's how it will be done (but included in ggml.o, llama.o, or both). I still like the minimalism of the header to get build info into the examples. And while it can cause unnecessary rebuilds, this is limited to the examples and doesn't include the larger files like ggml.c and llama.c and the other .o (and .so) files.

…ild into to all examples, added dependencies to Makefile

DannyDaemonic · 2023-05-01T02:13:57Z

Rebased with less "fancy" Makefile.

Used shorter hash as originally recommended by @slaren. If this is no longer desired, let me know and I'll put it back.

…wouldn't change it

Green-Sky added enhancement New feature or request build Compilation issues labels Apr 29, 2023

DannyDaemonic force-pushed the build-info branch from 378233a to 947e94f Compare April 29, 2023 17:07

DannyDaemonic force-pushed the build-info branch from 947e94f to ecbfe9d Compare April 29, 2023 18:19

sw suggested changes Apr 29, 2023

View reviewed changes

DannyDaemonic requested a review from sw April 29, 2023 19:24

slaren reviewed Apr 29, 2023

View reviewed changes

examples/main/main.cpp Outdated Show resolved Hide resolved

DannyDaemonic force-pushed the build-info branch 2 times, most recently from 40aec68 to 8977b8a Compare April 29, 2023 22:40

sw approved these changes Apr 29, 2023

View reviewed changes

ggerganov approved these changes Apr 30, 2023

View reviewed changes

DannyDaemonic force-pushed the build-info branch from 3866987 to e3a515f Compare April 30, 2023 07:54

SlyEcho reviewed Apr 30, 2023

View reviewed changes

DannyDaemonic force-pushed the build-info branch from 8aeda32 to b1dc97b Compare May 1, 2023 01:44

DannyDaemonic added 7 commits April 30, 2023 18:44

Add git-based build information for better issue tracking

8272f4a

macOS fix

fbcfc84

"build (hash)" and "CMAKE_SOURCE_DIR" changes

db0b835

Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages

eac5d68

Fix conditional dependency on missing target

d2b6d2c

Broke out build-info.cmake, added find_package fallback, and added bu…

3e7ec6e

…ild into to all examples, added dependencies to Makefile

4 space indenting for cmake, attempt to clean up my mess in Makefile

ddb3e88

DannyDaemonic force-pushed the build-info branch 5 times, most recently from 8653aca to 8f8715c Compare May 1, 2023 02:09

DannyDaemonic force-pushed the build-info branch from 8f8715c to f58307a Compare May 1, 2023 03:01

Short hash, less fancy Makefile, and don't modify build-info.h if it …

5289251

…wouldn't change it

DannyDaemonic force-pushed the build-info branch from f58307a to 5289251 Compare May 1, 2023 03:05

ggerganov approved these changes May 1, 2023

View reviewed changes

slaren approved these changes May 1, 2023

View reviewed changes

slaren merged commit f4cef87 into ggml-org:master May 1, 2023

DannyDaemonic deleted the build-info branch May 1, 2023 22:40

kuvaus mentioned this pull request May 2, 2023

fix build-info.h for git submodules #1289

Merged

kuvaus mentioned this pull request May 11, 2023

gpt4all-backend: Add temperature sampling with repetition penalty nomic-ai/gpt4all#541

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Add git-based build information for better issue tracking #1232

Add git-based build information for better issue tracking #1232

Uh oh!

Conversation

DannyDaemonic commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Changes:

Uh oh!

slaren commented Apr 29, 2023

Uh oh!

DannyDaemonic commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

j-f1 commented Apr 29, 2023

Uh oh!

Green-Sky commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DannyDaemonic commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented Apr 29, 2023

Uh oh!

DannyDaemonic commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DannyDaemonic commented Apr 29, 2023

Uh oh!

DannyDaemonic commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sw left a comment

Choose a reason for hiding this comment

Uh oh!

DannyDaemonic commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sw commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sw commented Apr 29, 2023

Uh oh!

DannyDaemonic commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Apr 29, 2023

Uh oh!

DannyDaemonic commented Apr 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Apr 30, 2023

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

DannyDaemonic commented Apr 30, 2023

Uh oh!

SlyEcho Apr 30, 2023

Choose a reason for hiding this comment

Uh oh!

DannyDaemonic Apr 30, 2023

Choose a reason for hiding this comment

Uh oh!

SlyEcho Apr 30, 2023

Choose a reason for hiding this comment

Uh oh!

DannyDaemonic May 1, 2023

Choose a reason for hiding this comment

Uh oh!

DannyDaemonic commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DannyDaemonic commented Apr 29, 2023 •

edited

Loading

DannyDaemonic commented Apr 29, 2023 •

edited

Loading

Green-Sky commented Apr 29, 2023 •

edited

Loading

DannyDaemonic commented Apr 29, 2023 •

edited

Loading

DannyDaemonic commented Apr 29, 2023 •

edited

Loading

DannyDaemonic commented Apr 29, 2023 •

edited

Loading

DannyDaemonic commented Apr 29, 2023 •

edited

Loading

sw commented Apr 29, 2023 •

edited

Loading

DannyDaemonic commented Apr 29, 2023 •

edited

Loading

DannyDaemonic commented Apr 30, 2023 •

edited

Loading

DannyDaemonic commented May 1, 2023 •

edited

Loading