-
Notifications
You must be signed in to change notification settings - Fork 12k
Add git-based build information for better issue tracking #1232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this is a good idea. One question, currently it looks like this outputs the number of commits in HEAD, but how do we find what version they are using from this information? I think I would prefer a commit hash id, such as obtained by |
To go the other direction: git rev-list HEAD --skip=$(( $(git rev-list HEAD --count) - BUILD_NUMBER)) --max-count=1 So if you were looking for 461: $ git rev-list HEAD --skip=$(( $(git rev-list HEAD --count) - 461)) --max-count=1
305eb5afd51325e3142c01c17431febb7c67de87 It's not as straight forward I suppose. I just thought a build number would be something easier for people to understand and without looking up the hash, judge how old the build is, and at the same time allow people (or documentation) to say "you need at least build number 461 to do X." If you want the direct hash, how would you feel about a combination? Something like:
|
How about pulling out the commit date?
|
Did you test how this behaves when it is a cmake subdirectory? edit: also git submodules. which have a .git file that points to the root. |
The date might be better for a quick judge of how old something is. I thought a straight number component would be more useful in the long run because it could be used more easily programmatically. For example, if we wanted to tag all the new quantizations with which build number they were quantized in and there were some quantization bugs fixed in commit number 670, you just have to test if the build number in the quantized file was < 760 and then warn the user they should requantize for best performance. The hash alone wouldn't be good enough and a date would mostly work, but it's not exact (and time zones?) and no one wants to parse date strings. I tried it with main (which is in another subdirectory, if that's what you mean?) and it worked fine. The BUILD_INFO project it makes is itself relative to the source root so even when I write It's still all configured on my dev machine if there's a particular cmake command you want me to try out. |
hm no.
I was thinking of including llama.cpp as a git submodule or git subtree and including the cmake via eg
In the llama.cpp's project root directory is the way to go, I think. |
I committed a change so it shows the hash and the build number. I think this is better after all.
Previously, this would have shown my branch instead (conditionally, because it wasn't master), but now it just always shows the hash, which I do admit is easier. The example even points to this exact commit: 5e7d90a838c2543687fcff2e603b1189b6bc1084 And while I don't think it was necessary (because BUILD_INFO is treated as its own project in cmake), I didn't see any harm in replacing all the |
@Green-Sky Oh I see what you're saying. If someone else is using this project in their own project, cmake should be smart about that since it would still be a sub directory, but I haven't tested it and now I think |
378233a
to
947e94f
Compare
Alright, I put |
947e94f
to
ecbfe9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the working directory is not a git repo (for example when downloading a ZIP from Github), the Cmake build fails:
...
-- Configuring done (0.4s)
CMake Error at examples/main/CMakeLists.txt:5 (add_dependencies):
The dependency target "BUILD_INFO" of target "main" does not exist.
Can we handle this gracefully?
@sw Yeah, it's embarrassingly easy to fix. Everything else is all set up to generate dummy values. I just overlooked that dependency line because I originally wasn't going to add it (since BUILD_INFO is it's own thing and part of ALL) but added it the last minute in case someone builds only main (which could cause other issues anyway), and it missed the .git testing. Give me a second. Edit: git will still spit out an error claiming "not a git repository", but only when configuring, not during builds. I can also hide that but I think maybe it's good people see that so if they look at their logs they will understand why their build info is "unknown"? |
Thanks @DannyDaemonic, I've noticed another problem, though. CMake doesn't find the git executable:
This is on Ubuntu 22.04 with cmake 3.22.1 and git 2.34.1 Also seems to happen in the CI build: https://github.com/ggerganov/llama.cpp/actions/runs/4840382315/jobs/8625989141?pr=1232#step:4:22 I think you have to use |
Couldn't you add I think the goal of this PR is good, the implementation just seems very complicated. But I'm no Cmake expert. |
40aec68
to
8977b8a
Compare
Hmm, I don't know why that would make a difference when BUILD_INFO has its own top-level project (at As for making a separate cmake file, all of the other The other downside to separating it into Alright. I've put it in all the examples. I tried to place it after parameter checks that could cause an exit, but before the seed is written out so that they'd be printed together. This also means I had to update the makefile to depend on the .h file for generation. The only thing is this made the Makefile a bit messy. We could probably clean it up with a pattern. But I'm not sure that's desired. I can always try to clean it up in a separate PR to see if that's prefered. |
This is very minor, but I wonder if the |
@slaren Oh, Yeah, I could do that much without reshaping the makefile. Previously, I was thinking of something like this: # Define the targets
TARGETS := main quantize quantize-stats perplexity embedding
# Define dependencies for each target
main_DEPS := ggml.o llama.o common.o $(OBJS)
quantize_DEPS := ggml.o llama.o $(OBJS)
quantize-stats_DEPS := ggml.o llama.o $(OBJS)
perplexity_DEPS := ggml.o llama.o common.o $(OBJS)
embedding_DEPS := ggml.o llama.o common.o $(OBJS)
# Pattern rule for compiling and linking
%.out: %.cpp $(%_DEPS) build-info.h
$(CXX) $(CXXFLAGS) $(filter-out *.h,$^) -o $@ $(LDFLAGS) But I don't know if that's too far removed from a |
Yeah that probably would be better in the long run. I also noticed that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 4 spaces indentation in the CMake
3866987
to
e3a515f
Compare
@ggerganov: Oh, good catch. @slaren I tried to make that particular pattern/rule work but it was really not having it. The only way to join the target with a variable is to use an It still looks great to use: TARGETS_CPP += quantize
DEPS_quantize := examples/quantize/quantize.cpp ggml.o llama.o
TARGETS_CPP += quantize-stats
DEPS_quantize-stats := examples/quantize-stats/quantize-stats.cpp ggml.o llama.o
TARGETS_CPP += perplexity
DEPS_perplexity := examples/perplexity/perplexity.cpp ggml.o llama.o common.o
TARGETS_CPP += embedding
DEPS_embedding := examples/embedding/embedding.cpp ggml.o llama.o common.o But then you have something like this at the end that includes all the extras (OBJS, build-info, etc): define template_cpp
OUTP_$(1) ?= $(1)
$(1): $$(DEPS_$(1)) $$(OBJS) build-info.h
$$(CXX) $$(CXXFLAGS) $$(filter-out build-info.h,$$^) -o $$(OUTP_$(1)) $$(LDFLAGS)
$$(if $$(value EXEC_$(1)),$$(EXEC_$(1)))
endef
$(foreach target,$(TARGETS_CPP),$(eval $(call template_cpp,$(target)))) I commented it nicely but I don't know if that's still preferred. Thoughts? |
Makefile
Outdated
define template_cpp | ||
OUTP_$(1) ?= $(1) | ||
$(1): $$(DEPS_$(1)) $$(OBJS) build-info.h | ||
$$(CXX) $$(CXXFLAGS) $$(filter-out build-info.h,$$^) -o $$(OUTP_$(1)) $$(LDFLAGS) | ||
$$(if $$(value EXEC_$(1)),$$(EXEC_$(1))) | ||
endef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too "smart", in my humble opinion.
The Makefile could be much simpler, actually.
VPATH= examples examples/main
%.o: %.c
$(CC) $(CPPFLAGS) $(CFLAGS) -c $< -o $@
%.o: %.cpp
$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< -o $@
%: %.o
$(CXX) $(LDFLAGS) -o $@ $^ $(LDLIBS)
ggml.o: ggml.h ggml-cuda.h
llama.o: ggml.h ggml-cuda.h llama.h llama_util.h
common.o: common.h
main: ggml.o llama.o common.o $(OBJS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Makefile was already pretty busy when I started trying to edit build-info.h
in. Each target had its own rule. Since the build info has to be generated from git, the example projects have to depend on it. But putting it in the prerequisites was somehow causing the compilers on macOS to try compiling it and failing, so you then have to exclude it from the compiler with $(filter-out build-info.h, $^)
. I couldn't get it to pass the build tests any other way, so I had to edit that into every example target.
Originally, I had thought I could use a static pattern rule for this (even mentioned as much earlier) and hoped to wrap up all the example target rules into one pattern rule so everyone wouldn't have to see a dozen lines of $(filter-out build-info.h, $^)
.
My goal was to put keep all the weird stuff in one place. This way, no has to understand, or copy and paste without understanding, code to change the makefile when a new project is added. There were so many little things different. It seemed each target had its own exception. (main wants to echo text when it's done, benchmark's source is actually benchmark-q4_0-matmult.c
not benchmark.c
.
Maybe if I had started with a Makefile from scratch I could have come up with a better solution, but I just wanted to get the example projects in line with the change. I do think a pattern rule is ideal, but I just couldn't get one to work in practice.
I hated the foreach
eval when I first thought about it, but after stumbling over patterns for way too long, it seemed like the only solution where each example, each program, each test, could just be written the same way in the Makefile. My argument for this approach is it's easy to add future examples and if you need to change the way we're building those examples you just need to edit one line (the template), which also means escaping your $
with another $
.
What I can do is revert it to how it was before and just make the bare minimum changes to each target.. After spending a little too much time fighting with patterns, I kind of wish I had just done to begin with, but since I arrived at a destination that wasn't too bad I decided to push it and see what people's thoughts were. Part of the reason I pointed out the "smart" code in a comment after my commit was so that I could try to judge people's comfort level with it. I just want to make sure the code is easy to follow and that people are comfortable updating it if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not too crazy if I look at the whole file.
The Makefile was already pretty busy when I started trying to edit build-info.h in. Each target had its own rule. Since the build info has to be generated from git, the example projects have to depend on it. But putting it in the prerequisites was somehow causing the compilers on macOS to try compiling it and failing
Wouldn't it be better to generate build-info.c
that has global variables and have everyone link to build-info.o
? It also means that it wouldn't be necessary to rebuild everything on every git pull
if some files have not changed.
main wants to echo text when it's done
Double colon rule could work?
benchmark's source is actually benchmark-q4_0-matmult.c not benchmark.c
I would say it's two targets, a PHONY benchmark
that depends on benchmark-q4_0-matmult
and runs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think the general consensus has been that Makefile needs some refactoring, but this PR was never intended to do that. I've reset it to a more minimally changed version and just tried to make it less ugly with *.h
instead of build-info.h
, which actually covers future cases of header only includes. I still have my new Makefile stashed away; maybe I'll open a different pull request with a draft for cleaning up Makefile to see if there's any interest there, but I'm less comfortable taking that role as I'm not the person who edits new stuff in regularly.
Building a build-info.o
would add a little more complexity to the process, but I do agree it would be most effective at preventing unnecessary recompilation. I imagine when there's an official version system, that's how it will be done (but included in ggml.o
, llama.o
, or both). I still like the minimalism of the header to get build info into the examples. And while it can cause unnecessary rebuilds, this is limited to the examples and doesn't include the larger files like ggml.c and llama.c and the other .o
(and .so
) files.
…ild into to all examples, added dependencies to Makefile
8653aca
to
8f8715c
Compare
Rebased with less "fancy" Makefile. Used shorter hash as originally recommended by @slaren. If this is no longer desired, let me know and I'll put it back. |
…wouldn't change it
Description:
This pull request adds a build-info.h header file, which will contain git-related build information such as branch and commit count. This will help users and maintainers to identify the build being used when reporting issues or bugs. The build-info.h file is generated using CMake and Makefile whenever
.git/index
changes, and is printed to stderr inmain.cpp
.Changes:
.gitignore
to includebuild-info.h
.CMakeLists.txt
(and updated main'sCMakeLists.txt
to add a dependency).Makefile
.examples/main/main.cpp
to print build information at runtime.This will allow us to track the build version in a more granular way and may be helpful in identifying and addressing issues or bugs until an official versioning system is introduced.
Edit: In the end this also ended being added to all examples.