Skip to content

Fix CI #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LegNeato opened this issue Jan 27, 2025 · 18 comments
Closed

Fix CI #133

LegNeato opened this issue Jan 27, 2025 · 18 comments

Comments

@LegNeato
Copy link
Contributor

CI is broken and hasn't worked in years. Now that we are rebooting the project (#130) we need CI.

@buk0vec
Copy link

buk0vec commented Jan 27, 2025

Is the plan for CI to still not include testing due to lack of CUDA support in actions? I don't really see another workaround aside from self-hosted runners which I don't think is super feasible on a public repo

@LegNeato
Copy link
Contributor Author

The first step is getting it to build. Then making sure any compiletests work.

At that point I'll rustle up some sponsors for hosted runners (I think they work fine for public repos? Embark had them for rust-gpu)

@MikeRomaniuk
Copy link

For the CI there is a great tool: https://github.com/matklad/cargo-xtask
I'd be eager to see this adopted in this project.

@Schmiedium
Copy link
Contributor

I've made some progress on this, I updated CI in my own fork so that it gets far enough to build the project, at least on the Ubuntu container. I've made some progress towards getting the project to build, there were a lot of places where the codegen no longer adheres to the rustc private api, and I think I've fixed and updated a lot of those issues.

I'm going to look into the rust-gpu codegen crate to see what that looks like and hopefully gain some insight that'll help me fix the nvvm codegen

@LegNeato
Copy link
Contributor Author

LegNeato commented Mar 7, 2025

@Schmiedium I merged your PR, but I think the rustc version is going to be much too new to actually work.

@Schmiedium
Copy link
Contributor

Thank you! That makes sense, I wasn't quite sure which version to target here. Do you have a rough idea of which version would be good to try and target? I can try to use the same one as Rust-GPU, or do you think that's still too new?

@LegNeato
Copy link
Contributor Author

LegNeato commented Mar 7, 2025

Rust-gpu had to vendor in parts of the compiler to get to that version. I would do the one before, as that's the one I personally forward ported and it wasn't too hard. I'm not at a computer right now, but you should be able to check git history

@Schmiedium
Copy link
Contributor

Excellent, I'll target that then. Thanks!

@LegNeato
Copy link
Contributor Author

LegNeato commented Mar 13, 2025

@LegNeato
Copy link
Contributor Author

I don't use windows, anyone want to take a crack at fixing the windows CI link error? Comment here if so!

@Schmiedium
Copy link
Contributor

I can take a crack at it, I think it should just be some environment variables to set properly. I was planning to put together a windows container with CUDA pre installed to hopefully speed some things up too, so if that works it'll hopefully speed up and fix the windows CI

@jorge-ortega
Copy link
Collaborator

@Schmiedium Windows CI failures are due to float_ext.rs being compiled on a non-cuda target. Nothing uses these in our crates, so differences in linker behavior for unused symbols are likely why we only see this on Windows. Likely fix seems to be to just mark those all as gpu_only.

@Schmiedium
Copy link
Contributor

Thanks! That makes sense, you think it's that the symbols go unused and the windows linker throws an error for that? I made the change you suggested and am running the CI test on my fork now. Hopefully that works.

I was trying to figure out how to get the linker to link the lib_cuda_std.rlib against the bitcode library that ships with the CUDA install. If the issue is that the symbols are unused though, that wouldn't have worked either.

@Schmiedium
Copy link
Contributor

Schmiedium commented Mar 17, 2025

@jorge-ortega Looks like that fixed it!

This is the error from the linker, it looks like. Seems like it generates a new symbol and then that newly generated one goes unfound

@jorge-ortega
Copy link
Collaborator

That makes sense, you think it's that the symbols go unused and the windows linker throws an error for that?

It's not that its unused so it errors, it's that even if its unused, the linker still checks if it can find all the symbols it references. You can change the linker behavior by passing /FORCE:UNRESOLVED to the linker via a build.rs (or other means), but don't think we should.

At some point, we might want to reevaluate how to vend mixed or gpu specific code. Marking as gpu_only just delays the error to runtime. I'd rather it fails at compile time if users make that mistake, but definitely not linker errors.

@Schmiedium
Copy link
Contributor

I see, so the goal is to make sure everything can be found even if it's not used.

At some point, we might want to reevaluate how to vend mixed or gpu specific code. Marking as gpu_only just delays the error to runtime. I'd rather it fails at compile time if users make that mistake, but definitely not linker errors.

I had the same concern, it did seem that this fix would just delay breakage until these functions do get used somewhere. I think you're right, if we can figure out how to convert that to a rust compiler error then that would be best. The linker error seems like the worst option, at least to me.

I'm still unclear why this didn't fail locally though? Like I can run the same commands as in CI, and I don't get that linker error. I couldn't even replicate the error locally

@LegNeato
Copy link
Contributor Author

Let's just get CI working and work on the longterm fix after.

@LegNeato
Copy link
Contributor Author

Looks like we have all green on main--windows and linux, CUDA 11.2 and 12.8!

https://github.com/Rust-GPU/Rust-CUDA/actions/runs/13913964011/job/38933766108

Let's close this and file new tasks as things come up.

Thanks @jorge-ortega and @Schmiedium ! 🍻 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants