Skip to content

Conversation

@mikepapadim
Copy link
Member

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces precision-aware embedding handling by storing embeddings in FP16 format and performing on-the-fly conversion to FP32 during GPU inference, reducing memory footprint while maintaining computational accuracy.

  • Embeddings are now stored in reduced precision (FP16) and converted to FP32 on GPU during inference
  • Introduced new TornadoVM conversion kernels (convertFP16toFP32 and convertFP32toFP16) for efficient precision conversion
  • Refactored tensor loading to support both standard and TornadoVM-compatible memory layouts with proper header handling

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
src/main/java/org/beehive/gpullama3/tornadovm/layers/Activation.java Updated activation layer to use FP16→FP32 conversion kernel instead of empty copy task
src/main/java/org/beehive/gpullama3/tornadovm/kernels/TransformerComputeKernels.java Added GPU kernels for FP16↔FP32 precision conversion
src/main/java/org/beehive/gpullama3/tensor/tornado/TornadoTensor.java Removed size tracking from base class (moved to subclasses), added spacing to documentation
src/main/java/org/beehive/gpullama3/tensor/tornado/Q8_0TornadoTensor.java Moved size field from parent to this class, added TODO for Q8_0 loading fix
src/main/java/org/beehive/gpullama3/tensor/tornado/FP32TornadoTensor.java Renamed field to tornadoNativeArray, added factory method for memory segment loading
src/main/java/org/beehive/gpullama3/tensor/tornado/FP16TornadoTensor.java Renamed field to tornadoNativeArray, added factory method for memory segment loading
src/main/java/org/beehive/gpullama3/tensor/GGUF.java Split tensor loading into loadTensorsStandard and loadTensorsTornado with proper memory layout handling
src/main/java/org/beehive/gpullama3/model/loader/*.java Updated model loaders to remove loadWeights parameter and use loadTornadoTensor instead of loadTornadoTensorAsFP32
src/main/java/org/beehive/gpullama3/model/loader/ModelLoader.java Updated to always load weights, replaced manual FP32 conversion with GPU-based approach
src/main/java/org/beehive/gpullama3/model/loader/AbstractModelLoader.java Removed loadWeights parameter, added logic to select tensor loading method based on TornadoVM usage
src/main/java/org/beehive/gpullama3/model/ModelType.java Updated all model type loaders to remove loadWeights parameter
src/main/java/org/beehive/gpullama3/inference/state/*.java Added embeddingX field (HalfFloatArray) to all state classes for FP16 embedding storage
src/main/java/org/beehive/gpullama3/inference/InferenceCore.java Modified embedding copy to use FP16 format (Short.BYTES) targeting state.embeddingX
set_paths Commented out TornadoVM path configuration variables
.github/workflows/build-and-run.yml Disabled Spotless formatting check, switched TornadoVM clone to develop branch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 52 to 53
System.out.println("[GGUF] fileChannel = FileChannel.open(modelPath, READ, WRITE);");
fileChannel = FileChannel.open(modelPath, READ, WRITE);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File opened with WRITE permission unnecessarily. The file is opened with READ and WRITE permissions, but based on the method name and usage, only read access should be required for loading metadata. Opening with write permissions creates an unnecessary security risk. Consider using only READ permission unless write access is actually needed.

Suggested change
System.out.println("[GGUF] fileChannel = FileChannel.open(modelPath, READ, WRITE);");
fileChannel = FileChannel.open(modelPath, READ, WRITE);
System.out.println("[GGUF] fileChannel = FileChannel.open(modelPath, READ);");
fileChannel = FileChannel.open(modelPath, READ);

Copilot uses AI. Check for mistakes.
* Loads GGUF tensor data using a TornadoVM-compatible memory layout.
*
* <p>This method parses the GGUF tensor list and memory-maps each tensor
* in {@link TornadoNativeArray} layout directly from the underlying{@link FileChannel}.
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space in JavaDoc. There should be a space between 'underlying' and the opening brace.

Suggested change
* in {@link TornadoNativeArray} layout directly from the underlying{@link FileChannel}.
* in {@link TornadoNativeArray} layout directly from the underlying {@link FileChannel}.

Copilot uses AI. Check for mistakes.
);

}
// @formatter:off
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect formatter comment. This should be @formatter:on to match the opening @formatter:off comment, not @formatter:off.

Suggested change
// @formatter:off
// @formatter:on

Copilot uses AI. Check for mistakes.
: new ChatTokens("<|im_start|>", "<|im_end|>", "", "<|end_of_text|>", "<|endoftext|>");
return new Qwen3(config, tokenizer, weights, ChatFormat.create(tokenizer, chatTokens));
}
// @formatter:off
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect formatter comment. This should be @formatter:on to match the opening @formatter:off comment, not @formatter:off.

Suggested change
// @formatter:off
// @formatter:on

Copilot uses AI. Check for mistakes.
);
return config;
}
// @formatter:off
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect formatter comment. This should be @formatter:on to match the opening @formatter:off comment, not @formatter:off.

Suggested change
// @formatter:off
// @formatter:on

Copilot uses AI. Check for mistakes.
run: |
cd ${{ github.workspace }}
./mvnw -T12C -Pspotless spotless:check
#./mvnw -T12C -Pspotless spotless:check
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code formatting check is commented out. The Spotless code formatting check has been commented out, which means code style violations will not be caught in CI. This should be re-enabled to maintain code quality standards, or removed entirely if no longer needed.

Suggested change
#./mvnw -T12C -Pspotless spotless:check
./mvnw -T12C -Pspotless spotless:check

Copilot uses AI. Check for mistakes.
Comment on lines 149 to 151
* Load a tensor and manually convert to FP32 (FloatArray).
* Used for embeddings that currently are treated as FP32.
* TODO: it is ultra-slow and should be removed
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated documentation. The JavaDoc states "Used for embeddings that currently are treated as FP32" and includes a TODO about removal, but based on the PR changes, embeddings are now handled as FP16 with GPU-based conversion. The documentation should be updated to clarify the current usage or deprecation status of this method.

Suggested change
* Load a tensor and manually convert to FP32 (FloatArray).
* Used for embeddings that currently are treated as FP32.
* TODO: it is ultra-slow and should be removed
* [DEPRECATED] Loads a tensor and manually converts it to FP32 (FloatArray).
* Embeddings are now handled as FP16 with GPU-based conversion; this method remains only for legacy compatibility.
* This method is ultra-slow and should not be used in new code. Scheduled for removal.

Copilot uses AI. Check for mistakes.

public class FP32TornadoTensor extends TornadoTensor {
private final FloatArray values;
private final FloatArray tornadoNativeArray;
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed field lacks consistency. The field was renamed from values to tornadoNativeArray, but this naming is inconsistent with the sibling class FP16TornadoTensor which also uses tornadoNativeArray. While the naming is now consistent between the two classes, a more descriptive name like data or array would be clearer since the "Tornado" context is already evident from the class name.

Copilot uses AI. Check for mistakes.
fields.wrapK = new FloatArray(nEmbdKGqa);
fields.wrapV = new FloatArray(nEmbdKGqa);

fields.embeddingX = new HalfFloatArray(config.dim());
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate field initialization. The field fields.embeddingX is initialized twice - once at line 70 and again at line 80 with the same value. The second initialization should be removed.

Suggested change
fields.embeddingX = new HalfFloatArray(config.dim());
// Removed duplicate initialization of fields.embeddingX

Copilot uses AI. Check for mistakes.
wrapX.set(i, x.get(i).getFloat32());
}

public static void convertFP32toFP16(KernelContext context, FloatArray wrapX, HalfFloatArray x) {
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace after parameter. There are two spaces between the comma and FloatArray - should be one space.

Suggested change
public static void convertFP32toFP16(KernelContext context, FloatArray wrapX, HalfFloatArray x) {
public static void convertFP32toFP16(KernelContext context, FloatArray wrapX, HalfFloatArray x) {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants