Skip to content

Conversation

@Fletterio
Copy link
Contributor

Checks that the arithmetic and comparisons work as expected + checks it compiles for GPU


add_subdirectory(70_FLIPFluids)
add_subdirectory(71_RayTracingPipeline)
add_subdirectory(73_Mortons EXCLUDE_FROM_ALL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually can you make it example 14 or 15 because the low numbers is where I keep basic HLSL/C++

Comment on lines +10 to +15
[numthreads(256, 1, 1)]
[shader("compute")]
void main(uint3 invocationID : SV_DispatchThreadID)
{
if (invocationID.x == 0)
fillTestValues(inputTestValues[0], outputTestValues[0]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make a 1,1,1 workgroup and always call fillTextValues(inputTestValues[gl_GlobalInvocationID.x],outputTestValues[gl_GlobalInvocationID.x])

Comment on lines +253 to +255
// Disabled: current glm implementation is wrong
//verifyTestValue("subBorrowResult", expectedTestValues.subBorrow.result, testValues.subBorrow.result, testType);
//verifyTestValue("subBorrowBorrow", expectedTestValues.subBorrow.borrow, testValues.subBorrow.borrow, testType);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then use an alternative implementation of subBorrow and don't use GLM's

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigned @Przemog1 in next PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually glm merged my PR, and I pushed a fix to our branch. So it should be usable now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ye so its just updating glm


#include <nabla.h>
#include "app_resources/testCommon.hlsl"
#include "ITester.h"
Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we reusing things from a different example ?

if you want ITester to be general, please move to appropriate directory like common/include/nbl/examples/testing

see https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/tree/08c898d5af460ba6469a78fb625216e27a1bc8a8/common/include/nbl/examples

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigned @Przemog1 in next PR

Comment on lines +1 to +11
#ifndef _NBL_EXAMPLES_TESTS_22_CPP_COMPAT_I_TESTER_INCLUDED_
#define _NBL_EXAMPLES_TESTS_22_CPP_COMPAT_I_TESTER_INCLUDED_

#include <nabla.h>
#include "app_resources/common.hlsl"
#include "nbl/application_templates/MonoDeviceApplication.hpp"

using namespace nbl;

class ITester
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please unify with ex22, don't want this much duplicate code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigned @Przemog1 in next PR

Comment on lines +228 to +269
template<typename InputStruct, typename OutputStruct>
OutputStruct dispatch(const InputStruct& input)
{
// Update input buffer
if (!m_inputBufferAllocation.memory->map({ 0ull,m_inputBufferAllocation.memory->getAllocationSize() }, video::IDeviceMemoryAllocation::EMCAF_READ))
logFail("Failed to map the Device Memory!\n");

const video::ILogicalDevice::MappedMemoryRange memoryRange(m_inputBufferAllocation.memory.get(), 0ull, m_inputBufferAllocation.memory->getAllocationSize());
if (!m_inputBufferAllocation.memory->getMemoryPropertyFlags().hasFlags(video::IDeviceMemoryAllocation::EMPF_HOST_COHERENT_BIT))
m_device->invalidateMappedMemoryRanges(1, &memoryRange);

std::memcpy(static_cast<InputStruct*>(m_inputBufferAllocation.memory->getMappedPointer()), &input, sizeof(InputStruct));

m_inputBufferAllocation.memory->unmap();

// record command buffer
m_cmdbuf->reset(video::IGPUCommandBuffer::RESET_FLAGS::NONE);
m_cmdbuf->begin(video::IGPUCommandBuffer::USAGE::NONE);
m_cmdbuf->beginDebugMarker("test", core::vector4df_SIMD(0, 1, 0, 1));
m_cmdbuf->bindComputePipeline(m_pipeline.get());
m_cmdbuf->bindDescriptorSets(nbl::asset::EPBP_COMPUTE, m_pplnLayout.get(), 0, 1, &m_ds.get());
m_cmdbuf->dispatch(1, 1, 1);
m_cmdbuf->endDebugMarker();
m_cmdbuf->end();

video::IQueue::SSubmitInfo submitInfos[1] = {};
const video::IQueue::SSubmitInfo::SCommandBufferInfo cmdbufs[] = { {.cmdbuf = m_cmdbuf.get()} };
submitInfos[0].commandBuffers = cmdbufs;
const video::IQueue::SSubmitInfo::SSemaphoreInfo signals[] = { {.semaphore = m_semaphore.get(), .value = ++m_semaphoreCounter, .stageMask = asset::PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT} };
submitInfos[0].signalSemaphores = signals;

m_api->startCapture();
m_queue->submit(submitInfos);
m_api->endCapture();

m_device->waitIdle();
OutputStruct output;
std::memcpy(&output, static_cast<OutputStruct*>(m_outputBufferAllocation.memory->getMappedPointer()), sizeof(OutputStruct));
m_device->waitIdle();

return output;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we dispatching once per tests, could dispatch all tests in parallel (one invocation one test iteration)!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigned @Przemog1 in next PR

Comment on lines +205 to +248
/*
void fillSecondTestValues(NBL_CONST_REF_ARG(InputTestValues) input)
{
uint64_t2 Vec2A = { input.coordX, input.coordY };
uint64_t2 Vec2B = { input.coordZ, input.coordW };
uint64_t3 Vec3A = { input.coordX, input.coordY, input.coordZ };
uint64_t3 Vec3B = { input.coordY, input.coordZ, input.coordW };
uint64_t4 Vec4A = { input.coordX, input.coordY, input.coordZ, input.coordW };
uint64_t4 Vec4B = { input.coordY, input.coordZ, input.coordW, input.coordX };
int64_t2 Vec2ASigned = int64_t2(Vec2A);
int64_t2 Vec2BSigned = int64_t2(Vec2B);
int64_t3 Vec3ASigned = int64_t3(Vec3A);
int64_t3 Vec3BSigned = int64_t3(Vec3B);
int64_t4 Vec4ASigned = int64_t4(Vec4A);
int64_t4 Vec4BSigned = int64_t4(Vec4B);
morton::code<false, fullBits_4, 4, emulated_uint64_t> morton_emulated_4A = morton::code<false, fullBits_4, 4, emulated_uint64_t>::create(Vec4A);
morton::code<true, fullBits_2, 2, emulated_uint64_t> morton_emulated_2_signed = morton::code<true, fullBits_2, 2, emulated_uint64_t>::create(Vec2ASigned);
morton::code<true, fullBits_3, 3, emulated_uint64_t> morton_emulated_3_signed = morton::code<true, fullBits_3, 3, emulated_uint64_t>::create(Vec3ASigned);
morton::code<true, fullBits_4, 4, emulated_uint64_t> morton_emulated_4_signed = morton::code<true, fullBits_4, 4, emulated_uint64_t>::create(Vec4ASigned);
output.mortonEqual_emulated_4 = uint32_t4(morton_emulated_4A.equal<false>(uint16_t4(Vec4B)));
output.mortonUnsignedLess_emulated_4 = uint32_t4(morton_emulated_4A.lessThan<false>(uint16_t4(Vec4B)));
mortonSignedLess_emulated_2 = uint32_t2(morton_emulated_2_signed.lessThan<false>(int32_t2(Vec2BSigned)));
mortonSignedLess_emulated_3 = uint32_t3(morton_emulated_3_signed.lessThan<false>(int32_t3(Vec3BSigned)));
mortonSignedLess_emulated_4 = uint32_t4(morton_emulated_4_signed.lessThan<false>(int16_t4(Vec4BSigned)));
uint16_t castedShift = uint16_t(input.shift);
arithmetic_right_shift_operator<morton::code<true, fullBits_2, 2, emulated_uint64_t> > rightShiftSignedEmulated2;
mortonSignedRightShift_emulated_2 = rightShiftSignedEmulated2(morton_emulated_2_signed, castedShift);
arithmetic_right_shift_operator<morton::code<true, fullBits_3, 3, emulated_uint64_t> > rightShiftSignedEmulated3;
mortonSignedRightShift_emulated_3 = rightShiftSignedEmulated3(morton_emulated_3_signed, castedShift);
arithmetic_right_shift_operator<morton::code<true, fullBits_4, 4, emulated_uint64_t> > rightShiftSignedEmulated4;
mortonSignedRightShift_emulated_4 = rightShiftSignedEmulated4(morton_emulated_4_signed, castedShift);
}
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fletterio whats this commented out block about ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was some fucked up reason that was preventing me from running all tests in a single shader (likely some DXC bug) so I think I was in the middle of moving the commented code to a different shader

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so bug no longer there and we can remove this commented block of code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug is probably still there, I thought I had reported it but it seems not. These are tests that if for some reason you add them to testCommon.hlsl it will fail to compile. I had temporarily commented them out here and the idea was to have many different test shaders so these could be tested apart. Specifically these would be tests for comparison operators for a 4D morton with 16bits per coord stored in an emulated uint64, and tests for the arithmetic right shift operator for a 2D, 3D or 4D morton backed by an emulated uint64.

Comment on lines +22 to +36
#ifndef __HLSL_VERSION

constexpr uint64_t smallBitsMask_2 = (uint64_t(1) << smallBits_2) - 1;
constexpr uint64_t mediumBitsMask_2 = (uint64_t(1) << mediumBits_2) - 1;
constexpr uint64_t fullBitsMask_2 = (uint64_t(1) << fullBits_2) - 1;

constexpr uint64_t smallBitsMask_3 = (uint64_t(1) << smallBits_3) - 1;
constexpr uint64_t mediumBitsMask_3 = (uint64_t(1) << mediumBits_3) - 1;
constexpr uint64_t fullBitsMask_3 = (uint64_t(1) << fullBits_3) - 1;

constexpr uint64_t smallBitsMask_4 = (uint64_t(1) << smallBits_4) - 1;
constexpr uint64_t mediumBitsMask_4 = (uint64_t(1) << mediumBits_4) - 1;
constexpr uint64_t fullBitsMask_4 = (uint64_t(1) << fullBits_4) - 1;

#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these variables used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clue what I had in mind here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @kevyuu delete them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants