Getting `Invalid Kernel when calling clEnqueueNDRangeKernel`

Hey there!

I'm getting an error when trying to do a custom implementation of a Neural Network (for fun). My code runs fine with the `CPU` backend (slowly, but fine), but in the `CUDA` and `OPENCL` backends, it crashes with the following: (actually, on `OPENCL` shows this, `CUDA` doesn't print anything but I assume its the same issue)

```
In function void opencl::evalNodes(std::vector<opencl::Param>&, std::vector<opencl::JIT::Node*>)
In file src/backend/opencl/jit.cpp:267
OpenCL Error (-48): Invalid Kernel when calling clEnqueueNDRangeKernel

In function af_err af_transpose(void**, af_array, bool)
In file src/api/c/transpose.cpp:51

thread 'main' panicked at 'Error message: Eror either in ArrayFire or in a project upstream', /home/jared/.cargo/registry/src/git.colasdn.top-1ecc6299db9ec823/arra
yfire-3.4.0/src/error.rs:72
```

Happy to provide a Rust stacktrace and the code if it's helpful. This is on Ubuntu 16.04, Cuda 8.0, ArrayFire 3.4.2, arrayfire-rust 3.4.0 running on an Nvdia GeForce 980 Ti. Let me know what other information might be useful, and thanks in advance =D

**Edit**: In case its helpful, here's the Rust code:

```rust
// iter of type (usize, (Array, Array)) => (index, (activation, zeta))
let iter = activations.iter().rev().zip(zetas.iter().rev()).rev().enumerate().rev();
// inject / fold / reduce with starting blank delta, receiving
// Array, (usize, (Array, Array)) => delta, (index, (activation, zeta))
let _ = iter.fold(constant::<f64>(0f64, Dim4::new(&[1, 1, 1, 1])), |delta, (index, (activation, zeta))| {
    if index == self.num_layers - 1 {
        self.cost_derivative(activation, target) * tanh_prime(zeta)
    } else if index == 0 {
        nabla_b.push(delta.clone());
        let activation_t = transpose(activation, false);
        let mul = matmul(&delta, &activation_t, MatProp::NONE, MatProp::NONE);
        nabla_w.push(mul.clone());
        delta
    } else {
        let activation_t = transpose(activation, false);
        let mul = matmul(&delta, &activation_t, MatProp::NONE, MatProp::NONE);
        nabla_w.push(mul.clone());
        nabla_b.push(delta.clone());

        let activated = tanh_prime(zeta);
        let weight = transpose(&self.weights[index], false);
        let new_delta = matmul(&weight, &delta, MatProp::NONE, MatProp::NONE) * activated;
        new_delta
    }
});
```

It's blowing up at the `let activation_t = transpose(activation, false);` line in the `else` clause (presumably because that clause is hit before the `else if` clause?). Specifically, it will blow up with any access to `activation`s data, even a `af_print!("", activation)` call.

In the `iter` construction up there, both `activations` and `zetas` are of type `Vec<Array>`. This only happens when, for some reason, I create a neural network who's last layer has more than 3 nodes in it.

**Edit 2**: I'm using array fire 3.4.2, not 3.4.0:

```
ArrayFire v3.4.2 (OpenCL, 64-bit Linux, build 2da9967)
[0] NVIDIA  : GeForce GTX 980 Ti, 6076 MB
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting `Invalid Kernel when calling clEnqueueNDRangeKernel` #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Getting Invalid Kernel when calling clEnqueueNDRangeKernel #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Getting `Invalid Kernel when calling clEnqueueNDRangeKernel` #95