From ca11636a8bfbf2ec90526b3c23c35da745e2c4d5 Mon Sep 17 00:00:00 2001
From: bauom <mouadelalj@gmail.com>
Date: Tue, 3 Jan 2023 11:11:19 +0100
Subject: [PATCH 1/5] started GPU_ddocs

---
 developer_docs/GPU_getting_started.md | 80 +++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)
 create mode 100644 developer_docs/GPU_getting_started.md

diff --git a/developer_docs/GPU_getting_started.md b/developer_docs/GPU_getting_started.md
new file mode 100644
index 0000000000..aec7cedd55
--- /dev/null
+++ b/developer_docs/GPU_getting_started.md
@@ -0,0 +1,80 @@
+# Getting started GPU
+
+## Decorators:
+
+* Kernel
+
+    In Pyccel, the `@kernel` decorator is used to indicate that a function should be treated as a CUDA kernel. A CUDA kernel is a function that is executed on a GPU and is typically used to perform a parallel computation over a large dataset.
+
+    The `@kernel` decorator is used to indicate to Pyccel that the function should be compiled as a CUDA kernel, and that it can be launched on a GPU using the appropriate syntax. For example, if we have a function decorated with `@kernel` like this:
+
+    > kernels can not return a variable that is why we pass the returned variable in the kernel arguments.
+
+    ```Python
+    from pyccel import cuda
+    from pyccel.decorators import kernel
+
+    @kernel
+    def my_kernel(x: 'float64[:]', y: 'float64[:]', out: 'float64[:]'):
+        i = cuda.threadIdx(0) + cuda.blockIdx(0) * cuda.blockDim(0)
+        if i >= x.shape[0]:
+            return
+        out[i] = x[i] + y[i]
+    ```
+    We can launch this kernel on a GPU using the following syntax:
+    ```Python
+    ng = 128  # size of the grid
+    tn = 64   # size of the block
+    my_kernel[ng, tn](x, y, out)
+    ```
+* Device
+
+    The `@device` decorator is similar to the `@kernel` decorator, but indicates that the function should be compiled and executed on the GPU as a device function, rather than a kernel.
+    
+    Device functions are similar to kernels, but are executed within the context of a kernel. They can be called only from kernels, and are typically used for operations that are too small to justify launching a separate kernel, or for operations that need to be performed repeatedly within the context of a kernel.
+
+    ```Python
+    from pyccel import cuda
+    from pyccel.decorators import device
+
+    @device
+    def my_device_function(x: 'float32'):
+        return x * x
+
+    # Call the function from a kernel
+    @kernel
+    def my_kernel(x: 'float32[:]', y: 'float32[:]'):
+        i = cuda.threadIdx(0) + cuda.blockIdx(0) * cuda.blockDim(0)
+        if i >= x.shape[0]:
+            return
+        y[i] = my_device_function(x[i])
+    ```
+
+## Built-in variables:
+
+* cuda.threadIdx(dim): Returns the index of the current thread within the block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
+
+* cuda.blockIdx(dim): Returns the index of the current block within the grid in a given dimension dim.
+dim is an integer between 0 and the number of dimensions of the grid minus one.
+
+* cuda.blockDim(dim): Returns the size of the thread block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
+
+* cuda.gridDim(dim): Returns the size of the grid in a given dimension dim. dim is an integer between 0 and the number of dimensions of the grid minus one.
+
+> These built-in variables are provided by the CUDA runtime and are available to all CUDA kernels, regardless of the programming language being used. They are used to obtain information about the execution configuration of the kernel, such as the indices and dimensions of the threads and blocks. They are an important part of the CUDA programming model and are often used to compute global indices and other information about the execution configuration of the kernel.
+
+```Python
+from pyccel import cuda
+from pyccel.decorators import kernel
+
+@kernel
+def my_kernel(x: 'float64[:]', y: 'float64[:]', out: 'float64[:]'):
+    i = cuda.threadIdx(0) + cuda.blockIdx(0) * cuda.blockDim(0)
+    if i >= x.shape[0]:
+        return
+    out[i] = x[i] + y[i]
+```
+
+The kernel uses the `cuda.threadIdx(0)`, `cuda.blockIdx(0)`, and `cuda.blockDim(0)` built-in variables to compute the global index of the current thread within the input arrays. 
+
+The global index is computed as the sum of the thread index and the block index, multiplied by the block size, in the first dimension. This allows each thread to compute its own index within the input arrays, and to exit if its index falls outside the bounds of the arrays.

From 046361c45e8c78077cc142885fd164d697f9c5e1 Mon Sep 17 00:00:00 2001
From: bauom <mouadelalj@gmail.com>
Date: Wed, 4 Jan 2023 14:48:52 +0100
Subject: [PATCH 2/5] codacy errors

---
 developer_docs/GPU_getting_started.md | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/developer_docs/GPU_getting_started.md b/developer_docs/GPU_getting_started.md
index aec7cedd55..5c0d305a7a 100644
--- a/developer_docs/GPU_getting_started.md
+++ b/developer_docs/GPU_getting_started.md
@@ -1,8 +1,8 @@
 # Getting started GPU
 
-## Decorators:
+## Decorators
 
-* Kernel
+  * Kernel
 
     In Pyccel, the `@kernel` decorator is used to indicate that a function should be treated as a CUDA kernel. A CUDA kernel is a function that is executed on a GPU and is typically used to perform a parallel computation over a large dataset.
 
@@ -27,7 +27,7 @@
     tn = 64   # size of the block
     my_kernel[ng, tn](x, y, out)
     ```
-* Device
+  * Device
 
     The `@device` decorator is similar to the `@kernel` decorator, but indicates that the function should be compiled and executed on the GPU as a device function, rather than a kernel.
     
@@ -50,16 +50,15 @@
         y[i] = my_device_function(x[i])
     ```
 
-## Built-in variables:
+## Built-in variables
 
-* cuda.threadIdx(dim): Returns the index of the current thread within the block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
+  * cuda.threadIdx(dim): Returns the index of the current thread within the block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
 
-* cuda.blockIdx(dim): Returns the index of the current block within the grid in a given dimension dim.
-dim is an integer between 0 and the number of dimensions of the grid minus one.
+  * cuda.blockIdx(dim): Returns the index of the current block within the grid in a given dimension dim.dim is an integer between 0 and the number of dimensions of the grid minus one.
 
-* cuda.blockDim(dim): Returns the size of the thread block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
+  * cuda.blockDim(dim): Returns the size of the thread block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
 
-* cuda.gridDim(dim): Returns the size of the grid in a given dimension dim. dim is an integer between 0 and the number of dimensions of the grid minus one.
+  * cuda.gridDim(dim): Returns the size of the grid in a given dimension dim. dim is an integer between 0 and the number of dimensions of the grid minus one.
 
 > These built-in variables are provided by the CUDA runtime and are available to all CUDA kernels, regardless of the programming language being used. They are used to obtain information about the execution configuration of the kernel, such as the indices and dimensions of the threads and blocks. They are an important part of the CUDA programming model and are often used to compute global indices and other information about the execution configuration of the kernel.
 

From 1f1706e462dc88e2c14e14a311958f9f83ddd74e Mon Sep 17 00:00:00 2001
From: bauom <40796259+bauom@users.noreply.github.com>
Date: Thu, 5 Jan 2023 12:18:12 +0000
Subject: [PATCH 3/5] removed repetition

Co-authored-by: Fatima-zahra RAMDANI <52450718+framdani@users.noreply.github.com>
---
 developer_docs/GPU_getting_started.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/developer_docs/GPU_getting_started.md b/developer_docs/GPU_getting_started.md
index 5c0d305a7a..f798d71f4b 100644
--- a/developer_docs/GPU_getting_started.md
+++ b/developer_docs/GPU_getting_started.md
@@ -6,7 +6,7 @@
 
     In Pyccel, the `@kernel` decorator is used to indicate that a function should be treated as a CUDA kernel. A CUDA kernel is a function that is executed on a GPU and is typically used to perform a parallel computation over a large dataset.
 
-    The `@kernel` decorator is used to indicate to Pyccel that the function should be compiled as a CUDA kernel, and that it can be launched on a GPU using the appropriate syntax. For example, if we have a function decorated with `@kernel` like this:
+By applying the @kernel decorator to a function, Pyccel recognizes it as a CUDA kernel that can be launched on a GPU using the appropriate syntax.
 
     > kernels can not return a variable that is why we pass the returned variable in the kernel arguments.
 

From 8b72efb49821b048f373e80905e9f126f4e979ed Mon Sep 17 00:00:00 2001
From: bauom <40796259+bauom@users.noreply.github.com>
Date: Mon, 9 Jan 2023 17:24:26 +0000
Subject: [PATCH 4/5] mention the dimensionality of array in the kernel
 example.

Co-authored-by: Fatima-zahra RAMDANI <52450718+framdani@users.noreply.github.com>
---
 developer_docs/GPU_getting_started.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/developer_docs/GPU_getting_started.md b/developer_docs/GPU_getting_started.md
index f798d71f4b..2bba8cfa95 100644
--- a/developer_docs/GPU_getting_started.md
+++ b/developer_docs/GPU_getting_started.md
@@ -74,6 +74,6 @@ def my_kernel(x: 'float64[:]', y: 'float64[:]', out: 'float64[:]'):
     out[i] = x[i] + y[i]
 ```
 
-The kernel uses the `cuda.threadIdx(0)`, `cuda.blockIdx(0)`, and `cuda.blockDim(0)` built-in variables to compute the global index of the current thread within the input arrays. 
+The kernel uses the `cuda.threadIdx(0)`, `cuda.blockIdx(0)`, and `cuda.blockDim(0)` built-in variables to compute the global index of the current thread within the 1D input arrays. 
 
 The global index is computed as the sum of the thread index and the block index, multiplied by the block size, in the first dimension. This allows each thread to compute its own index within the input arrays, and to exit if its index falls outside the bounds of the arrays.

From ccd8f3b83468498ed76d1235d03b9f1ac8d85186 Mon Sep 17 00:00:00 2001
From: bauom <mouadelalj@gmail.com>
Date: Mon, 9 Jan 2023 18:41:07 +0100
Subject: [PATCH 5/5] fixed codacy error indents

---
 developer_docs/GPU_getting_started.md | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/developer_docs/GPU_getting_started.md b/developer_docs/GPU_getting_started.md
index 5c0d305a7a..403beb7b01 100644
--- a/developer_docs/GPU_getting_started.md
+++ b/developer_docs/GPU_getting_started.md
@@ -2,7 +2,7 @@
 
 ## Decorators
 
-  * Kernel
+* Kernel
 
     In Pyccel, the `@kernel` decorator is used to indicate that a function should be treated as a CUDA kernel. A CUDA kernel is a function that is executed on a GPU and is typically used to perform a parallel computation over a large dataset.
 
@@ -21,16 +21,19 @@
             return
         out[i] = x[i] + y[i]
     ```
+
     We can launch this kernel on a GPU using the following syntax:
+
     ```Python
     ng = 128  # size of the grid
     tn = 64   # size of the block
     my_kernel[ng, tn](x, y, out)
     ```
-  * Device
+
+* Device
 
     The `@device` decorator is similar to the `@kernel` decorator, but indicates that the function should be compiled and executed on the GPU as a device function, rather than a kernel.
-    
+
     Device functions are similar to kernels, but are executed within the context of a kernel. They can be called only from kernels, and are typically used for operations that are too small to justify launching a separate kernel, or for operations that need to be performed repeatedly within the context of a kernel.
 
     ```Python
@@ -52,13 +55,13 @@
 
 ## Built-in variables
 
-  * cuda.threadIdx(dim): Returns the index of the current thread within the block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
+* cuda.threadIdx(dim): Returns the index of the current thread within the block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
 
-  * cuda.blockIdx(dim): Returns the index of the current block within the grid in a given dimension dim.dim is an integer between 0 and the number of dimensions of the grid minus one.
+* cuda.blockIdx(dim): Returns the index of the current block within the grid in a given dimension dim.dim is an integer between 0 and the number of dimensions of the grid minus one.
 
-  * cuda.blockDim(dim): Returns the size of the thread block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
+* cuda.blockDim(dim): Returns the size of the thread block in a given dimension dim. dim is an integer between 0 and the number of dimensions of the thread block minus one.
 
-  * cuda.gridDim(dim): Returns the size of the grid in a given dimension dim. dim is an integer between 0 and the number of dimensions of the grid minus one.
+* cuda.gridDim(dim): Returns the size of the grid in a given dimension dim. dim is an integer between 0 and the number of dimensions of the grid minus one.
 
 > These built-in variables are provided by the CUDA runtime and are available to all CUDA kernels, regardless of the programming language being used. They are used to obtain information about the execution configuration of the kernel, such as the indices and dimensions of the threads and blocks. They are an important part of the CUDA programming model and are often used to compute global indices and other information about the execution configuration of the kernel.
 
@@ -74,6 +77,6 @@ def my_kernel(x: 'float64[:]', y: 'float64[:]', out: 'float64[:]'):
     out[i] = x[i] + y[i]
 ```
 
-The kernel uses the `cuda.threadIdx(0)`, `cuda.blockIdx(0)`, and `cuda.blockDim(0)` built-in variables to compute the global index of the current thread within the input arrays. 
+The kernel uses the `cuda.threadIdx(0)`, `cuda.blockIdx(0)`, and `cuda.blockDim(0)` built-in variables to compute the global index of the current thread within the input arrays.
 
 The global index is computed as the sum of the thread index and the block index, multiplied by the block size, in the first dimension. This allows each thread to compute its own index within the input arrays, and to exit if its index falls outside the bounds of the arrays.