OpenCL incrementing integer after each kernel execution
up vote
0
down vote
favorite
I have a kernel that I need to execute multiple times (using clEnqueueNDRangeEnqueue), and one of its arguments is an integer that needs to be incremented after each execution. 
Rather than have the host assign an incrementing value (using clSetKernelArg) before enqueuing each kernel execution, is there a purely "device-side" way to achieve this, e.g. have the kernel increment a global integer itself once the final work item has run? (I'm still new to OpenCL so might be barking up the wrong tree here).
opencl
add a comment |
up vote
0
down vote
favorite
I have a kernel that I need to execute multiple times (using clEnqueueNDRangeEnqueue), and one of its arguments is an integer that needs to be incremented after each execution. 
Rather than have the host assign an incrementing value (using clSetKernelArg) before enqueuing each kernel execution, is there a purely "device-side" way to achieve this, e.g. have the kernel increment a global integer itself once the final work item has run? (I'm still new to OpenCL so might be barking up the wrong tree here).
opencl
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a kernel that I need to execute multiple times (using clEnqueueNDRangeEnqueue), and one of its arguments is an integer that needs to be incremented after each execution. 
Rather than have the host assign an incrementing value (using clSetKernelArg) before enqueuing each kernel execution, is there a purely "device-side" way to achieve this, e.g. have the kernel increment a global integer itself once the final work item has run? (I'm still new to OpenCL so might be barking up the wrong tree here).
opencl
I have a kernel that I need to execute multiple times (using clEnqueueNDRangeEnqueue), and one of its arguments is an integer that needs to be incremented after each execution. 
Rather than have the host assign an incrementing value (using clSetKernelArg) before enqueuing each kernel execution, is there a purely "device-side" way to achieve this, e.g. have the kernel increment a global integer itself once the final work item has run? (I'm still new to OpenCL so might be barking up the wrong tree here).
opencl
opencl
asked yesterday


Andrew Stephens
4,34823588
4,34823588
add a comment |
add a comment |
                                1 Answer
                                1
                        
active
oldest
votes
up vote
1
down vote
It is possible to achieve that on the kernel side but I would not do that, as it may have influence on the kernel performance. Anyway it could be done this way:
kernel void my_kernel(__global int* counter, __global int* other_data, ...)
{
    // some operations on other_data, etc.
    // make sure that only one work item increments the counter to avoid race condition
    // the assumption is that kernel uses one dimension only
    if(get_local_id(0) == 0) 
        atomic_inc(counter); // need to use atomic function as kernels may run in parallel
}
So to summarize rather than adding branch by making only one work item work and waste cycles of the others I would continue using clSetKernelArg and increment counter on the host side. There are operations that are better suited for GPU and incrementing the counter is rather not one of them.
 
 
 1
 
 
 
 
 This does not increment the value when the last work item has run. It increments it whenever the work item with global_id in first dimensions happens to finish.
 – Jovasa
 yesterday
 
 
 
 
 
 
 
 
 
 @Jovasa you are right, updated.
 – doqtor
 yesterday
 
 
 
add a comment |
                                1 Answer
                                1
                        
active
oldest
votes
                                1 Answer
                                1
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
It is possible to achieve that on the kernel side but I would not do that, as it may have influence on the kernel performance. Anyway it could be done this way:
kernel void my_kernel(__global int* counter, __global int* other_data, ...)
{
    // some operations on other_data, etc.
    // make sure that only one work item increments the counter to avoid race condition
    // the assumption is that kernel uses one dimension only
    if(get_local_id(0) == 0) 
        atomic_inc(counter); // need to use atomic function as kernels may run in parallel
}
So to summarize rather than adding branch by making only one work item work and waste cycles of the others I would continue using clSetKernelArg and increment counter on the host side. There are operations that are better suited for GPU and incrementing the counter is rather not one of them.
 
 
 1
 
 
 
 
 This does not increment the value when the last work item has run. It increments it whenever the work item with global_id in first dimensions happens to finish.
 – Jovasa
 yesterday
 
 
 
 
 
 
 
 
 
 @Jovasa you are right, updated.
 – doqtor
 yesterday
 
 
 
add a comment |
up vote
1
down vote
It is possible to achieve that on the kernel side but I would not do that, as it may have influence on the kernel performance. Anyway it could be done this way:
kernel void my_kernel(__global int* counter, __global int* other_data, ...)
{
    // some operations on other_data, etc.
    // make sure that only one work item increments the counter to avoid race condition
    // the assumption is that kernel uses one dimension only
    if(get_local_id(0) == 0) 
        atomic_inc(counter); // need to use atomic function as kernels may run in parallel
}
So to summarize rather than adding branch by making only one work item work and waste cycles of the others I would continue using clSetKernelArg and increment counter on the host side. There are operations that are better suited for GPU and incrementing the counter is rather not one of them.
 
 
 1
 
 
 
 
 This does not increment the value when the last work item has run. It increments it whenever the work item with global_id in first dimensions happens to finish.
 – Jovasa
 yesterday
 
 
 
 
 
 
 
 
 
 @Jovasa you are right, updated.
 – doqtor
 yesterday
 
 
 
add a comment |
up vote
1
down vote
up vote
1
down vote
It is possible to achieve that on the kernel side but I would not do that, as it may have influence on the kernel performance. Anyway it could be done this way:
kernel void my_kernel(__global int* counter, __global int* other_data, ...)
{
    // some operations on other_data, etc.
    // make sure that only one work item increments the counter to avoid race condition
    // the assumption is that kernel uses one dimension only
    if(get_local_id(0) == 0) 
        atomic_inc(counter); // need to use atomic function as kernels may run in parallel
}
So to summarize rather than adding branch by making only one work item work and waste cycles of the others I would continue using clSetKernelArg and increment counter on the host side. There are operations that are better suited for GPU and incrementing the counter is rather not one of them.
It is possible to achieve that on the kernel side but I would not do that, as it may have influence on the kernel performance. Anyway it could be done this way:
kernel void my_kernel(__global int* counter, __global int* other_data, ...)
{
    // some operations on other_data, etc.
    // make sure that only one work item increments the counter to avoid race condition
    // the assumption is that kernel uses one dimension only
    if(get_local_id(0) == 0) 
        atomic_inc(counter); // need to use atomic function as kernels may run in parallel
}
So to summarize rather than adding branch by making only one work item work and waste cycles of the others I would continue using clSetKernelArg and increment counter on the host side. There are operations that are better suited for GPU and incrementing the counter is rather not one of them.
edited yesterday
answered yesterday
doqtor
6,3751927
6,3751927
 
 
 1
 
 
 
 
 This does not increment the value when the last work item has run. It increments it whenever the work item with global_id in first dimensions happens to finish.
 – Jovasa
 yesterday
 
 
 
 
 
 
 
 
 
 @Jovasa you are right, updated.
 – doqtor
 yesterday
 
 
 
add a comment |
 
 
 1
 
 
 
 
 This does not increment the value when the last work item has run. It increments it whenever the work item with global_id in first dimensions happens to finish.
 – Jovasa
 yesterday
 
 
 
 
 
 
 
 
 
 @Jovasa you are right, updated.
 – doqtor
 yesterday
 
 
 
1
1
This does not increment the value when the last work item has run. It increments it whenever the work item with global_id in first dimensions happens to finish.
– Jovasa
yesterday
This does not increment the value when the last work item has run. It increments it whenever the work item with global_id in first dimensions happens to finish.
– Jovasa
yesterday
@Jovasa you are right, updated.
– doqtor
yesterday
@Jovasa you are right, updated.
– doqtor
yesterday
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53204311%2fopencl-incrementing-integer-after-each-kernel-execution%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
