Common Mistake in Using OpenCL (2): Mis-aligned Vector Field
In OpenCL C, a vector variable needs be aligned to the size of the vector in bytes. For example, a float4 variable needs to be aligned to a 16-byte boundary. And the ‘data’ field in the following struct also needs be aligned to a 16-byte boundary.
struct {
int flag;
float4 data;
} var;
An OpenCL C compiler will take care of the vector alignment for the variables defined within a OpenCL C program. However, it usually is not the case for variables defined in a host program. The OpenCL C vector type is not a native type in the host C, therefore the host compiler may not be aware of the alignment requirement. Mismatched access may happen when a struct variable is passed from the host to the device.
For example,
typedef struct
int flag;
float4 data;
} my_struct;
__kernel void foo(__global my_struct *in, ...)
{
... = in->data;
}
will probably break if the type of the passed in argument is defined as
typedef struct {
int flag;
float data[4];
} my_struct;
in the host program.
You need to manually pad the data structure to look like
typedef struct {
int flag;
int pad[3];
float data[4];
} my_struct;
or use the ‘cl_float4′ provided in the ‘cl_platform.h’, like
typedef struct {
cl_int flag;
cl_float4 data;
} my_struct;
[Note that some implementation may not align 'cl_float4' type correctly. You still need to RTFM :(]
There will be cases where you are not able to change the data structure. For example, you may be parallelizing a legacy program using OpenCL or you may be working with new existing data. In such cases, you can
- either do not use vector type for such fields, or
- use the vloadn()/vstoren() functions defined in Section 6.11.7 of the OpenCL Spec (1.0).

