Home > Parallel Programming > Common Mistake in Using OpenCL (2): Mis-aligned Vector Field

Common Mistake in Using OpenCL (2): Mis-aligned Vector Field

August 28th, 2009

In OpenCL C, a vector variable needs be aligned to the size of the vector in bytes. For example, a float4 variable needs to be aligned to a 16-byte boundary. And the ‘data’ field in the following struct also needs be aligned to a 16-byte boundary.

struct {
    int flag;
    float4 data;
} var;

An OpenCL C compiler will take care of the vector alignment for the variables defined within a OpenCL C program. However, it usually is not the case for variables defined in a host program. The OpenCL C vector type is not a native type in the host C, therefore the host compiler may not be aware of the alignment requirement. Mismatched access may happen when a struct variable is passed from the host to the device.

For example,

typedef struct
    int flag;
    float4 data;
} my_struct;

__kernel void foo(__global my_struct *in, ...)
{
    ... = in->data;
}

will probably break if the type of the passed in argument is defined as

typedef struct {
     int flag;
     float data[4];
} my_struct;

in the host program.

You need to manually pad the data structure to look like

typedef struct {
    int flag;
    int pad[3];
    float data[4];
} my_struct;

or use the ‘cl_float4′ provided in the ‘cl_platform.h’, like

typedef struct {
     cl_int flag;
     cl_float4 data;
} my_struct;

[Note that some implementation may not align 'cl_float4' type correctly. You still need to RTFM :(]

There will be cases where you are not able to change the data structure. For example, you may be parallelizing a legacy program using OpenCL or you may be working with new existing data. In such cases, you can

  • either do not use vector type for such fields, or
  • use the vloadn()/vstoren() functions defined in Section 6.11.7 of the OpenCL Spec (1.0).
Categories: Parallel Programming Tags:
  1. John
    September 28th, 2009 at 19:59 | #1

    just following up.

    manually padding for 32 bit architecture as you did with 3 integers is correct. So is it fair to say that for 64 bit architecture, manual padding would be 1 int? This is in consideration of the size of int changing, hence I could see the use of cl_int, cl_float4

  2. September 29th, 2009 at 20:34 | #2

    On most 64-bit systems, the size of C ‘int’ stays 32-bit. C ‘long’ type and pointer type become 64-bit. So you probably still need to pad the structure with 3 ‘int’s. The size of cl_float4 does not change with the host system. The header file defines the alignment requirement of cl_float4 if the host compiler supports alignment directive.

  1. No trackbacks yet.