Archive

Archive for August, 2009

Common Mistake in Using OpenCL (2): Mis-aligned Vector Field

August 28th, 2009

In OpenCL C, a vector variable needs be aligned to the size of the vector in bytes. For example, a float4 variable needs to be aligned to a 16-byte boundary. And the ‘data’ field in the following struct also needs be aligned to a 16-byte boundary.

struct {
    int flag;
    float4 data;
} var;

An OpenCL C compiler will take care of the vector alignment for the variables defined within a OpenCL C program. However, it usually is not the case for variables defined in a host program. The OpenCL C vector type is not a native type in the host C, therefore the host compiler may not be aware of the alignment requirement. Mismatched access may happen when a struct variable is passed from the host to the device.

For example,

typedef struct
    int flag;
    float4 data;
} my_struct;

__kernel void foo(__global my_struct *in, ...)
{
    ... = in->data;
}

will probably break if the type of the passed in argument is defined as

typedef struct {
     int flag;
     float data[4];
} my_struct;

in the host program.

You need to manually pad the data structure to look like

typedef struct {
    int flag;
    int pad[3];
    float data[4];
} my_struct;

or use the ‘cl_float4′ provided in the ‘cl_platform.h’, like

typedef struct {
     cl_int flag;
     cl_float4 data;
} my_struct;

[Note that some implementation may not align 'cl_float4' type correctly. You still need to RTFM :(]

There will be cases where you are not able to change the data structure. For example, you may be parallelizing a legacy program using OpenCL or you may be working with new existing data. In such cases, you can

  • either do not use vector type for such fields, or
  • use the vloadn()/vstoren() functions defined in Section 6.11.7 of the OpenCL Spec (1.0).
Categories: Parallel Programming Tags:

Common Mistake in Using OpenCL (1): Missing Vector Type When Constructing Vectors from Scalars

August 15th, 2009

[Following the style of my 'Common Mistakes in Using OpenMP' series, I am starting the new 'Common Mistakes in Using OpenCL'.]

In OpenCL, one can construct a vector from a set of scalars or vectors by writing a vector type followed by a parenthesized set of expressions. For example,

float4 fv4 = float4(1.0f, 2.0f, 3.0f, 4.0f);
int2 x = int2(1, 2);
int2 y = int2(3, 4);
int4 iv4 = int4(x, y);

The vector type in front of the parenthesis is important. Take a look at the following example.

int4 y = int4(10, 10, 10, 10);
int4 z = (1, 2, 3, 4) + y;
// z equals to int4(14, 14, 14, 14) at this point.

It took quite a while for a colleague and me to figure out why the vector ‘z’ was getting value int4(14, 14, 14, 14) after the second assignment.

Without a vector type, (1, 2, 3, 4) is a comma expression (OpenCL spec 1.0 rev 43, p.145 item l). Just like in C, the expression is evaluated from left to right and gets the value 4. Then the addition becomes a binary operator with a scalar operand and a vector operand. According to the promotion rule on p. 142 (item a), the scalar 4 is widened to vector int4(4, 4, 4, 4). Therefore int4(14, 14, 14, 14) is the sum!

Categories: Parallel Programming Tags: