Platform versions

The OpenCL is designed to support devices with different capabilities under a single platform. This includes devices which conform to different versions of the OpenCL specification. While writing an OpenCL based application one needs to query the implementation about the supported version in the platform. There are mainly two different types of version identifiers to consider.

  • Platform Version: Indicates the version of the OpenCL runtime supported.
  • Device Version: Indicates the device capabilities and attributes. The conformant version info provided cannot be greater than platform version.

Query platforms

Now let's write an OpenCL program to get the platform details. Use the get_platform_property example in this chapter.

The OpenCL standard specifies API interfaces to determine the platform configuration. To query the platform versions and details of the OpenCL implementation, the following two APIs are used:

cl_int clGetPlatformIDs (cl_uint num_entries,
  cl_platform_id *platforms,
  cl_uint *num_platforms);
cl_int clGetPlatformInfo(cl_platform_id platform,
  cl_platform_info param_name,
  size_t param_value_size,
  void *param_value,
  size_t *param_value_size_ret);

clGetPlatformIDs is used to obtain the total number of platforms available in the system. There can be more than one platform. If you install two OpenCL runtimes, one from AMD APP SDK and the other Intel OpenCL runtime for the CPU, you should be able to see two platforms in the system. Usually you don't want to pre-allocate the memory for storing the platforms. Before getting the actual platform, an application developer should query for the number of OpenCL implementations available in the platform. This is done using the following OpenCL call:

clError = clGetPlatformIDs(0, NULL, &num_platforms);

This call returns the total number of available platforms. Once we have obtained the number of available platforms we can allocate memory and query for the platform IDs for the various OpenCL implementations as follows:

platforms = (cl_platform_id *)malloc 
                      (num_platforms*sizeof(cl_platform_id));
clError = clGetPlatformIDs (num_platforms, platforms, NULL);

Once the list of platforms is obtained, you can query for the platform attributes in a loop for each platform. In the example we have queried the following parameters using the API clGetPlatformInfo:

CL_PLATFORM_NAME
CL_PLATFORM_VENDOR
CL_PLATFORM_VERSION
CL_PLATFORM_PROFILE
CL_PLATFORM_EXTENSIONS

Example:

clError = clGetPlatformInfo (platforms[index], CL_PLATFORM_NAME, 1024, &queryBuffer, NULL);

In the get_device_property example where we get device properties, we default to the first available platform and query the device property for all the devices in default platform obtained. Take a look at the get_device_property example for this chapter.

clError = clGetPlatformIDs(1, &platform, &num_platforms);

Note the difference in the calls to clGetPlatformIDs in the two examples discussed.

In this section we just wrote a small program to print the platform details. Take a look at how we allocate memory for platforms and how we get the details of the platform. As an exercise try to install multiple OpenCL implementations in your platform and see how many OpenCL platforms are enumerated by the function clGetPlatformIDs.

Multiple OpenCL implementations can be installed in the platform. You would question how would the application pick the appropriate runtime. The answer is OpenCL Installable Client Driver (ICD). We will study this more in a later section.

Query devices

We shall now continue with getting the attributes and resource limitations of an OpenCL device. In the last program we were able to print all the platform information available. In this example we shall try to enhance the existing code to print some basic device attributes and resource information for the first available platform. We will implement a function PrintDeviceInfo(), which will print the device specific information. The following two OpenCL APIs are used in the example:

cl_int clGetDeviceIDs (cl_platform_id platform,
  cl_device_type device_type,
  cl_uint num_entries,
  cl_device_id *devices,
  cl_uint *num_devices);
cl_int clGetDeviceInfo (cl_device_id device,
  cl_device_info param_name,
  size_t param_value_size,
  void *param_value,
  size_t *param_value_size_ret);

In the same way as we did for platforms, we first determine the number of devices available, and then allocate memory for each device found in the platform.

clError = clGetDeviceIDs (platform, 
  CL_DEVICE_TYPE_ALL, 
  0, NULL, &num_devices);

The above call gives the number of available device of CL_DEVICE_TYPE_ALL. You can otherwise use CL_DEVICE_TYPE_CPU or CL_DEVICE_TYPE_GPU, if you want to list the number of available CPU or GPU devices.

To understand better we we have added the PrintDeviceInfo function:

void PrintDeviceInfo(cl_device_id device)
{
  char queryBuffer[1024];
  int queryInt;
  cl_int clError;
  clError = clGetDeviceInfo(device, CL_DEVICE_NAME,
    sizeof(queryBuffer),
    &queryBuffer, NULL);
  printf("CL_DEVICE_NAME: %s\n", queryBuffer);
  queryBuffer[0] = '\0';
  clError = clGetDeviceInfo(device, CL_DEVICE_VENDOR,
    sizeof(queryBuffer), &queryBuffer,
    NULL);
  printf("CL_DEVICE_VENDOR: %s\n", queryBuffer);
  queryBuffer[0] = '\0';
  clError = clGetDeviceInfo(device, CL_DRIVER_VERSION, 
    sizeof(queryBuffer), &queryBuffer, 
    NULL);
  printf("CL_DRIVER_VERSION: %s\n", queryBuffer);
  queryBuffer[0] = '\0';
  clError = clGetDeviceInfo(device, CL_DEVICE_VERSION, 
    sizeof(queryBuffer), &queryBuffer, 
    NULL);
  printf("CL_DEVICE_VERSION: %s\n", queryBuffer);
  queryBuffer[0] = '\0';
  clError = clGetDeviceInfo(device, CL_DEVICE_MAX_COMPUTE_UNITS, 
    sizeof(int), &queryInt, NULL);
  printf("CL_DEVICE_MAX_COMPUTE_UNITS: %d\n", queryInt);
}

Note that each of the param_name associated with clGetDeviceInfo returns a different data type. In the routine PrintDeviceInfo you can see that the CL_DEVICE_MAX_COMPUTE_UNITS param_name returns an integer type The CL_DRIVER_VERSION param_name returns a character buffer.

The preceding function prints the following information about the device:

CL_DEVICE_NAME
CL_DEVICE_VENDOR
CL_DRIVER_VERSION
CL_DEVICE_VERSION
CL_DEVICE_MAX_COMPUTE_UNITS

Following is the maximum number of compute units for different types of platforms when you query for the GPU type device:

For APU like processors:

AMD A10 5800K    - 6

AMD trinity has 6 SIMD engines (compute units) and each has 64 processing elements.

INTEL HD 4000 - 16

Intel HD 4000 has 16 compute units and each is a single thread processor.

For discrete graphics:

NVIDIA GTX 680 - 8

The NVIDIA GTX 680 has a total of eight Compute units; each compute unit has 192 processing elements.

AMD Radeon HD 7870 - 32

The AMD Radeon HD 7870 GPU has 32 compute units and each has 64 processing elements.

It is not the case that if you have more compute units in the GPU device type, the faster the processor is. The number of compute units varies across different computer architectures and across different hardware vendors. Sometimes even within the vendors there are different families like the NVIDIA Kepler and Fermi architectures or the AMD Radeon HD 6XXX and Radeon HD 7XXX Architecture. The OpenCL specification is targeted at programming these different kinds of devices from different vendors. As an enhancement to the sample program print all the device related attributes and resource sizes for some of the param_name instances listed as follows:

  • CL_DEVICE_TYPE
  • CL_DEVICE_MAX_CLOCK_FREQUENCY
  • CL_DEVICE_IMAGE_SUPPORT
  • CL_DEVICE_SINGLE_FP_CONFIG

Besides these there are many more device attributes which can be queried. Take a look at the different param_name instances provided in the OpenCL specification 1.2, table 4.3. You should try out all the param_name instances and try to understand each device property.