06 Jun

OpenCL -> Vulkan: A Porting Guide (#1)

Vulkan is the newest kid on the block when it comes to cross-platform, widely supported, GPGPU compute. Vulkan’s primacy as the high performance rendering API powering the latest versions of Android, coupled with Windows and Linux desktop drivers from all major vendors means that we have a good way to run compute workloads on a wide range of devices.

OpenCL is the venerable old boy of GPGPU these days – having been around since 2009. A huge variety of software projects have made use of OpenCL as their way to run compute workloads enabling them to speed up their applications.

Given Vulkan’s rising prominence, how does one port from OpenCL to Vulkan?

This is part 1 of my guide for how things map between the APIs!

cl_platform_id -> VkInstance

In OpenCL, the first thing you do is get the platform identifiers (using clGetPlatformIDs).

// We do not strictly need to initialize this to 0 (as it'll
// be set by clGetPlatformIDs), but given a lot people do
// not check the error code returns, it's safer to 0
// initialize.
cl_uint numPlatforms = 0;
if (CL_SUCCESS != clGetPlatformIDs(
    0,
    nullptr,
    &numPlatforms)) {
  // ... error!
}

std::vector<cl_platform_id> platforms(numPlatforms);

if (CL_SUCCESS != clGetPlatformIDs(
    platforms.size(),
    platforms.data(),
    nullptr)) {
  // ... error!
}

Each cl_platform_id is a handle into an individual vendors OpenCL driver – if you had an AMD and NVIDIA implementation of OpenCL on your system, you’d get two cl_platform_id’s returned.

Vulkan is different here – instead of getting one or more handles to individual vendors implementations, we instead create a single VkInstance (via vkCreateInstance).

const VkApplicationInfo applicationInfo = {
  VK_STRUCTURE_TYPE_APPLICATION_INFO,
  0,
  "MyAwesomeApplication",
  0,
  "",
  0,
  VK_MAKE_VERSION(1, 0, 0)
};
 
const VkInstanceCreateInfo instanceCreateInfo = {
  VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO,
  0,
  0,
  &applicationInfo,
  0,
  0,
  0,
  0
};
 
VkInstance instance;
if (VK_SUCCESS != vkCreateInstance(
    &instanceCreateInfo,
    0,
    &instance)) {
  // ... error!
}

This single instance allows us to access multiple vendor implementations of the Vulkan API through a single object.

cl_device_id -> VkPhysicalDevice

In OpenCL, you can query one or more cl_device_id’s from each cl_platform_id that we previously queried (via clGetDeviceIDs). When querying for a device, we can specify a cl_device_type, where you can basically ask the driver to give you its default device (normally a GPU) or for a specific device type. We’ll use CL_DEVICE_TYPE_ALL, in that we are instructing the driver to return all the devices it knows about, and we can choose from them.

cl_uint numDevices = 0;

for (cl_uint i = 0; i < platforms.size(); i++) {
  // We do not strictly need to initialize this to 0 (as it'll
  // be set by clGetDeviceIDs), but given a lot people do
  // not check the error code returns, it's safer to 0
  // initialize.
  cl_uint numDevicesForPlatform = 0;

  if (CL_SUCCESS != clGetDeviceIDs(
      platforms[i],
      CL_DEVICE_TYPE_ALL,
      0,
      nullptr,
      &numDevicesForPlatform)) {
    // ... error!
  }

  numDevices += numDevicesForPlatform;
}

std::vector<cl_device_id> devices(numDevices);

// reset numDevices as we'll use it for our insertion offset
numDevices = 0;

for (cl_uint i = 0; i < platforms.size(); i++) {
  cl_uint numDevicesForPlatform = 0;

  if (CL_SUCCESS != clGetDeviceIDs(
      platforms[i],
      CL_DEVICE_TYPE_ALL,
      0,
      nullptr,
      &numDevicesForPlatform)) {
    // ... error!
  }

  if (CL_SUCCESS != clGetDeviceIDs(
      platforms[i],
      CL_DEVICE_TYPE_ALL,
      numDevicesForPlatform,
      devices.data() + numDevices,
      nullptr)) {
    // ... error!
  }

  numDevices += numDevicesForPlatform;
}

The code above is a bit of a mouthful – but it is the easiest way to get every device that the system knows about.

In contrast, since Vulkan gave us a single VkInstance, we query that single instance for all of the VkPhysicalDevice’s it knows about (via vkEnumeratePhysicalDevices). A Vulkan physical device is a link to the actual hardware that the Vulkan code is going to execute on.

uint32_t physicalDeviceCount = 0;

if (VK_SUCCESS != vkEnumeratePhysicalDevices(
    instance,
    &physicalDeviceCount,
    0)) {
  // ... error!
}

std::vector<VkPhysicalDevice> physicalDevices(physicalDeviceCount);

if (VK_SUCCESS != vkEnumeratePhysicalDevices(
    instance,
    &physicalDeviceCount,
    physicalDevices.data())) {
  // ... error!
}

A prominent API design fork can be seen between vkEnumeratePhysicalDevices and clGetDeviceIDs – Vulkan reuses the integer return parameter to the function (the parameter that lets you query the number of physical devices present) to also pass into the driver the number of physical devices we want filled out. In contrast, OpenCL uses an extra parameter for this. These patterns are repeated throughout both APIs.

cl_context -> VkDevice

Here is where it gets trickier between the APIs. OpenCL has a notion of a context – you can think of this object as your way as the user to view and interact with what the system is doing. OpenCL allows multiple device’s that belong to a single platform to be shared within a context. In contrast, Vulkan is fixed to having a single physical device per it’s ‘context’, which Vulkan calls a VkDevice.

To make the porting easier, and because in all honesty I’ve yet to see any real use-case or benefit from having multiple OpenCL devices in a single context, we’ll make our OpenCL code create it’s cl_context using a single cl_device_id (via clCreateContext).

// One of the devices in our std::vector
cl_device_id device = ...;

cl_int errorcode;

cl_context context = clCreateContext(
    nullptr,
    1,
    &device,
    nullptr,
    nullptr,
    &errorcode);

if (CL_SUCCESS != errorcode) {
  // ... error!
}

The above highlights the single biggest travesty in the OpenCL API – the error code has changed from being something returned from the API call, to an optional pointer parameter at the end of the signature. In API design, I’d say this is rule #1 in how not to mess up an API (If you’re interested, these are two great API talks Designing and Evaluating Reusable Components by Casey Muratori and Hourglass Interfaces for C++ APIs by Stefanus Du Toit).

For Vulkan, when creating our VkDevice object, we specifically enable the features we want to use from the device upfront. The easy way to do this is to first call vkGetPhysicalDeviceFeatures, and then pass the result of this into our create device call, enabling all features that the device supports.

When creating our VkDevice, we need to explicitly request which queues we want to use. OpenCL has no real analogous concept to this – the naive comparison is to compare VkQueue’s against cl_command_queue’s, but I’ll show in a later post that this is a wrong conflation. Suffice to say, for our purposes we’ll query for all queues that support compute functionality, as that is almost what OpenCL is doing behind the scenes in the cl_context.

// One of the physical devices in our std::vector
VkPhysicalDevice physicalDevice = ...;

VkPhysicalDeviceFeatures physicalDeviceFeatures;

vkGetPhysicalDeviceFeatures(
    physicalDevice,
    physicalDeviceFeatures);

uint32_t queueFamilyPropertiesCount = 0;

vkGetPhysicalDeviceQueueFamilyProperties(
    physicalDevice,
    &queueFamilyPropertiesCount,
    0);

// Create a temporary std::vector to allow us to query for
// all the queue's our physical device supports.
std::vector<VkQueueFamilyProperties> queueFamilyProperties(
    queueFamilyPropertiesCount);

vkGetPhysicalDeviceQueueFamilyProperties(
    physicalDevice,
    &queueFamilyPropertiesCount,
    queueFamilyProperties.data());

uint32_t numQueueFamiliesThatSupportCompute = 0;

for (uint32_t i = 0; i < queueFamilyProperties.size(); i++) {
  if (VK_QUEUE_COMPUTE_BIT &
      queueFamilyProperties[i].queueFlags) {
    numQueueFamiliesThatSupportCompute++;
  }
}

// Create a temporary std::vector to allow us to specify all
// queues on device creation
std::vector<VkDeviceQueueCreateInfo> queueCreateInfos(
    numQueueFamiliesThatSupportCompute);

// Reset so we can re-use as an index
numQueueFamiliesThatSupportCompute = 0;

for (uint32_t i = 0; i < queueFamilyProperties.size(); i++) {
  if (VK_QUEUE_COMPUTE_BIT &
      queueFamilyProperties[i].queueFlags) {
    const float queuePrioritory = 1.0f;

    const VkDeviceQueueCreateInfo deviceQueueCreateInfo = {
        VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO,
        0,
        0,
        i,
        1,
        &queuePrioritory
    };

    queueCreateInfos[numQueueFamiliesThatSupportCompute] =
        deviceQueueCreateInfo;

    numQueueFamiliesThatSupportCompute++;
  }
}

const VkDeviceCreateInfo deviceCreateInfo = {
    VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO,
    0,
    0,
    queueCreateInfos.size(),
    queueCreateInfos.data(),
    0,
    0,
    0,
    0,
    0
 };

VkDevice device;
if (VK_SUCCESS != vkCreateDevice(
    physicalDevice,
    &deviceCreateInfo,
    0,
    &device)) {
  // ... error!
}

Vulkan’s almost legendary verbosity strikes here – we’re having to write a lot more code than the equivalent in OpenCL to get an almost analogous handle. The plus here is that for the Vulkan driver, it can do a lot more upfront allocations because a much higher proportion of its state is known at creation time – that is the fundamental approach of Vulkan, we are trading upfront verbosity for a more efficient application overall.

Ok – so we’ve now got the API to the point where we can think about actually using the plethora of hardware available from these APIs! Stay tuned for the next in the series where I’ll cover porting from OpenCL’s cl_command_queue to Vulkan’s VkQueue.

2 thoughts on “OpenCL -> Vulkan: A Porting Guide (#1)

  1. Re: OpenCL functions that don’t return cl_int error codes… I agree that it’s jarring and surprising but at least the API is fairly consistent in using this pattern for clCreateXXX() functions (except for two, I think?).

    One nice benefit of the camel case naming and consistently returning an error code is that you can use a macro to neatly check for errors on those functions you don’t expect to fail (and are fatal): https://gist.github.com/allanmac/9328bb2d6a99b86883195f8f78fd1b93

    Looking forward to the rest of the posts in this series!

    • I think my biggest gripe is that the first two functions you have to use have cl_error as the return code, and then everything else switches. The amount of times I see new people make this mistake is enough proof that it would have been better for the API to pick either return error code or pointer argument return – just not mixing the two!

Comments are closed.