First, clone this repository. It contains two folders: starter and final project. Open
XCode project in
The starter project contains the boilerplate code. The main logic is located in
ViewController.swift. If you compile and run the
Image Editor Demo app you’ll see that it is able only to choose an image display it in
UIImageView and export via
UIActivityViewController. There are two
Tint sliders that do absolutely nothing. From this point we’re going to add image editing functionality to our app.
GPU Side: Image Editing Kernel
In order to adjust images using GPU, we need to provide some sort of instructions or a program to it. Originally such GPU programs were used only in 3D pipelines and often were responsible for lighting and shading effects. That is why over time, they were named shaders. In
Metal shaders are written in
Metal Shading Language which is a subset of
C++. It means that MSL has some restrictions (there are no lambda expressions, dynamic_cast operator, etc.) and extensions (support of textures, buffers, etc.), but mostly shader functions are pretty similar to ordinary C++ code.
Shaders.metal file that will contain our compute code.
It should look like this:
Now, add the following snippet of code:
This is our starter shader function. Currently, it is able only to copy pixels from one image to another. Let’s take a look, what how does it work.
Kernel Function Declaration
The first declaration is
kernel. It means that the following function is a compute kernel - a set of instructions for general-purpose computing. Metal also provides
fragment types of functions used in 3D.
void means that our function does not return anything. Kernels are always void, they only read, modify and write the data.
The name of our function is
adjustments. You can call your functions whatever you like. The only function naming restriction in MSL is that you cannot call your function
Kernel Function Arguments
In the arguments section of the function we can see source, destination and position. Source and destination are
textures. A texture is a structured collection of texture elements, often called
pixels. The exact configuration of these texture elements depends on the type of texture. The source pixels are stored in a two-dimensional texture as floats. This is exactly what is described with a templated type
texture2d<float, access::read>. In order to write the result, we use a texture of a similar type with
Metal provides a number of texture templates:
texture2d_array and more. You can see all of them in chapter 2.8 of MSL specification, but speaking about image processing, you will need only
The first texture template parameter is data type. It specifies the type of one of the components returned when reading from a texture or the type of one of the components specified when writing to the texture. The data type can be
ushort. Most of the time you’re going to use
access template parameter describes the way of access to texture data:
readmeans that you can access this texture only for reading;
writeis used for destination textures to write result into;
read_writecan be used for textures that can be used both for reading and writing. Note, that
read_writetextures are supported only on latest devices (Apple A11 devices and later);
samplegives an ability to both to read and sample texture with sampler. Sampling is not only more advanced way of gathering data than reading, but also it takes more time.
Each texture has to be provided with a unique identifier. It is done with the
[[ texture(n) ]] attribute, where
n is used as a number of texture slot while passing the texture object to the shader encoder on CPU side.
The final argument is
position. When a kernel function is submitted for execution, it executes over an
n-dimensional grid of threads, where
n is one, two or three. A thread is an instance of the kernel function that executes for each point in this grid, and
thread_position_in_grid identifies its position in the grid.
Generally, while working with images, you aim to dispatch a grid of threads of the same dimension as the image. In such cases there is a correspondence between a position of a destination pixel and a position of a thread in the grid which computes the result value.
Kernel Function Body: Boundary Check
Now let’s look at the body of the kernel function. First, we get the size of the texture. Then inside of the
if statement we use the texture size to ignore out-of-bounds execution via early return. To understand why we do this, we need to get familiar with the structure of the parallel work of threads.
Threads are organised into threadgroups that are executed together and can share a common threadgroup memory. In the most image processing kernels threads run independently of each other, but sometimes shader functions are designed so that threads in a threadgroup collaborate on their working set, for example, while calculating texture mean, min or max.
The threads in a threadgroup are further organised into single-instruction, multiple-data (SIMD) groups, that execute concurrently. It is important to notice that the threads in a SIMD group execute the same code. If there is an
if branching in the shaders code and one of threads in SIMD group takes a different path from the others, all threads in that group execute both branches, and the execution time for the group is the sum of the execution time of both branches. So It is a good practice to avoid
if statements in shaders or make them as thin as possible.
So, given that we need to minimise number of
ifs in shaders, why do we still have it at the beginning of the function? The answer is that old Metal backed devices can operate only uniform sized threadgroups which creates a constraint on the total size of the grid. To support old devices you need to dispatch such a number of threadgroups that the entire size of the grid overlaps the size of the image. And in order to ignore out-of-bounds execution on the edges of the grid, we make an early return.
Modern devices support non-uniform sized threadgroups and Metal is able to generate smaller threadgroups along the edges of the grid, as shown below.
In order to optimise the instructions for modern devices and avoid unnecessary code branching, we can create a separate version of our kernel without boundary check.
One of traditional ways to do it is to use preprocessor macro defines. For example, we could do something like this:
Compiling one function many times with different preprocessor macros to enable different features is called
ubershaders. But this approach has a drawback as the size of the result shading library increases significantly.
Another way is to use Metal’s
function constants. Function constants provide the same ease of use as preprocessor macros but moves the generation of the specific variants to the creation of the compute pipeline state - the state the GPU is in during the instructions execution, so you don’t have to compile the variants offline.
Let’s declare our function constant by adding the following piece of code before the kernel function declaration:
Next, replace the boundary check with the following:
Similar to textures, FCs also need to be provided with identifiers with the help of an attribute
[[ function_constant(n) ]]. As you can see, function constants are not initialised in the Metal function source. Instead, using
n, their values are specified during the creation of the compute pipeline state. To learn more about FCs, look at chapter 5.8 of MSL spec.
Great! Now, if the device supports non-uniform threadgroups, the compute pipeline state will be initialised with function constant
true, and the boundary check will be removed from the GPU instructions.
Boundary check is common pattern used almost in every image processing compute shader, so honestly saying, it is just copy-pasted every time at the beginning of the functions 🙂.
Kernel Function Body: Texture Read & Write
Finally, the last two lines of code demonstrate, how to read and write texture data. To get pixel values from certain position, you can use
read texture member function, and
write to store the values. Metal also allows you to
gather from a texture as well as get its width, height and a number of mipmap levels. If you want to learn more it, read chapter 2.8 of MSL specification.
Kernel Function: Adjustments
Now let’s add the adjustments functionality to our shader. Replace the arguments of the kernel with the following:
All arguments to functions that are a pointer or reference to a type must be declared with an address space attribute. An address space attribute specifies the region of memory from where a buffer memory objects are allocated. There are a number of address spaces:
threadgroup_imageblock, but the most commonly used are the first two spaces. The
device address space name refers to buffer memory objects allocated from the device memory pool that are both readable and writeable while
constant address space refers to read-only memory. A buffer memory object can be declared as a pointer or reference to a scalar, vector or user-defined structure. If you’re sure you need to pass some data to shader and you won’t modify it, It’s a good practice to use
constant address space, because Metal applies some optimisations on such buffers for better access to the memory.
Similar to textures and function constants, the attribute
[[ buffer(n) ]] sets an ID to a buffer.
So, we passed temperature and tint references to floats that will be modified with UI sliders on CPU side and accessed as read-only on GPU side. Let’s use these values for adjusting pixels of the texture. In order to change temperature and tint of the image, first, we need to convert its color space from RGB to LAB. Color spaces is a huge subject for another article, but the main idea is that we can interpret color values in different ways, and commonly used approach for image editing requires working in LAB. Now, import a header with convenience conversion functions below
Next, replace the last two lines of the adjustments function with the following code:
As we can see, each thread reads a pixel value from a texture at its position, converts the value from RGB color space to LAB, adjusts the value using temperature and tint arguments and converts it back to RGB. The result value is written to the the destination texture at the the same position it was read from a source.
If you did everything right, the final kernel should look like this:
Congratulations! You’ve written you first metal compute shader 🎉! In the next chapter we are going to write the encoder for this kernel.