Do Intel GPUs dream of training neural networks?

EURECOM

https://github.com/BrozzSama/onednn-wrapper

Do Intel GPUs dream of training neural networks? That’s a question deep learning practitioners, including myself, probably have wondered about for a very long time. During the past 4 months, along with my colleague and great friend Paolo Volpe, we’ve tried to find an answer to this question. The answer is that not only do they dream of doing that, but they are capable of doing so, and in the future, we might see a complete paradigm shift in the way we work with processing units, all thanks to the work of Intel in developing a common framework called oneAPI. Therefore, if you stick around at the end of this article you’ll be able to train your neural network on an Intel GPU.

oneAPI

Before diving into the nuts and bolts let’s quickly recap what Intel oneAPI is.

From Intel’s website:

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures—for faster application performance, more productivity, and greater innovation. The oneAPI industry initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.

This means that, if and when, oneAPI is fully functioning, it will provide access to any processing unit through a common framework; meaning I could ideally train a neural network written in PyTorch by simply selecting a oneAPI device, of any brand on any device.

As you can quickly notice, this is a complete paradigm shift as opposed to proprietary technologies such as Nvidia’s CUDA, any maker can ideally implement its operations in oneAPI and everything that’s been written to support oneAPI specifications can be easily ported. As you will see later, any code written for oneAPI can be run on any oneAPI engine (eg. CPU, GPU…) “on-demand”, as long as all the operations that are being used are supported.

Developing for oneAPI

To develop using oneAPI you need:

A device that supports oneAPI
The oneAPI deep learning toolkit

Luckily, you don’t need any of those things as long as you have a device that supports SSH or Visual Studio Code you can use Intel’s cloud-development platform: Intel Devcloud, which provides free access to all the resources you need to develop on oneAPI.

Implementing a neural network using primitives

Training a neural network on oneAPI cannot be done through classical tools such as Tensorflow or Keras although they do support oneAPI, currently, they are only compatible with CPU operations. This is because the engine is in its primordial state and lots of operations are still being tested and optimized.

However, this does not mean that we cannot still devise our solution to train a neural network, and that’s where oneDNN and oneDNN wrapper come into play!

oneDNN is the official oneAPI framework that provides the primitive operations for training a neural network on an API engine. An example of a primitive operation could be the backward data pass of a fully connected layer.

oneDNN Wrapper is the project we developed, which allows implementing a neural network from scratch using the oneDNN primitive descriptors. It will be quite convenient because it provides an easy and readable solution since it takes care of all the memory allocation steps and implements stochastic gradient descent, providing a Keras-like approach to neural network implementation.

Training a neural network

This tutorial assumes that the development is being done on Intel DevCloud, some modifications might be needed when developing on a local machine (for example the DNNL_PATH variable might be different depending on where Intel oneAPI has been installed).

Download and build the necessary components

Once you are connected to DevCloud, or you have installed the Intel oneAPI DL framework, the first step is to clone the oneDNN wrapper repo

$ git clone https://github.com/BrozzSama/onednn-wrapper

this will save a copy of onednn-wrapper in the onednn-wrapper folder.

Building can be done using CMake following the procedure used by most oneAPI applications,

source /opt/intel/oneapi/setvars.sh --force
export EXAMPLE_ROOT=./src/
mkdir dpcpp
cd dpcpp
cmake ../src -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=dpcpp -DDNNL_CPU_RUNTIME=SYCL -DDNNL_GPU_RUNTIME=SYCL -DDNNL_VERBOSE=ON -Ddnnl_DIR=/glob/development-tools/versions/oneapi/2021.2/inteloneapi/dnnl/2021.2.0/lib/cmake/dnnl
make onednn-file-name-cpp

Let’s unpack what the script is doing:

The source command uses the script provided by Intel to set the proper path, this is available on DevCloud
EXAMPLE_ROOT is the variable containing the directory in which CMake will look for the application that we will develop
mkdir and cd respectively create and access the dpcpp directory
cmake creates the Makefile for our application and adds some flags:
- DCMAKE*BUILD*TYPE chooses the Debug build type, which allows using GDB in case we need to debug our application (although this is fairly complicated and generally we resort to print statements)
- DCMAKE_C_COMPILER DCMAKE_CXX_COMPILER chooses the compiler. This command assumes that clang and dpcpp are both coming from the Intel compiler, this may not be the case on devices that are not on DevCloud and this might need to be changed accordingly
- DDNNL_CPU_RUNTIME=SYCL -DDNNL_GPU_RUNTIME=SYCL choose the runtime as SYCL (read more SYCL - Wikipedia)
- DDNNL_VERBOSE=ON enables verbosity, if DNNL_VERBOSE=1 is set when running an application we will see verbose logging coming from the engine
- Ddnnl_DIR sets the directory containing the oneDNN header files. This has to be set because some versions of oneDNN might have trouble finding the correct header files.
make onednn-file-name-cpp builds a file called file_name.cpp contained in the src directory.

You can see some more examples of this naming convention inside the repository (notably onednn_training_skin.cpp etc…)

Preparing the dataset

With these few lines, you should be able to at least build one of the provided examples. Here, we will show the example of onednn_training_skin.cpp. To train you first need to obtain the dataset from here and save it in the data directory inside onednn-wrapper; next, you can run the dataset_utils/dataset_skin.py script to generate the training and validation datasets. Note that this script automatically looks for the file using the data_path, therefore you might need to change it according to where you have downloaded the file.

Training the network

If you haven’t obtained any errors you are now ready to train your very first neural network using oneAPI, this can be done by typing

$ ./dpcpp/onednn-training-skin-cpp gpu config/config_skin_gpu.json

if something seems to be not working make sure you check the parameters in config_skin_gpu.json, for example, the training and validation dataset paths. If all went well you should now have a numpy file with the behavior of loss and the predicted values of the validation set. These can be analyzed using the provided data_analysis.ipynb notebook, or whatever tool you are comfortable with.

Customizing your network

So far you have been able to build and train a neural network, but a more interesting scenario would be training your custom neural network on an oneAPI engine. Here we will have a quick recap of the necessary API calls, but you are highly encouraged to go look at the documentation and look at the examples that are already available. This tutorial will mostly take some parts from the Skin dataset example and explain them.

Data loading

To load the data a DataLoader class is available. The constructor of this class takes in a file flattened in row-major order and puts it in a format that is suitable for oneAPI (ie. a dnnl::memory object). The only parameters that are required are the path to the features, the path to the labels, and the shape of the features:

auto dataset_path = config_file["dataset_path"];
auto labels_path = config_file["labels_path"];

auto dataset_path_val = config_file["dataset_path_val"];
auto labels_path_val = config_file["labels_path_val"];

std::vector<long> dataset_shape = {3};      //Skin dataset

// Data loader 

DataLoader skin_data(dataset_path, labels_path, batch, dataset_shape, eng);
std::cout << "Dataloader instantiated\n";

Here we are simply getting the paths from the config file and specifying that we have 3 features in our dataset. Notice how the dataset_shape is a vector, this means that theoretically, we might also load a multi-dimensional feature (for example an RGB image). The batch size is computed automatically starting from the file size.

Initializing layers

A layer is initialized as a C++ object. Layers correspond to primitives operations in oneDNN, for example, a Dense layer can be created as follows:

Dense fc1(fc1_output_size, input_memory, net_fwd, net_fwd_args, eng);

This automatically takes care of four things

It allocates the memory required for the operation by using a descriptor created starting from the input and output parameters.
It initializes the primitive operation, meaning that it puts into place all the required memory handles and describes what operation they will do
Exposes the handles as public members: this allows to simply call fc1.arg_dst to access the destination memory and so on.
Adds it to the net_fwd pipeline, meaning that it chains together the operations that have to be done

Now that you have your first layer you can build your forward pipeline by connecting the elements together. For example, if you wanted to put a ReLU activation to the previous Dense layer you could simply do

Eltwise relu1(dnnl::algorithm::eltwise_relu, 0.f, 0.f, fc1.arg_dst, net_fwd, net_fwd_args, eng);

Using layers to train

OK, now comes the real question: what about training? To train the neural network we need to do backpropagation. In oneDNN propagation is done in three steps:

The backward data pass, which computes the gradient of the input with respect to the output
The backward weights pass, which computes the gradient of the weights with respect to the loss starting from the backward data pass
The weights update, which performs mini-batch gradient descent

This can be done by creating 3 additional pipelines along with the forward pass

Backward data

For each layer that has a non-unitary derivative, we need to perform backward data. This is done by initializing the back_data layers. In the case of a Dense the operation is done as follows:

Dense_back_data fc2_back_data(sigmoid1_back_data.arg_diff_src, fc2, net_bwd_data, net_bwd_data_args, eng);
Eltwise_back relu1_back_data(dnnl::algorithm::eltwise_relu, 0.f, 0.f, relu1, fc2_back_data.arg_diff_src, net_bwd_data, net_bwd_data_args, eng);

This will simply backpropagate the operation and provide us with the memory objects that we need to perform the backward operation on weights.

Backward weights

Starting from the gradient of the source we can compute and store the gradient of the weights as follows:

Dense_back_weights fc2_back_weights(sigmoid1_back_data.arg_diff_src, fc2, net_bwd_weights, net_bwd_weights_args, eng);

Gradient Descent

To perform gradient descent we use:

updateWeights_SGD(fc1.arg_weights, fc1_back_weights.arg_diff_weights, learning_rate, net_sgd, net_sgd_args, eng);

This operation does not create an object but only adds a primitive since there is no real need to access any public member.

All the following operations can be seen in a full example here: onednn-wrapper/onednn_training_skin.cpp at master · BrozzSama/onednn-wrapper (github.com) , along with some more detailed documentation on how to load data and on the prototypes of the various objects

Conclusion

The onednn-wrapper is in no way stable or perfect, in fact, it currently only works with Dense layer, and has some instability issues when dealing with larger operations such as convolution. Nevertheless, it provides a nice and easy-to-understand playground on which students and deep learning practitioners can experiment with the lower-level side of machine learning.

If you want to learn more and get more detailed information I highly recommend to read:

The official oneDNN documentation which provides a nice overview of how the API works: oneDNN: Main Page (oneapi-src.github.io)
The documentation of onednn-wrapper: oneDNN Wrapper: oneDNN Wrapper (brozzsama.github.io)
… and its source code, which provides some nice code snippets in an easy to read and intuitive fashion: BrozzSama/onednn-wrapper (github.com)