How we wrote xtensor 2/N: Access Operators

5 min readJun 20, 2019

xtensor is a comprehensive framework for N-D arrays processing, including an extensible expression system, lazy evaluation, and many other features that cannot be covered in a single article. In this post, we focus on the access operators for N-D containers.

In the previous article, we started to design an N-dimensional container. We detailed the implementation of methods related to shape, strides, and memory layout. Let us now get to the access operators.

First access operator: operator()

Since you can access elements of std::vector via operator[], it seems natural to use the same operator to access elements of our N-D container:

Unfortunately, the C++ standard states that operator[] cannot accept more or less than one argument, thus the previous code cannot be valid. Linear algebra libraries in C++ traditionally use operator() to access the elements of their containers, since it can accept an arbitrary number of arguments. In order to respect the principle of least surprise, we do the same in xtensor.

Since xarray has a dynamic number of dimensions, it should provide several overloads of operator() , with different number of arguments:

Instead of defining several overloads, we could rely on preprocessor metaprogramming to reduce the amount of code to write, but we still have an issue: we have to fix an arbitrary maximum number of arguments.

Fortunately for us, C++11 introduced variadic templates, and they are a perfect solution to our problem:

With this API, it is valid to call the access operator with any number of arguments:

Each time operator() is called with a different number of arguments, the compiler instantiates a new overload of this operator.

Implementing operator()

If you paid attention to the previous declaration, you probably noticed two overloads of operator(): one that returns a reference, and another one returning aconst_reference. This is an idiom used in C++ containers for most of the methods that access elements (not only operators but also methods returning iterators); it helps to guarantee const-correctness.

However, if these methods return different types, their implementation is the same: compute the offset in the 1-D buffer from N indices, and return the element at this location:

In order to prevent code duplication, we implement the offset computation in a dedicated function. Besides, other components of the framework may need this feature later.

Now, how to deal with variadic templates? C++ does not give direct access to each element of a template parameter pack, so we have to rely on overloads and recursion. Let’s illustrate this with a simpler example, assume we want to compute the sum of all the indices:

The first overload implements the recursion on the variadic template. Instead of accepting a single parameter pack, it takes an explicit index and a parameter pack. When it calls itself with the parameter pack only, this last one is split: the first element is used as the argument for i, the remaining elements are used as the parameter pack args. When the parameter pack is empty, the second overload is called. This ends the recursion.

Let’s apply this principle to the computation of the data offset. Remember from the previous article that the data offset is the sum of the products of indices and strides: sum(ik * sk).

The recursion is done as explained previously: the function accepts an index and a parameter pack, makes use of the specified index to increment the offset and calls itself recursively with the remaining parameter pack. When the parameter pack is empty, the first overload is called which terminates the recursion.

We use a template parameter for the strides so that data_offsetcan work with any 1-D sequence providing operator[] (keep this in mind, it will be useful in the near future).

The last thing to notice is the additional template parameter dim. Since we need to access the stride corresponding to the k-th index during the computation, we need to propagate which part of the sum is being computed. This is done with this template parameter, which is increased at each recursive call.

If you wonder why we pass the current dimension as a template argument rather than a classical function argument, the reason is performance: using a value known at compilation time allows compiler to enable some optimizations and produce more efficient assembly.e.

Dealing with more or fewer arguments

Having an access operator that accepts an arbitrary number of arguments raises a new issue: what happens if the user calls it with fewer or more arguments than the number of dimensions of the tensor? We could throw an error, however, we will see later that this would prevent a correct implementation of the broadcasting rules. For now we have to admit the following behavior:

if operator() is called with too many arguments, we drop the most left ones
if operator() is called with too few arguments, we prepend them with 0 values until we match the number of dimensions

There is a deeper reason for the choice of dropping the left-most indices and prepending the indices with zeros, which is that it is the one rule that ensures
(a + b)[i0, …, in] = a[i0, …, in]+ b[i0, …, in]
i.e. commutativity of element access and broadcasting. We will get back to this and give more details in another article of the series.

To translate this in the code, we add a new function instead of changing the one defined previously. Indeed, we do not want to perform the check at each recursive call; besides, a function should do one thing and one thing only, according to the principle of single responsibility.

Since we want to keep data_offset as our API and since the check should be performed before entering the recursion, we rename the current implementation in raw_data_offset:

And we implement the checks in data_offset:

The first branches are pretty straightforward. If the number of indices matches the number of dimensions, we call the implementation function. If too many arguments are passed, we use the recursion on the parameter pack to drop indices until we match the number of dimensions.

Otherwise, we “adapt” the strides vector so it appears to have the correct number of elements. A naive implementation would be to copy the last elements of the strides into a new container. However, this would be ineffective due to the hidden memory allocation.

Remember that the implementation of raw_data_offset accepts any type that provides operator[] for the strides? Random access iterators precisely provide such an operator, so we can use them as 1-D containers. Therefore we require that S provides iterator pairs that can directly be passed to raw_data_offset!

Handling conversions

If you search for data_offset or operator() in the code base of xtensor, you will notice that the implementation is slightly different from the one in this blog post: there are additional static_cast and you can specify the return type of data_offset and raw_data_offset via an additional template parameter. This is to avoid warnings about conversions when using unsigned integers for the strides and calling access operators with signed integers for instance. We ensure that all the integers involved in the computation have the same type.

So far we have implemented a skeleton of N-dimensional array, with methods related to size, shape, layout and the access operators. In the next article, we will focus on the constructors and the value semantics of the xarray class.

More about the Series

This post is just one episode of a long series of articles:

How We Wrote Xtensor