How we wrote xtensor 8/N: iterators

10 min readJan 6, 2020

xtensor is a comprehensive framework for N-D arrays processing, including an extensible expression system, lazy evaluation, and many other features that cannot be covered in a single article. In this post, we focus on iterators.

In the previous article, we implemented the broadcasting rules so that we can compute the shape of arbitrary complex trees that involve arrays with different but compatible shapes. This is the first step when assigning an expression to an array. The second step is to perform the actual computation and to assign the result. A naive approach would be to generate nested loops that make use of the access operator. However, this would be highly inefficient: we would compute the offset of each element in the underlying 1-D buffer while the elements are contiguous. A better approach would be to rely on iterators that we increase to get the next element.

Trivial assignment

When all the arrays of an expression have the same shape and the same layout, the assignment is said to be trivial: we can directly operate upon the underlying buffers as if we were manipulating a 1-D expression. The simplest option for xarray is to provide methods that return an iterator on the underlying buffer:

The implementation of these methods is trivial as they simply forward to the underlying storage object:

However, for xfunction, things are more complicated since there is no underlying buffer. xfunction must provide an iterator object that:

holds iterators on the underlying operands
increases these iterators when it is itself incremented
triggers the computation when it is dereferenced

xfunction iterator

Since xfunction stores a tuple of operands, it seems natural for xfunction_iterator to store a tuple of iterators. Besides, xfunction_iterator needs access to the functor stored in xfunction. Instead of copying it (this can be heavy for complicated functions), let’s store a pointer to the xfunction object in xfunction_iterator and make this later a friend class of xfunction:

First, we forward-declare the xfunction class so that we can use it in the definition of xfunction_iterator as we would do with regular (i.e. non-template) classes.

Then comes the xfunction_iterator class itself. In this article, we focus on the dereferencing and the increment operators only, for the sake of simplicity. Implementing the remaining methods required by the Random Access Iterator concept should be straightforward. Besides, as we will see in a future article, we can rely on advanced metaprogramming techniques to avoid implementing ALL of these methods.

The last point to comment is the way we define data_type. We want to store a tuple of iterators on the underlying operands. A common and simple way to do it would be to use the following type:

using data_type = std::tuple<
    typename std::decay_t<CT>::const_iterator...>;

However, as we will see in the next section, some expressions might implement begin and end methods that return a type different from const_iterator. Therefore, the only way to know which type to store is to ask for the decltype of the object returned by begin. We can achieve this thanks to declval, which converts its argument to a reference and makes it possible to use member functions in decltype expressions without the need to go through constructors.

xfunction iterator implementation

As said previously, in this article we focus on the implementation of the dereferencing and the increment operators. Let’s start with the simplest of them, operator*. It retrieves the arguments to pass to the functor stored in the xfunction object by dereferencing the iterators stored in m_it:

We rely on the same technique as for implementing the access operator of xfunction: create an index sequence and pass it to the deref_impl function where the parameter pack will be expanded to retrieve all the iterators in the tuple and pass them to the functor.

The increment operator is slightly more complex to implement. Basically, it increments all the iterators in the data tuple. If the iterators were stored in an std::vector, we would write something like:

std::for_each(m_it.begin(), m_it.end(), [](auto& iter) { ++iter; });

However, the operands are stored in an std::tuple object, for which element access is resolved at compile time. Therefore we need an equivalent of std::for_each that operates on a tuple and unroll the loop at compile time. This should sound familiar if you read the article about broadcasting and how we implemented the xt::accumulate function. We rely exactly on the same principles for implementing xt::for_each, that is, recursion and SFINAE to operate on a parameter pack:

Now that we have a for_each function that operates on tuple, implementing the increment operator of xfunction_iterator is trivial:

Broadcasting and steppers

Different layouts

Now assume that the tensors of the expression tree have the same shape, but a different layout. Then we cannot rely on iterators of the underlying buffers anymore. We must be able to specify the iteration’s order (row-major or column-major) so that all the tensors are traversed in the same way.

If the traversal order is the same as the tensor layout, we return the underlying buffer’s iterator. Otherwise, we need to return something else. Since C++ forbids different return types based on runtime condition, the only option we have is to pass the iteration order as a template parameter:

We will not delve into the implementation detail just yet. For now, what matters here is that the return type depends on the layout template parameter. This justifies a posteriori the definition of data_type in the previous section, which asks for the return type of begin instead of using an inner type alias.

Different shapes

Another common use case is operating on tensors with different but compatible shapes. For instance, assume we want to add a 1-D tensor (a) to each line of a 2-D tensor (b). Iterating on the first line of the xfunction object that modelizes the operation is trivial: we simply iterate over the underlying buffers of a and b. However, when we move from the last element of the first line to the first element of the next line, things get more complicated: for b, we continue to iterate over the underlying buffer, but for a, we need to come back to the first element:

However, this circular behavior should not be systematic: when the iterator of b comes to the end of the buffer, the iterator of a should do the same. This means that we need to remember the current step of the iteration so that we know when we reach the end. This can be done with a list of indices (one per dimension of the resulting expression) that we increment with the iterator. When this list matches the shape of the expression, we’ve reached the end.

Notice that we can reuse the same principle when iterating over a tensor with a traversal order different from its layout; the only difference will be how we increment the internal list of indices.

Storing a list of indices for each node of the expression would be highly inefficient. It would be better to store it once in the iterator of the root’s node and have lightweight objects that allow to “step” of a given amount in any direction for each node of the expression. Therefore, we have an additional concept to implement: the “stepper”, which provides a raw interface for stepping of a given amount in a given direction. The iterator gathers the stepper and a list of indices to implement an STL-compliant iterator interface.

Stepper interface

Although the steppers are tied to the expression they refer to, they all provide the same API:

operator* is similar to that of iterators, it returns the element in the expression that the stepper refers to. reset and reset_back are equivalent to step_back and step called with dim and shape[dim] — 1 as arguments, where shape is the shape of the underlying expression.

The steppers are initialized with a “position” (that may be an index, a pointer to the underlying buffer of a container-based expression, etc…) in the expression, and can then be used to browse the expression in any direction:

In this diagram, the data is stored in row-major order, and we step in the first dimension (dimension index starts at 0). The positions of the stepper are represented by the red dots.

The to_end method takes a layout parameter because the ending positions of a stepper depend on the layout used to iterate. Indeed, if we call step_back after a call to to_end, we want the stepper to point to the last element. To ensure this for both row-major order and column-major order iterations, the ending positions must be set as shown below:

The red dots are the position of a stepper iterating in column-major while the green ones are the positions of a stepper iterating in row-major order. Thus, if we assume that p is a pointer to the last element (the square containing 11), the ending positions of the stepper are p + 1 in row-major, and p + 3 in column-major order.

xfunction stepper

The xfunction_stepper class is similar to the xfunction_iterator: it embeds a tuple of steppers (those of the underlying operands of the xfunction) and forwards them the call of its methods. It also defines the same inner types as those of an STL-compliant iterator. These types will be reused by the iterator built upon the stepper.

Reusing the for_each method exposed in the previous section makes the implementation of xfunction_stepper pretty straightforward:

The implementation of operator* is similar to that of the iterator dereferencing operator: rely on an intermediate function and parameter pack expansion to retrieve the operands from the steppers stored in the tuple data member.

xarray stepper

The steppers of container-based expressions rely on strides and backstrides and an offset for stepping:

get_stepper_iterator returns C::const_iterator is C is const, C::iterator otherwise. The m_offset member is required because of broadcasting rules. Consider the following expression:

r is an xfunction representing the sum of a and b. The stepper specific to this expression holds the steppers of the arguments of the function; calling step or step_back results in calling step or step_back of the steppers of a and b.

According to the broadcasting rules, the shape of r is { 3, 4}. Thus, calling r.stepper_begin().step(1, 1) will eventually call b.stepper_begin().step(1, 1). However, since the shape of b is {4}, we want this call to be a no-op to avoid undefined behavior. To achieve that, a broadcasting offset is added to the stepper:

This implementation takes into account that the broadcasting is done on the last dimension and dimensions are stored in ascending order; here dimension 1 of a corresponds to dimension 0 of b.

This implementation ensures that a step in dimension 0 of the function updates the stepper of a while the stepper of b remains unchanged; on the other hand, stepping in dimension 1 will update both steppers, as illustrated below:

The red dots are initial stepper positions, the green dots and blue dots are the positions of the steppers after calling step with different dimension arguments.

Broadcasting iterators

xtensor iterator is implemented in the xiterator class. This latter provides an STL-compliant iterator interface and is built upon the steppers. Whereas the steppers are tied to the expression they refer to, xiterator is generic enough to work with any kind of stepper.

An iterator holds a stepper, the shape of the underlying expression, and a multi-dimensional index:

The reason for having a layout_type template parameter will be detailed shortly. As for xfunction_iterator, in this article, we focus on the dereferencing and the incrementing operators. Dereferencing the iterator simply forward to the stepper:

A call to operator++ increases the index and calls the step method of the stepper accordingly. The way the index is increased depends on the layout used for iterating. For a row-major order iteration over a container with shape {3, 4}, the index iterating sequence is:

{0, 0}
{0, 1}
{0, 2}
{0, 3}
{1, 0}
{1, 1}
{1, 2}
{1, 3}
{2, 0}
{2, 1}
{2, 2}
{2, 3}

This is why we passed the layout as a template parameter of xiterator. When a member of the index reaches its maximum value, it is reset to 0 and the member in the next dimension is increased. This translates into the calls of two methods of the stepper, first reset and then step. This is illustrated by the following picture:

The green arrows represent the iteration from {0, 0} to {0, 3}. The blue arrows illustrate what happens when the index is increased from {0, 3} to {1, 0}: first the stepper is reset to {0, 0}, then step(0, 1) is called, setting the stepper to the position {1, 0}.

We could implement this logic inside the operator++ method. However, as we will see in the next article, we will reuse this logic in another part of the library. Therefore, let’s implement it in a static function of the stepper_tools class. As we saw above, the way the stepper is incremented depends on the iteration order. We can make stepper_tools a template class and rely on template specialization to distinguish between row_major and column_major iteration orders (only the specialization for row_major is displayed below):

When the stepper is incremented while it has reached the last element, the to_end method is called. This moves the stepper to the position that makes it possible to decrement the iterator and have it pointing to the last element.

The implementation of the incrementing operator is now straightforward:

And we’re done with the implementation of iterators for now. We have omitted some points in this article to keep it simple. We will get back to them in a future article. At this point, we have all the necessary equipment to implement expression assignment, which is the topic of the next article.

More about the Series

This post is just one episode of a long series of articles:

How We Wrote Xtensor