zarray: a dynamic expression system based on xtensor
In this article we demonstrate how we pushed xtensor one step further, implementing a dynamic expression system on top of it.
xtensor is a comprehensive C++framework for multi-dimensional array processing, including an extensible expression system, lazy evaluation, broadcasting, and many other features.
It exposes an API similar to that of NumPy covering a growing portion of the functionalities and more, and has acquired significant adoption over the past few years.
Although it allows to write C++ code that looks like NumPy, a major difference is that xtensor is statically typed, meaning the type of the array data is specified at compile time. That model is not suited to discover type at runtime. The goal of zarray is to fill the gap between the statically typed world and the dynamically typed world, so that one can write code like:
In order not to reinvent the wheel,
zarray is built on top of
xtensor to reuse its expression system.
The problem of pure static types
Assume you want to instantiate a tensor whose data is stored in a file. You cannot implement a single function for that, since you will discover the data type along with the data when you read the file. Indeed, you cannot have a return type depending on runtime information in C++:
Does that mean that we are doomed to implement reading methods for all possible data types and switch between them depending on the type we discover? Fortunately for us, there is a technique that can help us here.
Type erasure to the rescue
The idea is to wrap the different types into classes that all inherit from a common base. You can then use this common base class as the return type of a function, or as the value type of a container. Therefore, type erasure allows you to use various concrete types through a single generic interface. The concrete type is “erased”.
Let’s illustrate this with the templated class
zexpression_wrapper is used to wrap an xarray, but it can even wrap xtensor unevaluated expressions!
With this wrapper, one can now write the
We could go further and have template functions that build the right expression wrappers. These functions could be registered in a map whose key would be a string representing the data type. This would replace the cascading
if structure with a simple lookup in the map and would simplify the code of
read_tensor. But that’s an implementation detail.
Although this is enough to achieve type erasure, this design is not very easy to use: users have to dynamically allocate the objects, you can manipulate them throught pointers or references to the
zarray_impl class only, and they must not forget to free memory (or use smart pointers as above). It would be nice if we could add some value semantics, so that we could manipulate
zarray_impl objects as we manipulate
This can be easily implemented with a facade:
We can now copy and move
zarray objects just like we do with
xarray objects, and no longer need to bother with memory management. The architecture of the wrappers is represented on the following diagram:
These wrappers are really convenient for implementing deserialization methods. However, they are quite limited: you cannot do more than instantiating them. Next step is to add arithmetic operations and mathematical functions.
Dispatching the operations
In this section we illustrate the principle of dispatching with the addition operator.
When we write
za = zb + zc where
zarray objects, we want to propagate the addition to the wrapped
xc to compute
xa = xb + xc. Therefore, we want to do the opposite of type erasure: we want to retrieve the real type of objects hidden behind a generic interface. This is called dispatching: the function that operates on the interface dispatches the call to specialized functions that operate on concrete types.
The most common case of dispatching is single dispatch. It is achieved with inheritance and a virtual method: the program will “dispatch” the call of the virtual method from the base class to the implementation method of the concrete type.
When more than one argument is involved, we speak of multiple dispatch (or multimethods). In our case, we need to implement double-dispatch for unary operators and mathematical methods (because we dispatch on both the single argument and the result) and triple-dispatch for binary operators.
I will not go through all the details of implementing a generic multiple dispatch module, this deserves a chapter in a book. And actually, Andrei Alexandrescu already wrote it in “Modern C++ design”. The implementation in
zarray is based on that from Alexandrescu, with some improvements: we rewrote it in modern C++, making it easy to generalize to arbitrary arity. And we added the ability to do partial dispatching.
The principle of multiple dispatch is rather simple: each concrete type has a unique index. The specialized functions which operate on concrete types are registered into a dictionary whose keys are multi-indexes, built from the unique indices of concrete types. The function that operates on the generic types simply asks its arguments for their indices, looks for the specialized function in the dictionary, and invokes it. The specialized function then downcasts its arguments to their real type, and forwards the call to the wrapped objects:
There is still a slight issue. If you look at the wrappers again, you should notice that they can wrap absolutely all expressions. This means that the number of concrete types is infinite, which makes implementing the dispatchers an impossible task. Therefore, we need to reduce the number of possible concrete types.
The idea is to introduce an intermediate layer between the generic API and the wrapper, that is templated by the
value_type (i.e. the type of the data stored in the expression):
zexpression_wrapper can hold non-evaluated expressions, we must add an
xarray cache data member, used for precomputing the expression. It is returned by reference in the
However, for performance consideration, when
zexpression_wrapper holds an
xarray object, we don’t want to copy it into the cache data member. The
get_array method can return it directly. Therefore, we need a special wrapper for
So far, we have implemented type erasure and multiple dispatch. They are enough to have a computation framework that operates on data whose type is unknown at compile time. However, such a framework would lose a key feature of
xtensor: the lazyness. Indeed, the current implementation of specialized functions for arithmetic operators and mathematic functions returns a wrapper on
xarray instead of an non-evaluated expression.
The idea of lazy computation is to build arbitrarily complex expressions and to evaluate them at once when we need the result, without allocating temporaries.
zarray implements laziness the same way as
xtensor: instead of computing a result that is returned as a temporary
zarray object, the arithmetic operators and mathematical functions are overloaded to return an object representing a node in the expression tree:
zfunction internally holds the operands and the dispatcher that performs the computation when the user evaluates the whole expression.
Although this looks quite simple, overloading all the arithmetic operators and the mathematical functions is a lot of work. Besides, it’s a lot of boilerplate code very similar to that in
xtensor (see xmath.hpp and xoperation.hpp).
xtensor provides a mean to reuse most of this code. All we have to do is to create a special tag for zarray expressions, and specialize the select_xfunction_expression metafunction so that it returns a
zfunction instead of an
xfunction, and we get all the arithmetic operations and mathematic functions for free!
We can now build an expression tree whose leaves are
zarray objects. The dynamic expression system is almost over, the only missing part is the assignment.
Assignment of dynamic expressions
We cannot perform the assignment of dynamic expressions as we do in
xtensor, that is, iterate over the whole expression tree and compute each value of the result in a single loop. Indeed, since the real type of the operands is erased, so is that of the result. Only the dispatcher of an operator or a mathematic function knows the actual type of the result.
Does that mean that we must perform the computation node by node, with the dispatcher allocating a temporary holding the result? If so, that would make the entire design useless.
Fortunately, even if we cannot eliminate the computation node by node, we can at least allocate a single temporary for the whole computation. The idea is quite simple: if the dispatchers know the result type, they do not have to allocate the result. We can split the assignment process: first, we ask the dispatchers for the result type (they can return the index of the concrete type, used to retrieve the specialized functions in the dispatcher dictionary for instance).
Then we allocate the result and pass it to the right hand side expression tree (via an
assign_to method for instance). The result object is then passed as an argument to the dispatchers, along with the operands. This way, the dispatcher only runs the assignment loop, no extra temporary is allocated.
This is illustrated in the following diagram:
Although still in early preview, zarray already contains all the building blocks required to build a full operational dynamic expression system. It also supports views and chunked arrays, as we will see in a future article.
The development of zarray is led by QuantStack.
About the Author
Johan Mabille is a scientific software developer at QuantStack, specializing in high-performance computing in C++. Prior to joining QuantStack, Johan was a quant developer at HSBC.