xtensor is a C++ library meant for numerical analysis with multi-dimensional array expressions.
It exposes an API similar to that of NumPy covering a growing portion of the functionalities and more. Although it has been getting more and more adoption over the past few months, getting into the code base in order to contribute may be difficult.
This series of articles aims to reduce the entry cost by explaining the internals of the library and the reasons for its design. We will walk the reader through the design process from the beginning and we will explain the C++ techniques and patterns used in xtensor.
xtensor is a comprehensive framework for N-D array processing, including an extensible expression system, lazy evaluation, and many other features that cannot be covered in a single article. In this post, we focus on the first building block of the library: N-dimensional containers.
A first N-dimensional container
Our N-dimensional container will be a sequence container, therefore it should provide an API similar to that of the sequence containers from the STL (for both inner types and methods) so that it feels familiar to C++ developers.
However, since we develop an N-dimensional container, additional methods are needed compared to
std::vector; let’s forget iterators for now (we will get back to them later), the skeleton of our first N-dimensional container looks like:
Here we only defined inner types and methods related to size (number of elements in the container) and shape (size of each dimension). The inner types are similar to those of any STL sequence container so I won’t expand on them.
xarray has a dynamic number of dimensions (like NumPy arrays), thus its shape member (
m_shape) is a dynamically resizable container and we provide
reshape methods. The former allows to resize the container to any shape that might involve resizing the container, while the later keeps the total number of elements unchanged. Since their implementation implies resizing the underlying storage container, we skip it for now. The implementation of the other methods is pretty straightforward:
Now that inner types and size/shape methods are defined, we can focus on storage container and element access operator.
Storage container and strided indexing scheme
Let’s forget the N-dimensional case and focus on 2-D tensors for simplicity. The most obvious way to store data is to use a vector of vectors:
The most inner vectors can be the rows or the columns of the matrix, depending on the storage order we choose. However, this design has two major drawbacks. First, it is inefficient, many dynamic allocations occur each time we resize the tensor. Second, it does not scale to higher dimensions: we should add one nested vector per dimension; the type of
m_data would then depend on the number of dimensions, that is a static type would depend on a dynamic value, which is not possible. Fortunately, both problems have the same solution: use a 1-D buffer for storing data, and a mapping function that given the N-D index of an element, computes its location in the 1-D buffer.
In the case of a matrix, elements in the same row are stored contiguously, and rows are stored one after the other in the 1-D buffer. For 3-D tensors, matrices are then stored one after the other, which is easily generalized to higher dimensions.
The offset between two consecutive elements in the same dimension is called a stride and the scheme used to map indices into a location in the buffer is a strided indexing scheme. In such a scheme, the index
(i0, ..., in) corresponds to the offset
sum(ik * sk) from the beginning of the one-dimensional buffer, where
(s0, ..., sn) are the strides of the array. Some particular cases of strided schemes implement well-known memory layouts:
- the row-major layout (or C layout) is a strided index scheme where the strides grow from right to left. That’s the layout we used in the example above. For a 2-D tensor, elements the same row are stored contiguously.
- the column-major layout (or Fortran layout) is a strided index scheme where the strides grow from left to right. For a 2-D tensor, elements in the same column are stored contiguously.
We can now complete the skeleton of the
The first thing to notice is that we define a dedicated type for the strides. Even if it is the same as the shape’s type, giving it a different name gives a clearer API. The second thing to notice is the change in
reshape methods: they now require an additional parameter, the strides.
Having to compute and pass the strides each time we want to resize or reshape a tensor is going to be cumbersome. Especially if we consider that most of the time, a row-major or a column-major memory layout is used. In both cases, strides can be deduced from the layout. Let’s start with defining an enumeration for the layout:
dynamic will be used for any strided scheme that is not
column_major. Before we add the overloads of
reshape, let’s consider adding the layout as a template parameter. Indeed, although the layout could be a parameter set at runtime only, being able to set it at compile time will enable nice optimizations later. Thus the
xarray has two layout parameters: one static (the template parameter) and one dynamic (a new data member). When the static layout is
column_major, the dynamic one has the same value, and trying to change it will throw an exception. When the static layout is
dynamic, the layout data member can take any value of
XTENSOR_DEFAULT_LAYOUT is a macro that sets the default layout type to
row_major. Having a macro instead of a hard-coded value allows us to override it for testing the library with
column_major containers. We strongly discourage overriding this macro in userland.
Resize and reshape
reshape methods that accept a layout parameter have to compute new strides based on the shape and the layout. Since there is a chance that other parts of the library need this functionality, we encapsulate it in a dedicated free function:
This function accepts arbitrary container types for the
shape and the
strides, this way it is not tight to the choice of containers we made in
xarray. The implementation is pretty straightforward: each stride is the cumulative product of shape values from right to left or left to right depending on the
compute_strides also computes the total number of elements, this way the caller does not need to go through a new traversal of the shape.
However, the overloads accepting
strides need to compute the size of the 1-D buffer. Again, there’s a chance that we need to reuse this feature, so we implement it in a dedicated function:
We can now implement the
reshape overloads that accept
check_size are utility functions created for improving code readability:
The last methods to implement are the
reshape overloads that accept a layout parameter:
The implementation of these methods does not rely on the previous overloads for the following reasons:
- we don’t want
data_sizeto be called twice, which would be the case if
resize(shape, layout)was forwarding to
- we don’t want to allocate temporary strides in overloads accepting a layout parameter and pass them to the overloads accepting new strides; indeed we prefer to resize to allow optimization if the size of the strides does not change. Keep in mind that
m_stridesis a dynamic container, and resizing comes at a cost.
Note: the implementation of
reshape is not final, we will get back to it at a later stage when adding
That’s almost it for this first article; since we added a
layout_type data member, we also provide an accessor. At this stage, the
xarray API is:
In the next article we will implement the access operators.
More about the Series
This post is just one episode of a long series of articles: