How we wrote xtensor 4/N: Value Semantics

Johan Mabille
11 min readJul 16, 2019

xtensor is a comprehensive framework for N-D arrays processing, including an extensible expression system, lazy evaluation, and many other features that cannot be covered in a single article. In this post, we focus on the value semantics of N-D containers.

In the previous articles, we started to implement an N-D container. We covered the methods related to shape, dimension and layout, we went through the details of the access operator, and eventually, we added convenient constructors. Now is the time to focus on assignment operators and value semantics.

Value Semantics vs Entity Semantics

Value Semantics means you focus more on the values stored in the objects than on the objects themselves. Two objects are considered to be equal if they hold the same value:

std::complex<double> a(4, 2);
std::complex<double> b(4, 2);
std::cout << a == b << std::endl; // Outputs "true"
std::cout << &a == &b << std::endl; // Outputs "false"

This is the default semantics you get in C++, it closely follows the notation and reasoning from mathematics. A class implementing value semantics is a concrete type, and its instances are manipulated as objects or references of this type, not as objects or references of an abstract base type. Therefore the following code should be avoided if derived implements value semantics:

class base { ... };
class derived : public base { ... }
void some_function(const base& b) { ... }
derived i;
some_function(i);

The next section examines this question more closely.

Value semantics is one of these features that make C++ unique among other programming languages, especially object-oriented programming languages. Most of them only provide value semantics for basic builtin types (often called primitive types), and you cannot implement it for your own types:

i1 = 4
i2 = 5
i2 = i1
i2 += 4
# The following prints 4
print(i1)
class MyClass:

def __init__(self, value):
self.value = value
m1 = MyClass(4)
m2 = MyClass(5)
m2 = m1
m2.value += 4
# The following prints 8, not 4
print(m1.value)

What they provide instead is entity semantics, which is also available in C++. With entity semantics, we focus on the objects themselves rather than the values they hold. Distinct objects are always considered to be different, even if they hold the same value. The point of entity semantic is to capture the API in an abstract base class and implement different behaviors in inheriting classes. When operating on instances of these inheriting classes, their real type is usually hidden behind the abstract base type:

class base { ... };
class inheriting : public base { ... };
base* b = new inheriting;
// manipulates b as a base instead of an inheriting

Since we invoke methods of the base class but we actually want their redefinitions in the inheriting classes to be called, these methods have to be virtual. This is another major difference between value semantics and entity semantics: since we always know the exact type of instances of classes that implement value semantics, these classes never have to declare virtual methods.

This gives a performance advantage to value semantics under some circumstances. Do not get me wrong here: declaring a method as virtual does not add a lot of overhead, usually only one line of assembly code. For methods longer than a few lines, this is negligible. The problem is that most of the time virtual methods cannot be inlined by the compiler. For access methods of containers, this means keeping several lines that could have been replaced by a simple memory read. Given that access methods can be heavily called in computation libraries, this results in a significant performance hit.

Since xarray is designed to represent mathematical objects, providing access operators, and is meant to be used in heavy computation, it is natural to use value semantics.

Implementation of Value Semantics

Giving value semantics to a class means defining a constructor and an assignment operator that allow you to perform deep copies of its instances. C++11 added move semantics, which enables binding to rvalue references, so that copies can be avoided when working with temporaries. Therefore we have to define the following:

template <class T, layout_type L, class A>
class xarray
{
public:
// Copy semantics
xarray(const xarray& rhs);
xarray& operator=(const xarray& rhs);
// Move semantics
xarray(xarray&& rhs);
xarray& operator=(xarray&& rhs);
};

Let’s start with copy semantics, the implementation is straightforward:

template <class T, layout_type L, class A>
xarray<T, L, A>::xarray(const xarray& rhs)
: m_shape(rhs.m_shape)
, m_strides(rhs.m_strides)
, m_storage(rhs.m_storage)
, m_layout(rhs.m_layout)
{
}
template <class T, layout_type L, class A>
auto xarray<T, L, A>::operator=(const xarray& rhs) -> self_type&
{
m_shape = rhs.m_shape;
m_stride = rhs.m_strides;
m_storage = rhs.m_storage;
m_layout = rhs.m_layout;
return *this;
}

Move semantics is just as simple:

template <class T, layout_type L, class A>
xarray<T, L, A>::xarray(xarray&& rhs)
: m_shape(std::move(rhs.m_shape))
, m_strides(std::move(rhs.m_strides))
, m_storage(std::move(rhs.m_storage))
, m_layout(std::move(rhs.m_layout))
{
}
template <class T, layout_type L, class A>
auto xarray<T, L, A>::operator=(xarray&& rhs) -> self_type&
{
m_shape = std::move(rhs.m_shape);
m_stride = std::move(rhs.m_strides);
m_storage = std::move(rhs.m_storage);
m_layout = std::move(rhs.m_layout);
return *this;
}

Actually, since the xarray class embeds objects that all implement value semantics, and since it does not inherit from any other class, we could rely on the constructors and assignment operators generated by the compiler.

However, we will soon have to refactor the xarray class in a way that will prevent the compiler from generating them, therefore we did not lose our time defining these methods aforehand. Nevertheless, we can reduce the amount of code thanks to a new feature introduced with C++11:

template <class T, layout_type L, class A>
class xarray
{
public:
// Copy semantics
xarray(const xarray& rhs) = default;
xarray& operator=(const xarray& rhs) = default;
// Move semantics
xarray(xarray&& rhs) = default;
xarray& operator=(xarray&& rhs) = default;
};

Here the = default syntax means “generate the constructor / assignment operator” as the compiler would do if he could. This way, we no longer have to write their implementation.

At this point, a legitimate question is: what could prevent the compiler from generating default constructors and assignment operators? That is the point of the following section, which discusses design principles and patterns that will be used in the refactoring previously mentioned.

Value semantics and inheritance

The slicing problem

Remember from the first section that when dealing with objects that implement value semantics, we know their exact type; the classes of such objects should not be part of hierarchies of classes with public inheritance.

To understand why this can be problematic, let’s consider the following classes:

class container
{
public:
// ... public API ...
private:
storage_type m_storage;
};
class strided_container : public container
{
public:
// ... public API ...
private:
strides_type m_strides;
};

Here we rely on the generation of copy and move default methods from the compiler. This implementation does not prevent to write

void assign_container(const container& src, container& dst)
{
dst = src;
}
strided_container sc1;
// ... initializes sc1 ...
strided_container sc2;
// ... initializes sc2
assign_container(sc1, sc2);

The problem here is that only the container ‘s member of sc1 will be assigned to sc2. The data members specific to strided_container will never be copied from sc1 to sc2, resulting in an incomplete assignment. This is known as object slicing.

Protected copy and move semantics

A solution to this problem would be to never use inheritance with classes that implement value semantics. However, this is too restrictive: you might want to capture common behaviors of different types in a base class so that you do not have to duplicate their implementation.

Another solution is to forbid the assignment of base objects in external code:

class container
{
protected:
container& operator=(const container& rhs)
{
m_storage = rhs.m_storage;
return *this;
}
private: storage_type m_storage;
};

With such a declaration, only classes inheriting from container can call its assign operator, and the implementation of assign_container will trigger a compilation error. Here, only the copy assignment operator has been declared as protected, but the reasoning is the same for the copy constructor and the move semantics methods.

However, declaring these methods as protected prevents the compiler from generating them in the inheriting classes. That is, the following code has become invalid:

strided_container sc1;
strided_container sc2;
sc1 = sc2; // error, operator= is not reachable within this context

To fix this issue, the assignment operator has to be defined as a public member in the inheriting class:

class strided_container : public container
{
public:
strided_container& operator=(const strided_container& rhs)
{
container::operator=(rhs);
m_strides = rhs.m_strides;
return *this;
}
// ... similar implementation of constructor and move semantics
private: strides_type m_strides;
};

When the implementation is trivial, we can use the = default syntax mentioned in the previous section:

class strided_container : public container
{
public:
strided_container&
operator=(const strided_container& rhs) = default;
strided_container(const strided_container&) = default;
strided_container&
operator=(strided_container&&);
strided_container(stridd_container&&);
private: strides_type m_strides;
};

Private inheritance

Defining copy and move semantics methods as protected methods is possible when you define your own class. However, you may want to reuse code from thrid-party classes that were not meant to be used as base classes, and which define copy and move semantics methods as public methods. In that case, public inheritance can lead to the slicing issue described before.

Wrapping the class and forwarding the calls can result in a lot of boilerplate code, especially when the wrapped class provides a lot of methods. For instance, if you want to reuse the std::vector container, you need to define more than thirty methods! A common pattern to solve this problem elegantly is the combination of private inheritance and using directives:

template <class T>
class my_container : private std::vector<T>
{
public:
using base_type = std::vector<T>; using base_type::operator[];
using base_type::at;
using base_type::front;
using base_type::back;
// ... Implementation of copy and move semantics ...
};

Private inheritance makes all the public methods of the base class accessible to the inheriting class only, including constructors and assignment operators. Therefore the object slicing cannot happen.

The using directive makes the specified method of the base class public again. This kills two birds with one stone: we no longer have to write a lot of boilerplate code, and we do not even need to reimport all the methods of the base class, we can choose the ones we want to expose. And last but not least, when a method has a const overload (such as an access operator), the using directive imports both of them! Sweet.

Destructor

A point that we have not addressed yet is the destructor. Consider the following code:

class my_base
{
public:
my_base();
~my_base();
protected: my_base(const my_base&) = default;
my_base& operator=(const my_base&) = default;
my_base(my_base&&) = default;
my_base& operator=(my_base&&) = default;
};
class my_derived : public my_base
{
public:
my_derived() { // ... }
~my_derived()
{
delete[] p_data;
}
my_derived(const my_derived&) { // ... }
my_derived& operator=(const my_derived&) { // ...}
my_derived(my_derived&&) { // ... }
my_derived& operator=(my_derived&&) { // ... }
private: double* p_data;
};

Similarly to the object slicing issue, this design can lead to incomplete object destruction and memory leak:

void some_function()
{
my_base* b = new my_derived();
// ...
delete b;
}

Even if my_derived is not intented to be used this way, our design does not prevent it.

Traditional polymorphism (that is, entity semantics) solves this problem by declaring the destructor as virtual. However, with value semantics, this is something we want to avoid. Since similar problems have similar solutions, declaring the destructor of the base class as protected makes some_function invalid and fixes the design.

For consistency, the constructors of the base class (including the default one if it exists) should be declared as protected too.

Digression: Entity Semantics and inheritance

Although xtensor does not make use of entity semantics, covering the details of its implementation cannot hurt. Besides, it might give a better understanding of its differences with value semantics.

When manipulating objects with entity semantics, we want to hide their real type and deal with an abstract interface only:

class my_base
{
public:
virtual do_something() = 0;
};
class my_derived : public my_base
{
public:
do_something() override { //...}
};
void some_function(my_base& base)
{
//...
base.do_something();
}

That is, the usage we wanted to avoid with value semantics is precisely the one we want with entity semantics. Therefore, we face the same object slicing and potential memory leak issues:

void assign(const base* src, base* dst)
{
*dst = *src; // incomplete assignment
delete src; // incomplete destruction
}
base* src = new my_derived;
base* dst = new my_derived;
assign(src, dst);

The memory leak issue is simple to address, simply declare the destructor as virtual:

class my_base
{
public:
virtual ~my_base();
virtual do_something() = 0;
};
class my_derived
{
public:
virtual ~my_derived();
virtual do_something() = 0;
};

This, combined with the patterns we detailed in the previous section, are the sources of the following rule of thumb you probably know:

The destructor of a base class should be either public and virtual or protected and non virtual.

The object slicing problem requires a more drastic solution. Remember that with entity semantics, we focus more on the objects themselves than on the data they hold. Since we manipulate objects through pointers or references to their base type, and since constructors and assign operators cannot be virtual, this means that we should not be able to assign or copy an object to another. Before C++11, the pattern was to declare copy semantics methods as private and never implement them:

class my_base
{
public:
virtual ~my_base();
virtual do_something() = 0;
private: my_base(const my_base&); // Never implemented
my_base& operator=(const my_base&); // Never implemented
};

This prevents the compiler from generating the copy constructor and the assignment operator in the inheriting classes. C++11 introduced a more explicit syntax:

class my_base
{
public:
virtual ~my_base();
virtual do_something() = 0;
my_base(const my_base&) = delete;
my_base& operator=(const my_base&) = delete;
my_base(my_base&&) = delete;
my_base& operator=(my_base&&) = delete;
};

It is sometimes useful to be able to create an object as the copy of another one. A typical pattern that makes use of this technics is the prototype: objects are always instantiated as copies of a prototype object. Therefore we need to emulate a virtual copy constructor, usually referenced as a cloning method:

class my_base
{
public:
virtual ~my_base();
virtual do_something() = 0;
virtual my_base* clone() const = 0;
my_base& operator=(const my_base&) = delete;
my_base(my_base&&) = delete;
my_base& operator=(my_base&&) = delete;
protected: my_base(const my_base&);
};
class my_derived : public my_base
{
public:
my_derived* clone() const override
{
return new my_derived(*this);
}
protected: my_derived(const my_derived& rhs) { // ... }
};

The copy constructor is removed from the list of the deleted methods and declared as protected so that it cannot be called from outside of the class. The clone function delegates its implementation to this copy constructor, and the C++ rules ensure that the copy constructor of the base class is called.

Conclusion

Value semantics and entity semantics are key concepts in C++. Implementing them is not that hard with the following rules of thumb:

Value semantics

  • constructors, destructor and assignment operators of a public base class must be declared as protected
  • When using an existing class with public destructor as a base class, use private inheritance and using directives
  • in the inheriting classes, if you implement one of the followings: destructor, copy constructor, move constructor or assignment operators, you should implement them all

Entity semantics

  • The destructor of the base class should be public and virtual
  • The copy constructor, the move constructor and the assignment operators should be declared as public and deleted
  • If you need cloning, declare a public virtual method that relies on the copy constructor, and declare this latter as protected.

In the next article, we will focus on xtensor’s expression system. Indeed, xtensor is more than an implementation of an N-D array but a lazy expression system for array manipulation.

More about the Series

This post is just one episode of a long series of articles:

How We Wrote Xtensor

--

--

Johan Mabille

Scientific computing software engineer at QuantStack, C++ teacher, template metamage