1. Introduction


The present paper tries to explain in details a possible implementation of the C++ object model based on the proposal of the C++ Common ABI [1] and the paper of Nathan Sidwell [2]. It does not claim to explain the object model of your favorite C++ compiler which may differ from the present one. Some other C++ object models have been considered but the selected object model has some advantages over the others:

This paper uses some color convention to make contextual information more visual:

Since these colors should not appear on printed copy, all boxes have been labeled by context and follow the section numbering, except the C++98 citation which follow the Standard's tags.

1.1 Class hierarchy


The C++ class hierarchy of the figure 1.1 will be build step by step along this paper to see how the underlying objects layouts and the hidden data structures look like. To allow a better understanding of the object model, the C++ code of each class will be translated into an equivalent C89 code with the help of some constants and macros defined in the file object_model.h. For clarity, this equivalent code will be split into interface files (i.e. X_layout.h) and implementation files (i.e. X_layout.c), a choice which does not reflect any rule or constraint of C++.

DCABBA hierarchy
Figure 1.1 Complete class hierarchy of DCABBA

The class hierarchy of DCABBA is summarized in the sample code C++ 1.1. Each inheritance level of this hierarchy will show a new facet of the object model..

C++ 1.1
struct A { ... };
struct B { ... };
struct C : virtual B { ... };
struct AB : virtual A, virtual B { ... };
struct BA : virtual B, virtual A { ... };
struct ABBA : protected AB, BA { ... };
struct D { ... };
struct DCABBA : D, C, ABBA { ... };

All the classes involved in the construction of DCABBA are polymorphic except the class D which is monomorphic (see note 1.1, C++98 [class.virtual] and definition 1.1). The protected inheritance of AB in the class ABBA definition will introduce the difference between inheritance of implementation and inheritance of type. It will also allow to show how dynamic_cast<>() behaves with access protection.

1.2 Polymorphism


Etymologically, the adjective polymorphic qualifies objects that can take many (poly) forms (morph). Object oriented programming langages introduced more subtle terms to make the difference between various forms of polymorphism supported by the object oriented concepts.

Note 1.1
In this paper, the term polymorphism refers to subtyping polymorphism (i.e. inheritance). The terms parametric polymorphism and ad-hoc polymorphism will be discarded in favor of the more specific C++ concepts of respectively templates and overloading.

It is worthwhile to see what C++98 says about polymorphic classes and virtual functions.

C++98 [class.virtual]
A class that declares or inherits a virtual function is called a polymorphic class. A virtual function must be a non-static member function since a virtual function call relies on a specific object for determining which function to invoke. Explicit qualification with the scope operator :: suppresses the virtual call mechanism.

We need to generalize this definition to allow futher extension specific to the object model because as we will see later, a class that inherits a virtual base will also be a polymorphic class, even if its definition does not contain any virtual function.

Definition 1.1
A class is said to be polymorphic as soon as it exists some runtime link between all the class instances and the class definition. This link takes the form of a pointer to the class virtual table in the object and provides the support for dynamic binding, dynamic typing and dynamic casting. A class without this link is said to be monomorphic and does not support these concepts.
[top]

2. Polymorphic class


The simple polymorphic class A let us introduces the basics of the layout of polymorphic objects.

C++ 2.1
struct A {
  A(long a_) : a(a_)      {}
  virtual ~A()            { a = 0; }
  virtual void f() const  {}
protected:
  long a;
};

The class A defines a non-trivial constructor, a virtual destructor, a constant virtual function f and a protected data member a. Class definition involves few representative elements and one of the most important of them is the object layout. Objects size must remain as small as possible since it may exist a large number of simultaneous instances at runtime.

2.1 Object layout


From C++98 [class.virtual] and definition 1.1, we know that if a class has a virtual function all its instances must hold a hidden pointer somewhere. The location of this pointer must be relative to the this pointer since the latter may point to a subobject included in a larger object of derived type. So it is time for the first ABI rule.

Rule 2.1
The pointer to the class virtual table must be located at offset zero from the class instance's this pointer.

According to this rule, the layout of instances of A should be:

C 2.1
typedef struct {
  const VTABLE *vptr;                   /* compiler */
  long a;                               /* A member */
} A;

As expected, the data member a is present in the object layout of A. But it must be placed after the special hidden pointer vptr required by the definition 1.1 which must be the first member to respect the rule 2.1. The identifier (if any) of this pointer is implementation defined but we often find identifiers like __vtbl or __vptr in the litterature. To enhance the readability of the code, the double underscores prefix __ reserved for implementation specific identifiers will be omitted. On the other hand, hidden members or statements will be followed by the compiler comment on the same line.

The type pointed by vptr (i.e. the slots of the virtual tables) is implementation defined. For the purposed of the object model layout, we need a C89 integral type able to hold a signed offset, a pointer to data and a pointer to function since all these kind of data will be stored in the virtual tables.

C 2.2
typedef ptrdiff_t VTABLE;

If this type does not fit the implementation requirements, the compiler is free to use a more appropriate integral type or a union type. In C99, intptr_t would have been more approriate.

2.2 Class interface


Apart from the object layout, the class must also export the declaration of its member functions, virtual or not, and its type_info (to be discussed later in this section).

C 2.3
void A_ctor(A *this, long a);
void A_dtor(A *this);
void A_f   (const A *this);
extern const type_info A_info[];
Note 2.1
C++ uses an algorithm called name mangling to reflect scope (i.e. namespace and class), overloading (i.e. function parameter) and parameterization (i.e. template parameter) of external identifier at the ABI level (see [1]).

To simulate class membership in C89, the class member function will be prefixed by the class name. The data member will be prefixed by the class name if they are part of a subobject. Given a class X and its data member a and member function f, the C89 equivalent names will be respectively a or X_a if in a subobject and X_f. The prefix X is also a reminder of the object type pointed by this.

The first hidden argument of a non-static member function is always the this pointer which points to the current object. Inside a member function this can be explicitly used to disambiguate access to a member which share the same identifier with another visible symbol. It is also implicitly used each time a data member or a virtual function is accessed.

Note 2.2
In the definition of a non-static member function of class X which declares a member a and a virtual function f:

If the member function is const-qualified like A::f() in C++ 2.1, then this is a pointer to a constant object. It is not possible to assign a value to this nor to take its address. To simulate these points in C89, this will be register and const qualified in the member function definition (see C 2.5). Some people claim that it would have been cleaner to define this as a reference.

C++98 [class.this]
In the body of a non-static member function, the keyword this is a non-lvalue expression whose value is the address of the object for which the function is called. The type of this in a member function of class X is X*. If the member function is declared const, the type of this is const X*.

Finally to simplify the use of the helper macros defined in object_model.h, some extra constants and types are defined. They represent a small part of the implicit knownledge missing to the C89 compiler to behave like the C++ compiler.

C 2.4
enum { 
  INDEX(A_dtor) = FIRST_VFUNC,
  INDEX(A_f   ) = AFTER_VFUNC(A_dtor)
};
typedef void (TYPE(A_dtor))(A*);
typedef void (TYPE(A_f   ))(const A*);

The enumerated constants are the indexes of the virtual functions inside the virtual table of the class. FIRST_VFUNC is the index of the first virtual function and AFTER_VFUNC() returns the index of the virtual function just after the operand (relative increment).

These indexes are invariant by inheritance and therefore always known by the compiler from the class definition. For example, any derived class of A will use the same index INDEX(A_f) to find its own definition of the virtual function f inside its virtual table when reinterpreted as being class A.

The new type definitions will allow to reinterpret the slots of the virtual table as pointers to functions.

2.3 Class member functions


The member functions of class A are easy to translate to C. As a general rule, the member functions definitions will stick to the smallest implementation, including constructors and destructor.

C 2.5
void A_ctor(register A* const this, long a)
{
  this->vptr = A_vtbl;                  /* compiler */
  this->a = a;
}
void A_dtor(register A* const this)
{
  this->vptr = A_vtbl;                  /* compiler */
  this->a = 0;
}
void A_f(register const A* const this) {}

As before, the compiler comments mark where the compiler inserts hidden statements. The constructor A_ctor and the destructor A_dtor have to set the pointer vptr to point to the virtual table A_vtbl of class A before the user takeover to ensure that the polymorphic object pointed by this is exactly of type A. Indeed, the constructors and the destructor of class A could be invoked on a subobject of type A by the constructors and the destructor of a derived class with a different setting for vptr. These assignments could be skipped by the compiler if it can prove that polymorphism is not used until returning.

C++98 allows a class to declare many constructors [class.ctor] but only one destructor [class.dtor]. The citation [class.cdtor] will have heavy consequences on the number of virtual table required per class since it does not give much freedom other than reassigning the object's vptr pointer when entering in its constructors and destructors.

C++98 [class.ctor]
A constructor is used to initialize object of its class type. No return type can be specified for it. The address of a constructor shall not be taken. A constructor shall not be virtual or static. A constructor can be invoked for a cv-qualified object. A constructor shall not be cv-qualified. cv-qualifier are not applied on an object under construction and come into effect once the constructor for the most derived object ends.
C++98 [class.dtor]
A destructor is used to destroy object of its class type. No parameter or return type can be specified for it. The address of a destructor shall not be taken. A destructor shall not be static. A destructor can be invoked for a cv-qualified object. A destructor shall not be cv-qualified. cv-qualifier are not applied on an object under destruction and stop being into effect once the destructor of the most derived object starts.
C++98 [class.cdtor]
Member functions, including virtual functions, can be called during construction or destruction.

When a virtual function is called directly or indirectly from a constructor (including member initializers) or from a destructor, and the object to which the call applies is the object under construction or destruction, the function called is the one defined in the constructor or the destructor's own class or in one of its bases, but not a function overriding it in a class derived from the constructor or destructor's class, or overriding it in one of the other bases of the most derived object.

If the virtual function call uses an explicit class member access and the object expression refers to the object under construction or destruction but its type is neither the constructor or destructor's own class or one of its bases, the result of the call is undefined.

2.4 Class virtual table


The virtual table is the key hidden data structure of a polymorphic class shared by all its instances through their vptr pointer. Virtual tables hold different data types in their slots and every new declaration of virtual function or virtual base make them growing in one or two directions as shown by the arrows on the right in the figure 2.1. Slots tagged as offsets hold a signed integral value which must be added to the this pointer to find either the most derived top object (i.e. negative offset in slot [-2]) or a virtual base subobject (i.e. positive or negative offsets in slots [-3] to [-n-2]). The type_info slot holds the address of the runtime type information object of the class required by typeid() and dynamic_cast<>(). The remaining slots with positive indexes hold the addresses of the member functions declared virtual or the addresses of an equivalent thunks (see TODO REFERENCE TO THUNK DEFINITION).

Virtual table layout
Figure 2.1 Virtual table layout with slots of type VTABLE.

If we apply this virtual table layout to class A (see C++ 2.1), we obtain the table descibed in C 2.6.

C 2.6
static const VTABLE A_vtbl_[] = {
  0,                                    /* A->A       */
  (VTABLE)A_info,                       /* type_info  */
  (VTABLE)A_dtor,                       /* dtor       */
  (VTABLE)A_f                           /* f          */
};
static const VTABLE* const A_vtbl = A_vtbl_ + VSTART(NO_VBASE);

The class A does not have virtual base so its virtual table A_vtbl_ starts by the offset to top which is always zero in the class primary virtual table. The comment A->A specifies that this offset must be added to this to convert the pointee from type A to type A (trivial case). Then follow the address of the class type_info and the addresses of the two member functions declared virtual.

The pointer A_vtbl provides direct access to the correct entry point of the virtual table and avoids the propagation of VSTART() adjustment in constructors and destructors. This slot adjustment depends on the number of virtual base, that is NO_VBASE in the case of A.

2.5 Class runtime type information (RTTI)


The last missing piece of the class is its type_info hidden object which can only be retrieved through the typeid() operator.

C++98 [expr.typeid]
The result of a typeid expression is an lvalue of type const std::type_info. The lifetime of the object referred by the lvalue extends to the end of the program. The top level cv-qualifier of the lvalue expression are always ignored.

When typeid is applied to an expression being an lvalue of a polymorphic type, the result refers to a type_info object representing the type of the most derived object to which the lvalue refers. The expression is evaluated.

When typeid is applied to an expression other than an lvalue of a polymorphic type, the result refers to a type_info object representing the static type of the expression. The expression is not evaluated.

The typeid operator can be applied either to a type or to an lvalue expression. It evaluates (at runtime) its operand only if it is a lvalue expression refering to a polymorphic type. This behavior can be compared to the C99 sizeof operator which evaluates its operand only if it refers to a dynamic size (i.e. variable length array).

C++98 defines type_info as an abstract class (i.e. virtual destructor) which should be specialized for specific type (see the C++ Common ABI [1] for a possible list of specialization).

C++98 [lib.type.info]
The class type_info describes type information generated by the implementation. Object of this class effectively store a pointer to a name for the type, and an encoded value suitable for comparing two types for equality or collating order. The names, encoding rule, and collating sequence for types are all unspecified and may differ between programs.

In principle, equality and collating can be directly achieved using the address of the class type_info object. We still have to store the class name as a string litteral plus some information on the bases (see C 2.7). The types type_info and struct base_info are defined in the header object_model.h.

C 2.7
typedef struct {
  const char             *name;
  const struct base_info *base;
  unsigned                base_cnt;
} type_info;

struct base_info {
  const type_info        *base_type;
  const ptrdiff_t         base_offset;
  const enum { ... }      base_flag;
};

The type struct base_info does not exist in C++. It is only defined for the purpose of the present object model. Its base_flag member holds information about the derivation of the bases like public_base, protected_base, private_base and virtual_base which are required by dynamic_cast to behave correctly.

The content of the type_info object is rather simple since the class A does not have any base.

C 2.8
const type_info A_info[] = {{
  "A",                                  /* class name */
  0, 0                                  /* no base    */
}};

That it! We have finished to implement the layout of our first class.

[top]

3. Another polymorphic class


[top]

4. Single virtual inheritance


4.1 Object layout


4.2 Class interface


4.3 Class member functions


4.4 Class virtual table


4.5 Class runtime type information (RTTI)


4.6 Class thunks


the addresses of intermediate some very small pieces of code called thunks which make the required offset adjustment (if any) to the this pointer before calling the corresponding member function (see TODO REF FOR MORE INFO ON THUNK).

[top]

5. Multiple virtual inheritance


[top]

6. Inheritance and subtyping


[top]

7. Monomorphic class


[top]

8. Putting all together


[top]

9. Dynamic casting


[top]

96. Protocols


In type_info: protocol_cnt + protocol
In protocol_info: idem base_info mais offset is within vtable.
protocol_cast<>().
Declaration
Pointer like invocation +

[top]

97. Naming convention


Thunk

[top]

98. File source code


Files

[top]

99. References


If you want to know more about the existing C++ object models you can read the following references:

  1. C++ ABI Summary
  2. A Common Vendor ABI for C++ (PDF)
    Nathan Sidwell, ACCU 2003.
  3. Empirical Study of Object-Layout Strategies and Optimization Techniques
    Natalie Eckel, Research Thesis
  4. Design and Evolution of C++
    Bjarne Stroustrup, Addison-Wesley 1996.
  5. The C++ Programming Language
    Bjarne Stroustrup, Addison-Wesley 2000.
  6. Inside the C++ Object Model
    Stanley B. Lippman, Addison Wesley 1996.
[top]

100. Reminder


  1. Subtyping and Liskov subtitution principle + paper ref
  2. Covariant, contravariant types
  3. Cast (explicit) and type coercion (implicit)
  4. Exception.
  5. pure virtual and default init.