The present paper tries to explain in details a possible implementation of the C++ object model based on the proposal of the C++ Common ABI [1] and the paper of Nathan Sidwell [2]. It does not claim to explain the object model of your favorite C++ compiler which may differ from the present one. Some other C++ object models have been considered but the selected object model has some advantages over the others:
This paper uses some color convention to make contextual information more visual:
Since these colors should not appear on printed copy, all boxes have been labeled by context and follow the section numbering, except the C++98 citation which follow the Standard's tags.
The C++ class hierarchy of the figure
1.1 will be build step by step along this paper to see how the
underlying objects layouts and the hidden data structures look
like. To allow a better understanding of the object model, the C++
code of each class will be translated into an equivalent C89 code with
the help of some constants and macros defined in the file object_model.h
. For clarity, this
equivalent code will be split into interface files
(i.e. X_layout.h
) and implementation files
(i.e. X_layout.c
), a choice which does not reflect any
rule or constraint of C++.
DCABBA
The class hierarchy of DCABBA
is summarized in the sample
code C++ 1.1. Each inheritance
level of this hierarchy will show a new facet of the object
model..
struct A { ... }; struct B { ... }; struct C : virtual B { ... }; struct AB : virtual A, virtual B { ... }; struct BA : virtual B, virtual A { ... }; struct ABBA : protected AB, BA { ... }; struct D { ... }; struct DCABBA : D, C, ABBA { ... };
All the classes involved in the construction of DCABBA
are polymorphic except the class D
which is monomorphic
(see note 1.1, C++98 [class.virtual] and definition 1.1). The protected
inheritance of AB
in the class ABBA
definition will introduce the difference between inheritance of
implementation and inheritance of type. It will also allow to show how
dynamic_cast<>()
behaves with access
protection.
Etymologically, the adjective polymorphic qualifies objects that can take many (poly) forms (morph). Object oriented programming langages introduced more subtle terms to make the difference between various forms of polymorphism supported by the object oriented concepts.
It is worthwhile to see what C++98 says about polymorphic classes and virtual functions.
::
suppresses the virtual call mechanism.We need to generalize this definition to allow futher extension specific to the object model because as we will see later, a class that inherits a virtual base will also be a polymorphic class, even if its definition does not contain any virtual function.
The simple polymorphic class A
let us introduces the
basics of the layout of polymorphic objects.
struct A { A(long a_) : a(a_) {} virtual ~A() { a = 0; } virtual void f() const {} protected: long a; };
The class A
defines a non-trivial constructor, a virtual
destructor, a constant virtual function f
and a protected
data member a
. Class definition involves few
representative elements and one of the most important of them is the
object layout. Objects size must remain as small as possible since it
may exist a large number of simultaneous instances at runtime.
From C++98
[class.virtual] and definition
1.1, we know that if a class has a virtual function all its
instances must hold a hidden pointer somewhere. The location of this
pointer must be relative to the this
pointer since the
latter may point to a subobject included in a larger object of derived
type. So it is time for the first ABI rule.
this
pointer.
According to this rule, the layout of instances of A
should be:
typedef struct { const VTABLE *vptr; /* compiler */ long a; /* A member */ } A;
As expected, the data member a
is present in the object
layout of A
. But it must be placed after the special
hidden pointer vptr
required by the definition 1.1 which must be the first member to
respect the rule 2.1. The
identifier (if any) of this pointer is implementation defined but we
often find identifiers like __vtbl
or __vptr
in the litterature. To enhance the readability of the code, the double
underscores prefix __
reserved for implementation
specific identifiers will be omitted. On the other hand, hidden
members or statements will be followed by the compiler
comment on the same line.
The type pointed by vptr
(i.e. the slots of the virtual
tables) is implementation defined. For the purposed of the object
model layout, we need a C89 integral type able to hold a signed
offset, a pointer to data and a pointer to function since all these
kind of data will be stored in the virtual tables.
typedef ptrdiff_t VTABLE;
If this type does not fit the implementation requirements, the
compiler is free to use a more appropriate integral type or a union
type. In C99, intptr_t
would have been more approriate.
Apart from the object layout, the class must also export the
declaration of its member functions, virtual or not, and its
type_info
(to be discussed later in this section).
void A_ctor(A *this, long a); void A_dtor(A *this); void A_f (const A *this); extern const type_info A_info[];
To simulate class membership in C89, the class member
function will be prefixed by the class name. The data member will be
prefixed by the class name if they are part of a subobject. Given a
class X
and its data member a
and member
function f
, the C89 equivalent names will be respectively
a
or X_a
if in a subobject and
X_f
. The prefix X
is also a reminder of the
object type pointed by this
.
The first hidden argument of a non-static member function is always
the this
pointer which points to the current
object. Inside a member function this
can be explicitly
used to disambiguate access to a member which share the same
identifier with another visible symbol. It is also implicitly used
each time a data member or a virtual function is accessed.
X
which declares a member a
and a virtual
function f
:
a = 0
is equivalent to this->a =
0
.f()
is equivalent to
this->f(this)
.
The virtual invocation f()
is not equivalent to
this->X::f(this)
which is equivalent to the static
invocation X::f(this)
(see C++98 [class.virtual]). If
this
points to a subobject included into a larger object
of a derived type and if this derived type overrides f
,
the two invocations differ.
If the member function is const-qualified like
A::f()
in C++ 2.1,
then this
is a pointer to a constant object. It is not
possible to assign a value to this
nor to take its
address. To simulate these points in C89, this
will be register
and const
qualified in the
member function definition (see C 2.5). Some people claim that it would have been
cleaner to define this
as a reference.
this
is a non-lvalue expression whose value is the
address of the object for which the function is called. The type of
this
in a member function of class X
is
X*
. If the member function is declared
const
, the type of this
is const
X*
.
Finally to simplify the use of the helper macros defined in object_model.h
, some extra constants
and types are defined. They represent a small part of the implicit
knownledge missing to the C89 compiler to behave like the C++
compiler.
enum { INDEX(A_dtor) = FIRST_VFUNC, INDEX(A_f ) = AFTER_VFUNC(A_dtor) }; typedef void (TYPE(A_dtor))(A*); typedef void (TYPE(A_f ))(const A*);
The enumerated constants are the indexes of the virtual functions
inside the virtual table of the class. FIRST_VFUNC
is the
index of the first virtual function and AFTER_VFUNC()
returns the index of the virtual function just after the operand
(relative increment).
These indexes are invariant by inheritance and
therefore always known by the compiler from the class definition. For
example, any derived class of A
will use the same index
INDEX(A_f)
to find its own definition of the virtual
function f
inside its virtual table when reinterpreted as
being class A
.
The new type definitions will allow to reinterpret the slots of the virtual table as pointers to functions.
The member functions of class A
are easy to translate to
C. As a general rule, the member functions definitions will stick to
the smallest implementation, including constructors and
destructor.
void A_ctor(register A* const this, long a) { this->vptr = A_vtbl; /* compiler */ this->a = a; } void A_dtor(register A* const this) { this->vptr = A_vtbl; /* compiler */ this->a = 0; } void A_f(register const A* const this) {}
As before, the compiler
comments mark where the compiler
inserts hidden statements. The constructor A_ctor
and the
destructor A_dtor
have to set the pointer
vptr
to point to the virtual table A_vtbl
of
class A
before the user takeover to ensure that the
polymorphic object pointed by this
is exactly of
type A
. Indeed, the constructors and the destructor of
class A
could be invoked on a subobject of type
A
by the constructors and the destructor of a derived
class with a different setting for vptr
. These
assignments could be skipped by the compiler if it can prove that
polymorphism is not used until returning.
C++98 allows a class to declare many constructors [class.ctor] but only one destructor [class.dtor]. The citation
[class.cdtor] will have
heavy consequences on the number of virtual table required per class
since it does not give much freedom other than reassigning the
object's vptr
pointer when entering in its constructors
and destructors.
When a virtual function is called directly or indirectly from a constructor (including member initializers) or from a destructor, and the object to which the call applies is the object under construction or destruction, the function called is the one defined in the constructor or the destructor's own class or in one of its bases, but not a function overriding it in a class derived from the constructor or destructor's class, or overriding it in one of the other bases of the most derived object.
If the virtual function call uses an explicit class member access and the object expression refers to the object under construction or destruction but its type is neither the constructor or destructor's own class or one of its bases, the result of the call is undefined.
The virtual table is the key hidden data structure of a polymorphic
class shared by all its instances through their vptr
pointer. Virtual tables hold different data types in their slots and
every new declaration of virtual function or virtual base make them
growing in one or two directions as shown by the arrows on the right
in the figure 2.1. Slots tagged as
offsets hold a signed integral value which must be added to the
this
pointer to find either the most derived top object
(i.e. negative offset in slot [-2]) or a virtual base subobject
(i.e. positive or negative offsets in slots [-3] to [-n-2]). The
type_info
slot holds the address of the runtime type
information object of the class required by typeid()
and
dynamic_cast<>()
. The remaining slots with positive
indexes hold the addresses of the member functions declared
virtual
or the addresses of an equivalent thunks (see
TODO REFERENCE TO THUNK DEFINITION).
VTABLE
.
If we apply this virtual table layout to class A
(see C++ 2.1), we obtain the table
descibed in C 2.6.
static const VTABLE A_vtbl_[] = { 0, /* A->A */ (VTABLE)A_info, /* type_info */ (VTABLE)A_dtor, /* dtor */ (VTABLE)A_f /* f */ }; static const VTABLE* const A_vtbl = A_vtbl_ + VSTART(NO_VBASE);
The class A
does not have virtual base so its virtual
table A_vtbl_
starts by the offset to top which
is always zero in the class primary virtual table. The
comment A->A
specifies that this offset must be added to
this
to convert the pointee from type A
to
type A
(trivial case). Then follow the address of the
class type_info
and the addresses of the two member
functions declared virtual
.
The pointer A_vtbl
provides direct access to the correct
entry point of the virtual table and avoids the propagation of
VSTART()
adjustment in constructors and destructors. This
slot adjustment depends on the number of virtual base, that is
NO_VBASE
in the case of A
.
The last missing piece of the class is its type_info
hidden object which can only be retrieved through the
typeid()
operator.
typeid
expression is an
lvalue of type const std::type_info
. The lifetime of the
object referred by the lvalue extends to the end of the program. The
top level cv-qualifier of the lvalue expression are always
ignored.
When typeid
is applied to an expression being an lvalue
of a polymorphic type, the result refers to a
type_info
object representing the type of the most
derived object to which the lvalue refers. The expression is
evaluated.
When typeid
is applied to an expression other than an
lvalue of a polymorphic type, the result refers to a
type_info
object representing the static type of
the expression. The expression is not evaluated.
The typeid
operator can be applied either to a
type or to an lvalue expression. It
evaluates (at runtime) its operand only if it is a lvalue
expression refering to a polymorphic type. This behavior can be
compared to the C99 sizeof
operator which evaluates its
operand only if it refers to a dynamic size (i.e. variable length
array).
C++98 defines type_info
as an abstract class
(i.e. virtual destructor) which should be specialized for specific
type (see the C++ Common ABI [1] for
a possible list of specialization).
type_info
describes type
information generated by the implementation. Object of this class
effectively store a pointer to a name for the type, and an encoded
value suitable for comparing two types for equality or collating
order. The names, encoding rule, and collating sequence for types are
all unspecified and may differ between programs.
In principle, equality and collating can be directly achieved using
the address of the class type_info
object. We still have
to store the class name as a string litteral plus some information on
the bases (see C 2.7). The types
type_info
and struct base_info
are defined
in the header object_model.h
.
typedef struct { const char *name; const struct base_info *base; unsigned base_cnt; } type_info; struct base_info { const type_info *base_type; const ptrdiff_t base_offset; const enum { ... } base_flag; };
The type struct base_info
does not exist in C++. It is
only defined for the purpose of the present object model. Its
base_flag
member holds information about the derivation
of the bases like public_base
,
protected_base
, private_base
and
virtual_base
which are required by
dynamic_cast
to behave correctly.
The content of the type_info
object is rather simple
since the class A
does not have any base.
const type_info A_info[] = {{ "A", /* class name */ 0, 0 /* no base */ }};
That it! We have finished to implement the layout of our first class.
the addresses of intermediate some very small pieces of
code called thunks which make the required offset
adjustment (if any) to the this
pointer before calling
the corresponding member function (see TODO REF FOR MORE INFO ON
THUNK).
In type_info: protocol_cnt + protocol
In protocol_info: idem base_info mais offset is within vtable.
protocol_cast<>().
Declaration
Pointer like invocation +
Files
If you want to know more about the existing C++ object models you can read the following references: