Safe Template
Library: what's this, why is this?
C++ is very good language, but if
applied naively it can lead to big problems because of appearance of
latent and difficult-to-find
"fatal" errors. Techically this occurs as
the errors of manipulations with raw pointers and free memory.
The fundamental reason of that is that the
concept of object-oriented programming as usually adopted is
oriented to the classification of objects but not to classification of
relations between objects.
If the relations are not classified properly,
the copying and the deletion of objects could lead to their desintegration
and destruction.
Fortunately, C++ templates allows to solve these problems
by native means of C++.
The relations are conveniently classified as either ownership or
reference to an "alien" object.
Ownership assumes "deep copy semantics" including deep deletion.
The reference assumes that the pointer could be
deleted while the object continues to exist.
The referenced object can be deleted. The dereferencing of the
pointer to deleted object triggers an error (as a rule).
"Owned" object is always allocated in the heap memory.
The referenced object can be allocated in any memory.
The deletion of the reference could also trigger the deletion of object
at the deletion of the last reference to it, if
the pointer is programmed to do so, but this is a rarely used option.
Technically this can be implemented through "smart" pointers,
which are some non-trivial templates.
While the notion of smart pointers was known at the time
of writting this library, an available
package with two corresponding pointers was absent.
The sets of homogeneous
objects are convenient to handle through technique known as template
containers. The individual objects of variable type can be safely
handled through smart pointers. The mixed case of heterogeneous sets
can be handled by mix of these two techniques.
There are many other useful features of
this "safe template library".
For instance there is printing of output lists with indentation
reflecting class aggregation, containers and loops, convenient
assertion checks, etc.
There is a semi-joking sentence
known by experienced programmers, that "every program has an error".
This is especially true for C++ programs written with raw pointers
and direct manipulations with free memory.
This is frequently seen even by unarmed
eye.
Just take any large program with publicly available source text; open
the first class; see if there is non-private
assignment operator (in general it is unimportant whether this operator
is ever called in the current version of the program; if it exists, it
can be called at the next source code modification either by the
original author, or by you, or by the third party; but you could also
look whether and how it is already called and thus whether this going
to be a sleeping or a real error); look if all class data members
(except those which
should not be copied) are listed in it (not necessary pointers); look
if these is defense
against self-assignment; if no, look whether self-assignment can lead
to bad effects such as loss of parts; look if there are raw pointers
among the data members; if yes, look whether the pointee objects are
deleted in destructor; if yes, look where and how they are allocated or
their addresses are just passed; look whether the destructor checks the
pointers for
NULLs; if no, look whether the pointers are always initialized by
addresses of valid objects; if yes, look whether pointers are always
initialized by
addresses of valid objects or NULLs; look whether the assignment
operator deletes the old
objects similarly to destructor (with or without NULL checks) or checks
for their absence at least; check whether the destructor deletes
objects but the assignment copies addresses, but not addressed objects
(thus allowing dangling pointers after deletion of the first copy),
etc., etc.
Even such elementary tests, performed by just looking
into source code of another's program, reveals "sleeping" or potential
errors in somewhat about
each second or each third class in large publicly available programs.
These errors can be called "sleeping" because they are presented in the
source code, but they are not necessary realized in current paths of
calculations.
These conclusions are well
confirmed by automated tests, see one of the papers from
bibliography given in
the preprint of my SIGPLAN paper referenced below. Although most of
the errors we can find are
"sleeping", they eventually exhibit themselves, as proved by running
these
programs in non-standard conditions, running with memory-leak
detectors, running after slight modifications.
The application of SafeTL allowes one to avoid all these problems.
Note that this SafeTL is a part of
wcpplib,
which serves as auxiliary library for HEED
program (as well as in many other research projects which the author
developed in 1999-2007). SafeTL does not have any HEED-specific stuff,
and does not have any physics or mathematics at
all. So this is just general purpose library.
The ideas of safe pointes from SafeTL were first
briefly described in appendix of the preprint
issued in 2001. The appendix
to this preprint is short and it will be repeated here below as a good
introduction. In a more
complete (but not still absolutely complete) form it is published
in ACM SIGPLAN Notices, Vol. 42 (2007), No. 4 (April), Pages 23-31,
see its preprint with shortened title "Raw
pointers comsidered harmful" .
Below the chapter from the appendix of
the old
preprint (2001) is
reproduced. Things are being developed. Some more new remarks
and slight corrections derived from the practical experience are placed
in brackets "<<>>".
Safe Use of Pointers
(chapter from preprint issued in 2001)
The pointers play the principal role in
object-oriented programming in C++. Simultaneously they represent
the most unreliable element of the language. The manipulations
with the pointers do not automatically force adequate manipulations
with addressed objects and
vice versa. If the pointer is used for establishing certain
relations between objects, its inertness can eventually corrupt the
program. However, it turns out that the logic of these
relationships almost always falls into one of two categories, and both
of them may be determined and automated by the well protected template
and regular classes. Although the use of such classes gives
computer some additional work, it reduces time of programming, provides
certain scalability and re-use of the program, and, hence, seems
reasonable.
Shortly, the pointers support such relations between
the objects when one object can access or use another object but the
latter is not the physical or logical component of the first.
Another application of the pointers is the support of polymorphism,
when one object accesses another one whose type is not completely
determined (we mean handling the derived types with virtual functions;
the term polymorphism is sometimes used for other purposes, but here we
use the meaning suggested in [B. Stroustrup. The C++ programming
language.]). What is missed is the
support of polymorphism for the logical components, although such
relation
occurs very frequently.
Moreover, even the supported relation, the reference
to the alien object, is not supported in total and prone to programming
errors, since the deletion of the addressed object does not result in
annulling or clearing
the pointer and in denying the further access to it. The
attempt to simulate the relation of logical inclusion frequently leads
to errors by the same reason. In general, any regular pointer
appearing as the class
member of an application class (except well debugged accessory
library classes) means a programming error, either now or in
future.
This makes the result of computing random, depending on whether the
hardware detects
``segmentation fault'' or what will be found at the place of the
deleted object.
This motivates substitution of the regular pointers,
when they are used as the class members, by the objects of template
classes. Since we have only two main types of relations: the
reference to an alien object and the logical inclusion, we need just
two base templates simulating these relations. As follows from
the discussion above, the first one should just mimic the regular
pointer and protect it from the errors appearing at careless
manipulations with the addressed object.
This type is called the protected pointer and denoted by
"ProtPtr<>"
. <<Practice indicated that this name and also "AutoCont" (see
below)
is not enough expressive. Today I use a set of more pronounced
names "PassivePrt<>" and
"ActivePtr<>">>. The addressed object should
know the addresses of all protected pointers to it and clear them
at its deletion. <<Practice indicated that this is
not efficient and is not necessary. The addressed object could just let
know the address of intermediate reference counter object, through
which all references are performed, and which knows whether the
addressed object exists or not.>> As an optional feature it can
clear them at the substitution of itself by another object through the
operator of
assignment. Technically, the class of the addressed object should be
derived from a special base class called "RegProtPtr", "Register of
Protected
Pointers" <<Today "RegPassivePtr">>.
The second type should support the logical inclusion
of one object into another. If the pointer is copied or deleted either
together with the object to which it belongs or separately, this action
should force similar operation with the addressed object. By the other
words, this pointer should control the addressed object.
Technically, such object should be allocated in the free memory.
To provide correct copying of the objects of derived classes each such
class should define the same virtual member function "copy(void)" which
merely allocates the new exemplar of the object, a copy of itself, in
the free memory. The controlling pointer is denoted by
"AutoCont<>" <<Today "ActivePtr">>.
The protected pointers do not have significant
additional functionality with respect to the regular pointers and
perhaps they do not need special notation in the class diagrams.
The controlling pointers need a special notation. They are marked by
the open crossed box at the side of the addressed object <<These
notations were used in the old preprint. Today, in view of UML,
other notations can be also proposed. In particular, I would rather use
different notations for regular pointers and for passive pointers to
indicate when the relation is not automated and not safe (if expressed
by regular pointer, and I strictly avoid the
use of them as class members). Note that even the newest
UML
[G. Booch, J. Rumbaugh, and I. Jacobson, "The Unified Modeling
Language
User Guide", Second ed. Addison-Wesley, 2005] does not recognize
these
types of relations (polimorphic inclusion and polimorphic
reference
to independent object) and does not provide adequate notations
for
them.>> Thus, if the connection is marked by the
closed
circle at one end and by the closed box at the other end, this means
that
the object of the class marked by the box is the member of the object
of
the class marked by the circle. The same with the open box at the other
end
means, that the object with the circle knows only the address of the
object
with the box, but does not control it. The intermediate crossed
open
box means that the addressed object is logically the member of the
object
with the circle and is controlled by it. In the two latter cases
the
actual type of the addressed object may be either that which is
denoted,
or any derived from it.
Many additional explanations can be found in the manual of
SafeTL.
For users of HEED: you do not need to read this separated manual of
SafeTL. It is formed as a part of HEED manual related to SafeTL,
automatically extracted from Latex source text by a preprocessor, with
some small changes like the change of the title and introduction.
The package "safetl" is available via this
link.
The latest version of safetl is dated 11.12.2008.
List of changes:
03.06.2006: Removal of inline void
PassivePtr::pass(...) since it was
confusing and not necessary
(and it also had misprint, thanks to
Yukihiro Kato <katoy@hep.kindai.ac.jp>)
26.09.2006:
1. Removal of inclusion of
deprecated headers in all the components.
2. Other small corrections which make possible compilation at
Scientific Linux 4.1
3 . Many improvements in wcpplib/safetl/
such as inclusion of pilfer functions in active pointer, arrays and
list.
Also improvenents of indexing or template
arrays in order to increase the speed.
4. Some of the components of wcpplib can be now compiled at Windows
(XP, possibly at others too) by Microsoft Visual Studio C++ (v.
8).
Necessary bat-files for batch compilation are included.
5. Creating manual of SafeTL, which represents some extractions
from manual of HEED, created by a preprocessor.
So some of parts of these manuals are identical and will be changed
synchronously, whereas introductions are different.
04.10.2006:
1. Inclusion of INS_CRETURN macro in array indexing.
2. Removing the printing by default of raw addresses of counter
objects in PassivePtr. Correction of test program for BlkArr.
Both changes allow to avoid platform-dependent differences in output
of test programs.
3. Adding scripts run_single_test.b and run_single_test_vs.bat
for single test and
19.01.2007:
A lot of accumulated (during the time passed) changes in wcppplib.
The changes in wcpplib are mostly in matrix package in order to make it
more efficient (to avoid unnecessary assignments and copyings of
matrices),
and also in safetl to make it more flexible and hopefully safe.
24.01.2007
A small correction.
16.03.2007
Many accumulated improvements, in particular inclusion of indexing
through [][]... in multidimensional arrays.
10.04.2007
Slight corrections in accordance with wcpplib
01.08.2007
Many small corrections.
In particular, switching on by default macro USE_DELETE_AT_ZERO_COUNT
in AbsPtr.h, because there is one application very recently found
for which it possibly makes sense to delete specially marked objects at
zero reference count automatically (although one can bypass this by other
means, but currently it seems that the most simple approach is this).
Note that the whole methodology in particular suggests that shared
pointers are unnecessary and they should not be used. But there is no rules
without exclusions. As we use occasionally goto with much benefit, we can
perhaps use in some very special cases this deletion, which is a characteristic
feature of shared smart pointers. What are that cases, can be shown by the
following. The whole methodology is created for and oriented to ordinary programs
that is transient programs and transient objects. It was never
mentioned in texts (for the sake of briefty) but assumed. If
persistent objects appear, something may be modified. Persistent
objects and various databases are very special
field, in which discussions and accumulating of experience are still in
progress. Many probably redundant things has been proposed in numerous attempts
to marry object-oriented programming and databases. The author leaves this
huge field outside the scope of his research so far. But the application
mentioned above has some relation to persistence. In that application some
objects (histograms) are created in a transient program and handled through
various means including active and passive pointers, their addresses are kept
in the list of passive pointers in a special class-manager and this class
writes all "registered" objects to disk. Then it needs to read them back in
another program and another interactive session and do something with them. One
could create another class-manager with the list of active pointers and
assign the objects read from disk to them; or switch between the two lists
somehow. But the easiest solution seen so far is to use the same class-manager
and the same passive-pointer list, and to make the reading function assign
given_object.s_allow_del_at_zero_count = 1; which make
this object automatically deleted, possibly at the end of session, when the
list itself is deleted. I remark, that this application is very new and
there is no much experience accumulated about it. But it is perhaps
better to switch on the corresponding possibility in the default setup. Note
again that to really make the object self-deleted (at this macro switched on)
one should assign given_object.s_allow_del_at_zero_count
= 1; for this particular object. So in general passive pointers are
remained to be passive pointers and they do not delete anything by default.
17.08.2007 Added two couples of source files, which had
become required because of recent developments but were not included in
the distributed packages.
19.09.2007 Added some option in PassivePtr, which allowes to handle control
variables through bit operations. It was assumed to provide the best
performence and memory needs, but it have not provided it. So this option
is remained for future research, but the default way of working remaned the
same: character-type control variables and function parameters.
07-08.11.2007
Improved manual and reconfigured compilation scripts, for the sake of SafeTL
almost there are no changes.
In the manual now there is a large new chapter devoted to installation.
Now there are a examples of environment setup files,
which you could use as template for editing.
Note that the build program concom is changed (slightly),
so you are advised to reinstall it if you have done that already.
15.11.2007
Compilation scripts improved: main script now better discriminates
output from compilation of separated directories.
30.11.2007
Added convenient functions last_el() to DynLinArr and BlkArr
and removing frightening but wrong comment from file AbsPtr.h right
before the beginning of definition of ActivePtr:
/*
But there are occasions,
when one controlling pointer points to a base class, and another
controlling pointer points to a derived class, and the latter
pointer should has the type of derived class.
In these cases the system cannot be compiled unless the "derived" pointer
refers to different copy function. This is provided by suppling different
class name to the second argument of the following template.
*/
One can supply different class name refering to different copy function,
that's all OK with this. But the todays the only reson for this
is when dealing with
classes supplied by other parties in which another names of copy functions
being inbuilt. Today there is no problems with dealing with arbitrary
complicated tree of derived classes with the same copy function in each.
This wrong comment seems to be written in the former millenium
(the name controlling indicates this) and was kept
in the file due to an oversight.
03.06.2008
The FunNameStack mechanism have now possibility to coexist with
multi-threading. But other classes may not be able to work in multi-threads.
11.12.2008
Still alive. Many small improvements.
17.12.2008
Correction of some technical details in AbsList, which appear to be not
compiled at some more fresh GNU compiler.
26.02.2009
A slight modification of testing file.
28.02.2010
Some small modifications.
Please to not hesitate to inform me if you found some files missing in these
archives. It was reported already many times by different people, and there
are reasons to assume that in previous version some was also missed (due to
some disorder in my scripts by which I compile then).
It can be used according to "GNU Lesser General Public License"
(version 2.1).
Go to
new "softwars" page
Go to main
page
This page was created 16.12.2005.
Last modified 26.02.2009
Last modified 06.11.2015
Igor.Smirnov@cern.ch
Copyright 2005 Igor B. Smirnov