safetl.html

Safe Template Library: what's this, why is this?

      C++ is very good language, but if applied naively it can lead to big problems because of appearance of latent and difficult-to-find "fatal" errors. Techically this occurs as the errors of manipulations with raw pointers and free memory. The fundamental reason of that is that the concept of object-oriented programming as usually adopted is oriented to the classification of objects but not to classification of relations between objects. If the relations are not classified properly, the copying and the deletion of objects could lead to their desintegration and destruction.

      Fortunately, C++ templates allows to solve these problems by native means of C++.

      The relations are conveniently classified as either ownership or reference to an "alien" object. Ownership assumes "deep copy semantics" including deep deletion. The reference assumes that the pointer could be deleted while the object continues to exist. The referenced object can be deleted. The dereferencing of the pointer to deleted object triggers an error (as a rule). "Owned" object is always allocated in the heap memory. The referenced object can be allocated in any memory. The deletion of the reference could also trigger the deletion of object at the deletion of the last reference to it, if the pointer is programmed to do so, but this is a rarely used option.

      Technically this can be implemented through "smart" pointers, which are some non-trivial templates. While the notion of smart pointers was known at the time of writting this library, an available package with two corresponding pointers was absent.

      The sets of homogeneous objects are convenient to handle through technique known as template containers. The individual objects of variable type can be safely handled through smart pointers. The mixed case of heterogeneous sets can be handled by mix of these two techniques.

      There are many other useful features of this "safe template library". For instance there is printing of output lists with indentation reflecting class aggregation, containers and loops, convenient assertion checks, etc.

      There is a semi-joking sentence known by experienced programmers, that "every program has an error".

      This is especially true for C++ programs written with raw pointers and direct manipulations with free memory. This is frequently seen even by unarmed eye. Just take any large program with publicly available source text; open the first class; see if there is non-private assignment operator (in general it is unimportant whether this operator is ever called in the current version of the program; if it exists, it can be called at the next source code modification either by the original author, or by you, or by the third party; but you could also look whether and how it is already called and thus whether this going to be a sleeping or a real error); look if all class data members (except those which should not be copied) are listed in it (not necessary pointers); look if these is defense against self-assignment; if no, look whether self-assignment can lead to bad effects such as loss of parts; look if there are raw pointers among the data members; if yes, look whether the pointee objects are deleted in destructor; if yes, look where and how they are allocated or their addresses are just passed; look whether the destructor checks the pointers for NULLs; if no, look whether the pointers are always initialized by addresses of valid objects; if yes, look whether pointers are always initialized by addresses of valid objects or NULLs; look whether the assignment operator deletes the old objects similarly to destructor (with or without NULL checks) or checks for their absence at least; check whether the destructor deletes objects but the assignment copies addresses, but not addressed objects (thus allowing dangling pointers after deletion of the first copy), etc., etc.

      Even such elementary tests, performed by just looking into source code of another's program, reveals "sleeping" or potential errors in somewhat about each second or each third class in large publicly available programs. These errors can be called "sleeping" because they are presented in the source code, but they are not necessary realized in current paths of calculations. These conclusions are well confirmed by automated tests, see one of the papers from bibliography given in the preprint of my SIGPLAN paper referenced below. Although most of the errors we can find are "sleeping", they eventually exhibit themselves, as proved by running these programs in non-standard conditions, running with memory-leak detectors, running after slight modifications.

      The application of SafeTL allowes one to avoid all these problems.

      Note that this SafeTL is a part of wcpplib, which serves as auxiliary library for HEED program (as well as in many other research projects which the author developed in 1999-2007). SafeTL does not have any HEED-specific stuff, and does not have any physics or mathematics at all. So this is just general purpose library.

      The ideas of safe pointes from SafeTL were first briefly described in appendix of the preprint issued in 2001. The appendix to this preprint is short and it will be repeated here below as a good introduction. In a more complete (but not still absolutely complete) form it is published in ACM SIGPLAN Notices, Vol. 42 (2007), No. 4 (April), Pages 23-31, see its preprint with shortened title "Raw pointers comsidered harmful" .

      Below the chapter from the appendix of the old preprint (2001) is reproduced. Things are being developed. Some more new remarks and slight corrections derived from the practical experience are placed in brackets "<<>>".

Safe Use of Pointers (chapter from preprint issued in 2001)

    The pointers play the principal role in object-oriented programming in C++. Simultaneously they represent the most unreliable element of the language. The manipulations with the pointers do not automatically force adequate manipulations with addressed objects and vice versa. If the pointer is used for establishing certain relations between objects, its inertness can eventually corrupt the program. However, it turns out that the logic of these relationships almost always falls into one of two categories, and both of them may be determined and automated by the well protected template and regular classes. Although the use of such classes gives computer some additional work, it reduces time of programming, provides certain scalability and re-use of the program, and, hence, seems reasonable.

    Shortly, the pointers support such relations between the objects when one object can access or use another object but the latter is not the physical or logical component of the first. Another application of the pointers is the support of polymorphism, when one object accesses another one whose type is not completely determined (we mean handling the derived types with virtual functions; the term polymorphism is sometimes used for other purposes, but here we use the meaning suggested in [B. Stroustrup. The C++ programming language.]). What is missed is the support of polymorphism for the logical components, although such relation occurs very frequently.

    Moreover, even the supported relation, the reference to the alien object, is not supported in total and prone to programming errors, since the deletion of the addressed object does not result in annulling or clearing the pointer and in denying the further access to it. The attempt to simulate the relation of logical inclusion frequently leads to errors by the same reason. In general, any regular pointer appearing as the class member of an application class (except well debugged accessory library classes) means a programming error, either now or in future. This makes the result of computing random, depending on whether the hardware detects ``segmentation fault'' or what will be found at the place of the deleted object.

    This motivates substitution of the regular pointers, when they are used as the class members, by the objects of template classes. Since we have only two main types of relations: the reference to an alien object and the logical inclusion, we need just two base templates simulating these relations. As follows from the discussion above, the first one should just mimic the regular pointer and protect it from the errors appearing at careless manipulations with the addressed object. This type is called the protected pointer and denoted by "ProtPtr<>" . <<Practice indicated that this name and also "AutoCont" (see below) is not enough expressive. Today I use a set of more pronounced names "PassivePrt<>" and "ActivePtr<>">>. The addressed object should know the addresses of all protected pointers to it and clear them at its deletion. <<Practice indicated that this is not efficient and is not necessary. The addressed object could just let know the address of intermediate reference counter object, through which all references are performed, and which knows whether the addressed object exists or not.>> As an optional feature it can clear them at the substitution of itself by another object through the operator of assignment. Technically, the class of the addressed object should be derived from a special base class called "RegProtPtr", "Register of Protected Pointers" <<Today "RegPassivePtr">>.

    The second type should support the logical inclusion of one object into another. If the pointer is copied or deleted either together with the object to which it belongs or separately, this action should force similar operation with the addressed object. By the other words, this pointer should control the addressed object. Technically, such object should be allocated in the free memory. To provide correct copying of the objects of derived classes each such class should define the same virtual member function "copy(void)" which merely allocates the new exemplar of the object, a copy of itself, in the free memory. The controlling pointer is denoted by "AutoCont<>" <<Today "ActivePtr">>.

    The protected pointers do not have significant additional functionality with respect to the regular pointers and perhaps they do not need special notation in the class diagrams. The controlling pointers need a special notation. They are marked by the open crossed box at the side of the addressed object <<These notations were used in the old preprint. Today, in view of UML, other notations can be also proposed. In particular, I would rather use different notations for regular pointers and for passive pointers to indicate when the relation is not automated and not safe (if expressed by regular pointer, and I strictly avoid the use of them as class members). Note that even the newest UML [G. Booch, J. Rumbaugh, and I. Jacobson, "The Unified Modeling Language User Guide", Second ed. Addison-Wesley, 2005] does not recognize these types of relations (polimorphic inclusion and polimorphic reference to independent object) and does not provide adequate notations for them.>>   Thus, if the connection is marked by the closed circle at one end and by the closed box at the other end, this means that the object of the class marked by the box is the member of the object of the class marked by the circle. The same with the open box at the other end means, that the object with the circle knows only the address of the object with the box, but does not control it. The intermediate crossed open box means that the addressed object is logically the member of the object with the circle and is controlled by it. In the two latter cases the actual type of the addressed object may be either that which is denoted, or any derived from it.

Many additional explanations can be found in the manual of SafeTL.
For users of HEED: you do not need to read this separated manual of SafeTL. It is formed as a part of HEED manual related to SafeTL, automatically extracted from Latex source text by a preprocessor, with some small changes like the change of the title and introduction.
The package "safetl" is available via this link.

The latest version of safetl is dated 11.12.2008.

List of changes:

03.06.2006: Removal of inline void PassivePtr::pass(...) since it was confusing and not necessary
(and it also had misprint, thanks to Yukihiro Kato <katoy@hep.kindai.ac.jp>)

26.09.2006:
1. Removal of inclusion of deprecated headers in all the components.

2. Other small corrections which make possible compilation at Scientific Linux 4.1

3 . Many improvements in wcpplib/safetl/ such as inclusion of pilfer functions in active pointer, arrays and list.
Also improvenents of indexing or template arrays in order to increase the speed.

4. Some of the components of wcpplib can be now compiled at Windows (XP, possibly at others too) by Microsoft Visual Studio C++ (v. 8).
Necessary bat-files for batch compilation are included.

5. Creating manual of SafeTL, which represents some extractions from manual of HEED, created by a preprocessor.
So some of parts of these manuals are identical and will be changed synchronously, whereas introductions are different.

04.10.2006:

1. Inclusion of INS_CRETURN macro in array indexing.

2. Removing the printing by default of raw addresses of counter
objects in PassivePtr. Correction of test program for BlkArr.
Both changes allow to avoid platform-dependent differences in output
of test programs.

3. Adding scripts run_single_test.b and run_single_test_vs.bat
for single test and

19.01.2007:

A lot of accumulated (during the time passed) changes in wcppplib.
The changes in wcpplib are mostly in matrix package in order to make it more efficient (to avoid unnecessary assignments and copyings of matrices),
and also in safetl to make it more flexible and hopefully safe.

24.01.2007

A small correction.

16.03.2007

Many accumulated improvements, in particular inclusion of indexing through [][]... in multidimensional arrays.

10.04.2007

Slight corrections in accordance with wcpplib

01.08.2007

Many small corrections.
In particular, switching on by default macro USE_DELETE_AT_ZERO_COUNT in AbsPtr.h, because there is one application very recently found for which it possibly makes sense to delete specially marked objects at zero reference count automatically (although one can bypass this by other means, but currently it seems that the most simple approach is this). Note that the whole methodology in particular suggests that shared pointers are unnecessary and they should not be used. But there is no rules without exclusions. As we use occasionally goto with much benefit, we can perhaps use in some very special cases this deletion, which is a characteristic feature of shared smart pointers. What are that cases, can be shown by the following. The whole methodology is created for and oriented to ordinary programs that is transient programs and transient objects. It was never mentioned in texts (for the sake of briefty) but assumed. If persistent objects appear, something may be modified. Persistent objects and various databases are very special field, in which discussions and accumulating of experience are still in progress. Many probably redundant things has been proposed in numerous attempts to marry object-oriented programming and databases. The author leaves this huge field outside the scope of his research so far. But the application mentioned above has some relation to persistence. In that application some objects (histograms) are created in a transient program and handled through various means including active and passive pointers, their addresses are kept in the list of passive pointers in a special class-manager and this class writes all "registered" objects to disk. Then it needs to read them back in another program and another interactive session and do something with them. One could create another class-manager with the list of active pointers and assign the objects read from disk to them; or switch between the two lists somehow. But the easiest solution seen so far is to use the same class-manager and the same passive-pointer list, and to make the reading function assign given_object.s_allow_del_at_zero_count = 1; which make this object automatically deleted, possibly at the end of session, when the list itself is deleted. I remark, that this application is very new and there is no much experience accumulated about it. But it is perhaps better to switch on the corresponding possibility in the default setup. Note again that to really make the object self-deleted (at this macro switched on) one should assign given_object.s_allow_del_at_zero_count = 1; for this particular object. So in general passive pointers are remained to be passive pointers and they do not delete anything by default.

17.08.2007 Added two couples of source files, which had become required because of recent developments but were not included in the distributed packages.

19.09.2007 Added some option in PassivePtr, which allowes to handle control variables through bit operations. It was assumed to provide the best performence and memory needs, but it have not provided it. So this option is remained for future research, but the default way of working remaned the same: character-type control variables and function parameters.

07-08.11.2007 Improved manual and reconfigured compilation scripts, for the sake of SafeTL almost there are no changes.
In the manual now there is a large new chapter devoted to installation.
Now there are a examples of environment setup files, which you could use as template for editing.
Note that the build program concom is changed (slightly), so you are advised to reinstall it if you have done that already.

15.11.2007 Compilation scripts improved: main script now better discriminates output from compilation of separated directories.

30.11.2007 Added convenient functions last_el() to DynLinArr and BlkArr and removing frightening but wrong comment from file AbsPtr.h right before the beginning of definition of ActivePtr: /* But there are occasions, when one controlling pointer points to a base class, and another controlling pointer points to a derived class, and the latter pointer should has the type of derived class. In these cases the system cannot be compiled unless the "derived" pointer refers to different copy function. This is provided by suppling different class name to the second argument of the following template. */
One can supply different class name refering to different copy function, that's all OK with this. But the todays the only reson for this is when dealing with classes supplied by other parties in which another names of copy functions being inbuilt. Today there is no problems with dealing with arbitrary complicated tree of derived classes with the same copy function in each. This wrong comment seems to be written in the former millenium (the name controlling indicates this) and was kept in the file due to an oversight.

03.06.2008
The FunNameStack mechanism have now possibility to coexist with multi-threading. But other classes may not be able to work in multi-threads.

11.12.2008 Still alive. Many small improvements.

17.12.2008 Correction of some technical details in AbsList, which appear to be not compiled at some more fresh GNU compiler.

26.02.2009 A slight modification of testing file.

28.02.2010 Some small modifications.

Please to not hesitate to inform me if you found some files missing in these archives. It was reported already many times by different people, and there are reasons to assume that in previous version some was also missed (due to some disorder in my scripts by which I compile then).

It can be used according to "GNU Lesser General Public License" (version 2.1).

Go to new "softwars" page
Go to main page

This page was created 16.12.2005.
Last modified 26.02.2009
Last modified 06.11.2015

Igor.Smirnov@cern.ch