Some semantic notations

"ioX, NbX, NoX, MoX[•], MbX[•], MobX[1:2,•]"

A few specific prefixes are used consistently in the names of variables to recall their meaning (at least this is meant to be so).
X
replaces here an arbitrary string chosen to specify further the variable considered
refers to a dimension (single or multiple … see later).
io
is an ordinal number, characteristic of an ‘order’.
For example io=3 may designate the 3rd element of a list of objects.
Nb
is a cardinal number, characteristic of a ‘quantity’.
For example Nb=50 may designate the quantity of litres of gasoline in the tank of your car while io=3 (the 3rd litre) corresponds clearly to a different concept.
No
is a label, ‘describing’ the object considered.
For practical reasons, this variable is defined as an integer (when needed, other variables of ASCII nature could as well be defined).
Remark:I correlate intentionally the letters ‘o’ and ‘b’ to the words ‘numerO’ and ‘nomBre’ that, respectively, specify ordinal and cardinal quantities in the French language. It is unfortunate that some languages mix up the 2 concepts by using a single word such as ‘number’ or ‘numero’.
Mo[•]
is an array of objects of type ‘No’, while the implicit index is of type ‘io’
Mb[•]
is similarly an array of objects of type ‘Nb’ (implicit index ‘io’) Notice that there might be in some cases a confusion between ‘io’ and ‘No’. Each plane has been given a name (No) through the Detector Data Base (currently the Detector.Data file), usually increasing in each sub-detector along the direction of the beam. The natural succession of integers (1,2,…) has been adopted in the absence of a reason to make things complicated when they need not be. For Drift Chambers a 2-digit code is used instead: 10*I+J where I varies from 1 to 8 for the 8 modules (the large chamber counts for 2) and J increases naturally along the beam starting from 1. Thus ‘No’ has a definite meaning to identify the plane considered. Assume now that one treats a set of points measured in the dEdX sub-detector along a track. A natural running index ‘io’ would run from 1 to 3, and would be equal to ‘No’ (thus they could be confused). It may however happen that no measurement was found in plane No=2, hence io=[1,2] would correspond to No=[1,3]. Such a confusion cannot exist for the Drift Chambers.
Mob[1,•]
is an array of numbers pointing to the variable ‘X[•]’
Mob[2,•]
is an array of cardinal numbers Here the indices ‘1’ and ‘2’ are implicitly correlated to the letters ‘o’ (for ‘numerO=ordinal’) and ‘b’ (for ‘nomBre=cardinal’)
For example, let
i=MobX(1,•)
n=MobX(2,•)
Then
X(i) is the first element considered
X(i+n-1) is the last element considered
These ‘Mob’ concepts are extensively discussed in the rest of this note.


Now why is it useful to formalise all this ?



It became necessary to devise methods of dynamic storage, some 25 odd years ago, when available space memory was expensive while its requested counterpart was growing fast. HYDRA management opened the way, pushed by the strong incentive of Bubble Chamber data.
A successor (extended) version ZEBRA followed (and other similar versions, such as BOSS developed in Hambourg, were developed; I do not intend to be exhaustive on this subject). These programs took care of the data space problem at the cost of a fairly heavy load on users that only large groups could adopt (or rather could not avoid in order to maintain consistency between the contributions of small groups).

FORTRAN90 has in principle adopted some similar structural aspects. The difficulties of putting together the contributions of small groups into ever increasing collaborations (currently LHC) seems to make it necessary to impose more general programming concepts, Object Oriented such as C++, and even a specially LHC adapted species.

I am sure that we should not embark in such adventures (DIRAC is too small…to waste time). Thus I developed a very simple implementation of a dynamical storage that will minimise the recurrent problems of checking all over the program for dimension overflows. In addition, grouping in common arrays objects that are of similar nature, though related to different sub-detectors, helps writing code that is as often as possible detector independent. The trivial idea is based on the variable described above, Mob[1:2,•], which I try to clarify in the following.



  1. Dynamical storage => space saving


  2. Space saving is not anymore a serious objective because the cost of memory is now fairly low. However, for this very reason (!), one is tempted to define (very) large dimensions for all variables in order to be protected against inabilities to process some odd events.

    Let Xi(Ni) [i=1,Nb] be a set of Nb variables which are used to store some parameters characteristic of an event. X1 and X2 could represent ADC signals from the first 2 planes of the dEdX sub-detector, definitely concepts of a common nature. X3 might represent ADC signals from Vertical Hodoscopes, again similar concepts though from a different sub-detector. X4 might represent TDC information, that is certainly a different concept (whatever the sub-detector).
    Given this, one needs to foresee each of the corresponding dimensions (here N1 to N4) to the maximum value it can reach in any event. Assume that one has the possibility to store the relevant event information successively into the variables X1, X2, X3 and X4. This means completing the filling of X1 before starting that of X2, which implies some reasonable ordering of the input data (this IS NOT necessarily the case, it is however for DIRAC … as far as I know). The sizes of the slots needed to store each of these variables, say J1 to J4, are by construction upwards bounded by the corresponding dimensions, N1 to N4. The idea is to consider rather a new variable, say X, into which one will store successively the variables of the example above, that is X1 to X4. The overall size will be in this case [J1+J2+J3+J4] rather than [N1+N2+N3+N4]. To be precise, X must be given a dimension in the program that is Max [Sum {Ji}] rather than Sum [Max {Ji}], the latter being in general smaller than the former (i.e. strictly not larger). This implies evidently to record somewhere at which address in X[•] starts the information corresponding to Xi and how many successive words (i.e. Ji) are allocated to it. This is where MobX(1:2,•) enters into the game: the value 1 of the first index gives the address (in X) and the value of the second gives the size used. For example in the simple case mentioned above Xi starts at X(MobX(1,i)) and spans MobX(2,i) objects.
    Notice that it might happen that one variable needs more space than originally foreseen, due for example to local errors on the original data. Minor overflows (over expected maximal dimension) could still be handled by this method.


  3. Commodity of space sharing


  4. The maximum allowed number of dimensions varies with compilers, but large values (4 IS already large) are normally avoided because they generate catastrophic speed problems.

    The dimension of X[•], represented so far by the shorthand "•", may accommodate some reasonably arbitrary natures. X(i:j), where i,j are ordered (ižj) signed integers, is 1-dimensional, but X(1:3,-7:25), 2-dimensional, is as well valid for the discussion considered here (and more dimensions are still acceptable). However objects stored in series into X[•] MUST share the same dimensions after the first one (when many are used). Indeed MobX(1,i), an address, refers implicitly to the first dimension of X[•], the other dimensions MUST be common to all "i". Otherwise MobX would have to be a more complex object.
    The first index may be implicitly matched to multiple dimensions, which might turn out to be tricky. For example one may store a set of points, intersections of a track with a set of sub-detectors, using either XYZ(i) or X(i), Y(i), Z(i). MobXYZ or MobX, MobY, MobZ would give access to these variables. The second index, j, of these Mob would represent the sub-detector. Mob(1,j) would point into the arrays while Mob(2,j) would indicate the number of successive information per sub-detector. NOTICE that in the XYZ example Mob(2,j) would be a multiple of 3 (and consequently i=Mob(1,j) would vary by multiples of 3). Depending of the problem considered one may have to choose between the solutions XYZ or X,Y,Z. This is a VERY simple case, that could be solved much more easily by using XYZ(3,i) instead of XYZ(i). The real (non trivial) interest of the method is when the ‘3’ mentioned above is i-dependent. For example dEdX measures a slab number and an ADC (2 information, that may later turn out to be 3 if one observes TDC as well), while Drift Chambers measure a wire number and a drift time.
    Notice that there is a definite interest to process information, when possible, in a unified way by storing information from different sub-detectors into a common array (or set of) thus allowing to use common procedures. In contrast, manipulating variables whose names are sub-detector dependent require explicitly different code elements even in the case where the underlying data processing could be made common (imagine a global fit of all information on a track!). Keep in mind that duplication of code elements, with slight modifications in the variable names, is a nightmare for maintenance when bugs (or improvements) require an action.

A practical application



At the early stage of the program I devised a structure to store information from Monte Carlo results together with a ‘unified’ way to represent the detector parameters I needed. This part of the code exists on the current Pam file.
There is a part that I devised in order to have a primitive display of the events. Pattern recognition was made in all parts of the detector in an utterly brutal way (ALL combinations are considered) but this is unimportant because I meant to look only at a few hundreds of events.

Track fitting is a bit smarter (though restricted to straight lines) but done only piecewise (projections). I had thus enough information to display events (measurements and tracks). The measurements (Monte Carlo!) were stored into a dynamical structure (routine GetMeasFromMC) which I later transposed into GetMeasFromData that is operational for real data (it is an interface to the decoding routines of Valeri). This same structure is used in the non-Mac event display developed by J.-L. Narjoux.

The geometrical characteristics of the detector are defined in terms of planes (Cherenkov counters are so far considered as ‘planes at entrance and exit windows’). A set of Global planes is defined at a fairly fundamental level (routine RefFramesDefine). A different set concerns so-called Measurement planes (routine InitReconst) that is currently hard coded in terms of ‘projections’ (locally defined as Classes). This part could be generalised (i.e. adjustable from outside the program, avoiding recompilation) if felt useful. Finally there are links between these objects and variables whose names that ultimately must be detector dependent.

A label had been associated to each sub-detector (the vague ‘number’ English concept). In principle this is an identifier ‘No’, but one may consider it an ‘io’ because successive integers were mostly used along the beam (1 has been used for the target, thus the first sub-detector, MSGC, is labelled 2):
  1. Target
  2. MSGC
  3. Scintillating Fibres
  4. Ionisation Detector
  5. Drift Chambers
  6. Vertical Hodoscopes
  7. Horizontal Hodoscopes
  8. Pre Shower counters
  9. Muon counters
  10. Cherenkov

Unfortunately no information was available at that time for the Cherenkov, hence it was forgotten in the list of labels that was defined according to the detector’s position along the beam. In addition, other sub-detectors are already considered to be added in the future so that one cannot rely on this implicit rule, thus you better get used to make the difference between the two concepts ‘io’ and ‘No’.


Indices used here (MSGC is taken as an example, where the detector name is used):


General links:
Sub-detector dependent (here MSGC taken as an example):