Recent developments in neBEM

During the course of this work, the neBEM toolkit has been improved significantly. The major challenge in these developments has been to increase the efficiency of the solver, while maintaining its precision.

cm Code parallelization: The Open Multi-Processing (OpenMP) is an Application Programming Interface (API) that supports multi-platform shared memory multiprocessor programming in C, C++ and Fortran on most processor architectures and operating systems. It consists of a set of compiler directives, library routines and environment variables that influence run-time behavior. It uses a simple, scalable API for developing parallel applications on platforms ranging from the standard desktop to supercomputers. Recently, we have successfully implemented OpenMP for the neBEM field solver. The parallelization has been implemented in several computation-intensive sub-functions of the toolkit, such as computation of the influence coefficient matrix, matrix inversion and evaluation of field and potential at desired locations. These routines are computation intensive since there can be thousands of elements where charge densities need to evaluated / influence due to all these elements need to be taken care of. The matter is even more complicated through the use of repetition of the basic structure in order to conform to the real geometry of a detector. This has proved to be very important in improving the computational efficiency of the solver. We have tested these implementations on 4, 6, 8 and 16 cores. The observed reduction in computational time has been found to be significant while the precision of the solution has been found to be preserved. In the following table 1, we present a comparison of the time taken to solve charge densities for two typical problems involving 3000 elements and 20 repetitions and 10,000 elements and 10 repetitions of the basic structure. In the next table 2, we present the time taken to generate a three-dimensional-map of the potential and three components of the electric field for the same device. It may be noted here that the user inputs related to invocation of OpenMP during a specific solution is passed to neBEM via a file (named, neBEMProcess.inp) residing in the directory from where Garfield is being executed.

**Table 1:** Computational time for calculation of charge density using code parallelization
Problem	Thread	Thread	Thread	Thread	Thread	Thread
specification	1	2	4	6	8	16
Element number
= 3489	6 m 20 s	4 m 18 s	3 m 29 s	3 m 7 s	3 m 20 s	5 m 2 s
Periodicity = 20
Element number
= 10683	64 m 51 s	35 m 43 s	28 m 53 s	34 m 6 s	36 m 49 s	47 m 13 s
Periodicity = 10

**Table 2:** Computational time for calculation of potential and field map using code parallelization
Problem	Thread	Thread	Thread	Thread	Thread	Thread
specification	1	2	4	6	8	16
Element number
= 3489	30 m 57 s	30 m 24 s	32 m 1 s	30 m 36 s	31 m 35 s	31 m 42 s
Periodicity = 20
Element number
= 10683	26 m 23 s	25 m 53 s	25 m 56 s	27 m 51 s	28 m 19 s	29 m 59 s
Periodicity = 10

cm Reduced order modeling: Reduced order modeling (ROM) is a concept that is quite commonly used in numerical simulation of complex physical systems such as turbulent fluid flow, plasma dynamics etc. The idea is simple and essentially maintains the details of modeling of physical phenomena to an optimum level. A similar approach, when applied only to spatial discretization of a problem is called adaptive meshing. In the latter, the solution is usually attempted at a given spatial discretization and the solver is expected to increase or decrease the meshing to meet the desired accuracy specifications. For neBEM, we have presently implemented an algorithm which allows us to ignore the finer variations of charge densities on a primitive provided (i) it is not on the base device (as opposed to repetitive virtual devices generated in order to simulate periodic nature of a detector geometry) and (ii) it is at a far enough location so that the influence of the average charge density on the primitive is equivalent to the influence that is estimated preserving the real charge density variation on the primitive. It may be mentioned here that this order reduction in the charge density variation is implemented only at the evaluation stage, and not while actually computing the charge densities on each of the elements. Although the ROM algorithm is implemented only for periodic geometries, at present, it can be very useful also in non-periodic geometries. Moreover, there is no reason to stop the order reduction at the primitive level. It can continue through merging of original primitives to larger ones and even to lumping of several primitives into a component of the complete device, where the average charge density is assumed to be representative of the component itself.

The user input for controlling the ROM level is done through the same neBEMProcess.inp file that was mentioned above. The parameter primAfter=5 in the input file indicates the solver to ignore the elements in a primitive that is situated on a structure beyond the fifth repetition of the base device. I have presented the effect using different values of primAfter on a typical problem having 2000 number of elements and 60 peridicities of basic structure. In the present case, it can be seen that setting primAfter = 5 has negligible effect on the evaluated potential and field for this device. In the following table 3, I have presented the time taken to generate a map of potential and field using different levels of primAfter and the errors associated.

**Table 3:** Computational time for calculation of charge density, potential and field map and error estimation using ROM
PrimAfter	Time for charge density	Time for potential and field map	Error
0	3 m 25 s	141 m 2 s
2	4 m 42 s	27 m 59 s	0.5 4#4
5	4 m 27 s	52 m 56 s	0.3 4#4
10	3 m 26 s	71 m 28 s	0.1 4#4

Fast Volume: As is expected, the time to estimate potential and field for a complex device is significant. This is especially true if the device is composed of hundreds of primitives, thousands of elements and hundreds of repetitions. Reduction of time taken to estimate the electrostatic properties becomes increasingly important when complex processes such as Avalanche, Monte-Carlo tracking and Micro-Tracking are being modelled. In order to model these phenomena within a reasonable span of time, we have implemented the concept of using pre-computed values of potential and field at large number of nodal points in a set of suitable volumes. These volumes are chosen such that they can be repeated to represent any region of a given device and simple trilinear interpolation is used to find the properties at non-nodal points. The associated volume is named as the Fast Volume and the inputs related to this volume are provided via an input file (FastVol.inp) residing in the directory from which Garfield is being executed. It may be noted here that staggered volumes are allowed (takes care of GEM and other similar structures), it is possible to omit parts of a FastVol from being computed (inside a dielectric, or for other reasons) and to ignore computed FastVol values in certain regions so that the more complete and accurate evaluation is used for points in those regions. In order to preserve accuracy despite the use of trilinear interpolation, it is natural that the nodes should be chosen such that they are sparse in regions where potential and fields are changing slowly and closely packed where these properties are changing fast. Moreover, the singular surfaces and edges should be avoided as much as possible to coincide with the nodes since very sharp gradients are found to occur in these regions which are very unlikely to be correctly modelled under the assumption of linear variations. In the Tavle 4, potential and fields estimated by direct evaluation and those using FastVol have been compared. The maximum difference between the two estimates has been found to be 0.3 4#4 which is very small and its effect on the modelling of avalanche etc has been found to negligible.

**Table 4:** Effect of FastVol
	Without FastVol	With FastVol
Computation time for charge density	15 s	5 m 16 s
		(includes calculation of FastVol)
Computation time for field map	6 m 33 s	1 s
Error in electric field		0.3 4#4
Computation time for 10 drift line	7 m 54 s	2 s
Value of avalache electrons
Computation time for 100 avalanche		21 s

3d Map for Garfield++: Garfield++ imports map from field solvers such as ANSYS, CST etc. In order to equip neBEM with similar map generation capability, appropriate functions have been written. The functions can compute the map utilizing multiple CPUs in a given compute node using the OpenMP protocol. As a result, the map is computed within a reasonable amount of time. The map file is written in text, resulting in large file sizes. In this version, the generated map has a fixed mesh size throughout the computational volume. Flexibility in this respect is exppected in near future versions which will lead to smaller files and faster computation. Please go through the discussion related to FastVol above, in order to get a more clear idea of the approach adopted for generating the map and the precautions to be taken. The inputs determining the map is given via neBEMMap.inp. The inputs are quite straight-forward. Please note that the present version of the map is 0.1, indicating a rather early release. There are two output files in the BoundaryCondition (BC) directory (in Outputs/Model/Mesh): MapInfo.out and MapFPR.out. These files need to be read by the Garfield++ script in order to import data related to potential, field and region. A Fortran Garfield script and related files are made available to the user (FieldMapforGEM.tgz). A Garfield++ script that uses the resulting field map is also provided (GEM3dMap.tgz).

For precise and efficient computation of electrostatic field configuration within a given detector geometry, it is often necessary to optimize the numerical model of the detector. Otherwise, the computation may become unnecessarily detailed on one hand, and on the other, it may lose the accuracy necessary to follow the complicated physics processes occurring inside the detector volume. Satisfaction of parallel plate condition: In most of the detector geometries considered in this work, the gaps in the detectors are rather small in comparison to the size of the foil or mesh. For example, for a Micromegas detector, the size of the mesh is 10cm by 10cm, while the amplification gap is 5#5 and drift gap is 6#61cm. As a result, while modelling the characteristics of the detector, this feature needs to be preserved. However, modelling the full 10cm by 10cm geometry is essentially a waste of computational effort since very little happens beyond the middle of the detector (unless we are interested in the edge effects, in particular). So, we have tried to strike a balance between computational precision and computational effort by optimizing (a) placement of drift, and (b) choice of the number of repetitive structures beyond a base device model. For example, to model the above Micromegas detector, we have considered the length of the basic cell structure to be 7#7 both in X and Y-Axis. Now the choice of the number of repetitive structure would be such that the parallel plate condition is statisfied. As shown in the Figure 1, when the length of X and Y-Axis is 5 times the drift length, the field is smooth throughout the volume. It can be observed that instead of making the drift gap to be 1.2cm, as in the experiment, it is possible to reduce it to 0.1 cm and still get the same variations for the two fields in question. In the experimental condition, the mesh and drift voltage have been given to -410 and -650 V, respectively. In order to maintain the correct value of the drift field, the potential at the drift plane in the latter case has been adjusted to -427 volts instead of the experimentally applied voltage.

**Figure 1:** The axial electric field for different
[] []

Wire modelling - thin wire versus polygonal cylinder: Better wire modelling is achieved if the wire is represented as a cylinder whose cross-section is a polygon. The accuracy improves when the polygon matches the cross-section of the real wire. However, computational effort increases substantially as more sides are added to the polygon since each additional side adds one more surface primitive to the problem. A thin wire model is the other extreme of the representation. In this case, it is assumed that the real wire can be replaced by a line charge situated at the axis of the real wire. The potential boundary condition is satisfied at the wire surface, but imposes a cylindrical symmetry at a distance equal to the radius of the real wire. When cylindric elements are used, the voltage boundary condition is applied to each surface panels of the cylinder. Also, the thin-wire approximation neglects the dipole moment, created to ensure an equal potential on both surfaces. The wire element approach, which has the convenience of a much less computational effort, is acceptable only when the asymmetry in the potential distribution around the wire is not very large. In the Figure 2 we have presented the consequences of using the two models in order to represent the same experimental problems. Here, the micromesh of a Micromegas detector has been modelled using wire elements and cylindrical element respectively. The potential contour in the drift and amplification region, using these two elements, are shown in the Figure 2. As illustrated, the calculations using thin-wire approximation affects the potential.

**Figure 2:** The potential contour for wire element at (a)200 V/cm and (c) 2000 V/cm, for cylindrical element at (b) 200 V/cm and (d) 2000 V/cm
[] [] [] []