Monotonic Lagrangian Grid User Guide

David Fyfe, Ronald Kolbe, Jay Boris, Robert Sinkovits
Laboratory for Computational Physics and Fluid Dynamics
Naval Research Laboratory
Washington, DC 20375-5320


Table of Contents





I. MLG Data Structure



I.1. Philosophy and Motivation for the MLG Data Structure

The Monotonic Lagrangian Grid (MLG) data structure was developed for molecular dynamics to simplify the required N body force calculations[1]. For N independent objects, there exist the possibility for N(N-1)/2 interactions; this is called the N2 problem. However, for most problems only the strong or "near neighbor" interactions need to be considered. For most "near neighbor" algorithms this requires that a list of objects near in space be kept and updated when the objects move. The MLG data structure performs the task of organizing these N objects so that near neighbors can be found without the need for near neighbor lists.

The MLG data structure is a dynamic data structure that maps objects in physical space into computer memory with well-defined memory offsets for near neighbors. By mapping nearby objects into contiguous memory locations, the data structure allows the vectorization and parallelization of algorithms for the generation, maintenance and search of the data. Data sets can be partitioned and merged easily, which is especially important for distributed and parallel algorithms. Finally, small changes in the data, such as those that would occur for a spatial position during a particle dynamics simulation, do not trigger global changes in the MLG data structure. While the MLG data structure was originally developed for 3D spatial based data, its application can be expanded easily to incorporate other types of data in an arbitrary number of dimensions.

Motivation for the general definition of an MLG can be obtained by discussing its original 3D spatial definition [1]. Suppose N = Nx Ny Nz objects (particles) are moving in space. Because of the form of N, it is natural to reference each object with three subscripts (i,j,k). For lattice particle simulations it is also natural to associate each subscript with a physical dimension. Each of the objects will have three physical coordinates X(i,j,k), Y(i,j,k), and Z(i,j,k) as well as other data, (see figure 1.).

[Object Record Image]
Figure 1. Object Data Record as Stored in Computer Memory


The MLG data structure orders the N objects such that the X positions of all the objects increase monotonically with index i, the Y positions of all objects increase with index j, and the Z positions of all objects increase monotonically with index k.

Mathematically, the N objects in three dimensional space with locations, X(i, j, k), Y(i, j, k), and Z(i, j, k), constitute an MLG if and only if

Given N random locations, the spatial lattice defined by an MLG is irregular. When the N object locations satisfy Eqs. (1) and any additional constraints or relations specifying other than infinite-space boundary conditions, they are in "MLG order." This ordering is useful because the direction for going from one node to another in space and in the MLG is the same. Further, nodes which lie between two nodes in space will also be between them in the MLG. Thus, neighbors in real space also have neighboring address indices in the MLG.

I.2. Constructing an MLG Data Structure

Reference [1] also provides an existence proof for an MLG in the form of a construction procedure for any data. The basic procedure is to first sort all objects according to one of the relations of the MLG. One then partitions the objects into subsets and sorts each subset according to one of the other relations. One does this partitioning and sorting recursively until each relation is satisfied. For previous example, with the 3D spatial data, one first sorts the objects by their Z coordinate. In this sorted list, one assigns the first Nx Ny objects the index k=1, the next Nx Ny objects the index k=2, an so on. For each of the Nx Ny planes, one sorts the objects by their Y coordinate and assigns the first Nx objects of each of these sorted lists the index j=1 and so on. Finally one sorts each of the Ny Nz subsets of Nx objects assigning the first object in each subset the index i=1, the second object the index i=2, etc. This MLG construction is only as efficient as each of the individual sorts for each relation. If the 1D sorts are O(N log N), then the above construction algorithm is O(N log N). A pictorial view of this construction algorithm in two dimensions for a 4x4 MLG on 16 objects was given by Picone et al [2]. The 16 objects, lettered from A to P in Fig. 2, have associated with them a position (X,Y) in a 2D space. The previous construction algorithm proceeds as follows:



Figure 2. Two Dimensional Space of 16 Objects

  1. Order all 16 objects according to increasing y-coordinate, that is from lowest to highest y-value. In Fig. 2 these are labeled A through P.



    Figure 3. Two Dimensional Space of 16 Objects in Groups of 4 by Y

  2. Give the objects with the four lowest y-values (that is objects A, B, C, and D a j-index of one. Give those with the next four lowest y-values (E, F, G, and H) a j-index of two and so on. Figure 3 shows each set of four connected with straight lines.



    Figure 4. Two Dimensional Space of 16 Objects with i-index and j-index Based on the MLG Rules

  3. Now assign an i-index to each object in a given set of four with the same j-index according to the selected MLG rule. This assignment is shown in Fig. 4. Note that those objects with the same i-index are connected with a line.



Figure 5. The Two Dimensional MLG for the 16 Objects

The two-dimensional MLG in index space looks like Fig. 5. The letters in Fig. 5 match the letters labeling specific objects on the previous figures. Objects (letters) that are adjacent in memory (index space) are connected by lines in Figs. 3 and 4. Thus adjacency of the object data in computer memory corresponds to geometric adjacency of the objects.


Figure 6. The Near Neighbors of Object H in Real Space

The eight near neighbors of object H in index space are objects A, C, D, E, F, K, J, and L in real space as shown in in Fig. 6. The dashed line in Fig. 6 encloses the near neighbors of H in index space (compare with Fig. 5).

In general, more than one MLG data structure can exist for a given set of data. The description for the construction of a 2D MLG data structure just given is simple and easy to follow, but may not produce the "best" MLG data structure for the problem. The choice as to which MLG is best must be based on the data, problem being solved and additional constraints. Section II.3 will describe an approach to generate a better MLG from the given data.

Finally, this construction algorithm is not optimal for objects which are already "almost" in order. Data of this type occur in molecular dynamics calculations where the relative particle (object) positions change slowly over the course of one time step for reasons of accuracy and stability. In this case, only a few particles are locally out of MLG order. The previous general construction algorithm can result in objects moving long distances in the MLG data structure before order is restored. Boris[1] suggests using a bubble sort algorithm in this case to restore MLG order. A few iterates of a bubble sort will recover MLG order in O(N) time.

On completely random data, the bubble sort is not optimal. The MLG package described in this report includes a Shell sort [4] option for those data sets that require non-local sorting. A Shell sort starts by doing a bubble sort on objects N/2 away. Once this is complete a bubble sort on object N/4 away is carried out. This is done recursively until the standard bubble sort is done on the entire data set. The advantage to this scheme is that objects require movement to the far end of the MLG get there in fewer swaps. The Shell sort is a worst case O(N3/2) algorithm, but is shown to be O(N1.27) for random data when N < 60,000[4]. Other sorting algorithms are possible for complete sorting in one direction, but have not been implemented in the MLG package at this writing. The Shell sort offers the advantage of starting the global sort at varying length scales. This has some advantages for parallel computing and merging data sets where the sets to be merged have already been sorted.



I.3. General MLG Data Structure

In order to define a general multi-dimensional MLG, some additional terms need to be discussed. In the previous motivational discussion the term object was used to describe the data which determined the MLG. Here, an object in the MLG is a fixed length record of data (words or bytes) which are to be grouped together. Some or all of the data in the record may determine the MLG. In a 3D particle dynamics calculation, for example, an object consists of word data for the three positions, three velocities, possibly three accelerations, and a mass, (see figure 7). Only the three positions determine the MLG.




Figure 7. Part of a 3D Particle Dynamic Object

The term axis is used to describe a particular MLG dimension. For the 3D particle dynamics calculation, the MLG has three axes, one for each of the subscripts i, j, and k. In general, an axis may not be associated with any particular physical datum of an object, although in the particle dynamics example above they were. However, how objects are stored along an axis is determined by the data. Each axis is associated with a prescribed offset in computer memory. The offset depends on the length of each axis. The axis length is the number of objects along a given axis. For the above example, these axis lengths are Nx, Ny, and Nz. (Since we don't want to associate a physical position with an axis, a better denotation for these variables would be Ni, Nj, and Nk. By definition axis 1 is associated with a memory offset which is the length of one object record. Axis 2 is associated with a memory offset which is the length of axis 1 times the length of one object record. Axis 3 is associated with a memory offset which is the product of the lengths of axis 1 and axis 2 and the length of 1 object record. Figure 8 shows the object data stored in memory and the axis offsets. In general, axis m is associated with a memory offset which is the product of the lengths of all axes less than m and the length of one object record. This is nothing more than the way FORTRAN stores multi-dimensional arrays in a linear memory with the first MLG dimension of the array associated with axis 1, the second with axis 2, and so forth.




Figure 8. Memory Storage of Objects with Axis Offsets.

An MLG index is an M-tuple, j = (j1, j2, ... , jM) which uniquely describes the location of an object within the M-dimensional MLG data structure. If O(j) is an MLG object, its neighboring objects along each MLG dimension are the 2M objects O(j+em),m=1,...,M and O(j-em),m=1,...,M. The M-tuple em is the unit offset in dimension m and is defined as the M-tuple with mth component 1 and with all other components vanishing. An MLG link is used to describe the connection between neighboring objects along an MLG dimension in the MLG data structure and can be denoted by the ordered pair of objects (O(j),O(j+em))

The general multi-dimensional MLG data structure is defined in the following way:

[... I need HTML 3 for this!! ...]

Note that each object may be composed of many data items. The MLG data structure is a simple, compact way of indexing and storing data describing a number of objects in computer memory. The data for each object (node) is stored at memory location j in the MLG data structure such that for each axis, m, the object ordering obeys a rule which is a function of the data describing two adjacent objects. In the above 3D spatial example the objects are in MLG order for axis 1 when [ Mathematical Expression]. Figure 4 illustrates the application of this rule in 2D space.

An MLG does not have to be associated with objects moving in physical space. As an example, chemical compounds with their related characteristics could be the objects. A variable for one dimension might be the toxicity of the compound. The monotonic rule would sort the compounds by increasing toxicity. Another example could be graduating college football players with the variables being weight, height, speed, test scores, etc. Now, each variable could be defined for a dimension, and the monotonic rule would sort weight, height and test scores in increasing order and speed in decreasing order. Using the MLG a player with the appropriate abilities could be found quickly.

The MLG axis rule need not be associated with a single datum within the record. For example, a set of 4096 objects are moving in space and the total momentum is an important characteristic for collisions. Therefore, a 4D MLG, (8, 8, 8, 8) organized by the 3D position, x, y, z and total momentum, P would seem natural. The function for the total momentum is [ Mathematical Expression ] where m is the mass, and u, v, w are the x, y, z velocities respectfully. The objects would have a data vector O associated with them. The object O has seven components, (x, y, z, u, v, w, m). The three rules, eq. (1), apply to the first three axes. The fourth rule would be [ P rule ] for the fourth axis where P is a function of the respective u, v, w and m. The objects with the largest momentum in a given spatial region could quickly be identified.

A more general data example would sort street intersections and associate accidents. The data associated with each intersection would be:

The streets' name or number represent two types of data, character and numbers. The rule for the axis could sort the V_1 data alphanumerically. For the north-south streets, the rule could be which street is farthest east, A or B. That is, the streets would be order from west to east. A similar rule could be developed for the east-west streets. An interesting two dimensional MLG for this data could be to sort the intersections by type on one axis and by number of accidents on the other. One could then query the database for the 4-way stop intersections with the greatest number of accidents for the purpose of converting them to stop light control.

I.4. "Holes"

The previous example and discussion has assumed that the data of N objects to be organized into an MLG data structure could be arranged into M axes with each axis of length lm where N is the product of the axis lengths. But, what happens when the number of objects is not a product of reasonable numbers? If 31 objects are to be arranged into a two-dimensional MLG data structure, it could not be done since 31 is a prime number. Fifteen objects can to be arranged into an MLG that is either (3, 5) or (5, 3), but if the problem and data are better presented by an MLG (4, 4), what is to be done? The answer is very important and can be quite complex. An additional object, called a "hole", is defined and added to the data set. The 31 object set becomes a 32 object set and can be arranged into an MLG (8, 4); and the 15 objects become 16 objects and can be arranged into an MLG (4, 4). Any number of holes can be added to a system to form a desired MLG.


Figure 9. Two Dimensional Space of 15 Objects

As an example of this process, consider the data shown in Fig. 9. This example has only 15 data points and a MLG (4, 4) can not be constructed. A sixteenth data point could be added anywhere in space. The location chosen affects the MLG data structure and the quality of the "near-neighbors" ordering. How to chose this location is data and problem dependent. As such the MLG library of routines discussed in this report does not contain a routine for hole generation. In the present example, an additional data point, C can be added as shown by the circle in Fig. 10.



Figure 10. Two Dimensional Space of 15 Objects with Hole Added

The resulting MLG is shown in Fig. 11;


Figure 11. Two Dimensional Space of 15 Objects with Hole Added with i-index and j-index Based on MLG Rules

and the matrix of near-neighbors to point H is shown in Fig. 12. In the matrix, only "real" data points are shown.


Figure 12. Two Dimensional Space MLG for 15 Objects with Hole Added

The concept of a "Hole" is apparent from Figures 11 and 12, where the additional data point is a hole in the matrix or data structure. The hole is not a "real" data point but a spacer in the data set to make the MLG possible or to control or improve the MLG structure. While the addition of this spacer appears very simple (for this example it was), the placement of the hole in an actual data set is not a nontrivial matter. The hole and its location takes on special importance for algorithm writing and near-neighbor location in systems where the objects change dynamically.

For a MLG that contains nonuniformly distributed objects, the addition of holes can assist the maintenance of near-neighbor relationships by providing additional degrees of freedom. Nonuniformly distributed objects can result in a MLG ordering that has objects far apart in physical space next to each other in MLG space. This ordering is caused by the warping of the MLG links by the high density region. This effect is similar to large gravitational objects in space which warp the surrounding space. The ideal placement of holes is to provide a uniform distribution of objects and to remove steep density gradients, thereby, reducing the warping of the MLG links.



I.5. Using the MLG

The MLG data structure was original developed to facilitate algorithm writing and accessing near-neighbors for the Cray, a vector computer. While it may not be apparent from the discussion so far, the MLG data structure can be considered a gating algorithm. A gating algorithm selects which objects are close to another by some measurement. In the case of the MLG data structure, the axes, rules and holes perform a coarse gating operation and measurement without using a calculated value of the measurement. If this idea is kept in mind while writing algorithms using the MLG data structure and rule for constructing the MLG, efficient O(N) scaling algorithms should result.

In forming an MLG data structure, the objects and their data have been effectively mapped and gated into an array and adjacent computer memory as shown in Fig. 5. Objects close to H in the array are outlined by the dashed line in Fig. 6; these objects have index offsets of +1 or -1 from the H indices of (2,3). If we are looking for collisions or "strong interactions", these objects with offsets of +1 or -1 would be considered. To ensure that all interactions are found, all objects with offsets of +2 or -2 may be examined to see if they interact. The amount of offset required depends on the total number of objects, N, the local density of the objects, and the distance of paossible "interactions" between objects.

The MLG data structure has its greatest benefit when objects interact only with some near neighbor objects. The calculation of object interactions can obtain the long vector lengths needed for optimal vector speeds on vector machines by making the inner loops over objects and any outer loops over object offsets. Object data is accessed from memory with constant stride. An example of an algorithm for a vector machine is shown in Appendix D. The purpose of that routine is to evaluate the positional relationship between MLG objects with offsets of +1 or -1. This particular evaluation determines the directional cosine along each axis that is formed by an object and its + 1 offset along that axis. Figure 13 shows a small section of a two directional MLG. The discussion that follows describes how the objects A and B will be evaluated by that routine along the i axis.



Figure 13. MLG Evaluation Performed by Routine Given in Appendix A

The first block of do-loops, starting with line 67 and finishing with line 84, calculates the dj distance, (dxdi in the code), and starts the calculation of the distance from A to B , (link_eval(aijk) in the code). The second block of do-loops, starting with line 87 and finishing with line 104, calculates the di distance, completes the distance from A to B, and calculates the directional cosine. In both blocks, the innermost loops, lines 73, 74 and lines 94, 95 are over the links in the MLG using the fact that there is a one-to-one correspondence between the links in the MLG and the objects which are not at the end of a MLG axis. Even scalar machines with virtual memory or data caches can take advantage of this constant stride in memory by having "contiguous" blocks of data in physical memory or cache. A similar kind of routine structure can be built to compute all interactions with given MLG memory offsets.

The distance calculation is common in particle simulations. The algorithm discussed in the previous paragraph must be modified if holes are part of the MLG data structure. One approach is to include an item in the object record which indicates whether the object is a hole or a real object. A convenient indicator or mask for particle simulations is the value 1 for a real object and 0 for a hole. When the interaction between the objects is calculated one merely multiplies by the indicator for each object before adding to the total force for each object. Again, how the user treats holes is problem dependent. Other creative ways to indicate holes may be required for other problems.



II. Description of the MLG library



II.1. Basic Sorting Routines

The MLG Library Functions described in this section have been run and tested on standard UNIX boxes, (Cray Supercomputers, Convex mini-super computers and IBM, DEC, SGI and Sun workstations). The functions as written should run on shared memory computers, but have not been implemented and tested on distributed memory computers. The concepts underlying MLG library functions and MLG data structure work on distributed memory computers[5]. Future releases of this library will include code and library calls that incorporate this class of computers. These functions have been successfully linked with FORTRAN and C language code. Ada and other languages have not been tried but will be tried in the future releases. Details about language implementation can be found in Appendix A for FORTRAN and Appendix B for C. How one constructs an object record for each language is shown there.

II. 1. a. MLG_Shell_Sort

The routine MLG_Shell_Sort provides general purpose MLG generation from user prescribed data using a shell sort, (see [4]). The user prescribed data can be totally unsorted data, locally sorted data sets (e.g. merged MLG data sets), or a slightly perturbed MLG. To prescribe the degree of orderliness, the user specifies a value for Submlgs for each sort axis. The value of Submlgs is used by the shell sort to divide the data into subdivisions. For totally unordered data, the value should be log2(Axis_Length), and for slightly perturbed data, the value should be 0 indicating no subdivision. For locally sorted data, a value between 0 and log2(Axis_Length) should be used depending on how the data was merged. A FORTRAN program would call this integer function after including type statements as in the following example:
   
       integer status
       integer MLG_Shell_Sort
          .
          .
          .
       status = MLG_Shell_Sort 
     &       (Data_type, Ndim, swap_length, record_length, Axis_Length, 
     &        MLG, vrule, Nouter_its, Axis_Order, Sort_direction, 
     &        Submlgs, Ninner_its, Axis_Swaps)

Negative return value of status indicates an error, (see Section II.4). A positive return value of status is the actual number of outer iterations taken to reach the final MLG. The first six arguments describe the MLG structure and the object data. On all machines except Cray computers, the argument Data_type indicates whether the data records are stored as (8 bit) bytes or four-byte (32 bit) words. If the entire record consists of eight-byte (64 bit) data (e. g. double precision floating point numbers) the MLG data should be passed to the routine in terms of the equivalent record length in four-byte words and with Data_type=1. In general, the routine is more efficient using records stored as word data than the equivalent records stored as byte data. For Cray computers, byte data is currently unavailable and Data_type=1 means eight-byte (64 bit) words. The argument vrule (vector rule) is an external routine which describes the rules to use for each axis of the MLG. The next five arguments describe the type of sorting procedure to use. The argument Axis_Swaps provides the user with a diagnostic. The argument Submlg represents the orderliness of the data and the amount of sorting required along an axis. Submlgs takes values of zero for slightly perturbed data to log2(Axis_Length)-1 for unsorted data.

Input:

integer Data_type
= 0, byte data
= 1, word data
integer Ndim
Dimension of the MLG (Number of sort axes)
integer record_length
Maximum items per record in units of Data_type (first dimension of array MLG)
integer swap_length
The number of items in each record which determine the MLG in units of Data_type (How much of the first dimension of MLG is actually used)
integer Axis_Length(Ndim)
The length of each MLG axis, the product of which is the number of records in the MLG (Npoints)
user defined type MLG(record_length, Npoints)
Array of Npoints data records to be organized as an MLG.
external vrule(axis, MLGleft, MLGright, record_length, Nlow, skip, Nhigh, result)
External subroutine for FORTRAN and a void function for C of eight arguments which compares two arrays of records from the MLG. See Section II.1.e for a detailed description about this routine.
integer Nouter_its
Maximum number of outer iterations (passes) over each MLG axis
integer Axis_Order(Ndim)
The order in which the axes are sorted. (Typically a permitation of [1, ..., Ndim])
integer Sort_direction(Ndim)
= 0, traverse the axis downward
= 1, alternate directions, starting downward
= 2, traverse the axis upward
= 3, alternate directions, starting upward
integer Submlgs(Ndim)
Number of power of two records in subdivisions for each axis
= 0 gives a bubble sort
= log2(Axis_Length)-1 gives a Shell sort
integer Ninner_its(Ndim)
Maximum number of inner iterations you are willing to do, typically 1 or 2. Ninner_its(i) > Axis_Length(i) will guarantee that axis "i" is completely sorted before moving to the next axis.
Output:
user defined type MLG(record_length, Npoints)
Sorted array of data records in MLG order
integer Axis_Swaps(Ndim)
The number of individual record swaps done on each axis
Logical Structure of MLG_Shell_Sort

The shell sort repeatedly divides the data into coarser subdivisions and sorts the data within the subdivisions. The number of subdivisions determines the locality of the swapping process. Zero subdivisions is a completely local bubble swap. A log2(Axis_Length) subdivisions would give the most global swapping process. All iterative loops terminate if no swaps were detected for that iteration. The innermost loop is one pass of the basic bubble sort for that axis at the current subdivision level. The number of inner iterations determines how completely the subdivision of the current sort axis is sorted before moving on to the next subdivision. The number of subdivisions is decremented by one with each pass over the axis on the belief that no additional global sorts at the coarsest level desired is needed. As each axis is sorted, it may generate unsorted data on other axes. Hence, an outer iteration to perform a swapping operation on each axis again is needed. Rotating the axes for each outer iteration minimizes any bias the swapping algorithm may have toward a given axis. One can recover the original MLG algorithm of Boris[1] by choosing zero subdivisions, one inner iteration and alternating swap directions for all axes and by choosing an axis order which does each axis in succession starting with the first.

The innermost loop of the routine MLG_Shell_Sort calls either MLG_Swap_Words or MLG_Swap_Bytes depending on the type of data indicated. Both MLG_Swap routines use the same algorithm to perform a single bubble sort pass over the axis and subdivision indicated.


II.1.a MLG_Swap_Xxxxs

The MLG_Swap_Xxxxs, where Xxxx stands for either Word or Byte, provides a single bubble sort pass over the axis and subdivision indicated. These routines are not generally called by a user. A FORTRAN program would call this integer function after including type statements as in the following example:
       integer status
       integer MLG_Swap_Xxxxs
         .
         .
         .
       status = MLG_Swap_Xxxxs
     &        (Ndim, record_length, swap_length, Axis_Length, MLG, 
     &         vrule,axis, offset)

Negative return value of status indicates an error, (see Section II.4). A positive return value of status is the number of swaps performed.

Input:

integer Ndim
Dimension of the MLG (Number of sort axes)
integer record_length
Maximum items per record in units of Data_type (first dimension of array MLG)
integer swap_length
The number of items in each record which determine the MLG in units of Data_type (How much of the first dimension of MLG is actually used)
integer Axis_Length(Ndim)
The length of each MLG axis, the product of which is the number of records in the MLG (Npoints)
user defined type MLG(record_length, Npoints)
Array of Npoints data records to be organized as an MLG.
external vrule(axis, MLGleft, MLGright, record_length, Nlow, skip, Nhigh, result)
External subroutine for FORTRAN and a void function for C of eight arguments which compares two arrays of records from the MLG. See Section II.1.e for a detailed description about this routine.
integer axis
The axis on which to perform the swap
integer offset
The comparison distance along the sort axis between records
If offset < 0, the axis is traversed downward
If offset > 0, the axis is traversed upward
Output:
user defined type MLG(record_length, Npoints)
The newly arranged data
Logical Structure of MLG_Swap_Xxxxs

The MLG_Swap_Xxxxs routines treat the Npoints data records of the MLG array of objects as a three dimensional array with dimensions (Nlow, Axis_Length, Nhigh). The dimension Nlow is the product of all axis lengths below the current sort axis and Nhigh is the product of all axis lengths above the current sort axis. In the previous 3D spatial MLG (see Section I.3), when sorting axis 1, Nlow = 1, Axis_Length = Nx, and Nhigh = NyNz; when sorting axis 2, Nlow = Nx, Axis_Length = Ny, and Nhigh = Nz; and when sorting axis 3, Nlow = NxNy, Axis_Length = Nz, and Nhigh = 1. Since the only dependencies are along the current sort axis, optimal vectorization can be achieved making the innermost loops over the MLG data involve the Nlow and Nhigh dimensions. In effect, the routine carries out the comparison and swaps on Ndim-1 dimensional hyper-planes of the MLG.

Evaluation of the rule is performed by calling the user supplied routine vrule. The efficiency of MLG_swap algorithm is highly influenced by the efficiency of this user supplied routine. The MLG_library includes a "floating-point-less-than" rule for the special case when each axis is associated with the corresponding word in the MLG record. A listing of this routine is provided in Appendix D.2. The output of routine vrule is an array of logicals which describe whether the rule was satisfied for each comparison. A search of this array is conducted to find all "false" entries. The MLG_Swap routines use the efficient search routines WHENEQ on the Cray and ilsteq for the Convex and the FORTRAN equivalent loop for other scalar architectures. The loop over all out-of-MLG-order links can then be vectorized using gather/scatter techniques.

This algorithm was tested on 4096 particles moving randomly in a cubical box. A 16 x 16 x 16 MLG was used. Timings indicate that the above routine was 2.5 times faster than the "traditional" if-test vector switch and swap on a Convex C210. The difference between using the Convex veclib routine ilsteq and the FORTRAN equivalent was approximately 20%. Boris[1] also suggested a vectorized algorithm which involves several floating point operations on the object positions to obtain the equivalent of a vector switch. That algorithm relies on the fact that a spatial MLG is determined by floating point data. In general, floating point data may not even be involved, hence that algorithm is not general.

The MLG_Swap routines achieve their vectorization by processing all lines of data parallel to the sort axis simultaneously. This means that these algorithms achieve no vectorization for a one-dimensional MLG, since there is only one line of data. It is still possible to achieve some vectorization by using an even-odd sorting pass. The even-odd sorting pass requires additional scratch space in multi-dimensions, but increases overall vector length. Care also must be taken in multi-dimensions to avoid "psuedo"-links between the end of one parallel line and the beginning of the next parallel line. This even-odd algorithm has not been implemented as part of this package, but has shown promise for a spatial MLG on the CM-2[5].


II.1.c. MLG_Index_Shell_Sort

Frequently only a few items in the object determine the MLG. Boris[1] noted that efficiency can be obtained by only swapping those items in the objects which determine the MLG. Once MLG order has been obtained the remaining data items comprising the object can be reorganized into the MLG ordering ready determine. The MLG_library contains five routines associated with this idea. The routine MLG_Index_Shell_Sort performs the similar function as MLG_Shell_Sort, (II.1.a) except that only those items in the object which determine the MLG are swapped. The user prescribed data can be totally unsorted data, locally sorted data sets (e.g. merged MLG data sets), and a slightly perturbed MLG. A FORTRAN program would call this integer function after including type statements as in the following example:
        integer status
        integer MLG_Index_Shell_Sort
            .
            .
            .
        status = MLG_Index_Shell_Sort
     &       (Data_type, Ndim, swap_length, swap_data,
     &        record_length, Axis_Length, MLG,  
     &        vrule, Nouter_its, Axis_Order, Sort_direction, Submlgs, 
     &        Ninner_its, Axis_Swaps, MLG_index)

This routine has two additional arguments, namely, swap_data and MLG_index. The argument swap_data indicates which items in the object determine the MLG. A return argument MLG_index indicates where the current object was originally located prior to sorting. The argument Submlg represents the orderliness of the data and the amount of sorting required along a axis. Submlgs takes values of zero for slightly perturbed data to log2 Axis_Length - 1 for unsorted data, (see II.1.a). A negative return value of status indicates an error, (see II.4). A positive return value of status is the actual number of outer iterations taken to reach the final MLG. On all machines except Cray computers, the argument Data_type indicates whether the data records are stored as (8 bit) bytes or four-byte (32 bit) words. If the entire record consists of eight-byte (64 bit) data (e.g. double precision floating point numbers) the MLG data should be passed to the routine in terms of the equivalent record length in four-byte words and with Data_type=1. In general, the routine is more efficient using records stored as word data than the equivalent records stored as byte data. For Cray computers, byte data is currently unavailable and Data_type=1 means eight-byte (64 bit) words.

Input:

integer Data_type
= 0, byte data
= 1, word data
integer Ndim
Dimension of the MLG (Number of sort axes)
integer record_length
Maximum items per record in units of Data_type (first dimension of array MLG)
integer swap_length
The number of items in each record which determine the MLG in units of Data_type (How much of the first dimension of MLG is actually used)
integer swap_data(swap_length)
The particular items in each record which determine the MLG
integer Axis_Length(Ndim)
The length of each MLG axis, the product of which is the number of records in the MLG (Npoints)
user defined type MLG(record_length, Npoints)
Array of Npoints data records to be organized as an MLG.
external vrule(axis, MLGleft, MLGright, record_length, Nlow, skip, Nhigh, result)
External subroutine for FORTRAN and a void function for C of eight arguments which compares two arrays of records from the MLG. See Section II.1.e for a detailed description about this routine.
integer Nouter_its
Maximum number of outer iterations (passes) over each MLG axis
integer Axis_Order(Ndim)
The order in which the axes are sorted. (Typically a permitation of [1, ..., Ndim])
integer Sort_direction(Ndim)
= 0, traverse the axis downward
= 1, alternate directions, starting downward
= 2, traverse the axis upward
= 3, alternate directions, starting upward
integer Submlgs(Ndim)
Number of power of two records in subdivisions for each axis
= 0 gives a bubble sort
= log2(Axis_Length)-1 gives a Shell sort
integer Ninner_its(Ndim)
Maximum number of inner iterations you are willing to do, typically 1 or 2. Ninner_its(i) > Axis_Length(i) will guarantee that axis "i" is completely sorted before moving to the next axis.
Output:
user defined type MLG(record_length, Npoints)
Sorted array of data records in MLG order
integer Axis_Swaps(Ndim)
The number of individual record swaps done on each axis
,em>integer MLG_index(Npoints)
The original location in the array MLG where the remaining data items are located.

The logical structure of MLG_Index_Shell_Sort is similar to MLG_Shell_Sort (II.1.a}) The major difference being the addition of the array MLG_Index to MLG_Index_Shell_Sort and the index sort. As with MLG_Shell_Sort, a set of companion routines, MLG_Index_Swap_Word and MLG_Index_Swap_Bytes, are called by MLG_Index_Shell_Sort to perform the index swap depending on the type of data indicated. The companion routines swap the required MLG data items and MLG_Index. These companion routines are generally not called by a user. If the user wishes to directly call these companion routines, Section II.1.b, which describes the companion routines for MLG_Shell_Sort, MLG_Swap_Xxxxs, applies to MLG_Index_Swap_Word and MLG_Index_Swap_Bytes. Similar to the differences between MLG_Index_Shell_Sort and MLG_Shell_Sort, the array MLG_Index must be added to the calling statement. Since MLG_Index_Shell_Sort only sorts the required MLG data items and MLG_Index, the remaining data items must be move from their old location to their new location within the MLG using MLG_Index. Two companion routines, MLG_Index_Move_Words or MLG_Index_Move_Bytes, move the remaining data items depending on the type of data. These routines must called following MLG_Index_Shell_Sort to complete the new MLG and are described in the next section.


II.1.d. MLG_Index_Move_Xxxxs

In order to get a complete update of the MLG, MLG_Index_Shell_Sort must be followed by a call to either MLG_Index_Move_Words or MLG_Index_Move_Bytes depending on the type of data in the object. A FORTRAN program would call this integer function after including type statements as in the following example:
        integer status
        integer MLG_Index_Move_Xxxxs
           .
           .
           .
        status = MLG_Index_Move_Xxxxs
     &           (swap_length, swap_data, record_length, Npoints, MLG, 
     &            MLG_index)

Xxxx means either Word or Byte depending on which is appropriate. A negative return value of status indicates an error, (see II.4), while a zero return value of status indicates success.

Input:

integer swap_length
The number of items in each record which are left to be swapped
integer swap_data(swap_length)
The particular items in each record to swap
integer record_length
Maximum items per record in units of Data_type (first dimension of array MLG)
integer Npoints
The Number of records in the MLG
user defined type MLG(record_length, Npoints)
Array of Npoints data records to be organized as an MLG.
integer MLG_index(Npoints) Pointer from where to get data
Output:
user defined type MLG(record_length, Npoints)
The newly arranged data
Logical Structure of MLG_Index_Move_Xxxxs

The argument swap_data here indicates those items in the object that still need to be updated. The algorithm for MLG_Index_Move_Xxxxs simply involves gather/scatter operations. In general a global gather operation using the MLG_index array can complete the MLG swap. However, this approach moves every object's associated data whether the object moved or not. Therefore if only a few objects have changed locations in the MLG data structure, it pays to gather those few objects and scatter them back into the MLG data structure to complete the swap.


II.1.e. The "rule" routine

To provide the greatest flexibility in the construction of an MLG, the MLG library allows the user to define a rule for sorting the data. This rule is provided by the user in the form of an external subroutine in FORTRAN or a void function in C. This routine has the following form in FORTRAN 77:

       subroutine vrule(axis, MLGleft, MLGright, record_length, 
     &                  Nlow, skip, Nhigh, result)

       integer    axis, record_length, Nlow, skip, Nhigh
       <MLGtype>  MLGleft (record_length,Nlow,skip,Nhigh)
       <MLGtype>  MLGright(record_length,Nlow,skip,Nhigh)
       logical    result(Nlow,Nhigh)

where the arguments are:

Input:
integer axis
The current sort axis
user defined type MLGleft(record_length,Nlow,skip,Nhigh)
The current lower or left array on axis "axis"
user defined type MLGright(record_length,Nlow,skip,Nhigh)
The current upper or right array on axis "axis"
integer record_length
The length of the MLG record in units of the user's declared MLG data type
integer Nlow
The number of records in the MLG dimensions below "axis"
integer skip
The length of "axis"
integer Nhigh
The number of records in the MLG dimensions above "axis"

Output:
logical result(Nlow,Nhigh)
The results of the Nlow x Nhigh comparisons

For programming languages that support the STRUCTURE data type, the argument record_length may be superfluous, but still must be in the argument list. In fact with respect to the first subscript, the arguments MLGleft and MLGright should be declared in the same way that the user's main program declares his array of records to be sorted.

The last three subscripts for MLGleft and MLGright effectively partition the MLG records into hyperplanes normal to the current swap axis. The user is expected to return a point-by-point comparison of the first hyperplane of MLGleft, i.e. MLGleft(i,1,k) with the first hyperplane of MLGright, i.e. MLGright(i,1,k) and return the point-by-point result in array result. The array result should contain the value .true. if the rule is satisfied and .false. if the rule is not satisfied for the axis under consideration.

Appendix C.2 shows a FORTRAN 77 implementation of the "floating-point-less-than" rule used by routine EZMLG_sort.



II.2. MLG sorting made easy

The typical MLG application does not require complete control of the sorting process. The MLG library includes two routines, EZMLG_Sort and MLG_Sort_w which assume the following defaults for the sorting process. Nouter is chosen to be the sum of the axis lengths on the belief that in worst case a record must travel across the MLG data structure. The axes are taken in order from 1 to Ndim. For each axis, the sort direction is taken as alternating starting upward; there are no subdivisions of any axis (i.e. local swaps only); and only one inner iteration will be performed. This is the local algorithm of Boris. The records are assumed to consist of word data. Finally, these routines assume the user is not interested in the swap diagnostics. The approach used by both routines is to use MLG_Shell_Sort for a MLG whose record length is less than or equal to Ndim+1 in length, but use MLG_Index_Shell_Sort followed by a call to MLG_Index_Move_Words for longer record lengths.

II.2.a. EZMLG_Sort

The routine EZMLG_Sort adds the assumptions that the first Ndim words in the record are floating point numbers which are associated with the corresponding MLG axis and each axis is to be sorted in increasing order. These assumptions are typical of particle simulations. If these assumptions are too restrictive, the user should study MLG_Sort_w (Section II.2.b). A FORTRAN program would call EZMLG_Sort after including type statements as in the following example:

        integer status
        integer EZMLG_Sort
           .
           .
           .
        status = EZMLG_Sort (Ndim, record_length, Axis_Length, MLG)


Input:
integer Ndimn
Dimension of the MLG (Number of sort axes)
integer record_length
Maximum number of words per record (first dimension of array MLG)
integer Axis_Length(Ndim)
The length of each MLG axis, the product of which is the number of records in the MLG (Npoints)
user defined type MLG(record_length, Npoints) Array of Npoints data records to be organized as an MLG.

Output:
user defined type MLG(record_length, Npoints)
The fully sorted array of data

II.2.b. MLG_Sort_w

EZMLG_Sort restrictes the MLG axis rule to floating point numbers associated with the axis sorted in increasing order. The routine MLG_Sort_w, while requiring word data like EZMLG_Sort, allows control over the MLG axis rules. Three additional arguments over those required by the routine EZMLG_Sort are needed. A FORTRAN program would call this integer function after including type statements as in the following example:

        integer status
        integer MLG_Sort_w
            .
            .
            .
        status = MLG_Sort_w 
     &           (Ndim, record_length, swap_length, swap_data, 
     &            Axis_Length, MLG, vrule)


Input:
integer Ndim
Dimension of the MLG (Number of sort axes)
integer record_length
Maximum number of words per record (first dimension of array MLG)
integer swap_length
The number of items in each record which determine the MLG in words
integer swap_data(swap_length)
The particular items in each record which determine the MLG
integer Axis_Length(Ndim)
The length of each MLG axis, the product of which is the number of records in the MLG (Npoints)
user defined typeMLG(record_length, Npoints)}
Array of Npoints data records to be organized as an MLG.
external vrule(axis, MLGleft, MLGright, record_length, Nlow, skip, Nhigh, result)
external subroutine for FORTRAN and a void function for C of eight arguments which compares two arrays. Section II.5 has a detailed description about this routine.

Output:
user defined typeMLG(record_length, Npoints)}
Fully sorted array of data



II.3. Finding a better MLG



II.3.a. MLG_Anneal

As mentioned earlier, the previous sorting algorithms do not necessarily produce the "best" MLG. The MLG_library provides the routine MLG_anneal as a tool to improve the MLG quality. This routine includes a call to MLG_Shell_Sort and therefore require the arguments listed with that routine, (see Section II.1.a.). A FORTRAN program would call this integer function after including type statements as in the following example:

       integer status
       integer MLG_Anneal
           .
           .
           .
       status = MLG_Anneal
     &          (Data_type, Ndim, swap_length, swap_data,
     &           record_length, Axis_Length, MLG,  
     &           vrule, Nouter_its, Axis_Order, Sort_direction, Submlgs, 
     &           Ninner_its, Axis_Swaps, MLG_eval, Npass)

Additional arguments above those needed for the Shell sort are needed to determine MLG quality. In particular, a routine for the evaluation of the current MLG is needed. The library includes the routine MLG_eval_space, an evaluation routine based on link lengths. The complete argument description is:

Input:

integer Data_type
= 0, byte data
= 1, word data
integer Ndim
Dimension of the MLG (Number of sort axes)
integer record_length
Maximum items per record in units of Data_type (first dimension of array MLG)
integer swap_length
The number of items in each record which determine the MLG in units of Data_type
integer swap_data(swap_length)
The particular items in each record which determine the MLG
integer Axis_Length(Ndim)
The length of each MLG axis, the product of which is the number of records in the MLG (Npoints)
user defined type MLG(record_length, Npoints)
Array of Npoints data records to be organized as an MLG.
external vrule(axis, MLGleft, MLGright, record_length, Nlow, skip, Nhigh, result)
External subroutine for FORTRAN and a void function for C of eight arguments which compares two arrays of records from the MLG. See Section II.1.e for a detailed description about this routine.
integer Nouter_its
Maximum number of outer iterations (passes) over each MLG axis
integer Axis_Order(Ndim)
The order in which the axes are sorted. (Typically a permitation of [1, ..., Ndim])
integer Sort_direction(Ndim)
= 0, traverse the axis downward
= 1, alternate directions, starting downward
= 2, traverse the axis upward
= 3, alternate directions, starting upward
integer Submlgs(Ndim)
Number of power of two records in subdivisions for each axis
= 0 gives a bubble sort
= log2(Axis_Length)-1 gives a Shell sort
integer Ninner_its(Ndim)
Maximum number of inner iterations you are willing to do, typically 1 or 2. Ninner_its(i) > Axis_Length(i) will guarantee that axis "i" is completely sorted before moving to the next axis.
external MLG_eval (Ndim, record_length, Axis_Length, MLG, link_eval)
External real function of 5 arguments which evaluates an MLG and each of the axis links. The return value is a positive floating point number. Smaller values indicate better MLG structure. The argument link_eval(Ndim,Npoint) returns a real number between 0.0 (a bad link) and 1.0 (a good link) which evaluates all links in the MLG structure. The first dimension of link_eval is over the axes.
integer Npass
The maximum number of annealing passes
Output:
user defined type MLG(record_length, Npoints)
Sorted array of data records in MLG order
integer Axis_Swaps(Ndim)
The number of individual record swaps done on each axis
Logical Structure of MLG_Anneal

I.4. Error Reports

In all the MLG library routines, negative return values of status indicate an error. The following table is a list of all negative return values and their meaning.

  - 1  Ndim is non-positive.

  - 3  A non-positive axis was detected. Check values of Axis_Order.

  - 4  An axis > Ndim was detected. Check values of Axis_Order.

  - 5  A non-positive record item was detected. Check swap_data.

  - 6  A record item > record_length was detected. Check swap_data.

  -10  A non-positive axis length was detected.

  -18  A bad sort direction was detected.

  -21  Insufficient dynamic memory available for this problem.



References

  1. Boris, J. A Vectorized "Near Neighbors" Algorithm of Order N Using a Monotonic Logical Grid, J. of Computational Physics 66, 1-20, (1986).
  2. Picone, J. M., Lambrakos, S. G., and Boris, J. P., Timing Analysis of the Montonic Logical Grid for Many-body Dynamics, SIAM J. Sci. Stat Comput. 11, No. 2, 368-388, (March 1990).
  3. Lambrakos, S. G., and Boris, J. P., "Geometric Properties of the Monotonic Logical Grid Algorithms for Near Neighbor Calculations", NRL Memorandum Report 5761, April 24, 1986.
  4. Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vettering, W. T., Numerical Recipes, Cambridge University Press Cambridge (1989), pp. 244-245.
  5. Kolbe, R. L., Boris, J. P. and Picone, J. M., "Battle Engagement Area Simulator/Tracker", NRL Memorandum Report 6705, Oct. 1990.


Appendices


Appendix A. Calling the package from FORTRAN

Calling the MLG library from the FORTRAN programming language should not cause any problems, since the MLG library was written in FORTRAN. The MLG library requires contiguous object records. This requirement can be a problem for a object record with more than one type of data (e.g. integer, real, and character) in the record. FORTRAN 90 has a structure declaration similar to the C language with which one can generate contiguous records. FORTRAN 77 will require certain data declarations to construct the continuous object records.

  1        integer  n, n4
  2         parameter (n=5, n4=4*n)
  3        real     object_record (n, nx, ny, nz)
  4        integer  integer_record (n, nx, ny, nz)
  5        byte     byte_record (4*n, nx, ny, nz)
  6
  7        equilvalence (object_record, integer_record)
  8        equilvalence (object_record, byte_record)   
  9
 10        integer  i, j, k
 11        integer  Axis_length(3)
 11
 12        do k=1,nz
 13          do j=1,ny
 14            do i=1,nx
 15              object_record(1,i,j,k) = float(i-1)
 16              object_record(2,i,j,k) = float(j-1)
 17              object_record(3,i,j,k) = float(k-1)
 18              integer_record(4,i,j,k) = i+nx*(j-1+ny*(k-1))
 19              byte_record(17,i,j,k) = mod(i,128)
 20              byte_record(18,i,j,k) = mod(j,128)
 21              byte_record(19,i,j,k) = mod(k,128)
 22              byte_record(20,i,j,k) = 1
 23            end do
 24          end do
 25        end do
 26        byte_record(20,4,2,2) = 0
 27        axis_length(1) = nx
 28        axis_length(1) = ny
 29        axis_length(1) = nz
 30
 31        call ezmlg_sort (3, n, axis_length, object_record)
 

In the FORTRAN code fragment just presented, object records with mixed types are generated. The initial declaration of OBJECT_record is for a real variable. This declaration creates NX x NY x NZ object records of N words. In order to include integer and byte data in this object record, additional variables are declared named INTEGER_record and BYTE_record. These variables are sized to match OBJECT_record and then equivalenced to OBJECT_record. It becomes the user responsibility to keep track of word boundaries and to properly load the data arrays.


Appendix B. Calling the package from C

In calling the MLG library from the C programming language, two differences between the way C handles computer memory and the way FORTRAN handles computer memory must be kept in mind. In C, array indices start at zero; whereas in FORTRAN, array indices start at one. The MLG library follows the FORTRAN conventions for array indices. Some of the routine arguments, such as swap_data, are arrays of indices into the MLG array of data. The values of these index arrays must use the FORTRAN convention. Hence, one might initialize swap_data with

      for(i=0; i<record_length; i++)
        swap_data[i] = i+1;

The language C stores arrays row-wise; FORTRAN stores arrays column-wise. The MLG library follows the FORTRAN conventions for array storage. The following C code fragment initializes an object data base.


  1 struct OBJECT_swap { float X, Y, Z; };
  2 struct OBJECT_data { float VX, VY, VZ; };
  3 struct OBJECTstruct { struct OBJECT_swap swap; struct OBJECT_data data; };
  4 EXTERN struct OBJECTstruct particles[NI][NJ][NK];
  5 float  x, y, z, dx, dy, dz;
  6 int    i, j, k, status, record_length, Axis_Length[3];
  7 int    ezmlg_sort_
  8         (int *Ndim, int *record_length, int *Axis_Length, 
  9             struct OBJECTstruct *particles);
 10
 11 for(i=0;i<NI;i++){
 12    x = (float) i;
 13    dx = x / NI;
 14    for(j=0;j<NJ;j++){
 15       y = (float) j;
 16       dy = y / NJ;
 17       for(k=0;k<NK;k++){
 18          z = (float) k;
 19          dz = z / NK;
 20          particles[i][j][k].swap.X = x + 0.1*dy*dz;
 21          particles[i][j][k].swap.Y = y + 0.1*dx*dz;
 22          particles[i][j][k].swap.Z = z + 0.1*dx*dy;
 23       }
 24    }
 25 }
 26
 27 Ndim = 3;
 28 record_length = 6;
 29 Axis_Length[0] = NI;
 30 Axis_Length[1] = NJ;
 31 Axis_Length[2] = NK;
 32 status = ezmlg_sort_
 33         (*Ndim, *record_length, *Axis_Length, *particles);

The data is originally in MLG order with the "floating-point-less-than" rule if we associate Z with axis 1, Y with axis 2 and X with axis 3. If a call to EZMLG_Sort is made, massive swapping will occur since EZMLG_Sort will associate X with axis 1, Y with axis 2, and Z with axis 3. There are three ways to fix this problem. The first way is to interchange the indices i and k in lines 20-22 from [i][j][k] to [k][j][i] and the declaration in line 4 from [NI][NJ][NK] to [NK][NJ][NI]. This in effect stores the arrays column-wise so that the words in the record are associated with the axes in the way that EZMLG_Sort expects. A second approach is to replace "axis" in lines 59 and 67 of the rule of Appendix C.2 with "4-axis". This has the effect of making the MLG library associate the axes with the words in the record in the same way as the C code has initialized the arrays. A third approach is to reorder X, Y, Z in the structure on line 1 of the code to Z, Y, X and change the definition of Axis_Length in lines 29-31. The first and third approaches have the advantage that one can use the "floating-point-less-than" rule provided by the package.

Since the MLG library is written in FORTRAN and FORTRAN arguments are passed by address, all arguments provided must be pointers to memory.

Finally, the actual routine name one would use in the C program is currently machine dependent. On the the Convex and Sun workstations, the FORTRAN compiler converts all routine names to lower case and pre- and post-appends an underscore to the routine name, while the C compiler only pre-appends an underscore. Hence, to call a MLG library routine on these machines one look like the above example. For other machine architectures one should consult the appropriate manuals.


Appendix C. Examples


Appendix C.1. MLG link evaluation


  1 C**********************************************************************
  2 c
  3 c               MLG  evaluation
  4 c
  5 C**********************************************************************
  6 c
  7           real function MLG_eval_space
  8      &              (Ndim, words, Axis_Length, MLG, link_eval)
  9 c
 10 c     Evaluate a MLG and its links
 11 c
 12 c     Assumptions
 13 c
 14 c     1)  MLG record consists of 4-byte word data
 15 c     2)  The first "Ndim" words are the spatial coordinates of points
 16 c
 17 c
 18 c     Arguments:
 19 c
 20 c       Input:
 21 c
 22 c         Ndim      The dimension of the MLG
 23 c         words     The number of words per record in the MLG
 24 c         Axis_Length(Ndim)
 25 c                   The length of each axis in the MLG
 26 c         MLG(words,Npoint)
 27 c                   The data (here treated as real*4)
 28 c
 29 c       Output:
 30 c
 31 c         link_eval(Npoint,Ndim)
 32 c                   The dot product of the link with the MLG axis
 33 c
 34 c         Return value
 35 c                   The sum of all line lengths in the MLG
 36 c
 37 c
 38           Implicit   None
 39 c
 40 c      Subroutine Arguments
 41 c
 42           Integer    Ndim,   words
 43           integer    Axis_Length(Ndim)
 44           real       MLG(words,1)
 45           real       link_eval(1)
 46 c
 47 c      Local declarations
 48 c
 49           integer    n,      axis,   kl,     jk,     ijk,     aijk,   np
 50           integer    Nlow,   Nhigh,  Npoints
 51           real       dxdi
 52           real       EPS
 53           data       EPS / 1.0e-30 /
 54 c
 55 c
 56           Npoints = 1
 57           do n=1,Ndim
 58             Npoints = Npoints*Axis_Length(n)
 59           end do
 60 c
 61           do aijk=1,Ndim*Npoints
 62             link_eval(aijk) = 0.0
 63           end do
 64 c
 65 c      Compute all line length components
 66 c
 67           do n=1,Ndim
 68             Nlow  = 1
 69             do axis=1,Ndim
 70               Nhigh = Nlow*Axis_Length(axis)
 71               if(axis .ne. n)  then
 72                 np = (axis-1)*Npoints
 73                 do kl=0,Npoints-Nhigh,Nhigh
 74                   do jk=1,Nhigh-Nlow
 75                     ijk = jk + kl
 76                     aijk = np + ijk
 77                     dxdi = MLG(n,ijk+Nlow) - MLG(n,ijk)
 78                     link_eval(aijk) = link_eval(aijk) + dxdi**2
 79                   end do
 80                 end do
 81               end if
 82               Nlow = Nhigh
 83             end do
 84           end do
 85 c
 86           MLG_eval_space = 0.0
 87           do n=1,Ndim
 88             Nlow  = 1
 89             do axis=1,n-1
 90               Nlow =  Nlow*Axis_Length(axis)
 91             end do
 92             Nhigh = Nlow*Axis_Length(n)
 93             np = (n-1)*Npoints
 94             do kl=0,Npoints-Nhigh,Nhigh
 95               do jk=1,Nhigh-Nlow
 96                 ijk = jk + kl
 97                 aijk = np + ijk
 98                 dxdi = MLG(n,ijk+Nlow) - MLG(n,ijk)
 99                 link_eval(aijk) = sqrt(dxdi + link_eval(aijk))
100                 MLG_eval_space = MLG_eval_space + link_eval(aijk)
101                 link_eval(aijk) = dxdi/(link_eval(aijk)+EPS)
102               end do
103             end do
104           end do
105 c
106           return
107           end
108 c
109 C**********************************************************************


Appendix C.2. MLG "floating-point-less-than" rule

This sample rule compares two hyperplanes of the MLG associating the first word of the MLG with the first axis, the second word with the second axis, etc. Each axis is to be sorted from lowest to highest value. The if test at line 55 is included to improve vector efficiency by making the larger of Nlow or min(Nhigh, 8) the inner loop. This trade-off between vector length and memory stride was generated by experimentation on a Convex C210. The results of the comparisons are stored in array dont_swap_if.

  1 C*************************************************************************
  2 C
  3 C                 vectorizing MLG swap rules
  4 C
  5 C*************************************************************************
  6 c
  7           subroutine MLG_swap_rule_flt 
  8      &          (axis, MLGleft, MLGright, words, Nlow, skip, Nhigh, 
  9      &           dont_swap_if)
 10 c
 11 c    This routine compares two records of the MLG for i=1,Nlow,k=1,Nhigh
 12 c    This rule assumes 
 13 c          1)  word aligned real*4 
 14 c          2)  the axis variables appear in order in the record
 15 c
 16 c    Arguments:
 17 c
 18 c      Input:
 19 c
 20 c        axis     The axis to which to apply the rule
 21 c        MLGleft(words,Nlow,skip,Nhigh)
 22 c                 The current lower record on axis "axis"
 23 c        MLGright(words,Nlow,skip,Nhigh)
 24 c                 The current higher record on axis "axis"
 25 c        words    The number of words in a record
 26 c        Nlow     The number of records in the MLG dimensions below "axis"
 27 c        skip     The length of "axis"
 28 c        Nhigh    The number of records in the MLG dimensions above "axis"
 29 c
 30 c      Output:
 31 c
 32 c        dont_swap_if(Nlow,Nhigh)
 33 c             dont_swap_if(i,k) = .true. if MLGleft(axis,i,1,k) <= 
 34 c                                                          MLGright(axis,i,1,k)
 35 c
 36           implicit  none
 37 c
 38 c    Subroutine arguments
 39 c
 40           integer axis
 41           integer words, Nlow, skip, Nhigh
 42           real    MLGleft(words,Nlow,skip,Nhigh)
 43           real    MLGright(words,Nlow,skip,Nhigh)
 44           logical dont_swap_if(Nlow,Nhigh)
 45 c
 46 c    Local declarations
 47 c
 48           integer  i, k
 49 c
 50 c    Employ the vector loop which gives maximum efficiency
 51 c       If Nlow is big enough (currently 8) always use contiguous memory
 52 c       loop as the vector loop, otherwise take the outer dimension for
 53 c       increased vector length.
 54 c
 55           if(Nlow .ge. min(Nhigh,8))  then
 56             do k=1,Nhigh
 57               do i=1,Nlow
 58                 dont_swap_if(i,k) = 
 59      &                 MLGleft(axis,i,1,k) .le. MLGright(axis,i,1,k)
 60               end do
 61             end do
 62           else
 63             do i=1,Nlow
 64 c$dir prefer_vector
 65               do k=1,Nhigh
 66                 dont_swap_if(i,k) = 
 67      &                 MLGleft(axis,i,1,k) .le. MLGright(axis,i,1,k)
 68               end do
 69             end do
 70           end if
 71 c
 72           return
 73           end
 74 c
 75 C**********************************************************************



Last modified: June 16, 1997

fyfe@lcp.nrl.navy.mil