The MLG data structure is a dynamic data structure that maps objects in physical space into computer memory with well-defined memory offsets for near neighbors. By mapping nearby objects into contiguous memory locations, the data structure allows the vectorization and parallelization of algorithms for the generation, maintenance and search of the data. Data sets can be partitioned and merged easily, which is especially important for distributed and parallel algorithms. Finally, small changes in the data, such as those that would occur for a spatial position during a particle dynamics simulation, do not trigger global changes in the MLG data structure. While the MLG data structure was originally developed for 3D spatial based data, its application can be expanded easily to incorporate other types of data in an arbitrary number of dimensions.
Motivation for the general definition of an MLG can be obtained by discussing
its original 3D spatial definition [1]. Suppose N = N_{x}
N_{y} N_{z}
objects (particles) are moving in space. Because of the form of N, it
is
natural to reference each object with three subscripts (i,j,k).
For lattice
particle simulations it is also natural to associate each subscript with a
physical dimension. Each of the objects will have three physical coordinates
X(i,j,k), Y(i,j,k), and Z(i,j,k) as well as other data,
(see figure 1.).
Mathematically, the N objects in three dimensional space with locations, X(i, j, k), Y(i, j, k), and Z(i, j, k), constitute an MLG if and only if
The eight near neighbors of object H in index space are objects A, C, D, E, F, K, J, and L in real space as shown in in Fig. 6. The dashed line in Fig. 6 encloses the near neighbors of H in index space (compare with Fig. 5).
In general, more than one MLG data structure can exist for a given set of data. The description for the construction of a 2D MLG data structure just given is simple and easy to follow, but may not produce the "best" MLG data structure for the problem. The choice as to which MLG is best must be based on the data, problem being solved and additional constraints. Section II.3 will describe an approach to generate a better MLG from the given data.
Finally, this construction algorithm is not optimal for objects which are already "almost" in order. Data of this type occur in molecular dynamics calculations where the relative particle (object) positions change slowly over the course of one time step for reasons of accuracy and stability. In this case, only a few particles are locally out of MLG order. The previous general construction algorithm can result in objects moving long distances in the MLG data structure before order is restored. Boris[1] suggests using a bubble sort algorithm in this case to restore MLG order. A few iterates of a bubble sort will recover MLG order in O(N) time.
On completely random data, the bubble sort is not optimal. The MLG package described in this report includes a Shell sort [4] option for those data sets that require non-local sorting. A Shell sort starts by doing a bubble sort on objects N/2 away. Once this is complete a bubble sort on object N/4 away is carried out. This is done recursively until the standard bubble sort is done on the entire data set. The advantage to this scheme is that objects require movement to the far end of the MLG get there in fewer swaps. The Shell sort is a worst case O(N^{3/2}) algorithm, but is shown to be O(N^{1.27}) for random data when N < 60,000[4]. Other sorting algorithms are possible for complete sorting in one direction, but have not been implemented in the MLG package at this writing. The Shell sort offers the advantage of starting the global sort at varying length scales. This has some advantages for parallel computing and merging data sets where the sets to be merged have already been sorted.
The general multi-dimensional MLG data structure is defined in the following way:
Note that each object may be composed of many data items. The MLG data structure is a simple, compact way of indexing and storing data describing a number of objects in computer memory. The data for each object (node) is stored at memory location j in the MLG data structure such that for each axis, m, the object ordering obeys a rule which is a function of the data describing two adjacent objects. In the above 3D spatial example the objects are in MLG order for axis 1 when [ Mathematical Expression]. Figure 4 illustrates the application of this rule in 2D space.
An MLG does not have to be associated with objects moving in physical space. As an example, chemical compounds with their related characteristics could be the objects. A variable for one dimension might be the toxicity of the compound. The monotonic rule would sort the compounds by increasing toxicity. Another example could be graduating college football players with the variables being weight, height, speed, test scores, etc. Now, each variable could be defined for a dimension, and the monotonic rule would sort weight, height and test scores in increasing order and speed in decreasing order. Using the MLG a player with the appropriate abilities could be found quickly.
The MLG axis rule need not be associated with a single datum within the record. For example, a set of 4096 objects are moving in space and the total momentum is an important characteristic for collisions. Therefore, a 4D MLG, (8, 8, 8, 8) organized by the 3D position, x, y, z and total momentum, P would seem natural. The function for the total momentum is [ Mathematical Expression ] where m is the mass, and u, v, w are the x, y, z velocities respectfully. The objects would have a data vector O associated with them. The object O has seven components, (x, y, z, u, v, w, m). The three rules, eq. (1), apply to the first three axes. The fourth rule would be [ P rule ] for the fourth axis where P is a function of the respective u, v, w and m. The objects with the largest momentum in a given spatial region could quickly be identified.
A more general data example would sort street intersections and associate accidents. The data associated with each intersection would be:
As an example of this process, consider the data shown in Fig. 9.
This example
has only 15 data points and a MLG (4, 4) can not be constructed. A
sixteenth data point could be added anywhere in space. The location chosen
affects the MLG data structure and the quality of the "near-neighbors"
ordering. How to chose this
location is data and problem dependent. As such the MLG library of routines
discussed in this report does not contain a routine for hole generation. In the
present example, an additional data point, C can be added as shown by
the circle in Fig. 10.
The concept of a "Hole" is apparent from Figures 11 and 12, where the additional data point is a hole in the matrix or data structure. The hole is not a "real" data point but a spacer in the data set to make the MLG possible or to control or improve the MLG structure. While the addition of this spacer appears very simple (for this example it was), the placement of the hole in an actual data set is not a nontrivial matter. The hole and its location takes on special importance for algorithm writing and near-neighbor location in systems where the objects change dynamically.
For a MLG that contains nonuniformly distributed objects, the addition of holes can assist the maintenance of near-neighbor relationships by providing additional degrees of freedom. Nonuniformly distributed objects can result in a MLG ordering that has objects far apart in physical space next to each other in MLG space. This ordering is caused by the warping of the MLG links by the high density region. This effect is similar to large gravitational objects in space which warp the surrounding space. The ideal placement of holes is to provide a uniform distribution of objects and to remove steep density gradients, thereby, reducing the warping of the MLG links.
In forming an MLG data structure, the objects and their data have been effectively mapped and gated into an array and adjacent computer memory as shown in Fig. 5. Objects close to H in the array are outlined by the dashed line in Fig. 6; these objects have index offsets of +1 or -1 from the H indices of (2,3). If we are looking for collisions or "strong interactions", these objects with offsets of +1 or -1 would be considered. To ensure that all interactions are found, all objects with offsets of +2 or -2 may be examined to see if they interact. The amount of offset required depends on the total number of objects, N, the local density of the objects, and the distance of paossible "interactions" between objects.
The MLG data structure has its greatest benefit when objects interact only
with some near neighbor objects. The calculation of object interactions can
obtain the long vector lengths needed for optimal vector speeds on vector
machines by making the inner loops over objects and any outer loops over
object offsets. Object data is accessed from memory with constant stride.
An
example of an algorithm for a vector machine is shown in
Appendix D. The purpose
of that routine is to evaluate the positional relationship between MLG objects
with offsets of +1 or -1. This particular evaluation determines the
directional cosine along each axis that is formed by an object and its + 1
offset along that axis. Figure 13 shows a small section of a two directional
MLG. The discussion that follows describes how the objects A and
B will be evaluated by that routine along the i axis.
The first block of do-loops, starting with line 67 and finishing with line 84, calculates the d_{j} distance, (dxdi in the code), and starts the calculation of the distance from A to B , (link_eval(aijk) in the code). The second block of do-loops, starting with line 87 and finishing with line 104, calculates the d_{i} distance, completes the distance from A to B, and calculates the directional cosine. In both blocks, the innermost loops, lines 73, 74 and lines 94, 95 are over the links in the MLG using the fact that there is a one-to-one correspondence between the links in the MLG and the objects which are not at the end of a MLG axis. Even scalar machines with virtual memory or data caches can take advantage of this constant stride in memory by having "contiguous" blocks of data in physical memory or cache. A similar kind of routine structure can be built to compute all interactions with given MLG memory offsets.
The distance calculation is common in particle simulations. The algorithm discussed in the previous paragraph must be modified if holes are part of the MLG data structure. One approach is to include an item in the object record which indicates whether the object is a hole or a real object. A convenient indicator or mask for particle simulations is the value 1 for a real object and 0 for a hole. When the interaction between the objects is calculated one merely multiplies by the indicator for each object before adding to the total force for each object. Again, how the user treats holes is problem dependent. Other creative ways to indicate holes may be required for other problems.
integer status integer MLG_Shell_Sort . . . status = MLG_Shell_Sort & (Data_type, Ndim, swap_length, record_length, Axis_Length, & MLG, vrule, Nouter_its, Axis_Order, Sort_direction, & Submlgs, Ninner_its, Axis_Swaps)
Negative return value of status indicates an error, (see Section II.4). A positive return value of status is the actual number of outer iterations taken to reach the final MLG. The first six arguments describe the MLG structure and the object data. On all machines except Cray computers, the argument Data_type indicates whether the data records are stored as (8 bit) bytes or four-byte (32 bit) words. If the entire record consists of eight-byte (64 bit) data (e. g. double precision floating point numbers) the MLG data should be passed to the routine in terms of the equivalent record length in four-byte words and with Data_type=1. In general, the routine is more efficient using records stored as word data than the equivalent records stored as byte data. For Cray computers, byte data is currently unavailable and Data_type=1 means eight-byte (64 bit) words. The argument vrule (vector rule) is an external routine which describes the rules to use for each axis of the MLG. The next five arguments describe the type of sorting procedure to use. The argument Axis_Swaps provides the user with a diagnostic. The argument Submlg represents the orderliness of the data and the amount of sorting required along an axis. Submlgs takes values of zero for slightly perturbed data to log_{2}(Axis_Length)-1 for unsorted data.
Input:
The shell sort repeatedly divides the data into coarser subdivisions and sorts the data within the subdivisions. The number of subdivisions determines the locality of the swapping process. Zero subdivisions is a completely local bubble swap. A log_{2}(Axis_Length) subdivisions would give the most global swapping process. All iterative loops terminate if no swaps were detected for that iteration. The innermost loop is one pass of the basic bubble sort for that axis at the current subdivision level. The number of inner iterations determines how completely the subdivision of the current sort axis is sorted before moving on to the next subdivision. The number of subdivisions is decremented by one with each pass over the axis on the belief that no additional global sorts at the coarsest level desired is needed. As each axis is sorted, it may generate unsorted data on other axes. Hence, an outer iteration to perform a swapping operation on each axis again is needed. Rotating the axes for each outer iteration minimizes any bias the swapping algorithm may have toward a given axis. One can recover the original MLG algorithm of Boris[1] by choosing zero subdivisions, one inner iteration and alternating swap directions for all axes and by choosing an axis order which does each axis in succession starting with the first.
The innermost loop of the routine MLG_Shell_Sort calls either MLG_Swap_Words or MLG_Swap_Bytes depending on the type of data indicated. Both MLG_Swap routines use the same algorithm to perform a single bubble sort pass over the axis and subdivision indicated.
integer status integer MLG_Swap_Xxxxs . . . status = MLG_Swap_Xxxxs & (Ndim, record_length, swap_length, Axis_Length, MLG, & vrule,axis, offset)
Negative return value of status indicates an error, (see Section II.4). A positive return value of status is the number of swaps performed.
Input:
The MLG_Swap_Xxxxs routines treat the Npoints data records of the MLG array of objects as a three dimensional array with dimensions (Nlow, Axis_Length, Nhigh). The dimension Nlow is the product of all axis lengths below the current sort axis and Nhigh is the product of all axis lengths above the current sort axis. In the previous 3D spatial MLG (see Section I.3), when sorting axis 1, Nlow = 1, Axis_Length = N_{x}, and Nhigh = N_{y}N_{z}; when sorting axis 2, Nlow = N_{x}, Axis_Length = N_{y}, and Nhigh = N_{z}; and when sorting axis 3, Nlow = N_{x}N_{y}, Axis_Length = N_{z}, and Nhigh = 1. Since the only dependencies are along the current sort axis, optimal vectorization can be achieved making the innermost loops over the MLG data involve the Nlow and Nhigh dimensions. In effect, the routine carries out the comparison and swaps on Ndim-1 dimensional hyper-planes of the MLG.
Evaluation of the rule is performed by calling the user supplied routine vrule. The efficiency of MLG_swap algorithm is highly influenced by the efficiency of this user supplied routine. The MLG_library includes a "floating-point-less-than" rule for the special case when each axis is associated with the corresponding word in the MLG record. A listing of this routine is provided in Appendix D.2. The output of routine vrule is an array of logicals which describe whether the rule was satisfied for each comparison. A search of this array is conducted to find all "false" entries. The MLG_Swap routines use the efficient search routines WHENEQ on the Cray and ilsteq for the Convex and the FORTRAN equivalent loop for other scalar architectures. The loop over all out-of-MLG-order links can then be vectorized using gather/scatter techniques.
This algorithm was tested on 4096 particles moving randomly in a cubical box. A 16 x 16 x 16 MLG was used. Timings indicate that the above routine was 2.5 times faster than the "traditional" if-test vector switch and swap on a Convex C210. The difference between using the Convex veclib routine ilsteq and the FORTRAN equivalent was approximately 20%. Boris[1] also suggested a vectorized algorithm which involves several floating point operations on the object positions to obtain the equivalent of a vector switch. That algorithm relies on the fact that a spatial MLG is determined by floating point data. In general, floating point data may not even be involved, hence that algorithm is not general.
The MLG_Swap routines achieve their vectorization by processing all lines of data parallel to the sort axis simultaneously. This means that these algorithms achieve no vectorization for a one-dimensional MLG, since there is only one line of data. It is still possible to achieve some vectorization by using an even-odd sorting pass. The even-odd sorting pass requires additional scratch space in multi-dimensions, but increases overall vector length. Care also must be taken in multi-dimensions to avoid "psuedo"-links between the end of one parallel line and the beginning of the next parallel line. This even-odd algorithm has not been implemented as part of this package, but has shown promise for a spatial MLG on the CM-2[5].
integer status integer MLG_Index_Shell_Sort . . . status = MLG_Index_Shell_Sort & (Data_type, Ndim, swap_length, swap_data, & record_length, Axis_Length, MLG, & vrule, Nouter_its, Axis_Order, Sort_direction, Submlgs, & Ninner_its, Axis_Swaps, MLG_index)
This routine has two additional arguments, namely, swap_data and MLG_index. The argument swap_data indicates which items in the object determine the MLG. A return argument MLG_index indicates where the current object was originally located prior to sorting. The argument Submlg represents the orderliness of the data and the amount of sorting required along a axis. Submlgs takes values of zero for slightly perturbed data to log_{2} Axis_Length - 1 for unsorted data, (see II.1.a). A negative return value of status indicates an error, (see II.4). A positive return value of status is the actual number of outer iterations taken to reach the final MLG. On all machines except Cray computers, the argument Data_type indicates whether the data records are stored as (8 bit) bytes or four-byte (32 bit) words. If the entire record consists of eight-byte (64 bit) data (e.g. double precision floating point numbers) the MLG data should be passed to the routine in terms of the equivalent record length in four-byte words and with Data_type=1. In general, the routine is more efficient using records stored as word data than the equivalent records stored as byte data. For Cray computers, byte data is currently unavailable and Data_type=1 means eight-byte (64 bit) words.
Input:
The logical structure of MLG_Index_Shell_Sort is similar to MLG_Shell_Sort (II.1.a}) The major difference being the addition of the array MLG_Index to MLG_Index_Shell_Sort and the index sort. As with MLG_Shell_Sort, a set of companion routines, MLG_Index_Swap_Word and MLG_Index_Swap_Bytes, are called by MLG_Index_Shell_Sort to perform the index swap depending on the type of data indicated. The companion routines swap the required MLG data items and MLG_Index. These companion routines are generally not called by a user. If the user wishes to directly call these companion routines, Section II.1.b, which describes the companion routines for MLG_Shell_Sort, MLG_Swap_Xxxxs, applies to MLG_Index_Swap_Word and MLG_Index_Swap_Bytes. Similar to the differences between MLG_Index_Shell_Sort and MLG_Shell_Sort, the array MLG_Index must be added to the calling statement. Since MLG_Index_Shell_Sort only sorts the required MLG data items and MLG_Index, the remaining data items must be move from their old location to their new location within the MLG using MLG_Index. Two companion routines, MLG_Index_Move_Words or MLG_Index_Move_Bytes, move the remaining data items depending on the type of data. These routines must called following MLG_Index_Shell_Sort to complete the new MLG and are described in the next section.
integer status integer MLG_Index_Move_Xxxxs . . . status = MLG_Index_Move_Xxxxs & (swap_length, swap_data, record_length, Npoints, MLG, & MLG_index)Xxxx means either Word or Byte depending on which is appropriate. A negative return value of status indicates an error, (see II.4), while a zero return value of status indicates success.
Input:
The argument swap_data here indicates those items in the object that still
need
to be updated. The algorithm for MLG_Index_Move_Xxxxs simply involves
gather/scatter operations. In general a global gather operation using the
MLG_index array can complete the MLG swap. However, this approach moves every
object's associated data whether the object moved or not. Therefore if only a
few objects have changed locations in the MLG data structure, it pays to
gather
those few objects and scatter them back into the MLG data structure to complete
the swap.
subroutine vrule(axis, MLGleft, MLGright, record_length, & Nlow, skip, Nhigh, result) integer axis, record_length, Nlow, skip, Nhigh <MLGtype> MLGleft (record_length,Nlow,skip,Nhigh) <MLGtype> MLGright(record_length,Nlow,skip,Nhigh) logical result(Nlow,Nhigh)where the arguments are:
For programming languages that support the STRUCTURE data type, the argument record_length may be superfluous, but still must be in the argument list. In fact with respect to the first subscript, the arguments MLGleft and MLGright should be declared in the same way that the user's main program declares his array of records to be sorted.
The last three subscripts for MLGleft and MLGright effectively partition the MLG records into hyperplanes normal to the current swap axis. The user is expected to return a point-by-point comparison of the first hyperplane of MLGleft, i.e. MLGleft(i,1,k) with the first hyperplane of MLGright, i.e. MLGright(i,1,k) and return the point-by-point result in array result. The array result should contain the value .true. if the rule is satisfied and .false. if the rule is not satisfied for the axis under consideration.
Appendix C.2 shows a FORTRAN 77 implementation of the "floating-point-less-than" rule used by routine EZMLG_sort.
integer status integer EZMLG_Sort . . . status = EZMLG_Sort (Ndim, record_length, Axis_Length, MLG)
integer status integer MLG_Sort_w . . . status = MLG_Sort_w & (Ndim, record_length, swap_length, swap_data, & Axis_Length, MLG, vrule)
integer status integer MLG_Anneal . . . status = MLG_Anneal & (Data_type, Ndim, swap_length, swap_data, & record_length, Axis_Length, MLG, & vrule, Nouter_its, Axis_Order, Sort_direction, Submlgs, & Ninner_its, Axis_Swaps, MLG_eval, Npass)
Additional arguments above those needed for the Shell sort are needed to determine MLG quality. In particular, a routine for the evaluation of the current MLG is needed. The library includes the routine MLG_eval_space, an evaluation routine based on link lengths. The complete argument description is:
Input:
- 1 Ndim is non-positive. - 3 A non-positive axis was detected. Check values of Axis_Order. - 4 An axis > Ndim was detected. Check values of Axis_Order. - 5 A non-positive record item was detected. Check swap_data. - 6 A record item > record_length was detected. Check swap_data. -10 A non-positive axis length was detected. -18 A bad sort direction was detected. -21 Insufficient dynamic memory available for this problem.
1 integer n, n4 2 parameter (n=5, n4=4*n) 3 real object_record (n, nx, ny, nz) 4 integer integer_record (n, nx, ny, nz) 5 byte byte_record (4*n, nx, ny, nz) 6 7 equilvalence (object_record, integer_record) 8 equilvalence (object_record, byte_record) 9 10 integer i, j, k 11 integer Axis_length(3) 11 12 do k=1,nz 13 do j=1,ny 14 do i=1,nx 15 object_record(1,i,j,k) = float(i-1) 16 object_record(2,i,j,k) = float(j-1) 17 object_record(3,i,j,k) = float(k-1) 18 integer_record(4,i,j,k) = i+nx*(j-1+ny*(k-1)) 19 byte_record(17,i,j,k) = mod(i,128) 20 byte_record(18,i,j,k) = mod(j,128) 21 byte_record(19,i,j,k) = mod(k,128) 22 byte_record(20,i,j,k) = 1 23 end do 24 end do 25 end do 26 byte_record(20,4,2,2) = 0 27 axis_length(1) = nx 28 axis_length(1) = ny 29 axis_length(1) = nz 30 31 call ezmlg_sort (3, n, axis_length, object_record)
In the FORTRAN code fragment just presented, object records with mixed types are generated. The initial declaration of OBJECT_record is for a real variable. This declaration creates NX x NY x NZ object records of N words. In order to include integer and byte data in this object record, additional variables are declared named INTEGER_record and BYTE_record. These variables are sized to match OBJECT_record and then equivalenced to OBJECT_record. It becomes the user responsibility to keep track of word boundaries and to properly load the data arrays.
for(i=0; i<record_length; i++) swap_data[i] = i+1;
The language C stores arrays row-wise; FORTRAN stores arrays column-wise. The MLG library follows the FORTRAN conventions for array storage. The following C code fragment initializes an object data base.
1 struct OBJECT_swap { float X, Y, Z; }; 2 struct OBJECT_data { float VX, VY, VZ; }; 3 struct OBJECTstruct { struct OBJECT_swap swap; struct OBJECT_data data; }; 4 EXTERN struct OBJECTstruct particles[NI][NJ][NK]; 5 float x, y, z, dx, dy, dz; 6 int i, j, k, status, record_length, Axis_Length[3]; 7 int ezmlg_sort_ 8 (int *Ndim, int *record_length, int *Axis_Length, 9 struct OBJECTstruct *particles); 10 11 for(i=0;i<NI;i++){ 12 x = (float) i; 13 dx = x / NI; 14 for(j=0;j<NJ;j++){ 15 y = (float) j; 16 dy = y / NJ; 17 for(k=0;k<NK;k++){ 18 z = (float) k; 19 dz = z / NK; 20 particles[i][j][k].swap.X = x + 0.1*dy*dz; 21 particles[i][j][k].swap.Y = y + 0.1*dx*dz; 22 particles[i][j][k].swap.Z = z + 0.1*dx*dy; 23 } 24 } 25 } 26 27 Ndim = 3; 28 record_length = 6; 29 Axis_Length[0] = NI; 30 Axis_Length[1] = NJ; 31 Axis_Length[2] = NK; 32 status = ezmlg_sort_ 33 (*Ndim, *record_length, *Axis_Length, *particles);
The data is originally in MLG order with the "floating-point-less-than" rule if we associate Z with axis 1, Y with axis 2 and X with axis 3. If a call to EZMLG_Sort is made, massive swapping will occur since EZMLG_Sort will associate X with axis 1, Y with axis 2, and Z with axis 3. There are three ways to fix this problem. The first way is to interchange the indices i and k in lines 20-22 from [i][j][k] to [k][j][i] and the declaration in line 4 from [NI][NJ][NK] to [NK][NJ][NI]. This in effect stores the arrays column-wise so that the words in the record are associated with the axes in the way that EZMLG_Sort expects. A second approach is to replace "axis" in lines 59 and 67 of the rule of Appendix C.2 with "4-axis". This has the effect of making the MLG library associate the axes with the words in the record in the same way as the C code has initialized the arrays. A third approach is to reorder X, Y, Z in the structure on line 1 of the code to Z, Y, X and change the definition of Axis_Length in lines 29-31. The first and third approaches have the advantage that one can use the "floating-point-less-than" rule provided by the package.
Since the MLG library is written in FORTRAN and FORTRAN arguments are passed by address, all arguments provided must be pointers to memory.
Finally, the actual routine name one would use in the C program is
currently machine dependent. On the the Convex and Sun workstations, the
FORTRAN compiler converts all routine names to lower case and
pre- and post-appends an underscore to the routine name,
while the C compiler only pre-appends an underscore. Hence, to call a MLG
library routine on these machines one look like the above example.
For other machine architectures one should consult the appropriate manuals.
1 C********************************************************************** 2 c 3 c MLG evaluation 4 c 5 C********************************************************************** 6 c 7 real function MLG_eval_space 8 & (Ndim, words, Axis_Length, MLG, link_eval) 9 c 10 c Evaluate a MLG and its links 11 c 12 c Assumptions 13 c 14 c 1) MLG record consists of 4-byte word data 15 c 2) The first "Ndim" words are the spatial coordinates of points 16 c 17 c 18 c Arguments: 19 c 20 c Input: 21 c 22 c Ndim The dimension of the MLG 23 c words The number of words per record in the MLG 24 c Axis_Length(Ndim) 25 c The length of each axis in the MLG 26 c MLG(words,Npoint) 27 c The data (here treated as real*4) 28 c 29 c Output: 30 c 31 c link_eval(Npoint,Ndim) 32 c The dot product of the link with the MLG axis 33 c 34 c Return value 35 c The sum of all line lengths in the MLG 36 c 37 c 38 Implicit None 39 c 40 c Subroutine Arguments 41 c 42 Integer Ndim, words 43 integer Axis_Length(Ndim) 44 real MLG(words,1) 45 real link_eval(1) 46 c 47 c Local declarations 48 c 49 integer n, axis, kl, jk, ijk, aijk, np 50 integer Nlow, Nhigh, Npoints 51 real dxdi 52 real EPS 53 data EPS / 1.0e-30 / 54 c 55 c 56 Npoints = 1 57 do n=1,Ndim 58 Npoints = Npoints*Axis_Length(n) 59 end do 60 c 61 do aijk=1,Ndim*Npoints 62 link_eval(aijk) = 0.0 63 end do 64 c 65 c Compute all line length components 66 c 67 do n=1,Ndim 68 Nlow = 1 69 do axis=1,Ndim 70 Nhigh = Nlow*Axis_Length(axis) 71 if(axis .ne. n) then 72 np = (axis-1)*Npoints 73 do kl=0,Npoints-Nhigh,Nhigh 74 do jk=1,Nhigh-Nlow 75 ijk = jk + kl 76 aijk = np + ijk 77 dxdi = MLG(n,ijk+Nlow) - MLG(n,ijk) 78 link_eval(aijk) = link_eval(aijk) + dxdi**2 79 end do 80 end do 81 end if 82 Nlow = Nhigh 83 end do 84 end do 85 c 86 MLG_eval_space = 0.0 87 do n=1,Ndim 88 Nlow = 1 89 do axis=1,n-1 90 Nlow = Nlow*Axis_Length(axis) 91 end do 92 Nhigh = Nlow*Axis_Length(n) 93 np = (n-1)*Npoints 94 do kl=0,Npoints-Nhigh,Nhigh 95 do jk=1,Nhigh-Nlow 96 ijk = jk + kl 97 aijk = np + ijk 98 dxdi = MLG(n,ijk+Nlow) - MLG(n,ijk) 99 link_eval(aijk) = sqrt(dxdi + link_eval(aijk)) 100 MLG_eval_space = MLG_eval_space + link_eval(aijk) 101 link_eval(aijk) = dxdi/(link_eval(aijk)+EPS) 102 end do 103 end do 104 end do 105 c 106 return 107 end 108 c 109 C**********************************************************************
1 C************************************************************************* 2 C 3 C vectorizing MLG swap rules 4 C 5 C************************************************************************* 6 c 7 subroutine MLG_swap_rule_flt 8 & (axis, MLGleft, MLGright, words, Nlow, skip, Nhigh, 9 & dont_swap_if) 10 c 11 c This routine compares two records of the MLG for i=1,Nlow,k=1,Nhigh 12 c This rule assumes 13 c 1) word aligned real*4 14 c 2) the axis variables appear in order in the record 15 c 16 c Arguments: 17 c 18 c Input: 19 c 20 c axis The axis to which to apply the rule 21 c MLGleft(words,Nlow,skip,Nhigh) 22 c The current lower record on axis "axis" 23 c MLGright(words,Nlow,skip,Nhigh) 24 c The current higher record on axis "axis" 25 c words The number of words in a record 26 c Nlow The number of records in the MLG dimensions below "axis" 27 c skip The length of "axis" 28 c Nhigh The number of records in the MLG dimensions above "axis" 29 c 30 c Output: 31 c 32 c dont_swap_if(Nlow,Nhigh) 33 c dont_swap_if(i,k) = .true. if MLGleft(axis,i,1,k) <= 34 c MLGright(axis,i,1,k) 35 c 36 implicit none 37 c 38 c Subroutine arguments 39 c 40 integer axis 41 integer words, Nlow, skip, Nhigh 42 real MLGleft(words,Nlow,skip,Nhigh) 43 real MLGright(words,Nlow,skip,Nhigh) 44 logical dont_swap_if(Nlow,Nhigh) 45 c 46 c Local declarations 47 c 48 integer i, k 49 c 50 c Employ the vector loop which gives maximum efficiency 51 c If Nlow is big enough (currently 8) always use contiguous memory 52 c loop as the vector loop, otherwise take the outer dimension for 53 c increased vector length. 54 c 55 if(Nlow .ge. min(Nhigh,8)) then 56 do k=1,Nhigh 57 do i=1,Nlow 58 dont_swap_if(i,k) = 59 & MLGleft(axis,i,1,k) .le. MLGright(axis,i,1,k) 60 end do 61 end do 62 else 63 do i=1,Nlow 64 c$dir prefer_vector 65 do k=1,Nhigh 66 dont_swap_if(i,k) = 67 & MLGleft(axis,i,1,k) .le. MLGright(axis,i,1,k) 68 end do 69 end do 70 end if 71 c 72 return 73 end 74 c 75 C**********************************************************************
fyfe@lcp.nrl.navy.mil