Titanium implementations of distributed sorts. Author: Amir Kamil Version: 1.0 Date: 6/18/03 This program implements two distributed sorts in Titanium. The first is a radix sort, and the second a sample sort, based on the algorithm used by the Berkeley NOW (Network of Workstations) group. --------------- |Program Usage| --------------- Sort [opt1 ... optN] [num] Options (optK): -r uses radix sort instead of default sample sort -d[m] sets the debugging level to m, or 1 if m is not given -h prints this message num the number of elements each processor generates ------------------------ |Algorithm Descriptions| ------------------------ Radix sort This sort proceeds in the following steps: 1. Each processor generates random numbers. 2. Processor 0 collects all the numbers from each processor and broadcasts the resulting array. 3. Each processor computes the number and positions of the elements it is to sort in that array. The positions of processor i's elements are all less than the positions of processor j's elements if i < j. 4. For i:=0 to 32, incremented by k: a. Each processor copies its numbers from the global array and bucketizes them using the i to i+k-1 digits in each number. b. Processor 0 collects the buckets from each processor and combines them, retaining the order such that an element from processor i comes before an element from processor j if i < j. c. Processor 0 copies the buckets into the global array such that an element from bucket i comes before an element from bucket j if i < j, and such that the relative orders of elements from the same bucket is preserved. The resulting numbers are all on processor 0, in sorted order. In the above algorithm, k is a parameter that can be varied. Sample sort This sort proceeds in the following steps: 1. Each processor generates random numbers. 2. Each processor computes k samples from its numbers and sends them to processor 0. 3. Processor 0 sorts the k*n samples using radix sort, then chooses every kth number to be a splitter (n-1 total splitters). 4. Processor 0 broadcasts the splitters, and each processor bucketizes its numbers according to the splitters as follows: a number that is between splitters i and i+1 goes in bucket i+1. There are n total buckets on each processor. 5. The processors exchange their buckets, and processor i retrieves bucket i from each processor. 6. Each processor locally sorts the numbers it retrieved using radix sort. The resulting numbers are distributed across the processors as follows: * for any two processors i and j, if i < j, then all keys on processor i are less than any key on processor j * for each processor, the keys on that processor are sorted Probabilistically, the number of keys on each processor should be roughly the same. -------------------- |Class Descriptions| -------------------- The program is structured so as to maximally reuse code between the two sorts. I will only give brief overviews on each of the classes used. Information about the class members can be found in the Javadoc for this program. The classes used are as follows: class Sort This is the interface to the program, and where the distributed sort code resides. class Util This class contains utility functions used by the sorts, such as local sorting algorithms and a random number generator. class IntVector This class implements a growable array-based structure for storing ints. class IntVectorCombiner implements ObjectOp This class is used in scans and reductions to combine IntVectors. class Buckets This class implements a set of buckets storing ints (based on IntVectors) and is parameterized (in the normal sense of the word, not the template sense) by a Bucketizer that determines in which bucket a given number should be stored. class BucketsCombiner implements ObjectOp This class is used in scans and reductions to combine Buckets'. interface Bucketizer This interface gives the specification for classes that map ints to buckets. class RadixBucketizer implements Bucketizer This class maps ints to buckets according to a subset of the digits in each int. The subset to be used can be specified. class SplitBucketizer implements Bucketizer This class maps ints to buckets using a set of splitters. ----------------------- |Possible Improvements| ----------------------- Extensive performance testing of this program have not been done, so it is unknown how it compares to other implementations. There are a few parameters already in the program that can be varied, such as the number of digits to use in each iteration of radix sort or the number of samples each processor computes in sample sort. These parameters can potentially have a large effect on performance. Besides them, the following improvements can be made: Algorithms: - Radix sort: - do first iteration of sort before sending numbers to first processor - radix sort only the lower digits, then do insertion sort on the rest Implementation: - Radix sort - do explicit array copies instead of scans when initially sending the numbers to processor 0 (would be nullified by first algorithm improvement) - Sample sort - do explicit array copies instead of scans when sending samples to processor 0