001: package net.sf.saxon.sort;
002:
003: /*
004: Copyright ? 1999 CERN - European Organization for Nuclear Research.
005: Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose
006: is hereby granted without fee, provided that the above copyright notice appear in all copies and
007: that both that copyright notice and this permission notice appear in supporting documentation.
008: CERN makes no representations about the suitability of this software for any purpose.
009: It is provided "as is" without expressed or implied warranty.
010: */
011:
012: /**
013: * Modified by Michael Kay to use the Saxon Sortable interface rather than a separate IntComparator and Swapper
014: */
015:
016: /**
017: Generically sorts arbitrary shaped data (for example multiple arrays, 1,2 or 3-d matrices, and so on) using a
018: quicksort or mergesort. This class addresses two problems, namely
019: <ul>
020: <li><i>Sorting multiple arrays in sync</i>
021: <li><i>Sorting by multiple sorting criteria</i> (primary, secondary, tertiary,
022: ...)
023: </ul>
024: <h4>Sorting multiple arrays in sync</h4>
025: <p>
026: Assume we have three arrays X, Y and Z. We want to sort all three arrays by
027: X (or some arbitrary comparison function). For example, we have<br>
028: <tt>X=[3, 2, 1], Y=[3.0, 2.0, 1.0], Z=[6.0, 7.0, 8.0]</tt>. The output should
029: be <tt><br>
030: X=[1, 2, 3], Y=[1.0, 2.0, 3.0], Z=[8.0, 7.0, 6.0]</tt>. </p>
031: <p>How can we achive this? Here are several alternatives. We could ... </p>
032: <ol>
033: <li> make a list of Point3D objects, sort the list as desired using a comparison
034: function, then copy the results back into X, Y and Z. The classic object-oriented
035: way. </li>
036: <li>make an index list [0,1,2,...,N-1], sort the index list using a comparison function,
037: then reorder the elements of X,Y,Z as defined by the index list. Reordering
038: cannot be done in-place, so we need to copy X to some temporary array, then
039: copy in the right order back from the temporary into X. Same for Y and Z.
040: </li>
041: <li> use a generic quicksort or mergesort which, whenever two elements in X are swapped,
042: also swaps the corresponding elements in Y and Z. </li>
043: </ol>
044: Alternatives 1 and 2 involve quite a lot of copying and allocate significant amounts
045: of temporary memory. Alternative 3 involves more swapping, more polymorphic message dispatches, no copying and does not need any temporary memory.
046: <p> This class implements alternative 3. It operates on arbitrary shaped data.
047: In fact, it has no idea what kind of data it is sorting. Comparisons and swapping
048: are delegated to user provided objects which know their data and can do the
049: job.
050: <p> Lets call the generic data <tt>g</tt> (it may be one array, three linked lists
051: or whatever). This class takes a user comparison function operating on two indexes
052: <tt>(a,b)</tt>, namely an {@link Sortable}. The comparison function determines
053: whether <tt>g[a]</tt> is equal, less or greater than <tt>g[b]</tt>. The sort,
054: depending on its implementation, can decide to swap the data at index <tt>a</tt>
055: with the data at index <tt>b</tt>. It calls a user provided {@link Sortable}
056: object that knows how to swap the data of these indexes.
057: <p>The following snippet shows how to solve the problem.
058: <table>
059: <td class="PRE">
060: <pre>
061: final int[] x;
062: final double[] y;
063: final double[] z;
064:
065: x = new int[] {3, 2, 1 };
066: y = new double[] {3.0, 2.0, 1.0};
067: z = new double[] {6.0, 7.0, 8.0};
068:
069:
070: // this one knows how to swap two indexes (a,b)
071: Swapper swapper = new Swapper() {
072: public void swap(int a, int b) {
073: int t1; double t2, t3;
074: t1 = x[a]; x[a] = x[b]; x[b] = t1;
075: t2 = y[a]; y[a] = y[b]; y[b] = t2;
076: t3 = z[a]; z[a] = z[b]; z[b] = t3;
077: }
078: };
079: // simple comparison: compare by X and ignore Y,Z<br>
080: IntComparator comp = new IntComparator() {
081: public int compare(int a, int b) {
082: return x[a]==x[b] ? 0 : (x[a]<x[b] ? -1 : 1);
083: }
084: };
085:
086: System.out.println("before:");
087: System.out.println("X="+Arrays.toString(x));
088: System.out.println("Y="+Arrays.toString(y));
089: System.out.println("Z="+Arrays.toString(z));
090:
091: GenericSorting.quickSort(0, X.length, comp, swapper);
092: // GenericSorting.mergeSort(0, X.length, comp, swapper);
093:
094: System.out.println("after:");
095: System.out.println("X="+Arrays.toString(x));
096: System.out.println("Y="+Arrays.toString(y));
097: System.out.println("Z="+Arrays.toString(z));
098: </pre>
099: </td>
100: </table>
101: <h4>Sorting by multiple sorting criterias (primary, secondary, tertiary, ...)</h4>
102: <p>Assume again we have three arrays X, Y and Z. Now we want to sort all three
103: arrays, primarily by Y, secondarily by Z (if Y elements are equal). For example,
104: we have<br>
105: <tt>X=[6, 7, 8, 9], Y=[3.0, 2.0, 1.0, 3.0], Z=[5.0, 4.0, 4.0, 1.0]</tt>. The
106: output should be <tt><br>
107: X=[8, 7, 9, 6], Y=[1.0, 2.0, 3.0, 3.0], Z=[4.0, 4.0, 1.0, 5.0]</tt>. </p>
108: <p>Here is how to solve the problem. All code in the above example stays the same,
109: except that we modify the comparison function as follows</p>
110: <table>
111: <td class="PRE">
112: <pre>
113: //compare by Y, if that doesn't help, reside to Z
114: IntComparator comp = new IntComparator() {
115: public int compare(int a, int b) {
116: if (y[a]==y[b]) return z[a]==z[b] ? 0 : (z[a]<z[b] ? -1 : 1);
117: return y[a]<y[b] ? -1 : 1;
118: }
119: };
120: </pre>
121: </td>
122: </table>
123:
124: <h4>Notes</h4>
125: <p></p>
126: <p> Sorts involving floating point data and not involving comparators, like, for
127: example provided in the JDK {@link java.util.Arrays} and in the Colt
128: (cern.colt.Sorting) handle floating point numbers in special ways to guarantee
129: that NaN's are swapped to the end and -0.0 comes before 0.0. Methods delegating
130: to comparators cannot do this. They rely on the comparator. Thus, if such boundary
131: cases are an issue for the application at hand, comparators explicitly need
132: to implement -0.0 and NaN aware comparisons. Remember: <tt>-0.0 < 0.0 == false</tt>,
133: <tt>(-0.0 == 0.0) == true</tt>, as well as <tt>5.0 < Double.NaN == false</tt>,
134: <tt>5.0 > Double.NaN == false</tt>. Same for <tt>float</tt>.
135: <h4>Implementation </h4>
136: <p>The quicksort is a derivative of the JDK 1.2 V1.26 algorithms (which are, in
137: turn, based on Bentley's and McIlroy's fine work).
138: The mergesort is a derivative of the JAL algorithms, with optimisations taken from the JDK algorithms.
139: Both quick and merge sort are "in-place", i.e. do not allocate temporary memory (helper arrays).
140: Mergesort is <i>stable</i> (by definition), while quicksort is not.
141: A stable sort is, for example, helpful, if matrices are sorted successively
142: by multiple columns. It preserves the relative position of equal elements.
143:
144: @author wolfgang.hoschek@cern.ch
145: @version 1.0, 03-Jul-99
146: */
147: public class GenericSorter extends Object {
148:
149: private static final int SMALL = 7;
150: private static final int MEDIUM = 7;
151: private static final int LARGE = 40;
152:
153: /**
154: * Makes this class non instantiable, but still let's others inherit from it.
155: */
156: protected GenericSorter() {
157: }
158:
159: /**
160: * Sorts the specified range of elements according
161: * to the order induced by the specified comparator. All elements in the
162: * range must be <i>mutually comparable</i> by the specified comparator
163: * (that is, <tt>c.compare(a, b)</tt> must not throw an
164: * exception for any indexes <tt>a</tt> and
165: * <tt>b</tt> in the range).<p>
166: *
167: * The sorting algorithm is a tuned quicksort,
168: * adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a
169: * Sort Function", Software-Practice and Experience, Vol. 23(11)
170: * P. 1249-1265 (November 1993). For details, see
171: * http://citeseer.ist.psu.edu/bentley93engineering.html.
172: * This algorithm offers n*log(n) performance on many data sets that cause other
173: * quicksorts to degrade to quadratic performance.
174: *
175: * @param fromIndex the index of the first element (inclusive) to be sorted.
176: * @param toIndex the index of the last element (exclusive) to be sorted.
177: * @param c the comparator to determine the order of the generic data;
178: * an object that knows how to swap the elements at any two indexes (a,b).
179: *
180: */
181: public static void quickSort(int fromIndex, int toIndex, Sortable c) {
182: quickSort1(fromIndex, toIndex - fromIndex, c);
183: }
184:
185: /**
186: * Sorts the specified sub-array into ascending order.
187: */
188: private static void quickSort1(int off, int len, Sortable comp) {
189: // Insertion sort on smallest arrays
190: if (len < SMALL) {
191: for (int i = off; i < len + off; i++)
192: for (int j = i; j > off && (comp.compare(j - 1, j) > 0); j--) {
193: comp.swap(j, j - 1);
194: }
195: return;
196: }
197:
198: // Choose a partition element, v
199: int m = off + (len >>> 1); // len/2; // Small arrays, middle element
200:
201: if (len > MEDIUM) {
202: int l = off;
203: int n = off + len - 1;
204: if (len > LARGE) { // Big arrays, pseudomedian of 9
205: int s = len >>> 3; // len/8;
206: l = med3(l, l + s, l + 2 * s, comp);
207: m = med3(m - s, m, m + s, comp);
208: n = med3(n - 2 * s, n - s, n, comp);
209: }
210: // m = med3(l, m, n, comp); // Mid-size, med of 3
211: // manually inlined (most time is spent near the leafs of the recursion tree)
212: //a = comp.compare(l,m);
213: //b = comp.compare(l,n);
214: int c = comp.compare(m, n);
215: m = (comp.compare(l, m) < 0 ? (c < 0 ? m : comp.compare(l,
216: n) < 0 ? n : l) : (c > 0 ? m
217: : comp.compare(l, n) > 0 ? n : l));
218: }
219: //long v = x[m];
220:
221: // Establish Invariant: v* (<v)* (>v)* v*
222: int a = off, b = a, c = off + len - 1, d = c;
223: while (true) {
224: int comparison;
225: while (b <= c && ((comparison = comp.compare(b, m)) <= 0)) {
226: if (comparison == 0) {
227: if (a == m)
228: m = b; // pivot is moving target; DELTA to JDK !!!
229: else if (b == m)
230: m = a; // pivot is moving target; DELTA to JDK !!!
231: comp.swap(a++, b);
232: }
233: b++;
234: }
235: while (c >= b && ((comparison = comp.compare(c, m)) >= 0)) {
236: if (comparison == 0) {
237: if (c == m)
238: m = d; // pivot is moving target; DELTA to JDK !!!
239: else if (d == m)
240: m = c; // pivot is moving target; DELTA to JDK !!!
241: comp.swap(c, d--);
242: }
243: c--;
244: }
245: if (b > c)
246: break;
247: if (b == m)
248: m = d; // pivot is moving target; DELTA to JDK !!!
249: else if (c == m)
250: m = c; // pivot is moving target; DELTA to JDK !!!
251: comp.swap(b++, c--);
252: }
253:
254: // Swap partition elements back to middle
255:
256: int s = Math.min(a - off, b - a);
257: // vecswap(swapper, off, b-s, s);
258: // manually inlined
259: int aa = off;
260: int bb = b - s;
261: while (--s >= 0)
262: comp.swap(aa++, bb++);
263: int n = off + len;
264: s = Math.min(d - c, n - d - 1);
265: // vecswap(swapper, b, n-s, s); // manually inlined
266: aa = b;
267: bb = n - s;
268: while (--s >= 0)
269: comp.swap(aa++, bb++);
270:
271: // Recursively sort non-partition-elements
272: if ((s = b - a) > 1)
273: quickSort1(off, s, comp);
274: if ((s = d - c) > 1)
275: quickSort1(n - s, s, comp);
276: }
277:
278: /**
279: * Returns the index of the median of the three indexed elements.
280: */
281: private static int med3(int a, int b, int c, Sortable comp) {
282: int bc = comp.compare(b, c);
283: return (comp.compare(a, b) < 0 ? (bc < 0 ? b : comp.compare(a,
284: c) < 0 ? c : a) : (bc > 0 ? b
285: : comp.compare(a, c) > 0 ? c : a));
286: }
287:
288: // /**
289: // * Swaps x[a .. (a+n-1)] with x[b .. (b+n-1)].
290: // */
291: // private static void vecswap(Swapper swapper, int a, int b, int n) {
292: // for (int i=0; i<n; i++, a++, b++) swapper.swap(a, b);
293: // }
294:
295: /**
296: * Sorts the specified range of elements according
297: * to the order induced by the specified comparator. All elements in the
298: * range must be <i>mutually comparable</i> by the specified comparator
299: * (that is, <tt>c.compare(a, b)</tt> must not throw an
300: * exception for any indexes <tt>a</tt> and
301: * <tt>b</tt> in the range).<p>
302: *
303: * This sort is guaranteed to be <i>stable</i>: equal elements will
304: * not be reordered as a result of the sort.<p>
305: *
306: * The sorting algorithm is a modified mergesort (in which the merge is
307: * omitted if the highest element in the low sublist is less than the
308: * lowest element in the high sublist). This algorithm offers guaranteed
309: * n*log(n) performance, and can approach linear performance on nearly
310: * sorted lists.
311: *
312: * @param fromIndex the index of the first element (inclusive) to be sorted.
313: * @param toIndex the index of the last element (exclusive) to be sorted.
314: * @param c the comparator to determine the order of the generic data;
315: * an object that knows how to swap the elements at any two indexes (a,b).
316: *
317: */
318: public static void mergeSort(int fromIndex, int toIndex, Sortable c) {
319: /*
320: * We retain the same method signature as quickSort. Given only a
321: * comparator and swapper we do not know how to copy and move elements
322: * from/to temporary arrays. Hence, in contrast to the JDK mergesorts
323: * this is an "in-place" mergesort, i.e. does not allocate any temporary
324: * arrays. A non-inplace mergesort would be faster in most cases, but
325: * would require non-intuitive delegate objects. Remember that an
326: * in-place merge phase requires N logN swaps, while an out-of-place
327: * merge phase requires only N swaps. This doesn't matter much if swaps
328: * are cheap and comparisons are expensive. Nonetheless this can
329: * certainly be suboptimal.
330: */
331:
332: // Insertion sort on smallest arrays
333: if (toIndex - fromIndex < SMALL) {
334: for (int i = fromIndex; i < toIndex; i++) {
335: for (int j = i; j > fromIndex
336: && (c.compare(j - 1, j) > 0); j--) {
337: c.swap(j, j - 1);
338: }
339: }
340: return;
341: }
342:
343: // Recursively sort halves
344: int mid = (fromIndex + toIndex) >>> 1; // (fromIndex + toIndex) / 2;
345: mergeSort(fromIndex, mid, c);
346: mergeSort(mid, toIndex, c);
347:
348: // If list is already sorted, nothing left to do. This is an
349: // optimization that results in faster sorts for nearly ordered lists.
350: if (c.compare(mid - 1, mid) <= 0)
351: return;
352:
353: // Merge sorted halves
354: inplaceMerge(fromIndex, mid, toIndex, c);
355: }
356:
357: /**
358: * Transforms two consecutive sorted ranges into a single sorted
359: * range. The initial ranges are <code>[first, middle)</code>
360: * and <code>[middle, last)</code>, and the resulting range is
361: * <code>[first, last)</code>.
362: * Elements in the first input range will precede equal elements in the
363: * second.
364: */
365: private static void inplaceMerge(int first, int middle, int last,
366: Sortable comp) {
367: if (first >= middle || middle >= last)
368: return;
369: if (last - first == 2) {
370: if (comp.compare(middle, first) < 0) {
371: comp.swap(first, middle);
372: }
373: return;
374: }
375: int firstCut;
376: int secondCut;
377: if (middle - first > last - middle) {
378: firstCut = first + ((middle - first) >>> 1); // first + ((middle - first) / 2);
379: // secondCut = lower_bound(middle, last, firstCut, comp);
380: // manually inlined for speed (speedup = 2)
381: int _first = middle;
382: int len = last - _first;
383: while (len > 0) {
384: int half = len >>> 1; // len / 2;
385: int mid = _first + half;
386: if (comp.compare(mid, firstCut) < 0) {
387: _first = mid + 1;
388: len -= half + 1;
389: } else {
390: len = half;
391: }
392: }
393: secondCut = _first;
394: } else {
395: secondCut = middle + ((last - middle) >>> 1); // middle + ((last - middle) / 2);
396: // firstCut = upper_bound(first, middle, secondCut, comp);
397: // manually inlined for speed (speedup = 2)
398: int _first = first;
399: int len = middle - _first;
400: while (len > 0) {
401: int half = len >>> 1; // len / 2;
402: int mid = _first + half;
403: if (comp.compare(secondCut, mid) < 0) {
404: len = half;
405: } else {
406: _first = mid + 1;
407: len -= half + 1;
408: }
409: }
410: firstCut = _first;
411: }
412:
413: // rotate(firstCut, middle, secondCut, swapper);
414: // is manually inlined for speed
415: // (hotspot compiler inlining in recursive methods seems to work only for
416: // small call depths, even if methods are "static private")
417: // speedup = 1.7
418: // begin inline
419: int first2 = firstCut;
420: int middle2 = middle;
421: int last2 = secondCut;
422: if (middle2 != first2 && middle2 != last2) {
423: int first1 = first2;
424: int last1 = middle2;
425: while (first1 < --last1)
426: comp.swap(first1++, last1);
427: first1 = middle2;
428: last1 = last2;
429: while (first1 < --last1)
430: comp.swap(first1++, last1);
431: first1 = first2;
432: last1 = last2;
433: while (first1 < --last1)
434: comp.swap(first1++, last1);
435: }
436: // end inline
437:
438: middle = firstCut + (secondCut - middle);
439: inplaceMerge(first, firstCut, middle, comp);
440: inplaceMerge(middle, secondCut, last, comp);
441: }
442:
443: // /**
444: // * Performs a binary search on an already-sorted range: finds the first
445: // * position where an element can be inserted without violating the ordering.
446: // * Sorting is by a user-supplied comparison function.
447: // * @param array Array containing the range.
448: // * @param first Beginning of the range.
449: // * @param last One past the end of the range.
450: // * @param x Element to be searched for.
451: // * @param comp Comparison function.
452: // * @return The largest index i such that, for every j in the
453: // * range <code>[first, i)</code>,
454: // * <code>comp.apply(array[j], x)</code> is
455: // * <code>true</code>.
456: // * @see Sorting#upper_bound
457: // * @see Sorting#equal_range
458: // * @see Sorting#binary_search
459: // */
460: // private static int lower_bound(int first, int last, int x, IntComparator comp) {
461: // int len = last - first;
462: // while (len > 0) {
463: // int half = len >>> 1; // len / 2;
464: // int middle = first + half;
465: // if (comp.compare(middle, x)<0) {
466: // first = middle + 1;
467: // len -= half + 1;
468: // }
469: // else {
470: // len = half;
471: // }
472: // }
473: // return first;
474: // }
475: //
476: // /**
477: // * Performs a binary search on an already-sorted range: finds the last
478: // * position where an element can be inserted without violating the ordering.
479: // * Sorting is by a user-supplied comparison function.
480: // * @param array Array containing the range.
481: // * @param first Beginning of the range.
482: // * @param last One past the end of the range.
483: // * @param x Element to be searched for.
484: // * @param comp Comparison function.
485: // * @return The largest index i such that, for every j in the
486: // * range <code>[first, i)</code>,
487: // * <code>comp.apply(x, array[j])</code> is
488: // * <code>false</code>.
489: // * @see Sorting#lower_bound
490: // * @see Sorting#equal_range
491: // * @see Sorting#binary_search
492: // */
493: // private static int upper_bound(int first, int last, int x, IntComparator comp) {
494: // int len = last - first;
495: // while (len > 0) {
496: // int half = len >>> 1; // len / 2;
497: // int middle = first + half;
498: // if (comp.compare(x, middle)<0) {
499: // len = half;
500: // }
501: // else {
502: // first = middle + 1;
503: // len -= half + 1;
504: // }
505: // }
506: // return first;
507: // }
508:
509: }
|