Implements a simple differencing algortithm.
version: $Revision: 7756 $ author: Juanco Anez author: author: Overview of Algorithm author:
author: author: by bwm author:
author: author: The algorithm is optimised for situations where the input sequences author: have few repeated objects. If it is given input with many repeated author: objects it will report sub-optimal changes. However, given author: appropriate input, it is fast, and linear in memory usage. author:
author: author: The algorithm consists of the following steps: author:
author: author: - compute an equivalence set for the input data
author: - translate each element of the orginal and revised input
author: sequences to a member of the equivalence set author: - match the the input sequences to determine the deltas, i.e. the
author: differences between the original and revised sequences. author:
author: author: The first step is to compute a an equivalence set for the input data. author: The equivalence set is computed from objects that are in the original author: input sequence author:
author: author: eq(x) = the index of the first occurence of x in the original sequence. author:
author: author: With this equivalence function, the algorithm can compare integers author: rather than strings, which is considerably more efficient. author:
author: author: The second step is to compute the datastructure on which the author: algorithm will operate. Having computed the equivalence function in author: the previous step, we can compute two arrays where indx[i] = author: eqs(orig[i]) and jndx[i] = eqs(rev[i]). The algorithm can now operate author: on indx and jndx instead of orig and rev. Thus, comparisons are then author: on O(int == int) instead of O(Object.equals(Object)). author:
author: author: The algorithm now matches indx and jndx. Whilst indx[i] == jndx[i] it author: skips matching objects in the sequence. In seeking to match objects author: in the input sequence it assumes that each object is likely to be author: unique. It uses the known characteristics of the unique equivalence author: function. It can tell from the eq value if this object appeared in author: the other sequence at all. If it did not, there is no point in author: searching for a match. author:
author: author: Recall that the eq function value is the index earliest occurrence in author: the orig sequence. This information is used to search efficiently for author: the next match. The algorithm is perfect when all input objects are author: unique, but degrades when input objects are not unique. When input author: objects are not unique an optimal match may not be found, but a author: correct match will be. author:
author: author: Having identified common matching objects in the orig and revised author: sequences, the differences between them are easily computed. author:
See Also: Delta See Also: Revision See Also: Modifications: 27/Apr/2003 bwm Added some comments whilst See Also: trying to figure out the algorithm 03 May 2003 bwm Created this See Also: implementation class by refactoring it out of the Diff class to enable See Also: plug in difference algorithms |