Morten Kloster <> (moklo)

r1140247, r1140203, r1140195, r1139725, r1132811, r1128921, r1128862, r1128857

r1128857 | stefan2 | 2011-05-29 10:35:40 +0000 (Sun, 29 May 2011)

Simpler/faster LCS algorithm in libsvn_diff by elimination of idx.

* subversion/libsvn_diff/lcs.c
 (svn_diff__snake): Removed idx parameter (always 0)
 (svn_diff__lcs): Removed idx variable (always 0) , let d have either
  sign, and moved the origo of fp by d. New version no longer chooses
  the shorter file as the "original", and can thus give different LCS's
  depending on the order of the input files even when the files have
  different lengths. Depending on circumstances, speed can be ~15% faster 
  for core lcs algorithm.

Patch by: Morten Kloster <morklo{_AT_}>

r1128862 | stefan2 | 2011-05-29 10:49:58 +0000 (Sun, 29 May 2011)

Describe the general idea behind the LCS algorithm in place.

* subversion/libsvn_diff/lcs.c
  (svn_diff__lcs): add description

Patch by: Morten Kloster <morklo{_AT_}>

r1128921 | stefan2 | 2011-05-29 18:04:50 +0000 (Sun, 29 May 2011)

Faster LCS algorithm in libsvn_diff by reworking fp argument.
This change benefits mainly the 32 bit VS compiler.

* subversion/libsvn_diff/lcs.c
  (svn_diff__snake): expect sum of fp and k as a single argument;
   replace int1+1 > int2 by int1 >= int2
  (svn_diff__lcs): adapt caller

Patch by: Morten Kloster <morklo{_AT_}>
          (plus a small tweak by me)

r1132811 | jcorvel | 2011-06-06 22:34:41 +0000 (Mon, 06 Jun 2011)

Speed-up of libsvn_diff using token counts.
By using indices, not node pointers, to refer to tokens, and counting
the number of each token, the longest common subsequence (lcs)
algorithm at the heart of libsvn_diff becomes much faster in many
situations by skipping tokens that are unique to one source or the other.

* subversion/libsvn_diff/token.c
  (svn_diff__node_t): Added 'index' member.
  (svn_diff__tree_t): Added 'node_count' member.
  (svn_diff__get_node_count): New function.
  (tree_insert_token): Assigns indices to nodes.
  (svn_diff__get_tokens): Uses token_index (svn_diff__token_index_t), not
   node pointer, to identify node in position struct.

* subversion/libsvn_diff/lcs.c
  (svn_diff__snake): Takes token counts as additional argument.
   Skips tokens that are unique to one or the other file.
  (svn_diff__lcs): Takes token counts and number of different tokens
   as additional arguments. Calculates number of tokens unique to
   each source and uses this to compute effective length difference.

* subversion/libsvn_diff/diff.h
  (svn_diff__token_index_t): New type for token indices/counts.
  (svn_diff__position_t): Replaced node pointer member by a
   token_index (svn_diff__token_index_t).
  Updated and added new function declarations.

* subversion/libsvn_diff/diff.c
* subversion/libsvn_diff/diff3.c
* subversion/libsvn_diff/diff4.c
  (svn_diff_diff_2, svn_diff_diff_3, svn_diff_diff_4): Count the
   number of times each token appears in each source.
  (svn_diff__resolve_conflict): Takes number of different tokens
   as additional argument. Counts the number of times
   each token appears in each (partial) source.
  (svn_diff__get_token_counts): New function.

Patch by: Morten Kloster <morklo{_AT_}>
          (some small whitespace fixes and docstring movements by me)

r1139725 | moklo | 2011-06-26 07:49:36 +0000 (Sun, 26 Jun 2011)

Whitespace adjustments

* subversion/libsvn_diff/diff3.c
  Made first half of file match second half and norm

r1140195 | moklo | 2011-06-27 15:04:37 +0000 (Mon, 27 Jun 2011)

Creating new experimental branch for my work on lcs/diff/merge

r1140203 | moklo | 2011-06-27 15:16:16 +0000 (Mon, 27 Jun 2011)

* COMMITTERS: Add moklo as partial committer. (diff-improvements 

r1140247 | moklo | 2011-06-27 17:41:10 +0000 (Mon, 27 Jun 2011)

Speeds up LCS algorithm for well-separated sections of change
 By exploiting unique matches between the two files, it is possible to
 "restart" the LCS algorithm at times and reset the buildup of square
 computational complexity.

* subversion/libsvn_diff/lcs.c
 (svn_diff__snake_t): Added uniquematchcount member.
 (svn_diff__snake): Increments uniquematchcount each time
  a matching token occurs only once in each file.
 (clear_fp): New function that removes nonviable states.
 (svn_diff__lcs): Restarts LCS algorithm from p=0 if the number
  of unique matches exceeds the number of mismatches, using
  the first state where that occurs as the new start state.