Morten Kloster <morklo@gmail.com>


Patch
r1132811, r1128921, r1128862, r1128857

r1132811 | jcorvel | 2011-06-06 17:34:41 -0500 (Mon, 06 Jun 2011)

Speed-up of libsvn_diff using token counts.
By using indices, not node pointers, to refer to tokens, and counting
the number of each token, the longest common subsequence (lcs)
algorithm at the heart of libsvn_diff becomes much faster in many
situations by skipping tokens that are unique to one source or the other.

* subversion/libsvn_diff/token.c
  (svn_diff__node_t): Added 'index' member.
  (svn_diff__tree_t): Added 'node_count' member.
  (svn_diff__get_node_count): New function.
  (tree_insert_token): Assigns indices to nodes.
  (svn_diff__get_tokens): Uses token_index (svn_diff__token_index_t), not
   node pointer, to identify node in position struct.

* subversion/libsvn_diff/lcs.c
  (svn_diff__snake): Takes token counts as additional argument.
   Skips tokens that are unique to one or the other file.
  (svn_diff__lcs): Takes token counts and number of different tokens
   as additional arguments. Calculates number of tokens unique to
   each source and uses this to compute effective length difference.

* subversion/libsvn_diff/diff.h
  (svn_diff__token_index_t): New type for token indices/counts.
  (svn_diff__position_t): Replaced node pointer member by a
   token_index (svn_diff__token_index_t).
  Updated and added new function declarations.

* subversion/libsvn_diff/diff.c
* subversion/libsvn_diff/diff3.c
* subversion/libsvn_diff/diff4.c
  (svn_diff_diff_2, svn_diff_diff_3, svn_diff_diff_4): Count the
   number of times each token appears in each source.
  (svn_diff__resolve_conflict): Takes number of different tokens
   as additional argument. Counts the number of times
   each token appears in each (partial) source.
  (svn_diff__get_token_counts): New function.

Patch by: Morten Kloster <morklo{_AT_}gmail.com>
          (some small whitespace fixes and docstring movements by me)


r1128921 | stefan2 | 2011-05-29 13:04:50 -0500 (Sun, 29 May 2011)

Faster LCS algorithm in libsvn_diff by reworking fp argument.
This change benefits mainly the 32 bit VS compiler.

* subversion/libsvn_diff/lcs.c
  (svn_diff__snake): expect sum of fp and k as a single argument;
   replace int1+1 > int2 by int1 >= int2
  (svn_diff__lcs): adapt caller

Patch by: Morten Kloster <morklo{_AT_}gmail.com>
          (plus a small tweak by me)

r1128862 | stefan2 | 2011-05-29 05:49:58 -0500 (Sun, 29 May 2011)

Describe the general idea behind the LCS algorithm in place.

* subversion/libsvn_diff/lcs.c
  (svn_diff__lcs): add description

Patch by: Morten Kloster <morklo{_AT_}gmail.com>

r1128857 | stefan2 | 2011-05-29 05:35:40 -0500 (Sun, 29 May 2011)

Simpler/faster LCS algorithm in libsvn_diff by elimination of idx.

* subversion/libsvn_diff/lcs.c
 (svn_diff__snake): Removed idx parameter (always 0)
 (svn_diff__lcs): Removed idx variable (always 0) , let d have either
  sign, and moved the origo of fp by d. New version no longer chooses
  the shorter file as the "original", and can thus give different LCS's
  depending on the order of the input files even when the files have
  different lengths. Depending on circumstances, speed can be ~15% faster 
  for core lcs algorithm.

Patch by: Morten Kloster <morklo{_AT_}gmail.com>