r1132811 | jcorvel | 2011-06-06 17:34:41 -0500 (Mon, 06 Jun 2011) Speed-up of libsvn_diff using token counts. By using indices, not node pointers, to refer to tokens, and counting the number of each token, the longest common subsequence (lcs) algorithm at the heart of libsvn_diff becomes much faster in many situations by skipping tokens that are unique to one source or the other. * subversion/libsvn_diff/token.c (svn_diff__node_t): Added 'index' member. (svn_diff__tree_t): Added 'node_count' member. (svn_diff__get_node_count): New function. (tree_insert_token): Assigns indices to nodes. (svn_diff__get_tokens): Uses token_index (svn_diff__token_index_t), not node pointer, to identify node in position struct. * subversion/libsvn_diff/lcs.c (svn_diff__snake): Takes token counts as additional argument. Skips tokens that are unique to one or the other file. (svn_diff__lcs): Takes token counts and number of different tokens as additional arguments. Calculates number of tokens unique to each source and uses this to compute effective length difference. * subversion/libsvn_diff/diff.h (svn_diff__token_index_t): New type for token indices/counts. (svn_diff__position_t): Replaced node pointer member by a token_index (svn_diff__token_index_t). Updated and added new function declarations. * subversion/libsvn_diff/diff.c * subversion/libsvn_diff/diff3.c * subversion/libsvn_diff/diff4.c (svn_diff_diff_2, svn_diff_diff_3, svn_diff_diff_4): Count the number of times each token appears in each source. (svn_diff__resolve_conflict): Takes number of different tokens as additional argument. Counts the number of times each token appears in each (partial) source. (svn_diff__get_token_counts): New function. Patch by: Morten Kloster <morklo{_AT_}gmail.com> (some small whitespace fixes and docstring movements by me)
r1128921 | stefan2 | 2011-05-29 13:04:50 -0500 (Sun, 29 May 2011) Faster LCS algorithm in libsvn_diff by reworking fp argument. This change benefits mainly the 32 bit VS compiler. * subversion/libsvn_diff/lcs.c (svn_diff__snake): expect sum of fp and k as a single argument; replace int1+1 > int2 by int1 >= int2 (svn_diff__lcs): adapt caller Patch by: Morten Kloster <morklo{_AT_}gmail.com> (plus a small tweak by me)
r1128862 | stefan2 | 2011-05-29 05:49:58 -0500 (Sun, 29 May 2011) Describe the general idea behind the LCS algorithm in place. * subversion/libsvn_diff/lcs.c (svn_diff__lcs): add description Patch by: Morten Kloster <morklo{_AT_}gmail.com>
r1128857 | stefan2 | 2011-05-29 05:35:40 -0500 (Sun, 29 May 2011) Simpler/faster LCS algorithm in libsvn_diff by elimination of idx. * subversion/libsvn_diff/lcs.c (svn_diff__snake): Removed idx parameter (always 0) (svn_diff__lcs): Removed idx variable (always 0) , let d have either sign, and moved the origo of fp by d. New version no longer chooses the shorter file as the "original", and can thus give different LCS's depending on the order of the input files even when the files have different lengths. Depending on circumstances, speed can be ~15% faster for core lcs algorithm. Patch by: Morten Kloster <morklo{_AT_}gmail.com>