CVS Logs and ChangeLogs

Here are some tips on writing CVS log messages so they work as ChangeLog entries too.

todo: more cvs-log-oriented discussion is needed

These are excerpted and paraphrased from Jim Blandy's somewhat longer essay "Maintaining the ChangeLog", included later on this page and also highly recommended.


Writing Log Entries
===================

Certain guidelines should be adhered to when writing log messages:

Make a log entry for every change.  The value of the log becomes much
less if developers cannot rely on its completeness.  Even if you've
only changed comments, write an entry that says, "Doc fix."  The only
changes you needn't log are small changes that have no effect on the
source, like formatting tweaks.

Log entries should be full sentences, not sentence fragments.
Fragments are more often ambiguous, and it takes only a few more
seconds to write out what you mean.  Fragments like `New file' or `New
function' are acceptable, because they are standard idioms, and all
further details should appear in the source code.

The log entry should name every affected function, variable, macro,
makefile target, grammar rule, etc, including the names of symbols
that are being removed in this commit.  This helps people do automated
searches through the logs later.  Don't hide names in wildcards,
because the globbed portion may be what someone searches for later.
For example, this is bad:

   (twirling_baton_*): removed these obsolete structures.
   (handle_parser_warning): pass data directly to callees, instead of
   storing in twirling_baton_*.

Later on, when someone is trying to figure out what happened to
`twirling_baton_fast', they may not find it if they just search for
"fast".  A better entry would be:

   (twirling_baton_fast, twirling_baton_slow): removed these obsolete
   structures. 
   (handle_parser_warning): pass data directly to callees, instead of
   storing in twirling_baton_*.

The wildcard is okay in the description for `handle_parser_warning',
but only because the two structures were mentioned by full name
elsewhere in the log entry.

There are some common-sense exceptions to the need to name everything
that was changed:

   * If you have made a change which requires trivial changes
     throughout the rest of the program (e.g., renaming a variable),
     you needn't name all the functions affected.

   * If you have rewritten a file completely, the reader understands
     that everything in it has changed, so your log entry may simply
     give the file name, and say "Rewritten".

In general, there is a tension between making entries easy to find by
searching for identifiers, and wasting time or producing unreadable
entries by being exhaustive.  Use your best judgement --- and be
considerate of your fellow developers.

For large changes or change groups, group the log entry into
paragraphs separated by blank lines.  Each paragraph should be a set
of changes that accomplishes a single goal.  Independent changes
should be in separate paragraphs.  It helps to start out each group
with a sentence or two summarizing the change.

One should never need the log entries to understand the current code.
If you find yourself writing a significant explanation in the log, you
should consider carefully whether your text doesn't actually belong in
a comment, alongside the code it explains.  Here's an example of doing
it right:

       (consume_count): If `count' is unreasonable, return 0 and don't
       advance input pointer.  

And then, in `consume_count' in `cplus-dem.c':

   while (isdigit ((unsigned char)**type))
     {
       count *= 10;
       count += **type - '0';
       /* A sanity check.  Otherwise a symbol like
         `_Utf390_1__1_9223372036854775807__9223372036854775'
         can cause this function to return a negative value.
         In this case we just consume until the end of the string.  */
      if (count > strlen (*type))
        {
          *type = save;
          return 0;
        }

This is why a new function, for example, needs only a log entry saying
"New Function" --- all the details should be in the source.

The above is taken from a longer essay by Jim Blandy, "Maintaining the ChangeLog", reproduced here:

Maintaining the ChangeLog
=========================

A project's ChangeLog provides a history of development.  Comments in
the code should explain the code's present state, but ChangeLog
entries should explain how and when it got that way.  The ChangeLog
must show:

* the relative order in which changes entered the code, so you can
  see the context in which a change was made, and

* the date at which the change entered the code, so you can relate the
  change to outside events, like branch cuts, code freezes, and
  releases.

In the case of CVS, these refer to when the change was committed,
because that is the context in which other developers will see the
change.

Every change to the sources should have a ChangeLog entry.  The value
of the ChangeLog becomes much less if developers cannot rely on its
completeness.  Even if you've only changed comments, write an entry
that says, "Doc fix."  The only changes you needn't log are small
changes that have no effect on the source, like formatting tweaks.

In order to keep the ChangeLog a manageable size, at the beginning of
each year, the ChangeLog should be renamed to "ChangeLog-YYYY", and a
fresh ChangeLog file started.


How to write ChangeLog entries
------------------------------

ChangeLog entries should be full sentences, not sentence fragments.
Fragments are more often ambiguous, and it takes only a few more
seconds to write out what you mean.  Fragments like `New file' or `New
function' are acceptable, because they are standard idioms, and all
further details should appear in the source code.

The log entry should mention every file changed.  It should also
mention by name every function, variable, macro, makefile target,
grammar rule, etc. you changed.  However, there are common-sense
exceptions:

* If you have made a change which requires trivial changes throughout
  the rest of the program (e.g., renaming a variable), you needn't
  name all the functions affected.

* If you have rewritten a file completely, the reader understands that
  everything in it has changed, so your log entry may simply give the
  file name, and say "Rewritten".

In general, there is a tension between making entries easy to find by
searching for identifiers, and wasting time or producing unreadable
entries by being exhaustive.  Use your best judgement --- and be
considerate of your fellow developers.

Group ChangeLog entries into "paragraphs", separated by blank lines.
Each paragraph should be a set of changes that accomplish a single
goal.  Independent changes should be in separate paragraphs.  For
example:

    1999-03-24  Stan Shebs  <shebs@andros.cygnus.com>

            * configure.host (mips-dec-mach3*): Use mipsm3, not mach3.

            Attempt to sort out SCO-related configs.
            * configure.host (i[3456]86-*-sysv4.2*): Use this instead of
            i[3456]86-*-sysv4.2MP and i[3456]86-*-sysv4.2uw2*.
            (i[3456]86-*-sysv5*): Recognize this.
            * configure.tgt (i[3456]86-*-sco3.2v5*, i[3456]86-*-sco3.2v4*):
            Recognize these.

Even though this entry describes two changes to `configure.host',
they're in separate paragraphs, because they're unrelated changes.
The second change to `configure.host' is grouped with another change
to `configure.tgt', because they both serve the same purpose.

Also note that the author has kindly recorded his overall motivation
for the paragraph, so we don't have to glean it from the individual
changes.

The header line for the ChangeLog entry should have the format shown
above.  If you are using an old version of Emacs (before 20.1) that
generates entries with more verbose dates, consider using
`etc/add-log.el', from the GDB source tree.  If you are using vi,
consider using the macro in `etc/add-log.vi'.  Both of these generate
entries in the newer, terser format.

One should never need the ChangeLog to understand the current code.
If you find yourself writing a significant explanation in the
ChangeLog, you should consider carefully whether your text doesn't
actually belong in a comment, alongside the code it explains.  Here's
an example of doing it right:

  1999-02-23  Tom Tromey  <tromey@cygnus.com>

          * cplus-dem.c (consume_count): If `count' is unreasonable,
          return 0 and don't advance input pointer.

And then, in `consume_count' in `cplus-dem.c':

   while (isdigit ((unsigned char)**type))
     {
       count *= 10;
       count += **type - '0';
       /* A sanity check.  Otherwise a symbol like
         `_Utf390_1__1_9223372036854775807__9223372036854775'
         can cause this function to return a negative value.
         In this case we just consume until the end of the string.  */
      if (count > strlen (*type))
        {
          *type = save;
          return 0;
        }

This is why a new function, for example, needs only a log entry saying
"New Function" --- all the details should be in the source.

Avoid the temptation to abbreviate filenames or function names, as in
this example (mostly real, but slightly exaggerated):

	* gdbarch.[ch] (gdbarch_tdep, gdbarch_bfd_arch_info,
 	gdbarch_byte_order, {set,}gdbarch_long_bit,
 	{set,}gdbarch_long_long_bit, {set,}gdbarch_ptr_bit): Corresponding
 	functions.

This makes it difficult for others to search the ChangeLog for changes
to the file or function they are interested in.  For example, if you
searched for `set_gdbarch_long_bit', you would not find the above
entry, because the writer used CSH-style globbing to abbreviate the
list of functions.  If you gave up, and made a second pass looking for
gdbarch.c, you wouldn't find that either.  Consider your poor readers,
and write out the names.


ChangeLogs and the CVS log
--------------------------

CVS maintains its own logs, which you can access using the `cvs log'
command.  This duplicates the information present in the ChangeLog,
but binds each entry to a specific revision, which can be helpful at
times.

However, the CVS log is no substitute for the ChangeLog files.

* CVS provides no easy way to see the changes made to a set of files
  in chronological order.  They're sorted first by filename, not by date.

* Unless you put full ChangeLog paragraphs in your CVS log entries, it's 
  difficult to pull together changes that cross several files.

* CVS doesn't segregate log entries for branches from those for the
  trunk in any useful way.

In some circumstances, though, the CVS log is more useful than the
ChangeLog, so we maintain both.  When you commit a change, you should
provide appropriate text in both the ChangeLog and the CVS log.

It is not necessary to provide CVS log entries for ChangeLog changes,
since it would simply duplicate the contents of the file itself.  


Writing ChangeLog entries for merges
------------------------------------

Revision management software like CVS can introduce some confusion
when writing ChangeLog entries.  For example, one might write a change
on a branch, and then merge it into the trunk months later.  In that
case, what position and date should the developer use for the
ChangeLog entry --- that of the original change, or the date of the
merge?

The principles described at the top need to hold for both the original
change and the merged change.  That is:

* On the branch (or trunk) where the change is first committed, the
  ChangeLog entry should be written as normal, inserted at the top of
  the ChangeLog and reflecting the date the change was committed to
  the branch (or trunk).

* When the change is then merged (to the trunk, or to another branch),
  the ChangeLog entry should have the following form:

  1999-03-26  Jim Blandy  <jimb@zwingli.cygnus.com>

           Merged change from foobar_20010401_branch:

           1999-03-16  Keith Seitz  <keiths@cygnus.com>
           [...]

  In this case, "Jim Blandy" is doing the merge on March 26; "Keith
  Seitz" is the original author of the change, who committed it to
  `foobar_20010401_branch' on March 16.
  
  As shown here, the entry for the merge should be like any other
  change --- inserted at the top of the ChangeLog, and stamped with
  the date the merge was committed.  It should indicate the origin of
  the change, and provide the full text of the original entry,
  indented to avoid being confused with a true log entry.  Remember
  that people looking for the merge will search for the original
  changelog text, so it's important to preserve it unchanged.

  For the merge entry, we use the merge date, and not the original
  date, because this is when the change appears on the trunk or branch
  this ChangeLog documents.  Its impact on these sources is
  independent of when or where it originated.

This approach preserves the structure of the ChangeLog (entries appear
in order, and dates reflect when they appeared), but also provides
full information about changes' origins.

(Back to Karl Fogel's home page.)