Thoughts on Chapter 5

C. Michael Pilato cmpilato at red-bean.com
Sun Feb 25 15:01:09 CST 2007


C. Michael Pilato wrote:
> Brian W. Fitzpatrick wrote:
>> OK.  This chapter covers a *ton* of data about an arcane subject and
>> it's a nice fluid read, but reading the chapter end-to-end, I felt
>> like I had to wade through a *ton* of BDB minutiae that 99% of the
>> repository admins won't ever have to deal with.  I don't have a
>> solution in mind for this, but I found it to be distracting and wonder
>> if we can't better title sections that are BDB specific so that FSFS
>> admins don't have to read all the way through just to find out that it
>> doesn't apply to them.
> 
> Yes, that was something we talked about doing -- I simply forgot to do
> it.  :-)

Okay, the "Repository Recovery" section was the only one not clearly
marked with "BDB".  Maybe you felt others should have been, like
"Removing dead transactions", due to misunderstanding the facts?  At any
rate, I think an FSFS-minded person could breeze over the sections that
have the words "Berkeley DB" in them and a) not miss anything and b) not
get too much irrelevant material.

>> You may need to take some of these comments with a grain of salt as I
>> personally don't recommend that people use bdb at all.  Aren't we
>> going to prescribe one over the other?
> 
> Even if we prescribe one, the book should still contain the information
> necessary to assist those who chose the other one.  I personally would
> still stay on this side of an outright prescription of FSFS; but yeah, I
> think we should be able to say something like, "These days, most folks
> choose FSFS for its flexibility in various deployment scenarios and ease
> of administration."

So, I didn't make a clear prescription, because my conscience won't
allow me to do so.  The memory of data-losing FSFS bugs as recent as
Subversion 1.3 is far too clear in my mind.  But I *did* state that
today both backends should be deemed reliable.  I added a comparison
table entry "Reliability: data integrity" which points out that newer
FSFS should be great, and BDB is also great but only if properly
deployed.  And I also explained why FSFS is pretty much the (correct)
choice everyone makes today, without falsely slandering Berkeley DB.  To
my knowledge, there has never been a data lossage bug in Berkeley DB
that didn't turn out to be a problem with the deployment configuration.

(Let me be very clear here -- I use FSFS myself all the time, now.  I'm
merely aiming at journalistic responsibility.)

>> "Planning Your Repository Organization":
>> - one other reason to have separate repositories is when you have
>> completely different types of data in each project: eg, one project
>> has source code, and another has 100MB Photoshop files in it.
> 
> Really?  Why is that?  (I can't quickly think of a reason why that would
> actually matter.)

*Poke*

>> In the table:
>> - "Scalability: repository size": I don't understand what this
>> means--does this mean that fsfs repositories take up less space on
>> disk or that you can't use it for repositories with tons of data (and
>> if it's the latter, I think it's incorrect--Apache uses fsfs).
> 
> That could be more clear, yes?  I'm pretty sure that when Ben added
> this, he was talking about space consumed on disk.

Fixed to by "Scalability: repository disk usage".  And I removed the
"slightly" here ... my experience is that FSFS is non-trivially smaller,
especially when the BDB logfiles haven't been purged.

>> - "Performance: Isn't BDB < 10% faster than FSFS in checking out the
>> latest revision?  I thought ghudson mailed stats on this to the list
>> that showed it's a negligible difference.

Added "slightly" here.

>> -We should note that BDB has an extra dependency: BDB itself
> 
> +1

I did this in the prose, rather than in the table.  The extra dependency
isn't really meaningful to non-developers or non-build-from-sourcers.

>> - Also, doesn't FSFS deal better with mixed repository access
>> mechanisms (http:// + svn://)?  Should we mention this?
> 
> Well, it deals better mixed access by different *OS users*.  BDB has no
> problem doing http:// + svn:// if httpd and svnserve run as the same
> user.  But I dunno how to make this fit into a smallish table.  :-)

Turns out the table covered this already in the umask/groups row.

>> - BDB & FSFS subsections: Maybe these could be divided into a
>> "summary" and "gritty details" part?  I really doubt that most admins
>> give a hoot that BDB directory mods are O(n^2) and FSFS's are O(n).
> 
> Oh, I'm happy to toss that little bit altogether.

Removed that detail.  I didn't detect strong enough reason to bifurcate
the sections any more.

>> - FSFS subsection:  fsfs really isn't "immature" any more, and it's
>> been stress tested a lot.  I'd say that this paragraph is mostly FUD
>> and should go.
> 
> Agreed.  (Though, it's hard not to remember two relatively recent data
> lossage bugs in the backend ... something we've never had with BDB.)

See above.  Those data lossage bugs require me tread carefully here.  So
I lost the stuff about lack of testing and how we were only guessing
about performance figures -- those have been validated by now.  And I
clearly indicated that FSFS is the choice nearly everyone is making.

>> "Creating the Repository"
>> - Maybe move the 1st tip up a little bit?

I kept this where it was because it follows the first use of 'svnadmin
create'.  Was there some other location you can in mind?

>> - Make the Warning more threatening?  We had some dude on the #svn
>> channel talking about how he edited one of his rev files (I am *not*
>> kidding).

Done.

>> "svndumpfilter":
>> - 1st footnote:  I used to agree that the inability to obliterate a
>> rev is a feature, but after talking to dozens of people in various
>> roles (open source, closed source, including the BSD dudes), I now
>> think that it *is* a missing feature.  FreeBSD *can't* have to do
>> something that would require thousands of people to recheckout huge
>> working copies (eg the ports tree).
> 
> +1

I had originally intended this footnote as a joke, so I rewrote the joke
to be more clear.  I then added another footnote later which points out
that obliterate has real use-cases, that svn-dev-folk are aware of and
interested in solving the problem there.

>> "repository recovery":
>> - This should be specified as BDB specific in the title
> 
> +1

Done (as mentioned before).

>> "repository replication":
>> - The 'svn>' prompt confused me--I thought it was some sort of weird
>> svn shell at first.
> 
> Yeah, that's not necessary.  I'll drop it.

Fixed.

>> - using the username 'syncprop' in your examples is extremely
>> confusing--reminds me of properties.  Can't we use harry or sally?
> 
> I thought I used "syncproc" (as in "synchronization process").  I don't
> want to use harry or sally because I go out the way to recommend that
> you setup a custom user for sync stuffs.  Oops!  I see now that
> sometimes I typed "syncprop" by accident.  Will fix.  Maybe I'll just
> make everything use "syncuser", which is more clear.

Used syncuser/syncpass instead.

-- 
C. Michael Pilato <cmpilato at red-bean.com>

"The Christian ideal has not been tried and found wanting.  It has
 been found difficult; and left untried."  -- G. K. Chesterton




More information about the svnbook-dev mailing list