Guile Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: avl-trees vs. hashes
On 28 Oct 1998, Harvey J. Stein wrote:
> Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:
>
> > 1. not storing the key's hash in the entry would be a big win for
> > deletion in certain, relatively common situations. For example, if the
> > keys and values you use are small (relative to the size of the [big]
> > integer hash object), like small strings, integers, booleans, then
> >
> > (hash (key . value))
> >
> > is much larger than
> >
> > (key . value)
> >
> > and guile will have a much easier time gc'ing the smaller of the two.
>
> What do you mean, "for deletion"? Because of the GC, more memory =
> more CPU = slower in general (because more memory gets scanned), so
> including the hash value makes the program slower in general.
>
> OTOH, all hash operations requiring looking for items might be faster
> because you can test for hash equality before testing key equality &
> the hash equality test might be faster.
>
> > 2. not storing the hash would be a big loss for resize since it's
> > certainly faster (err, at least I hope it is) to do this
> >
> > index = (scm_num2long(SCM_CAR(entry), "foo", "bar")) % vec_len;
> >
> > rather than
> >
> > hash = my_fave_hasher(SCM_CAR(entry));
> > index = hash % vec_len;
> >
> > but then again, the speed of the hasher depends on the size of the key
> > you're hashing. So if you've small keys...
>
> I can't imagine the hasher being faster than just looking up the
> number, so it should always be faster to look them up than to
> recompute all the hashes. However, it might be a minor point for
> small keys because of the other work needed when rehashing.
>
> But, using the above trick it can make all ops that require locating
> an object faster (as mentioned above).
I think the best solution is to _not_ store the hash by default. Reason: The
user could do so if he wishes. In a generic hashtable implementation, which
allows to provide a hash-function and an equality predicate, you could do the
following:
* Keys are pairs (hash . real-key)
* The hash-function which is provided to make-hashtable is simply 'car.
* The equality function first compares the cars, then performs the real
equality test on the real-keys:
(define (my-hash-equal? k1 k2)
(and (eq? (car k1) (car k2))
(real-equality-test (cdr k1) (cdr k2))))
* Instead of (hashtable-set! table key value) you do
(hashtable-set! table (cons (real-hasher key) key) value)
* make-hashtable is called as follows (or similarly):
(make-hashtable car my-hash-equal?)
By _not_ adding the hash you have the possibility to get the best of all
worlds.
Jay, you have done a great job providing the auto-resizing hashtables.
What do you think of my suggestion?
Best regards,
Dirk Herrmann
Guile Home |
Main Index |
Thread Index