Guile Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: avl-trees vs. hashes
On 28 Oct 1998, Harvey J. Stein wrote:
> Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:
>
> > 1. not storing the key's hash in the entry would be a big win for
> > deletion in certain, relatively common situations. For example, if the
>
> What do you mean, "for deletion"? Because of the GC, more memory =
> more CPU = slower in general (because more memory gets scanned), so
> including the hash value makes the program slower in general.
>
All I know is that my hash tables outperform (speed-wise) Guile's hash
tables at everything except deletion. If I test
(set! mytab (make-proc))
(for-each (insert-entry mytab insert-proc) lines)
(for-each (find-entry mytab find-proc) lines)
(for-each (delete-entry mytab delete-proc) lines)
auto-shrink enabled
*time: 557*
(gc-time-taken . 332)
(cells-allocated . 35957)
(cell-heap-size . 98304)
(bytes-malloced . 93495)
(gc-malloc-threshold . 150000)
(cell-heap-segments (537251512 . 536989368) (537909288 . 537385000))
auto-shrink disabled
*time: 564*
(gc-time-taken . 323)
(cells-allocated . 35945)
(cell-heap-size . 98304)
(bytes-malloced . 93497)
(gc-malloc-threshold . 150000)
(cell-heap-segments (537251512 . 536989368) (537963016 . 537438728))
Guile hash
*time: 448*
(gc-time-taken . 241)
(cells-allocated . 36002)
(cell-heap-size . 98304)
(bytes-malloced . 126346)
(gc-malloc-threshold . 238159)
(cell-heap-segments (537251512 . 536989368) (537962712 . 537438424))
in fact, Guile's hashes are faster at deletion than they are at insertion!
if I test
(set! mytab (make-proc))
(for-each (insert-entry mytab insert-proc) lines)
(for-each (find-entry mytab find-proc) lines)
(for-each (insert-entry mytab insert-proc) lines)
auto-shrink enabled
*time: 398*
(gc-time-taken . 185)
(cells-allocated . 65968)
(cell-heap-size . 294912)
(bytes-malloced . 109871)
(gc-malloc-threshold . 150000)
(cell-heap-segments (537251512 . 536989368) (537909288 . 537385000)
(539510888 . 537938024))
Guile hash
*time: 635*
(gc-time-taken . 433)
(cells-allocated . 56009)
(cell-heap-size . 98304)
(bytes-malloced . 126346)
(gc-malloc-threshold . 238159)
(cell-heap-segments (537251512 . 536989368) (537962712 . 537438424))
> OTOH, all hash operations requiring looking for items might be faster
> because you can test for hash equality before testing key equality &
> the hash equality test might be faster.
yes, in almost all cases. it's faster, e.g. to check whether two symbols
are eq? because then, in the C code, you're comparing two long's,
whereas testing two SCM numbers requires that you do two SCM_INUM()'s
#define SCM_INUM(x) (((x)>>1)>>1)
> I can't imagine the hasher being faster than just looking up the
> number, so it should always be faster to look them up than to
> recompute all the hashes. However, it might be a minor point for
> small keys because of the other work needed when rehashing.
the problem is that a full size long won't fit into a SCM INUM. I'm
currently masking (and'ing) my hash with 2,097,157 which does fit, but
obviously won't work work for extremely huge tables (>2,000,000 entries).
If I insist on a range (- 2^31, 2^31 - 1) for my hashes, then they'll be
stored as "big numbers". Converting a big number to a long is like
converting a string to a number (not very fast).
Guile Home |
Main Index |
Thread Index