MOO-cows Mailing List Archive

[Subject Prev][Subject Next][Thread Prev][Thread Next][Subject Index][Thread Index]

Re: hash & balanced trees



Charles Bancroft writes: 

> Richard Godard writes:

>> That's fine for mutable data type... but unless I'm mistaken MOO
>> lists are not.

As seen by MOO programmers, lists are immutable.  How they're
implemented internally doesn't matter.  For example, when executing

   l = {"foo", "bar", "baz"};
   l[2] = "quux";

we currently implement the second line close to

   l = listset(l, "quux", 2);

making a new copy of l, mutating the copy, placing the copy in l, and
then throwing away the original value of l.

However, in this instance we can (theoretically) avoid the
malloc/copy/free by noting that there are no other references to l as
we enter the l[2]="quux" statement--and nobody will notice if we just
mutate the list l points to directly.  Anybody have a cite for "The
implementation may cheat, but it must not get caught"?

It's possible that we could reimplement MOO lists as some fancier data
structure internally, but I don't think it's going to happen.

> Not sure that one would want to replace the moo list data type with
> a hash or tree.  Rather, by adding new builtin support for
> alternative types, it gives programmers a choice.

FWIW, I am firmly opposed to giving programmers mutable data types as
a choice.

> Why not add both a $hash and a $tree =8?

Cos they both would likely implement $dictionary, and their
performance characteristics aren't all that different.

While I'm at it, I should point out that naive hash tables are not
necessarily a good implementation of dictionaries.  Given a relatively
full dictionary, d, say we want to insert a new element:

   // Force an additional reference, since this situation comes
   // up in code a lot, especially for hacking on properties:
   old_d = d;             
   d["jay"] = #3920;

The worst thing we can do is copy the entire dictionary and add "jay"
to the new copy of d.  You may have to do this for hash tables.
(Come to think of it, you can pull the same kind of trick below with
hash chains, but I need to think about this more before I can get rid
of my distaste for hashing.)

Consider instead a binary search tree, with each node reference
counted.  As we descend the tree, we copy each node along our path,
but while we're copying we only have to bump the refcount on the paths
not taken.  We insert "jay" at the appropriate spot, do whatever
rebalancing needs to be done, and return the head of our new tree,
which shares all but O(log n) nodes with the old tree.

At least at first glance, trees have better memory utilization than
hash tables, so this certainly argues for trees.  It may come down to
which one we can pull better tricks for, though.  For example, we can
rebalance trees or compact hash tables during idle CPU time---or
inside of a hash chain, it should be possible to do MRU sorting as
long as it's clear that we're not sharing substructure.  Interaction
with the allocator is critical for anything as cons-y as these
structs, too.

I'm staring at all those baroque search techniques at the back of the
data structures book and my head hurts.  Thinking is hard.  Anybody
have any ideas?
-- 
Jay Carlson    nop@nop.com    nop@kagoona.mitre.org

Flat text is just *never* what you want.   ---stephen p spackman

Follow-Ups: References:
Home | Subject Index | Thread Index