From minor-owner@red-bean.com Thu Jul 17 18:46:49 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6HNknDs031287
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Thu, 17 Jul 2003 18:46:49 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6HNkmpl031285
	for minor-commits@red-bean.com; Thu, 17 Jul 2003 18:46:48 -0500
Date: Thu, 17 Jul 2003 18:46:48 -0500
Message-Id: <200307172346.h6HNkmpl031285@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 14 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-17 18:46:45 -0500 (Thu, 17 Jul 2003)
New Revision: 14

Modified:
   trunk/doc/design
Log:
Note that it's okay to use Linux-specific stuff, when doing so would
be a big win.

Add "C interface" section.



Modified: trunk/doc/design
==============================================================================
--- trunk/doc/design	2003-07-12 06:32:10 UTC (rev 13)
+++ trunk/doc/design	2003-07-17 23:46:45 UTC (rev 14)
@@ -74,7 +74,6 @@
   with MzScheme, RScheme, Bigloo, and SCM --- even where those systems
   use ahead-of-time compilation to native code via C.
 
-
 ** Secondary Goals
 
 Once the basic requirements have been met, here are others that are
@@ -125,6 +124,14 @@
 different targets.  But the whole JIT aspect of Minor is very exciting
 to me, and IA-32 is the obvious place to start.
 
+It's fine for Minor to take advantage of features specific to Linux
+and the GNU toolchain, where doing so would make Minor easier to use,
+or provide a helpful feature.  If other platforms don't have
+equivalent facilities, that's fine: GNU/Linux sets the standard;
+anyone else has to catch up.  This doesn't mean we can be gratuitously
+Linux-specific; it just means that we're not lashed to the least
+common denominator.
+
 ** Multi-threaded, generational GC
 
 Minor should use generational, non-incremental garbage collection.
@@ -192,7 +199,125 @@
 form), I think this might be much easier than one would expect, and
 very helpful.
 
+** C interface
 
+To make it easier to use Minor Scheme within existing build systems,
+and with existing tools, the Minor ahead-of-time compiler should
+produce ordinary ELF relocatable object files (".o files"), that the
+user can link with other .o files generated by other compilers to
+produce working mixed-language executables.  The user will probably
+need to link against a run-time library, too ("-lminor").  The Minor
+AOT compiler should also be able to produce header files containing
+declarations for the functions, variables, and other constructs the
+Scheme module exports.
+
+For example, the following makefile should work:
+
+    %.o: %.scm
+            minor-compile -c $<
+    %.h: %.scm
+            minor-compile --header $<
+
+    prog: main.o gcd.o
+            cc main.o gcd.o -lminor -o prog
+    main.o: main.c gcd.h
+    gcd.o gcd.h: gcd.scm
+
+Here is main.c:
+
+    #include <stdio.h>
+    #include "gcd.h"
+
+    int
+    main (int argc, char **argv)
+    {
+      printf ("%d\n", gcd (atoi (argv[1]), atoi (argv[2])));
+      return 0;
+    }
+
+Here is gcd.scm:
+
+    (module gcd Minor
+      (import c-ffi)
+      
+      (define (gcd a b)
+        (if (= b 0)
+            a
+            (gcd b (remainder a b))))
+
+      ;; Export the Scheme function gcd as a C function named "gcd",
+      ;; which expects two int arguments, and returns an int.
+      (export-c-function int "gcd" (int int) gcd))
+
+The command "minor-compile --header gcd.scm" should produce the file
+"gcd.h", containing something like the following:
+
+    /* gcd.h --- C functions exported by gcd.scm
+       Generated by minor-compile 0.0 as follows:
+
+         minor-compile --header gcd.scm
+
+    */
+
+    int gcd (int, int)
+
+The build process should behave like this:
+
+    $ make
+    minor-compile --header gcd.scm
+    cc -c main.c
+    minor-compile -c gcd.scm
+    cc main.o gcd.o -lminor -o prog
+    $ ./prog 12 16
+    4
+    $
+
+The 'export-c-function' form takes care of generating code for the
+C-visible function that converts the incoming arguments to Scheme
+objects, and the outgoing return value to a C value.  As used here,
+the wrapper will also make sure Minor has been initialized (i.e., call
+mn_init), and find an appropriate mn_call object to use for the
+conversions and the call.  Options to 'export-c-function' not shown
+here will allow C code to pass an mn_call object explicitly; in this
+case, the initialization check will be unnecessary.
+
+So, to create a stand-alone executable in Scheme, one could write:
+
+    (module gcd2 minor
+      (import c-ffi)
+
+      (define (gcd a b)
+        (if (= b 0)
+            a
+            (gcd b (remainder a b))))
+
+      (define (main argv)
+        (display (gcd (string->number (list-ref argv 1)
+                                      (list-ref argv 2))))
+        (newline))
+
+      ;; Export the Scheme function main as a C function named "main".
+      ;; The C function takes two arguments: an int named "argc" and a
+      ;; char ** named "argv".  argv is an array whose length is argc,
+      ;; where each element is a null-terminated string of chars.
+      (export-c-function int "main" ((int argc)
+                                     ((list c-string argc) argv))
+
+        ;; The C "main" function converts its argv to the natural
+        ;; corresponding Scheme type (a list of strings), and passes
+        ;; only that list to the Scheme main; argc is unnecessary in
+        ;; Scheme.
+        (main argv)))
+
+One could compile and run this file as follows:
+
+    $ minor-compile -c gcd2.scm
+    $ cc gcd2.o -lminor -o gcd2
+    $ ./gcd2 30 40
+    10
+    $ 
+
+
 * The Minor ABI
 
 The interface for loading new machine code into a running Minor



From minor-owner@red-bean.com Fri Jul 18 21:42:07 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6J2g6Ds014610
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Fri, 18 Jul 2003 21:42:07 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6J2g6mm014608
	for minor-commits@red-bean.com; Fri, 18 Jul 2003 21:42:06 -0500
Date: Fri, 18 Jul 2003 21:42:06 -0500
Message-Id: <200307190242.h6J2g6mm014608@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 15 - trunk/include/minor
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-18 21:42:04 -0500 (Fri, 18 Jul 2003)
New Revision: 15

Modified:
   trunk/include/minor/minor.h
Log:
* include/minor/minor.h (mn_thread_first_call): Rename from
mn_thread_outermost_call.  Clarify documentation.



Modified: trunk/include/minor/minor.h
==============================================================================
--- trunk/include/minor/minor.h	2003-07-17 23:46:45 UTC (rev 14)
+++ trunk/include/minor/minor.h	2003-07-19 02:42:04 UTC (rev 15)
@@ -657,18 +657,18 @@
    object, return false, and set the pending exception.  */
 bool mn_thread_to_pthread (mn_call *, mn_ref *thread, pthread_t *pthread_p);
 
-/* Return the outermost Minor call object for the calling thread.
+/* Return the first Minor call object for the calling thread.
 
    A call object corresponds to a particular Minor->C call, but a
    pthread's start function is called by the POSIX threads
-   implementation, not by Minor, so there is no appropriate call
-   object already available.  (The call object of the thread that
-   created this one can't be used: call objects belong to specific
-   threads.)
+   implementation, not by Minor, so it doesn't begin life with any
+   call object it can use with the functions in this interface.  (Call
+   objects belong to specific threads, so the call object of the
+   thread that created this one can't be used.)
 
-   So for each thread we create a special "outermost" call object,
-   that a new thread can use to call Minor functions.  */
-mn_ref *mn_thread_outermost_call (mn_call *);
+   So for each thread we create a special "first" call object, that a
+   new thread can use to call Minor functions.  */
+mn_ref *mn_thread_first_call (mn_call *);
 
 
 /* Mutexes.  */
@@ -819,8 +819,8 @@
 /* Initializing the Minor system.  */
 
 /* Initialize Minor.  This function must be called before any other
-   function in this interface.  Return the outermost call object for
-   the calling thread, as if we had called mn_thread_outermost_call.  */
+   function in this interface.  Return the first call object for the
+   calling thread, as if we had called mn_thread_first_call.  */
 mn_call *mn_init (void);
 
 



From minor-owner@red-bean.com Sat Jul 19 02:09:27 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6J79QDs022017
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 19 Jul 2003 02:09:26 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6J79QEr022015
	for minor-commits@red-bean.com; Sat, 19 Jul 2003 02:09:26 -0500
Date: Sat, 19 Jul 2003 02:09:26 -0500
Message-Id: <200307190709.h6J79QEr022015@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 16 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-19 02:09:23 -0500 (Sat, 19 Jul 2003)
New Revision: 16

Modified:
   trunk/doc/design
Log:
* doc/design (The Minor ABI): Just refer to "C Interface" for an
explanation of how Minor .o files behave.
(C Interface, Target IA-32 Linux --- first): Tweaks.


Modified: trunk/doc/design
==============================================================================
--- trunk/doc/design	2003-07-19 02:42:04 UTC (rev 15)
+++ trunk/doc/design	2003-07-19 07:09:23 UTC (rev 16)
@@ -127,7 +127,7 @@
 It's fine for Minor to take advantage of features specific to Linux
 and the GNU toolchain, where doing so would make Minor easier to use,
 or provide a helpful feature.  If other platforms don't have
-equivalent facilities, that's fine: GNU/Linux sets the standard;
+equivalent facilities, that's fine --- GNU/Linux sets the standard;
 anyone else has to catch up.  This doesn't mean we can be gratuitously
 Linux-specific; it just means that we're not lashed to the least
 common denominator.
@@ -246,8 +246,10 @@
             (gcd b (remainder a b))))
 
       ;; Export the Scheme function gcd as a C function named "gcd",
-      ;; which expects two int arguments, and returns an int.
-      (export-c-function int "gcd" (int int) gcd))
+      ;; which expects two int arguments, and returns an int.  The
+      ;; C function's arguments are passed to the Scheme function in
+      ;; the obvious way.
+      (export-c-function int "gcd" (int int) (gcd ...)))
 
 The command "minor-compile --header gcd.scm" should produce the file
 "gcd.h", containing something like the following:
@@ -328,50 +330,13 @@
 ever be any other clients, but this is still an important interface,
 and one worth keeping simple and documenting.)
 
-The Minor ahead-of-time compiler translates Scheme modules into
-ordinary ELF relocatable object files (.o files).  You can link
-these together with .o files from other compilers, or from other
-languages, to 
+As described above in "C Interface", the Minor ahead-of-time compiler
+translates Scheme modules into ordinary ELF relocatable object files
+(.o files), which can be linked using the standard system linker to
+produce a working executable.  The Minor ABI describes the form
+relocatable object files should have in order to be linked against the
+Minor runtime, or loaded into an existing Minor session.
 
-[work-in-progress helpful detritus follows]
-
-Exported bindings in the
-Scheme module appear as symbol definitions in the ELF file; references
-to bindings in other modules appear as undefined symbols.  You can use
-the normal system linker to combine these files with other .o files,
-generated by Minor or by other compilers for programs written in other
-languages, if you also include the Minor runtime dynamic library in
-the link.
-
-These .o files contain
-symbol definitions for exported variables,
-
-
-You can link these with other .o files,
-produced by Minor or by other compilers for other languages, using the
-standard system linker, to produce stand-alone executables or shared
-libraries that include Scheme code.  These
-
-
-
-
-
-Minor uses ELF relocatable object files to represent machine code
-
-provides a documented, public interface for loading machine code
-into a running Minor 
-
-
-This interface serves two purposes:
-- allowing other programs to generate machine code for use with Minor,
-  and
-
-- documenting how the Minor compiler itself loads the code it
-  generates.
-
-
-
-
 * Implementation
 
 ** Macros
@@ -392,11 +357,6 @@
 
 *** Code Annotations
 
-** ABI
-
-(how to represent a pre-compiled Scheme module in an ELF file, so that
-it can be linked with other .o files to produce a working executable)
-
 ** Object File Handling
 *** generic ELF assembler
 *** IA-32 assembler



From minor-owner@red-bean.com Sun Jul 20 17:02:59 2003
Received: from zenia.home (12-223-225-216.client.insightbb.com [12.223.225.216])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6KM2wDr019076
	for <minor@red-bean.com>; Sun, 20 Jul 2003 17:02:59 -0500
Received: by zenia.home (Postfix, from userid 5433)
	id 0F54B204F5; Sun, 20 Jul 2003 17:05:39 -0500 (EST)
Sender: jimb@zenia.home
To: minor@red-bean.com
Subject: Minor databases
From: Jim Blandy <jimb@redhat.com>
Date: 20 Jul 2003 17:05:39 -0500
Message-ID: <vt24r1ghn0s.fsf@zenia.home>
Lines: 39
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii


Karl, during your visit, at one point you said, "Databases are cool."
This reminded me of Paul Graham's quip about how, between programming
languages and databases, the former was more more likely to disappear
over time.  I'm a willing student, but I don't grok this yet (I'm
still at the BTDTDB stage).  Noneless, I've been thinking about what
sorts of databases would be nice to have for Minor:

- nightly regression testing results (for producing tinderbox-like
  reports)

- nightly automatic benchmark results - if we could have a web page
  with graphs showing performance over time, measuring not just
  various traditional benchmarks but also things like start-up time,
  code size, compile time, link time --- then inadvertent sudden hits
  would be easy to spot.

- Profiling results - I'm imagining a graph whose vertical axis is
  Subversion revision number (REV), and whose horizontal access is
  code address (ADDR), where the color at REV, ADDR shows goes from
  violet to red depending on how much time the program spends there.
  Hot spots would show up as bright red areas.

- bug reports (obviously)

- features yet to be implemented (track like bugs)

- mail archives (??)

What do you think?

What can we bump up in priority for Minor to help all this happen?

Database access, obviously.  Could you re-implement what Craig B. did?

Would XML be helpful here?  I remember when we were talking about your
ideas for bug-tracking stuff, that I suggested you use XML internally,
to represent whatever microsyntax you wanted to use in your subject
headers, etc.


From minor-owner@red-bean.com Sun Jul 20 18:14:01 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6KNE0Ds021047
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sun, 20 Jul 2003 18:14:01 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6KNE0hF021045
	for minor-commits@red-bean.com; Sun, 20 Jul 2003 18:14:00 -0500
Date: Sun, 20 Jul 2003 18:14:00 -0500
Message-Id: <200307202314.h6KNE0hF021045@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 17 - trunk/include/arch/ia-32
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-20 18:13:57 -0500 (Sun, 20 Jul 2003)
New Revision: 17

Modified:
   trunk/include/arch/ia-32/gc-map.h
Log:
More work.


Modified: trunk/include/arch/ia-32/gc-map.h
==============================================================================
--- trunk/include/arch/ia-32/gc-map.h	2003-07-19 07:09:23 UTC (rev 16)
+++ trunk/include/arch/ia-32/gc-map.h	2003-07-20 23:13:57 UTC (rev 17)
@@ -4,126 +4,222 @@
 #ifndef MINOR_ARCH_IA32_GC_MAP_H
 #define MINOR_ARCH_IA32_GC_MAP_H
 
-/* A "doting object" is an object in one generation that points to an
-   object in a younger generation.
+/* The map data structure is used for the following tasks:
 
-   A "doting page" is a page on which a doting object starts.  Doting
-   objects can be quite large, and cover many pages, but only the page
-   on which a doting object starts is a doting page.  */
+   - To restrict collection to a limited portion of the heap, the
+     generational garbage collector needs to be able to find all
+     pointers from the uncollected portion into the collected portion:
+     these act as roots for the partial collection.
 
+     Since, in practice, pointers from older objects to younger
+     objects are rare, we can reduce the amount of bookkeeping needed
+     here by, when collecting generation G, always including all
+     generations younger than G in the collection as well.  This means
+     we only need to track pointers in older generations to objects in
+     younger generations --- the rare kind.
+
+     The write barrier records when the mutator creates such
+     old->young pointers in this map, for the collector to use.
+
+   - The generational collector also needs to be able to quickly
+     determine which generation an object belongs to, to decide
+     whether to stop tracing, or continue.
+
+   - When we're done with a collection, we need to be able to find all
+     the pages belonging to now-empty "from" spaces, to free them.
+
+   Keep in mind that we have to deal with objects from two very
+   different sources:
+   - mutators allocating objects as they run, and
+   - .o and .so files being loaded from disk.
+
+   In the latter case, we have less control of and less information
+   about where objects are allocated: we only have what we can get the
+   standard static and dynamic linkers to tell us.  Without custom
+   linker scripts to define helpful symbols, this isn't much.
+
+   A "doting object" is an object in one generation that points to an
+   object in a younger generation.  A "doting page" is a page on which
+   a doting object starts.  Doting objects can be quite large, and
+   cover many pages, but only the page on which a doting object starts
+   is a doting page.  */
+
+
 /* For every 4kb page managed by the garbage collector, we keep the
    following information.  (Since there is an instance of this
-   structure for every page, it needs to be kept small.)  */
+   structure for every page, it needs to be kept small.)
+
+   At the machine code level, referencing an (unsigned) bit field
+   turns into:
+   - a memory reference to read the word containing the bit field,
+   - a mask, to get rid of bits that don't belong to the field, and
+   - a right shift, to put the bitfield's least significant bit at
+     the bottom of the register.
+
+   But note that a lot of these fields are indices within a page, or
+   portions of page addresses.  So the first thing we're going to do
+   with such values is shift them left again, to multiply by 8 (for
+   first_doting_object and last_doting_object) or by 4k (for
+   next_doting_page and next_generation_page).  So the compiler could
+   combine the right shift and the left shift into a single operation.
+   
+   We can do even better: if we make sure that the right shift (the
+   bitfield's position within the word) and the left shift (the factor
+   we need to multiply it by to get a page offset or a page address)
+   are the same, then they cancel each other out, and all we need to
+   do is fetch and mask.  So first_doting_object, last_doting_object,
+   next_doting_page, and next_generation_page are all aligned this
+   way.
+
+   Remember, premature optimization is the root of all evil.  */
 struct gc_page
 {
   /* The following fields should all pack into a single 32-bit word.  */
 
   /* The generation to which the objects in this page belong.  Zero is
-     the youngest generation.  */
+     the youngest generation.  Seven is the "dummy generation", used
+     for memory areas we haven't allocated a separate gc_page arary
+     for yet.  */
   unsigned generation : 3;
 
   /* If this is a doting page, this is the offset within this page of
      the start of the first doting object that begins on this page ---
-     divided by eight.  4k / 8 == 512, so we need nine bits for this
-     field.  To find all the doting pointers, we start here and scan
-     all the pointers in each object that starts on this page.
+     divided by eight.  If this is not a doting page, then
+     last_doting_object == 0 and first_doting_object > 0.
 
-     (Note that, since the IA-32 C ABI packs bitfields into words
-     starting at the least significant end, fetching this field
-     entails fetching (at least) a 16-bit value, and then shifting
-     that right three bits.  But since the next thing we're usually
-     going to do is multiply that value by eight --- that is, shift
-     left 3 --- to get the object's byte offset in the page, the
-     optimizer should notice that the shifts cancel each other out,
-     and just mask off the high and low bits to get the byte offset.
-     Remember, premature optimization is the root of all evil.)  */
+     4k / 8 == 512, so we need nine bits for this field.  To find all
+     the doting pointers, we start here and scan until
+     last_doting_object.  */
   unsigned first_doting_object : 9;
 
   /* All the pages that contain doting objects are kept in a
-     singly-linked list; there is one list per generation.  The last
-     page in the list points back to the first page.  If no doting
-     objects start on this page, this field is zero.  Otherwise, this
-     field is the link in that list: the address of the next such page
-     in this generation, divided by 4k.  The last page in the chain
-     points back to the first page.  If the page is not a member of
-     such a list, this is zero.
-
-     (Same trick here: the value here is an address divided by 4k, but
-     it's twelve bits up in the 32-bit word.  So fetching it entails
-     shifting right twelve bits, using it entails shifting left twelve
-     bits, and it should all cancel out.  Fetch and mask, and you've
-     got the value you want.)  */
+     singly-linked list; there is one list per generation.  This field
+     is the link in that list: the address of the next such page in
+     this generation, divided by 4k.  For the last page in the chain,
+     this field is zero.  */
   unsigned next_doting_page : 20;
 
   /* The following fields should all pack into a single 32-bit word.  */
 
-  /* Non-zero if any object begins on this page, zero otherwise.
-     (Perhaps this page is the middle of some large object.)  */
-  unsigned any_starts : 1;
-
   /* Non-zero if this page is the first in a contiguous block of two
      or more pages belonging to the same generation, or on the same
      free list.  */
   unsigned first_contiguous : 1;
 
-  /* A free bit!  */
-  unsigned : 1;
+  /* Free bits!  */
+  unsigned : 2;
 
-  /* If any_starts is non-zero, this is the offset in this page at
-     which the first object starts --- divided by eight.  (An object
-     could begin on a previous page and continue into this one.)  */
-  unsigned first_start : 9;
+  /* If this is a doting page, this is the offset within this page of
+     the start of the last doting object that begins on this page ---
+     divided by eight.  If this is not a doting page, then this is
+     zero.  */
+  unsigned last_doting_object : 9;
 
   /* All the pages in a generation are kept in a singly-linked list.
      All free pages are kept in a list.  This is the link in those
      lists.  It's helpful to recognize contiguous blocks of pages, so
      this takes care of that, too.
-
-     If first_contiguous is non-zero, this is the address of the last
-     page in the contiguous run --- divided by 4k.  Otherwise, if
-     first_contiguous is zero, this is the address of the next page in
-     the list --- divided by 4k.  If this is the last page in the
-     list, this is zero.  */
-  unsigned next_page : 20;
+     - If first_contiguous is non-zero, this is the address of the last
+       page in the contiguous run --- divided by 4k.
+     - Otherwise, if first_contiguous is zero, this is the address of
+       the next page in the list --- divided by 4k.  If this is the last
+       page in the list, this is zero.  */
+  unsigned next_generation_page : 20;
 };
 
 
 /* The map of all pages is a two-level tree.  Given a 32-bit address
    ADDR, the 'struct gc_page' for that page is:
 
-      ia32_page_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
+      mn__ia32_page_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
 
-   In other words, the top ten bits select an element from the
-   top-level array that points to the first element of an array of
-   'struct gc_page' structures.  */
-struct gc_page *ia32_page_map[1 << 10];
+   In other words, we use the top ten bits of the object's address to
+   index the top-level array, yielding a pointer to the start of a
+   second-level array; we use the next higher ten bits to index into
+   that array, yielding a gc_page structure.
 
+   At the moment, for objects created by loading object files (either
+   .o files we load ourselves or .so files introduced by the dynamic
+   linker), we don't have any way of knowing where their .minor_data
+   sections start and end.  All we do know is that they'll be aligned
+   to 4k boundaries, and that all the .minor_data sections will be
+   concatenated together, not interleaved with non-Scheme objects.
 
+   So, if we find a pointer to a page we're not aware of having
+   allocated, we just assume it belongs to a loaded object file's
+   .minor_data section.
+
+   For areas we haven't knowingly allocated any memory to, we don't
+   put zeros or leave garbage in the top-level array, or garbage in
+   the second-level array.  Rather, we have one special second-level
+   "dummy" gc_page array which all the top-level pointers point into,
+   in which every gc_page looks like:
+
+    {
+      generation = 7,
+      first_doting_object = 1,
+      next_doting_page = 0,
+      first_contiguous = 0,
+      last_doting_object = 0,
+      next_generation_page = 0
+    }
+
+   Generation 7 is the dummy generation: only the dummy gc_page
+   array's pages belong to generation 7.  When the write barrier
+   actually needs to record a doting object in a page that only has a
+   dummy gc_page array, then we actually allocate a real gc_page array
+   for the region, initialize its gc_pages, and then record the doting
+   object.
+
+   Since we record the first and last doting objects in a page, we can
+   cope even when portions of that page hold data from sections other
+   than .minor_data.  Since all .minor_data sections are concatenated
+   together, the heap sections from one object file will never be
+   interleaved with other non-heap sections.  And since sections on
+   the i386 are always aligned on 4k boundaries --- our page size, we
+   know that we will never see heap and non-heap data interleaved on a
+   single page.  Thus, if we have the offsets of the first and last
+   doting objects, we know that everything between them must be a heap
+   object, so we know how to scan it for pointers to objects in
+   younger generations.
+
+   By using generation 7 as the "dummy" generation, the GC can decide
+   whether to traverse an object simply by comparing its page's
+   generation number to the number of the oldest generation being
+   collected: dummy pages will always be "too old to collect", so
+   they'll be ignored.  */
+struct gc_page *mn__ia32_page_map[1 << 10];
+
+
 /* Return a pointer to the 'struct gc_page' object for ADDR.  */
 #define GC_PAGE(addr)                                           \
-  (&(ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]       \
+  (&(mn__ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]   \
                   [((unsigned int) (addr) >> 12) & 0x3ff]))
 
 
 /* A single generation.  */
 struct gc_generation
 {
+  /* The base address of the first page in this collection.  */
+  void *first_generation_page;
+
   /* The base address of the first doting page in this generation.  */
   void *first_doting_page;
 
-  /* How many times this generation has been collected, without
-     collecting any older generations.  */
+  /* How many collections have gone by without collecting any
+     generations older than this.  */
   int collections;
 
   /* When 'collections' reaches this number, the next collection will
-     include older generations.  */
+     include the next older generation.  */
   int threshold;
 };
 
 
 /* The table of all generations.  Generation zero is the youngest
-   generation.  */
-struct gc_generation generations[8];
+   generation.  Generation 7 is the dummy generation, for areas we
+   haven't allocated anything to yet.  */
+struct gc_generation generations[7];
 
 
 #endif /* MINOR_ARCH_IA32_GC_MAP_H */



From minor-owner@red-bean.com Sun Jul 20 19:19:52 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6L0JqDs023025
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sun, 20 Jul 2003 19:19:52 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6L0Jq0d023023
	for minor-commits@red-bean.com; Sun, 20 Jul 2003 19:19:52 -0500
Date: Sun, 20 Jul 2003 19:19:52 -0500
Message-Id: <200307210019.h6L0Jq0d023023@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 18 - trunk/include/arch/ia-32
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-20 19:19:48 -0500 (Sun, 20 Jul 2003)
New Revision: 18

Modified:
   trunk/include/arch/ia-32/gc-map.h
Log:
More thinking.


Modified: trunk/include/arch/ia-32/gc-map.h
==============================================================================
--- trunk/include/arch/ia-32/gc-map.h	2003-07-20 23:13:57 UTC (rev 17)
+++ trunk/include/arch/ia-32/gc-map.h	2003-07-21 00:19:48 UTC (rev 18)
@@ -168,8 +168,8 @@
    array's pages belong to generation 7.  When the write barrier
    actually needs to record a doting object in a page that only has a
    dummy gc_page array, then we actually allocate a real gc_page array
-   for the region, initialize its gc_pages, and then record the doting
-   object.
+   for the region, initialize its gc_pages, mark them all as belonging
+   to generation 6, and then record the doting object.
 
    Since we record the first and last doting objects in a page, we can
    cope even when portions of that page hold data from sections other



From minor-owner@red-bean.com Sun Jul 20 19:23:33 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6L0NXDs023148
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sun, 20 Jul 2003 19:23:33 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6L0NXEt023146
	for minor-commits@red-bean.com; Sun, 20 Jul 2003 19:23:33 -0500
Date: Sun, 20 Jul 2003 19:23:33 -0500
Message-Id: <200307210023.h6L0NXEt023146@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 19 - trunk/include/arch/ia-32
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-20 19:23:30 -0500 (Sun, 20 Jul 2003)
New Revision: 19

Modified:
   trunk/include/arch/ia-32/gc-map.h
Log:
Note some FIXME's.


Modified: trunk/include/arch/ia-32/gc-map.h
==============================================================================
--- trunk/include/arch/ia-32/gc-map.h	2003-07-21 00:19:48 UTC (rev 18)
+++ trunk/include/arch/ia-32/gc-map.h	2003-07-21 00:23:30 UTC (rev 19)
@@ -4,6 +4,16 @@
 #ifndef MINOR_ARCH_IA32_GC_MAP_H
 #define MINOR_ARCH_IA32_GC_MAP_H
 
+/* FIXME: the discussion of generation seven, and the initial state of
+   the map for areas we haven't allocated yet, needs to be improved.
+
+   FIXME: in general, objects loaded from .o files are not necessarily
+   treated as belonging to generation 7, then generation 6 as
+   described.  Since we're doing the loading and relocating ourselves,
+   we have complete information.  It's only objects that come from the
+   main executable, or from shared libraries, that we need to handle
+   this way.  */
+
 /* The map data structure is used for the following tasks:
 
    - To restrict collection to a limited portion of the heap, the



From minor-owner@red-bean.com Sun Jul 20 21:38:30 2003
Received: from pimout5-ext.prodigy.net (pimout5-ext.prodigy.net [207.115.63.73])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6L2cTDr026931
	for <minor@red-bean.com>; Sun, 20 Jul 2003 21:38:29 -0500
Received: from floss.red-bean.com (adsl-65-42-85-186.dsl.chcgil.ameritech.net [65.42.85.186])
	by pimout5-ext.prodigy.net (8.12.9/8.12.9) with ESMTP id h6L2cSxT164100;
	Sun, 20 Jul 2003 22:38:28 -0400
Received: from kfogel by floss.red-bean.com with local (Exim 3.34 #1 (Debian))
	id 19eLmI-0004IF-00; Sun, 20 Jul 2003 16:31:34 -0500
To: Jim Blandy <jimb@redhat.com>
Cc: minor@red-bean.com
Subject: Re: Minor databases
References: <vt24r1ghn0s.fsf@zenia.home>
Reply-to: kfogel@red-bean.com
Emacs: indefensible, reprehensible, and fully extensible.
From: Karl Fogel <kfogel@floss.red-bean.com>
Date: 20 Jul 2003 16:31:34 -0500
In-Reply-To: <vt24r1ghn0s.fsf@zenia.home>
Message-ID: <87k7acholl.fsf@floss.red-bean.com>
Lines: 107
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Jim Blandy <jimb@redhat.com> writes:
> - nightly regression testing results (for producing tinderbox-like
>   reports)
> 
> - nightly automatic benchmark results - if we could have a web page
>   with graphs showing performance over time, measuring not just
>   various traditional benchmarks but also things like start-up time,
>   code size, compile time, link time --- then inadvertent sudden hits
>   would be easy to spot.
> 
> - Profiling results - I'm imagining a graph whose vertical axis is
>   Subversion revision number (REV), and whose horizontal access is
>   code address (ADDR), where the color at REV, ADDR shows goes from
>   violet to red depending on how much time the program spends there.
>   Hot spots would show up as bright red areas.
> 
> - bug reports (obviously)
> 
> - features yet to be implemented (track like bugs)
> 
> - mail archives (??)
> 
> What do you think?

The one that I personally find most compelling is a bug database -- as
in, using Minor to finally make a good bug tracker (i.e., what Gnats /
BugZilla / IssueZilla should have been).  If we happen to store
Minor's bugs in it, so much the better!  (I'm considering bugs and
feature requests to be the same kinds of object, since in practice
they're tracked the same way.)

A mail database would be great too.  Matt Braithwaite's been thinking
about that problem for a long time.  He wants to keep all his sent and
received mail in one database.  Interface-wise, that's a slightly
different problem, since it's one user browsing private data, not the
public browsing public data.  But the underlying DB schema should be
the same.  It's a hard problem, though, harder than a bug tracker,
I'll bet.  Email has a zillion years of history and interface that has
to be dealt with compatibly; whereas a bug tracker you can do any way
you like, as long as you supply the features people are expecting.
Imports of existing bug data are done on an "as losslessly as we can"
basis.

The other stuff are neat ideas, but they strike me as the sort of
thing a core developer shouldn't spend time on.  Volunteers will test
and profile, and can usually be coaxed into providing their results in
organized, parseable ways.  If they need to extend Minor in order to
do that, they'll let you know.  (I don't mean to say that core
developers shouldn't profile when attacking a specific problem, of
course.)

> What can we bump up in priority for Minor to help all this happen?

Uh, code generation? :-)

I mean, I don't think basic database support requires anything
special.  Just a working Scheme system with decent I/O.  Obviously
there may eventually be optimizations that could help Minor connect to
DBs more efficiently, but that's for the future.

> Database access, obviously.  Could you re-implement what Craig B. did?

Yup.  You get a basic scheme system working, and I'll make some sort
of database access happen.  I can't promise full ODBC compliance, only
because I don't know what that entails, but I know what I need to
actually *use* a database, and will do that part in as ODBC-y a way
as possible.

I'd go and talk to Craig and Jesse, too.  I know they'd be happy to
help out, at the very least with advice.

> Would XML be helpful here?  I remember when we were talking about your
> ideas for bug-tracking stuff, that I suggested you use XML internally,
> to represent whatever microsyntax you wanted to use in your subject
> headers, etc.

Hmmm.  Actually, no, I think we don't need XML for internal stuff,
since XML is just a highly blown-up way of representing lists, and
we've already got pretty good support for lists in Scheme :-).  XML is
most useful as a pseudo-generic import/export format, IMHO -- it saves
time because you don't have to make up a format or write [most of] a
parser.  Take the bug database: if it's a matter of exporting a set of
bugs as a discrete, portable package, then XML is the obvious choice
(because it's what recipients expect, not because it's technically
superior to anything in particular).  But if it's a matter of internal
data representation or cross-component communication, then heck, why
use XML when you have Scheme?

> Karl, during your visit, at one point you said, "Databases are cool."
> This reminded me of Paul Graham's quip about how, between programming
> languages and databases, the former was more more likely to disappear
> over time.  I'm a willing student, but I don't grok this yet (I'm
> still at the BTDTDB stage).  Noneless, I've been thinking about what
> sorts of databases would be nice to have for Minor:

I guess they start to seem cool when you have to deal with problems
involving large amounts of regularized data....  When we write the bug
tracking system (or when you start to use it), I'll bet you'll grok it
:-).

Don't really understand Paul Graham's quip, though, I must admit.

   languages : databases :: stoves : woks

What use is one if the other disappears?

Btw, has Oscar seen any of the Minor design or interfaces yet?


From minor-owner@red-bean.com Mon Jul 21 02:05:28 2003
Received: from zenia.home (12-223-225-216.client.insightbb.com [12.223.225.216])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6L75RDr001972;
	Mon, 21 Jul 2003 02:05:27 -0500
Received: by zenia.home (Postfix, from userid 5433)
	id 98504204F5; Mon, 21 Jul 2003 02:08:12 -0500 (EST)
Sender: jimb@zenia.home
To: kfogel@red-bean.com
Cc: minor@red-bean.com
Subject: Re: Minor databases
References: <vt24r1ghn0s.fsf@zenia.home> <87k7acholl.fsf@floss.red-bean.com>
From: Jim Blandy <jimb@redhat.com>
Date: 21 Jul 2003 02:08:12 -0500
In-Reply-To: <87k7acholl.fsf@floss.red-bean.com>
Message-ID: <vt2wuecfjc3.fsf@zenia.home>
Lines: 96
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii


Karl Fogel <kfogel@floss.red-bean.com> writes:
> A mail database would be great too.  Matt Braithwaite's been thinking
> about that problem for a long time.  He wants to keep all his sent and
> received mail in one database.  Interface-wise, that's a slightly
> different problem, since it's one user browsing private data, not the
> public browsing public data.  But the underlying DB schema should be
> the same.  It's a hard problem, though, harder than a bug tracker,
> I'll bet.  Email has a zillion years of history and interface that has
> to be dealt with compatibly; whereas a bug tracker you can do any way
> you like, as long as you supply the features people are expecting.
> Imports of existing bug data are done on an "as losslessly as we can"
> basis.

Actually, what we're competing with for the mail database is things
like mhonarc and the like.  We can certainly do as well as that.

> The other stuff are neat ideas, but they strike me as the sort of
> thing a core developer shouldn't spend time on.  Volunteers will test
> and profile, and can usually be coaxed into providing their results in
> organized, parseable ways.  If they need to extend Minor in order to
> do that, they'll let you know.  (I don't mean to say that core
> developers shouldn't profile when attacking a specific problem, of
> course.)
> 
> > What can we bump up in priority for Minor to help all this happen?
> 
> Uh, code generation? :-)

... Yez.  When feature A depends on feature B, then feature A
obviously cannot be promoted ahead of B in the to-do list, no matter
what priority one assigns to it.  Instead, in such a situation,
feature B inherits A's priority.  Since every possible Minor feature
depends on code generation, they inherit the highest priority.  But
although it won't affect what we do this month, or perhaps even this
year, I think it's still interesting to ask what our extended
priorities should be.

So, I'm not asking, "What do I need to do in the GC [say] to support
database access."  Rather, I'm asking, "Amongst those things we
imagine doing once the basic Scheme system runs, what can we bump up
in priority?"

But essentially, you're right, we don't have to think about that yet.

> > Would XML be helpful here?  I remember when we were talking about your
> > ideas for bug-tracking stuff, that I suggested you use XML internally,
> > to represent whatever microsyntax you wanted to use in your subject
> > headers, etc.
> 
> Hmmm.  Actually, no, I think we don't need XML for internal stuff,
> since XML is just a highly blown-up way of representing lists, and
> we've already got pretty good support for lists in Scheme :-).  XML is
> most useful as a pseudo-generic import/export format, IMHO -- it saves
> time because you don't have to make up a format or write [most of] a
> parser.  Take the bug database: if it's a matter of exporting a set of
> bugs as a discrete, portable package, then XML is the obvious choice
> (because it's what recipients expect, not because it's technically
> superior to anything in particular).  But if it's a matter of internal
> data representation or cross-component communication, then heck, why
> use XML when you have Scheme?

Well, okay --- a bug-tracking system should allow you to:
- include links to other bugs in the textual field of the bug: "The
  solution to <bug 14954> might work here, too."
- refer to source files in a way that allows them to be links:
  "The way things are done at <file foo.c line 12> is probably wrong."

And stuff like that.  Sure, when it's in the heap, it's all
s-expressions, and when it's in the database, those portions of the
data that are tabular are all broken out into database fields.  But
it's the more free-form stuff that I was wondering about.

> Don't really understand Paul Graham's quip, though, I must admit.
> 
>    languages : databases :: stoves : woks
> 
> What use is one if the other disappears?

I took it as: are you sure programming languages are the best way to
get computers to do what you want?  What if you had an AI system that
would consult with you about what you needed to do, and together help
you evolve your schema, queries, entry procedures, and so on?

That is, the part of the scenario that won't go away is that people
have large datasets.  But the techniques for handling them will
(hopefully) improve.

> Btw, has Oscar seen any of the Minor design or interfaces yet?

Yeah, I forwarded him links into the Subversion repository.  He said:

> Looking over the design document nearly cured me of my desire to get
> involved.  :o) By which I mean, oh yes, there is quite a lot of work
> to be done.  I'm also reminded of all the things we've done in Chez
> Scheme that probably should be written up.


From minor-owner@red-bean.com Mon Jul 21 14:21:15 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6LJLEDs026570
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Mon, 21 Jul 2003 14:21:15 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6LJLEKo026568
	for minor-commits@red-bean.com; Mon, 21 Jul 2003 14:21:14 -0500
Date: Mon, 21 Jul 2003 14:21:14 -0500
Message-Id: <200307211921.h6LJLEKo026568@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 20 - in trunk: . include
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-21 14:21:12 -0500 (Mon, 21 Jul 2003)
New Revision: 20

Added:
   trunk/arch/
Removed:
   trunk/include/arch/
Log:
Re-arrange per-arch stuff to resemble the Linux kernel sources.

Rather than having 'arch/ia-32', 'arch/sparc' etc. subdirectories
scattered throughout the sources, we'll have one top-level 'arch'
directory, with per-architecture subdirectories, each of which mirrors
the overall Minor hierarchy: 'arch/ia-32/gc', 'arch/ia-32/include'
(for non-installed headers), and so on.

The exception will be 'include/minor', the directory for installed
header files.  The whole point of this is to match what actually ends
up installed: 'include/minor/*.h' is installed as
"$prefix/include/minor/*.h".  If there are ever any installed
arch-specific files, they'll need to be installed as
"$prefix/include/minor/arch/MUMBLE/*.h".  So there will be a separate
'arch' subdirectory of 'include/minor'.


Copied: trunk/arch (from rev 19, trunk/include/arch)



From minor-owner@red-bean.com Mon Jul 21 14:22:37 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6LJMbDs026634
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Mon, 21 Jul 2003 14:22:37 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6LJMa75026632
	for minor-commits@red-bean.com; Mon, 21 Jul 2003 14:22:36 -0500
Date: Mon, 21 Jul 2003 14:22:36 -0500
Message-Id: <200307211922.h6LJMa75026632@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 21 - in trunk/arch/ia-32: . include
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-21 14:22:33 -0500 (Mon, 21 Jul 2003)
New Revision: 21

Added:
   trunk/arch/ia-32/include/
   trunk/arch/ia-32/include/gc-map.h
Removed:
   trunk/arch/ia-32/gc-map.h
Log:
Finish some rearrangement.  I wasn't sure if Subversion could handle
it if I did it all at once.  Probably would have been fine.


Copied: trunk/arch/ia-32/include/gc-map.h (from rev 20, trunk/arch/ia-32/gc-map.h)

Deleted: trunk/arch/ia-32/gc-map.h
==============================================================================
--- trunk/arch/ia-32/gc-map.h	2003-07-21 19:21:12 UTC (rev 20)
+++ trunk/arch/ia-32/gc-map.h	2003-07-21 19:22:33 UTC (rev 21)
@@ -1,235 +0,0 @@
-/* gc-map.h --- tracking GC'd memory on the IA-32
-   Jim Blandy <jimb@red-bean.com> --- July 2003  */
-
-#ifndef MINOR_ARCH_IA32_GC_MAP_H
-#define MINOR_ARCH_IA32_GC_MAP_H
-
-/* FIXME: the discussion of generation seven, and the initial state of
-   the map for areas we haven't allocated yet, needs to be improved.
-
-   FIXME: in general, objects loaded from .o files are not necessarily
-   treated as belonging to generation 7, then generation 6 as
-   described.  Since we're doing the loading and relocating ourselves,
-   we have complete information.  It's only objects that come from the
-   main executable, or from shared libraries, that we need to handle
-   this way.  */
-
-/* The map data structure is used for the following tasks:
-
-   - To restrict collection to a limited portion of the heap, the
-     generational garbage collector needs to be able to find all
-     pointers from the uncollected portion into the collected portion:
-     these act as roots for the partial collection.
-
-     Since, in practice, pointers from older objects to younger
-     objects are rare, we can reduce the amount of bookkeeping needed
-     here by, when collecting generation G, always including all
-     generations younger than G in the collection as well.  This means
-     we only need to track pointers in older generations to objects in
-     younger generations --- the rare kind.
-
-     The write barrier records when the mutator creates such
-     old->young pointers in this map, for the collector to use.
-
-   - The generational collector also needs to be able to quickly
-     determine which generation an object belongs to, to decide
-     whether to stop tracing, or continue.
-
-   - When we're done with a collection, we need to be able to find all
-     the pages belonging to now-empty "from" spaces, to free them.
-
-   Keep in mind that we have to deal with objects from two very
-   different sources:
-   - mutators allocating objects as they run, and
-   - .o and .so files being loaded from disk.
-
-   In the latter case, we have less control of and less information
-   about where objects are allocated: we only have what we can get the
-   standard static and dynamic linkers to tell us.  Without custom
-   linker scripts to define helpful symbols, this isn't much.
-
-   A "doting object" is an object in one generation that points to an
-   object in a younger generation.  A "doting page" is a page on which
-   a doting object starts.  Doting objects can be quite large, and
-   cover many pages, but only the page on which a doting object starts
-   is a doting page.  */
-
-
-/* For every 4kb page managed by the garbage collector, we keep the
-   following information.  (Since there is an instance of this
-   structure for every page, it needs to be kept small.)
-
-   At the machine code level, referencing an (unsigned) bit field
-   turns into:
-   - a memory reference to read the word containing the bit field,
-   - a mask, to get rid of bits that don't belong to the field, and
-   - a right shift, to put the bitfield's least significant bit at
-     the bottom of the register.
-
-   But note that a lot of these fields are indices within a page, or
-   portions of page addresses.  So the first thing we're going to do
-   with such values is shift them left again, to multiply by 8 (for
-   first_doting_object and last_doting_object) or by 4k (for
-   next_doting_page and next_generation_page).  So the compiler could
-   combine the right shift and the left shift into a single operation.
-   
-   We can do even better: if we make sure that the right shift (the
-   bitfield's position within the word) and the left shift (the factor
-   we need to multiply it by to get a page offset or a page address)
-   are the same, then they cancel each other out, and all we need to
-   do is fetch and mask.  So first_doting_object, last_doting_object,
-   next_doting_page, and next_generation_page are all aligned this
-   way.
-
-   Remember, premature optimization is the root of all evil.  */
-struct gc_page
-{
-  /* The following fields should all pack into a single 32-bit word.  */
-
-  /* The generation to which the objects in this page belong.  Zero is
-     the youngest generation.  Seven is the "dummy generation", used
-     for memory areas we haven't allocated a separate gc_page arary
-     for yet.  */
-  unsigned generation : 3;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the first doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then
-     last_doting_object == 0 and first_doting_object > 0.
-
-     4k / 8 == 512, so we need nine bits for this field.  To find all
-     the doting pointers, we start here and scan until
-     last_doting_object.  */
-  unsigned first_doting_object : 9;
-
-  /* All the pages that contain doting objects are kept in a
-     singly-linked list; there is one list per generation.  This field
-     is the link in that list: the address of the next such page in
-     this generation, divided by 4k.  For the last page in the chain,
-     this field is zero.  */
-  unsigned next_doting_page : 20;
-
-  /* The following fields should all pack into a single 32-bit word.  */
-
-  /* Non-zero if this page is the first in a contiguous block of two
-     or more pages belonging to the same generation, or on the same
-     free list.  */
-  unsigned first_contiguous : 1;
-
-  /* Free bits!  */
-  unsigned : 2;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the last doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then this is
-     zero.  */
-  unsigned last_doting_object : 9;
-
-  /* All the pages in a generation are kept in a singly-linked list.
-     All free pages are kept in a list.  This is the link in those
-     lists.  It's helpful to recognize contiguous blocks of pages, so
-     this takes care of that, too.
-     - If first_contiguous is non-zero, this is the address of the last
-       page in the contiguous run --- divided by 4k.
-     - Otherwise, if first_contiguous is zero, this is the address of
-       the next page in the list --- divided by 4k.  If this is the last
-       page in the list, this is zero.  */
-  unsigned next_generation_page : 20;
-};
-
-
-/* The map of all pages is a two-level tree.  Given a 32-bit address
-   ADDR, the 'struct gc_page' for that page is:
-
-      mn__ia32_page_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
-
-   In other words, we use the top ten bits of the object's address to
-   index the top-level array, yielding a pointer to the start of a
-   second-level array; we use the next higher ten bits to index into
-   that array, yielding a gc_page structure.
-
-   At the moment, for objects created by loading object files (either
-   .o files we load ourselves or .so files introduced by the dynamic
-   linker), we don't have any way of knowing where their .minor_data
-   sections start and end.  All we do know is that they'll be aligned
-   to 4k boundaries, and that all the .minor_data sections will be
-   concatenated together, not interleaved with non-Scheme objects.
-
-   So, if we find a pointer to a page we're not aware of having
-   allocated, we just assume it belongs to a loaded object file's
-   .minor_data section.
-
-   For areas we haven't knowingly allocated any memory to, we don't
-   put zeros or leave garbage in the top-level array, or garbage in
-   the second-level array.  Rather, we have one special second-level
-   "dummy" gc_page array which all the top-level pointers point into,
-   in which every gc_page looks like:
-
-    {
-      generation = 7,
-      first_doting_object = 1,
-      next_doting_page = 0,
-      first_contiguous = 0,
-      last_doting_object = 0,
-      next_generation_page = 0
-    }
-
-   Generation 7 is the dummy generation: only the dummy gc_page
-   array's pages belong to generation 7.  When the write barrier
-   actually needs to record a doting object in a page that only has a
-   dummy gc_page array, then we actually allocate a real gc_page array
-   for the region, initialize its gc_pages, mark them all as belonging
-   to generation 6, and then record the doting object.
-
-   Since we record the first and last doting objects in a page, we can
-   cope even when portions of that page hold data from sections other
-   than .minor_data.  Since all .minor_data sections are concatenated
-   together, the heap sections from one object file will never be
-   interleaved with other non-heap sections.  And since sections on
-   the i386 are always aligned on 4k boundaries --- our page size, we
-   know that we will never see heap and non-heap data interleaved on a
-   single page.  Thus, if we have the offsets of the first and last
-   doting objects, we know that everything between them must be a heap
-   object, so we know how to scan it for pointers to objects in
-   younger generations.
-
-   By using generation 7 as the "dummy" generation, the GC can decide
-   whether to traverse an object simply by comparing its page's
-   generation number to the number of the oldest generation being
-   collected: dummy pages will always be "too old to collect", so
-   they'll be ignored.  */
-struct gc_page *mn__ia32_page_map[1 << 10];
-
-
-/* Return a pointer to the 'struct gc_page' object for ADDR.  */
-#define GC_PAGE(addr)                                           \
-  (&(mn__ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]   \
-                  [((unsigned int) (addr) >> 12) & 0x3ff]))
-
-
-/* A single generation.  */
-struct gc_generation
-{
-  /* The base address of the first page in this collection.  */
-  void *first_generation_page;
-
-  /* The base address of the first doting page in this generation.  */
-  void *first_doting_page;
-
-  /* How many collections have gone by without collecting any
-     generations older than this.  */
-  int collections;
-
-  /* When 'collections' reaches this number, the next collection will
-     include the next older generation.  */
-  int threshold;
-};
-
-
-/* The table of all generations.  Generation zero is the youngest
-   generation.  Generation 7 is the dummy generation, for areas we
-   haven't allocated anything to yet.  */
-struct gc_generation generations[7];
-
-
-#endif /* MINOR_ARCH_IA32_GC_MAP_H */



From minor-owner@red-bean.com Mon Jul 21 23:52:13 2003
Received: from pimout6-ext.prodigy.net (pimout6-ext.prodigy.net [207.115.63.78])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6M4qCDr012763
	for <minor@red-bean.com>; Mon, 21 Jul 2003 23:52:12 -0500
Received: from floss.red-bean.com (adsl-65-42-85-186.dsl.chcgil.ameritech.net [65.42.85.186])
	by pimout6-ext.prodigy.net (8.12.9/8.12.9) with ESMTP id h6M4qB3S060412;
	Tue, 22 Jul 2003 00:52:11 -0400
Received: from kfogel by floss.red-bean.com with local (Exim 3.34 #1 (Debian))
	id 19ekLI-00009r-00; Mon, 21 Jul 2003 18:45:20 -0500
To: Jim Blandy <jimb@redhat.com>
Cc: minor@red-bean.com
Subject: Re: Minor databases
References: <vt24r1ghn0s.fsf@zenia.home> <87k7acholl.fsf@floss.red-bean.com>
	<vt2wuecfjc3.fsf@zenia.home>
Reply-to: kfogel@red-bean.com
X-Windows: the cutting edge of obsolescence.
From: Karl Fogel <kfogel@floss.red-bean.com>
Date: 21 Jul 2003 18:45:20 -0500
In-Reply-To: <vt2wuecfjc3.fsf@zenia.home>
Message-ID: <87d6g3qwa7.fsf@floss.red-bean.com>
Lines: 54
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Jim Blandy <jimb@redhat.com> writes:
> ... Yez.  When feature A depends on feature B, then feature A
> obviously cannot be promoted ahead of B in the to-do list, no matter
> what priority one assigns to it.  Instead, in such a situation,
> feature B inherits A's priority.  Since every possible Minor feature
> depends on code generation, they inherit the highest priority.  But
> although it won't affect what we do this month, or perhaps even this
> year, I think it's still interesting to ask what our extended
> priorities should be.
> 
> So, I'm not asking, "What do I need to do in the GC [say] to support
> database access."  Rather, I'm asking, "Amongst those things we
> imagine doing once the basic Scheme system runs, what can we bump up
> in priority?"
> 
> But essentially, you're right, we don't have to think about that yet.

That's what I was really saying, I guess.  I can't find a motivation
to think that far ahead.

> Well, okay --- a bug-tracking system should allow you to:
> - include links to other bugs in the textual field of the bug: "The
>   solution to <bug 14954> might work here, too."
> - refer to source files in a way that allows them to be links:
>   "The way things are done at <file foo.c line 12> is probably wrong."
> 
> And stuff like that.  Sure, when it's in the heap, it's all
> s-expressions, and when it's in the database, those portions of the
> data that are tabular are all broken out into database fields.  But
> it's the more free-form stuff that I was wondering about.

One solution is to use URLs.  Since your code is in Subversion, any
reference into it is a URL.  And obviously individual issues will have
URLs, such as

   http://foo.bar.com/mytracker/query.cgi?id=1729

so you can refer to bugs that way.  Some trackers do even more regular
expression matching, so if you type "issue #1729" it automatically
makes it into a link to the proper URL.

There seem to be a lot of ad hoc solutions out there for the free form
stuff; I haven't thought much about whether it's possible to do better
than that...

> Yeah, I forwarded him links into the Subversion repository.  He said:
> 
> > Looking over the design document nearly cured me of my desire to get
> > involved.  :o) By which I mean, oh yes, there is quite a lot of work
> > to be done.  I'm also reminded of all the things we've done in Chez
> > Scheme that probably should be written up.

Heh!  Well, that means there's meat there, at least, if it intimidated
him :-).


From minor-owner@red-bean.com Tue Jul 22 19:54:26 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6N0sPDs022623
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 22 Jul 2003 19:54:25 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6N0sPee022621
	for minor-commits@red-bean.com; Tue, 22 Jul 2003 19:54:25 -0500
Date: Tue, 22 Jul 2003 19:54:25 -0500
Message-Id: <200307230054.h6N0sPee022621@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 22 - trunk/arch/ia-32/include
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-22 19:54:23 -0500 (Tue, 22 Jul 2003)
New Revision: 22

Modified:
   trunk/arch/ia-32/include/gc-map.h
Log:
* arch/ia-32/include/gc-map.h: I think this is done.


Modified: trunk/arch/ia-32/include/gc-map.h
==============================================================================
--- trunk/arch/ia-32/include/gc-map.h	2003-07-21 19:22:33 UTC (rev 21)
+++ trunk/arch/ia-32/include/gc-map.h	2003-07-23 00:54:23 UTC (rev 22)
@@ -4,50 +4,88 @@
 #ifndef MINOR_ARCH_IA32_GC_MAP_H
 #define MINOR_ARCH_IA32_GC_MAP_H
 
-/* FIXME: the discussion of generation seven, and the initial state of
-   the map for areas we haven't allocated yet, needs to be improved.
+/* The GC map is a table mapping every heap object's address onto a
+   mn__ia32_gc_page structure describing the page the object lives in.
+   The mn__ia32_gc_page structure says which generation the objects it
+   contains belong to, and is also where the write barrier records
+   old->young pointers.
 
-   FIXME: in general, objects loaded from .o files are not necessarily
-   treated as belonging to generation 7, then generation 6 as
-   described.  Since we're doing the loading and relocating ourselves,
-   we have complete information.  It's only objects that come from the
-   main executable, or from shared libraries, that we need to handle
-   this way.  */
+   In more detail, here are the jobs the GC map needs to do:
 
-/* The map data structure is used for the following tasks:
+   - The whole idea of generational garbage collection is to usually
+     collect only part of the heap.  Occasionally, you'll need to do a
+     full collection, but if you can focus your time on portions of
+     the heap that contain more garbage, then that time will be more
+     productive, and free up more memory for the mutator to waste.
 
-   - To restrict collection to a limited portion of the heap, the
-     generational garbage collector needs to be able to find all
-     pointers from the uncollected portion into the collected portion:
-     these act as roots for the partial collection.
+     But to restrict collection to a limited portion of the heap, the
+     collector needs to be able to find all pointers from the
+     uncollected portion into the collected portion: these act as
+     roots for the partial collection.
 
      Since, in practice, pointers from older objects to younger
      objects are rare, we can reduce the amount of bookkeeping needed
-     here by, when collecting generation G, always including all
-     generations younger than G in the collection as well.  This means
-     we only need to track pointers in older generations to objects in
-     younger generations --- the rare kind.
+     here by, when collecting generation G, always collecting all
+     generations younger than G as well.  This means we only need to
+     track pointers in older generations to objects in younger
+     generations --- the rare kind.
 
-     The write barrier records when the mutator creates such
-     old->young pointers in this map, for the collector to use.
+     Since a newly allocated object can only be initialized with
+     pointers to existing objects, old->young pointers can only be
+     created by mutation.  Thus, every bit of code that mutates a heap
+     object in Minor needs to include a "write barrier": code that
+     checks whether an old->young pointer is being created, and record
+     such pointers in the GC map, for the collector to use in finding
+     roots for partial collections.
 
    - The generational collector also needs to be able to quickly
-     determine which generation an object belongs to, to decide
-     whether to stop tracing, or continue.
+     determine which generation an object belongs to, to recognize
+     when a pointer points out of the portion of the heap it's
+     collecting.
 
    - When we're done with a collection, we need to be able to find all
      the pages belonging to now-empty "from" spaces, to free them.
 
-   Keep in mind that we have to deal with objects from two very
-   different sources:
-   - mutators allocating objects as they run, and
-   - .o and .so files being loaded from disk.
+   There are two ways objects can come into existence:
 
-   In the latter case, we have less control of and less information
-   about where objects are allocated: we only have what we can get the
-   standard static and dynamic linkers to tell us.  Without custom
-   linker scripts to define helpful symbols, this isn't much.
+   - The mutator can allocate them in the usual way, with 'cons',
+     'make-vector', etc.
 
+   - Executable files and shared libraries may contain objects,
+     constructed at compile-time, linked by the system linker, and
+     introduced into memory by the kernel or the dynamic linker.
+
+   In the first case, code generated by Minor, or hand-written for
+   Minor, handles the allocation, so it can follow whatever
+   conventions we find useful.
+
+   But in the second case, Minor has only limited control over the
+   allocation.  Minor can ensure that all the heap objects in a
+   particular executable or shared library are contiguous, and not
+   interleaved with other sorts of data.  But the GC has no way to
+   find out at run time where each executable/shared library's heap
+   objects are.  This means that the GC can't reliably free up those
+   pages for re-use; they might also contain non-heap objects.  That,
+   in turn, means that the GC might as well never relocate such
+   objects, or even bother to collect them at all --- much better to
+   simply ignore them, except to check for old->young pointers.
+
+   So, when we allocate pages for a thread to allocate new objects
+   into, we mark the pages as belonging to generation zero.  And when
+   we allocate pages to hold objects the collector has promoted from
+   one generation to the next, we record the appropriate generation
+   for them as well.  But we assume that all other pages belong to
+   generation seven, the "immortal generation".  Any objects that we
+   find here must have come from executable files or shared libraries.
+   Other objects are never promoted into the immortal generation.
+
+   Note that when we load a .o file ourselves --- say, when we load a
+   module compiled by the ahead-of-time compiler --- that's Minor code
+   doing the loading, not the kernel or the dynamic linker.  So we can
+   place the .o file's objects (and procedures) in any generation we
+   want.  So that falls in the first category of allocation, not the
+   second.
+
    A "doting object" is an object in one generation that points to an
    object in a younger generation.  A "doting page" is a page on which
    a doting object starts.  Doting objects can be quite large, and
@@ -82,14 +120,14 @@
    way.
 
    Remember, premature optimization is the root of all evil.  */
-struct gc_page
+struct mn__ia32_gc_page
 {
   /* The following fields should all pack into a single 32-bit word.  */
 
   /* The generation to which the objects in this page belong.  Zero is
      the youngest generation.  Seven is the "dummy generation", used
-     for memory areas we haven't allocated a separate gc_page arary
-     for yet.  */
+     for memory areas we haven't allocated a separate mn__ia32_gc_page
+     arary for yet.  */
   unsigned generation : 3;
 
   /* If this is a doting page, this is the offset within this page of
@@ -116,7 +154,7 @@
      free list.  */
   unsigned first_contiguous : 1;
 
-  /* Free bits!  */
+  /* Unused bits!  */
   unsigned : 2;
 
   /* If this is a doting page, this is the offset within this page of
@@ -139,32 +177,57 @@
 
 
 /* The map of all pages is a two-level tree.  Given a 32-bit address
-   ADDR, the 'struct gc_page' for that page is:
+   ADDR, the 'struct mn__ia32_gc_page' for that page is:
 
       mn__ia32_page_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
 
    In other words, we use the top ten bits of the object's address to
-   index the top-level array, yielding a pointer to the start of a
-   second-level array; we use the next higher ten bits to index into
-   that array, yielding a gc_page structure.
+   index the top-level array, yielding a pointer to a second-level
+   array; then we use the next ten bits to index into that array,
+   yielding a mn__ia32_gc_page structure.
 
-   At the moment, for objects created by loading object files (either
-   .o files we load ourselves or .so files introduced by the dynamic
-   linker), we don't have any way of knowing where their .minor_data
-   sections start and end.  All we do know is that they'll be aligned
-   to 4k boundaries, and that all the .minor_data sections will be
-   concatenated together, not interleaved with non-Scheme objects.
+   Initially, before we've allocated any heap pages at all, every
+   entry in mn__ia32_page_map points to the same second-level array
+   object: mn__ia32_immortal_pages.  This creates the appearance of a
+   fully populated tree, with a mn__ia32_gc_page struct for every 4k
+   page in the IA-32's 32-bit address space --- even though
+   mn__ia32_page_map and mn__ia32_immortal_pages occupy only 1k * 4b +
+   1k * 8b == 12kb.
 
-   So, if we find a pointer to a page we're not aware of having
-   allocated, we just assume it belongs to a loaded object file's
-   .minor_data section.
+   As we allocate pages for new allocation, or for to-spaces during
+   collection, we need to record these allocations in the map.  Since
+   mn__ia32_immortal_pages is (potentially) shared by many top-level
+   array entries, we handle it in a copy-on-write fashion: when the
+   mn__ia32_gc_page struct we want to tweak is actually an element of
+   mn__ia32_immortal_pages, we allocate a fresh second-level table,
+   initialize it to be a copy of mn__ia32_immortal_pages, and then
+   tweak the appropriate mn__ia32_gc_page.  So as the program runs, we
+   use map memory only for the interesting parts.
 
-   For areas we haven't knowingly allocated any memory to, we don't
-   put zeros or leave garbage in the top-level array, or garbage in
-   the second-level array.  Rather, we have one special second-level
-   "dummy" gc_page array which all the top-level pointers point into,
-   in which every gc_page looks like:
+   As described above, GNU/Linux doesn't tell us which regions of an
+   executable or shared library contain heap objects: we just
+   occasionally find heap references to objects on pages we've never
+   touched before.  So the initial state of a mn__ia32_gc_page struct
+   has to be appropriate for such objects.  This means:
 
+     - The pages' generation should be the immortal generation ---
+       generation seven.
+
+     - The pages contain no doting objects.  Objects in executables or
+       shared libraries may only (initially) point to objects in other
+       executables or shared libraries, since they were linked by the
+       static linker: otherwise, the static linker would complain
+       about unresolved references.
+
+     - The pages will never be freed.  We don't scavenge objects from
+       executables or shared libraries: we can't be sure where the
+       regions start and end, so we couldn't free the area for reuse
+       after the live objects have been copied out of them.  So
+       next_generation_page and first_contiguous don't need to be
+       initialized to anything special.
+
+   Thus, the default mn__ia32_gc_page struct looks like this:
+
     {
       generation = 7,
       first_doting_object = 1,
@@ -174,62 +237,76 @@
       next_generation_page = 0
     }
 
-   Generation 7 is the dummy generation: only the dummy gc_page
-   array's pages belong to generation 7.  When the write barrier
-   actually needs to record a doting object in a page that only has a
-   dummy gc_page array, then we actually allocate a real gc_page array
-   for the region, initialize its gc_pages, mark them all as belonging
-   to generation 6, and then record the doting object.
+   Of course, the mutator may create doting objects in executables and
+   shared libraries, so it's not the case that every executable or
+   shared object page will always look like this.  But initially, this
+   is fine.
 
-   Since we record the first and last doting objects in a page, we can
-   cope even when portions of that page hold data from sections other
-   than .minor_data.  Since all .minor_data sections are concatenated
-   together, the heap sections from one object file will never be
-   interleaved with other non-heap sections.  And since sections on
-   the i386 are always aligned on 4k boundaries --- our page size, we
-   know that we will never see heap and non-heap data interleaved on a
-   single page.  Thus, if we have the offsets of the first and last
-   doting objects, we know that everything between them must be a heap
-   object, so we know how to scan it for pointers to objects in
-   younger generations.
+   Since the write barrier records the first and last doting objects
+   in a page, and the GC never looks outside that range, things will
+   work correctly even if the initial or tail end of a page holds
+   non-heap objects.  So if the linker concatenates the .minor.data
+   section with the .data or .bss section, for example, things will be
+   fine.
 
-   By using generation 7 as the "dummy" generation, the GC can decide
-   whether to traverse an object simply by comparing its page's
-   generation number to the number of the oldest generation being
-   collected: dummy pages will always be "too old to collect", so
-   they'll be ignored.  */
-struct gc_page *mn__ia32_page_map[1 << 10];
+   Problems would arise if non-heap objects were interleaved with heap
+   objects on a page: first_doting_object could end up pointing before
+   them, while last_doting_object was pointing after them.  However,
+   we know this can't happen:
 
+   - Within a single executable or shared library, the static linker
+     concatenates all the .minor.data sections, without interleaving
+     other sections.  So we don't have to worry about intra-exec/solib
+     interleavings.
 
-/* Return a pointer to the 'struct gc_page' object for ADDR.  */
-#define GC_PAGE(addr)                                           \
-  (&(mn__ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]   \
-                  [((unsigned int) (addr) >> 12) & 0x3ff]))
+   - The IA-32 ABI requires that ELF load segments be aligned on page
+     boundaries.  This means that two non-empty data segments can't
+     appear on the same page.  So we don't have to worry about
+     inter-exec/solib interleavings, either.
 
+   Using the oldest generation, generation 7, as the "immortal"
+   generation means that the collector's test for whether to scavenge
+   an object doesn't need a special case to recognize immortal
+   objects.  The obvious way to write the test, "Is this object's
+   generation less than or equal to the oldest generation we're
+   collecting?" will correctly decline to traverse an immortal object.
+   Since the collector asks this of every object it touches, it's
+   important for this test to be fast.  */
+extern struct mn__ia32_gc_page *mn__ia32_page_map[1 << 10];
 
+/* The array of immortal pages.  */
+extern struct mn__ia32_gc_page mn__ia32_immortal_pages[1 << 10];
+
+
+/* The 'struct mn__ia32_gc_page' object for ADDR.  */
+#define MN__IA32_GC_PAGE(addr)                                  \
+  (mn__ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]     \
+                  [((unsigned int) (addr) >> 12) & 0x3ff])
+
+
 /* A single generation.  */
-struct gc_generation
+struct mn__ia32_gc_generation
 {
-  /* The base address of the first page in this collection.  */
+  /* The base address of the first page in this generation, or zero if
+     the generation contains no pages.  This is invalid in the
+     immortal generation.  */
   void *first_generation_page;
 
-  /* The base address of the first doting page in this generation.  */
+  /* The base address of the first doting page in this generation.
+     Zero if the generation contains no doting pages.  */
   void *first_doting_page;
 
-  /* How many collections have gone by without collecting any
-     generations older than this.  */
+  /* How many collections we've done since the last time we collected
+     any generations older than this.  */
   int collections;
-
-  /* When 'collections' reaches this number, the next collection will
-     include the next older generation.  */
-  int threshold;
 };
 
 
 /* The table of all generations.  Generation zero is the youngest
-   generation.  Generation 7 is the dummy generation, for areas we
-   haven't allocated anything to yet.  */
-struct gc_generation generations[7];
+   generation.  Generation 7 is the immortal generation, for pages in
+   executables and shared libraries (actually, for any page we didn't
+   allocate ourselves).  */
+extern struct mn__ia32_gc_generation mn__ia32_generations[8];
 
 
 #endif /* MINOR_ARCH_IA32_GC_MAP_H */



From minor-owner@red-bean.com Wed Jul 23 01:07:54 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6N67sDs000401
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Wed, 23 Jul 2003 01:07:54 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6N67r1K000399
	for minor-commits@red-bean.com; Wed, 23 Jul 2003 01:07:53 -0500
Date: Wed, 23 Jul 2003 01:07:53 -0500
Message-Id: <200307230607.h6N67r1K000399@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 23 - trunk/arch/ia-32/include
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-23 01:07:50 -0500 (Wed, 23 Jul 2003)
New Revision: 23

Modified:
   trunk/arch/ia-32/include/gc-map.h
Log:
More edits.


Modified: trunk/arch/ia-32/include/gc-map.h
==============================================================================
--- trunk/arch/ia-32/include/gc-map.h	2003-07-23 00:54:23 UTC (rev 22)
+++ trunk/arch/ia-32/include/gc-map.h	2003-07-23 06:07:50 UTC (rev 23)
@@ -21,7 +21,7 @@
      But to restrict collection to a limited portion of the heap, the
      collector needs to be able to find all pointers from the
      uncollected portion into the collected portion: these act as
-     roots for the partial collection.
+     additional roots for the partial collection.
 
      Since, in practice, pointers from older objects to younger
      objects are rare, we can reduce the amount of bookkeeping needed
@@ -30,20 +30,19 @@
      track pointers in older generations to objects in younger
      generations --- the rare kind.
 
-     Since a newly allocated object can only be initialized with
-     pointers to existing objects, old->young pointers can only be
-     created by mutation.  Thus, every bit of code that mutates a heap
-     object in Minor needs to include a "write barrier": code that
-     checks whether an old->young pointer is being created, and record
-     such pointers in the GC map, for the collector to use in finding
-     roots for partial collections.
+     How do we track such pointers?  Since a newly allocated object
+     can only be initialized with pointers to existing objects,
+     old->young pointers can only be created by mutation.  Thus, every
+     bit of code that mutates a heap object in Minor needs to include
+     a "write barrier": code that checks whether an old->young pointer
+     is being created, and records such pointers in the GC map, for
+     the collector to use in finding roots for partial collections.
 
-   - The generational collector also needs to be able to quickly
-     determine which generation an object belongs to, to recognize
-     when a pointer points out of the portion of the heap it's
-     collecting.
+   - We also need to be able to quickly determine which generation an
+     object belongs to, to recognize when a pointer points out of the
+     portion of the heap we're collecting.
 
-   - When we're done with a collection, we need to be able to find all
+   - When we've finished collecting, we need to be able to find all
      the pages belonging to now-empty "from" spaces, to free them.
 
    There are two ways objects can come into existence:
@@ -53,7 +52,8 @@
 
    - Executable files and shared libraries may contain objects,
      constructed at compile-time, linked by the system linker, and
-     introduced into memory by the kernel or the dynamic linker.
+     introduced into memory by the kernel doing an 'exec' or the
+     dynamic linker.
 
    In the first case, code generated by Minor, or hand-written for
    Minor, handles the allocation, so it can follow whatever
@@ -61,30 +61,37 @@
 
    But in the second case, Minor has only limited control over the
    allocation.  Minor can ensure that all the heap objects in a
-   particular executable or shared library are contiguous, and not
-   interleaved with other sorts of data.  But the GC has no way to
-   find out at run time where each executable/shared library's heap
-   objects are.  This means that the GC can't reliably free up those
-   pages for re-use; they might also contain non-heap objects.  That,
-   in turn, means that the GC might as well never relocate such
-   objects, or even bother to collect them at all --- much better to
-   simply ignore them, except to check for old->young pointers.
+   particular executable or shared library appear in one contiguous
+   chunk, not interleaved with other sorts of non-heap data --- from C
+   code, say.  But the GC has no way to find out at run time where
+   each executable/shared library's chunk of heap objects is.  (I
+   think we'd need a custom linker script, or some messy stuff based
+   on the C++ static initializer support, but, bleah.)  This means
+   that the GC can't reliably free up such memory for re-use; it can't
+   tell where Minor heap objects end and foreign non-heap objects
+   begin.  That, in turn, means that the GC might as well never
+   relocate such objects, or even bother to collect them at all ---
+   much better to simply ignore them, except to track old->young
+   pointers.
 
-   So, when we allocate pages for a thread to allocate new objects
-   into, we mark the pages as belonging to generation zero.  And when
-   we allocate pages to hold objects the collector has promoted from
-   one generation to the next, we record the appropriate generation
-   for them as well.  But we assume that all other pages belong to
-   generation seven, the "immortal generation".  Any objects that we
-   find here must have come from executable files or shared libraries.
-   Other objects are never promoted into the immortal generation.
+   So, when we allocate fresh pages for a thread to allocate from, we
+   mark them in the GC map as belonging to generation zero, the
+   youngest generation.  And when we allocate pages to hold objects
+   the collector is promoting from one generation to the next, we
+   record the appropriate generation for them as well.  But we assume
+   that all other pages belong to generation seven, the "immortal
+   generation".  Any objects that we find here must have come from
+   executable files or shared libraries.  Other objects are never
+   promoted into the immortal generation --- they come to rest in
+   generation six.
 
    Note that when we load a .o file ourselves --- say, when we load a
-   module compiled by the ahead-of-time compiler --- that's Minor code
-   doing the loading, not the kernel or the dynamic linker.  So we can
-   place the .o file's objects (and procedures) in any generation we
-   want.  So that falls in the first category of allocation, not the
-   second.
+   module previously compiled by the ahead-of-time compiler --- that's
+   Minor code turning that stream of bytes into objects and
+   procedures, not the kernel or the dynamic linker.  Since the
+   allocation is under our code's control, we can place the .o file's
+   objects (and procedures) in any generation we want.  So loading .o
+   files falls in the first category of allocation, not the second.
 
    A "doting object" is an object in one generation that points to an
    object in a younger generation.  A "doting page" is a page on which
@@ -97,32 +104,37 @@
    following information.  (Since there is an instance of this
    structure for every page, it needs to be kept small.)
 
-   At the machine code level, referencing an (unsigned) bit field
+   From the "premature optimization is the root of all evil" dept:
+
+   At the machine code level, fetching an (unsigned) bit field
    turns into:
-   - a memory reference to read the word containing the bit field,
+   - a memory reference to fetch the word containing the bit field,
    - a mask, to get rid of bits that don't belong to the field, and
    - a right shift, to put the bitfield's least significant bit at
-     the bottom of the register.
+     the right end of the register.
 
-   But note that a lot of these fields are indices within a page, or
-   portions of page addresses.  So the first thing we're going to do
-   with such values is shift them left again, to multiply by 8 (for
-   first_doting_object and last_doting_object) or by 4k (for
+   But note that a lot of fields in this struct are indices within a
+   page, or portions of page addresses.  So the first thing we're
+   going to do with such values is shift them left again, to multiply
+   by 8 (for first_doting_object and last_doting_object) or by 4k (for
    next_doting_page and next_generation_page).  So the compiler could
-   combine the right shift and the left shift into a single operation.
+   combine the right shift of the field fetch and the left shift of
+   the multiply into a single operation, net left or net right.
    
    We can do even better: if we make sure that the right shift (the
-   bitfield's position within the word) and the left shift (the factor
-   we need to multiply it by to get a page offset or a page address)
-   are the same, then they cancel each other out, and all we need to
-   do is fetch and mask.  So first_doting_object, last_doting_object,
-   next_doting_page, and next_generation_page are all aligned this
-   way.
+   bitfield's position within the word) and the left shift (the log2
+   of the factor we need to multiply it by to get a page offset or a
+   page address) are the *same*, then the shifts cancel each other
+   out, and all we need to do is fetch and mask.
 
-   Remember, premature optimization is the root of all evil.  */
+   So first_doting_object, last_doting_object, next_doting_page, and
+   next_generation_page are all aligned this way.  Since page
+   addresses and offsets within a page are disjoint portions of an
+   address word, things fit together pretty nicely.  */
 struct mn__ia32_gc_page
 {
-  /* The following fields should all pack into a single 32-bit word.  */
+  /* The following three fields should all pack into a single 32-bit
+     word.  */
 
   /* The generation to which the objects in this page belong.  Zero is
      the youngest generation.  Seven is the "dummy generation", used
@@ -147,15 +159,11 @@
      this field is zero.  */
   unsigned next_doting_page : 20;
 
-  /* The following fields should all pack into a single 32-bit word.  */
+  /* The following three fields should all pack into a single 32-bit
+     word.  */
 
-  /* Non-zero if this page is the first in a contiguous block of two
-     or more pages belonging to the same generation, or on the same
-     free list.  */
-  unsigned first_contiguous : 1;
-
   /* Unused bits!  */
-  unsigned : 2;
+  unsigned : 3;
 
   /* If this is a doting page, this is the offset within this page of
      the start of the last doting object that begins on this page ---
@@ -164,14 +172,10 @@
   unsigned last_doting_object : 9;
 
   /* All the pages in a generation are kept in a singly-linked list.
-     All free pages are kept in a list.  This is the link in those
-     lists.  It's helpful to recognize contiguous blocks of pages, so
-     this takes care of that, too.
-     - If first_contiguous is non-zero, this is the address of the last
-       page in the contiguous run --- divided by 4k.
-     - Otherwise, if first_contiguous is zero, this is the address of
-       the next page in the list --- divided by 4k.  If this is the last
-       page in the list, this is zero.  */
+     All free pages are kept in a list, too.  This is the link in
+     those lists.  This is the address of the next page in the list
+     --- divided by 4k.  If this is the last page in the list, this is
+     zero.  */
   unsigned next_generation_page : 20;
 };
 
@@ -184,7 +188,7 @@
    In other words, we use the top ten bits of the object's address to
    index the top-level array, yielding a pointer to a second-level
    array; then we use the next ten bits to index into that array,
-   yielding a mn__ia32_gc_page structure.
+   yielding a mn__ia32_gc_page structure for a particular page.
 
    Initially, before we've allocated any heap pages at all, every
    entry in mn__ia32_page_map points to the same second-level array
@@ -194,37 +198,41 @@
    mn__ia32_page_map and mn__ia32_immortal_pages occupy only 1k * 4b +
    1k * 8b == 12kb.
 
-   As we allocate pages for new allocation, or for to-spaces during
-   collection, we need to record these allocations in the map.  Since
-   mn__ia32_immortal_pages is (potentially) shared by many top-level
-   array entries, we handle it in a copy-on-write fashion: when the
-   mn__ia32_gc_page struct we want to tweak is actually an element of
-   mn__ia32_immortal_pages, we allocate a fresh second-level table,
-   initialize it to be a copy of mn__ia32_immortal_pages, and then
-   tweak the appropriate mn__ia32_gc_page.  So as the program runs, we
-   use map memory only for the interesting parts.
+   As we allocate pages for newly allocated objects, or for to-spaces
+   during collection, we need to record these allocations in the map.
+   Since mn__ia32_immortal_pages is (potentially) shared by many
+   top-level array entries, we handle things in a copy-on-write
+   fashion: when the mn__ia32_gc_page struct we want to tweak is
+   actually an element of mn__ia32_immortal_pages, we allocate a fresh
+   second-level table, initialize it to be a copy of
+   mn__ia32_immortal_pages, change the appropriate entry in the
+   top-level array to point to it, and then tweak the appropriate
+   mn__ia32_gc_page.  So as the program runs, we dedicate map memory
+   only to the interesting parts, without making any assumptions about
+   where in the address space malloc/mmap will give us pages from.
 
    As described above, GNU/Linux doesn't tell us which regions of an
    executable or shared library contain heap objects: we just
    occasionally find heap references to objects on pages we've never
-   touched before.  So the initial state of a mn__ia32_gc_page struct
-   has to be appropriate for such objects.  This means:
+   touched before.  So the initial state of an mn__ia32_gc_page struct
+   has to be appropriate for such objects.  This tells us several
+   things:
 
-     - The pages' generation should be the immortal generation ---
-       generation seven.
+   - The pages' generation should be the immortal generation ---
+     generation seven.
 
-     - The pages contain no doting objects.  Objects in executables or
-       shared libraries may only (initially) point to objects in other
-       executables or shared libraries, since they were linked by the
-       static linker: otherwise, the static linker would complain
-       about unresolved references.
+   - The pages (initially) contain no doting objects.  Objects in
+     executables or shared libraries may only point to objects in
+     other executables or shared libraries, since they were linked by
+     the static linker: otherwise, the static linker would have
+     complained about unresolved references.
 
-     - The pages will never be freed.  We don't scavenge objects from
-       executables or shared libraries: we can't be sure where the
-       regions start and end, so we couldn't free the area for reuse
-       after the live objects have been copied out of them.  So
-       next_generation_page and first_contiguous don't need to be
-       initialized to anything special.
+   - The pages will never be freed.  We don't scavenge objects from
+     executables or shared libraries: we can't be sure where the
+     regions of heap objects start and end, so we couldn't free the
+     area for reuse after the live objects have been copied out of
+     them anyway.  So next_generation_page and first_contiguous don't
+     need to be initialized to anything special.
 
    Thus, the default mn__ia32_gc_page struct looks like this:
 
@@ -232,44 +240,49 @@
       generation = 7,
       first_doting_object = 1,
       next_doting_page = 0,
-      first_contiguous = 0,
       last_doting_object = 0,
       next_generation_page = 0
     }
 
+   Every element of mn__ia32_immortal_pages looks like that.
+
    Of course, the mutator may create doting objects in executables and
    shared libraries, so it's not the case that every executable or
    shared object page will always look like this.  But initially, this
    is fine.
 
-   Since the write barrier records the first and last doting objects
-   in a page, and the GC never looks outside that range, things will
-   work correctly even if the initial or tail end of a page holds
-   non-heap objects.  So if the linker concatenates the .minor.data
-   section with the .data or .bss section, for example, things will be
-   fine.
+   Since the write barrier records the offsets of the first and last
+   doting objects in a page, and the GC never looks outside that
+   range, things will work correctly even if the initial or tail end
+   of a page holds non-heap objects.  So if the linker concatenates
+   the .minor.data section build by the Minor compiler with the .data
+   or .bss section built by the C compiler, for example, things will
+   be fine.
 
    Problems would arise if non-heap objects were interleaved with heap
-   objects on a page: first_doting_object could end up pointing before
-   them, while last_doting_object was pointing after them.  However,
-   we know this can't happen:
+   objects on a page: if first_doting_object happened to end up
+   pointing before some non-heap objects, and last_doting_object
+   happened to end up pointing after them, then the scan for doting
+   pointers would end up sweeping through non-heap objects.
 
-   - Within a single executable or shared library, the static linker
-     concatenates all the .minor.data sections, without interleaving
-     other sections.  So we don't have to worry about intra-exec/solib
-     interleavings.
+   However, that sort of interleaving can't happen:
 
-   - The IA-32 ABI requires that ELF load segments be aligned on page
-     boundaries.  This means that two non-empty data segments can't
-     appear on the same page.  So we don't have to worry about
-     inter-exec/solib interleavings, either.
+   - Within a single executable or shared library, the static linker's
+     normal behavior is to concatenate all the .minor.data sections,
+     without interleaving other sections.  So we don't have to worry
+     about intra-exec or intra-shared library interleavings.
 
+   - The IA-32 ABI requires that ELF load segments be aligned on 4kb
+     page boundaries.  This means that two non-empty data segments
+     can't appear on the same page.  So we don't have to worry about
+     inter-executable or inter-shared library interleavings, either.
+
    Using the oldest generation, generation 7, as the "immortal"
    generation means that the collector's test for whether to scavenge
    an object doesn't need a special case to recognize immortal
    objects.  The obvious way to write the test, "Is this object's
    generation less than or equal to the oldest generation we're
-   collecting?" will correctly decline to traverse an immortal object.
+   collecting?" will correctly decline to traverse immortal objects.
    Since the collector asks this of every object it touches, it's
    important for this test to be fast.  */
 extern struct mn__ia32_gc_page *mn__ia32_page_map[1 << 10];
@@ -281,10 +294,10 @@
 /* The 'struct mn__ia32_gc_page' object for ADDR.  */
 #define MN__IA32_GC_PAGE(addr)                                  \
   (mn__ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]     \
-                  [((unsigned int) (addr) >> 12) & 0x3ff])
+                    [((unsigned int) (addr) >> 12) & 0x3ff])
 
 
-/* A single generation.  */
+/* A single heap generation.  */
 struct mn__ia32_gc_generation
 {
   /* The base address of the first page in this generation, or zero if



From minor-owner@red-bean.com Wed Jul 23 20:53:31 2003
Received: from pimout6-ext.prodigy.net (pimout6-ext.prodigy.net [207.115.63.78])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6O1rVDr009091;
	Wed, 23 Jul 2003 20:53:31 -0500
Received: from floss.red-bean.com (adsl-65-42-85-186.dsl.chcgil.ameritech.net [65.42.85.186])
	by pimout6-ext.prodigy.net (8.12.9/8.12.9) with ESMTP id h6O1rU3S320270;
	Wed, 23 Jul 2003 21:53:30 -0400
Received: from kfogel by floss.red-bean.com with local (Exim 3.34 #1 (Debian))
	id 19fQVX-0000pf-00; Wed, 23 Jul 2003 15:46:43 -0500
To: jimb@sanpietro.red-bean.com
Cc: minor@red-bean.com
Subject: Re: rev 20 - in trunk: . include
References: <200307211921.h6LJLEKo026568@sanpietro.red-bean.com>
Reply-to: kfogel@red-bean.com
Emacs: if it payed rent for disk space, you'd be rich.
From: Karl Fogel <kfogel@floss.red-bean.com>
Date: 23 Jul 2003 15:46:43 -0500
In-Reply-To: <200307211921.h6LJLEKo026568@sanpietro.red-bean.com>
Message-ID: <871xwhynrg.fsf@floss.red-bean.com>
Lines: 24
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

jimb@sanpietro.red-bean.com writes:
> Re-arrange per-arch stuff to resemble the Linux kernel sources.
> 
> Rather than having 'arch/ia-32', 'arch/sparc' etc. subdirectories
> scattered throughout the sources, we'll have one top-level 'arch'
> directory, with per-architecture subdirectories, each of which mirrors
> the overall Minor hierarchy: 'arch/ia-32/gc', 'arch/ia-32/include'
> (for non-installed headers), and so on.
> 
> The exception will be 'include/minor', the directory for installed
> header files.  The whole point of this is to match what actually ends
> up installed: 'include/minor/*.h' is installed as
> "$prefix/include/minor/*.h".  If there are ever any installed
> arch-specific files, they'll need to be installed as
> "$prefix/include/minor/arch/MUMBLE/*.h".  So there will be a separate
> 'arch' subdirectory of 'include/minor'.
> 
> 
> Copied: trunk/arch (from rev 19, trunk/include/arch)

Quick check: is this the sort of comment that would be better
documented somewhere in the tree, rather than the log message?

-K


From minor-owner@red-bean.com Thu Jul 24 17:46:56 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6OMkuDs017749
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Thu, 24 Jul 2003 17:46:56 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6OMkumk017747
	for minor-commits@red-bean.com; Thu, 24 Jul 2003 17:46:56 -0500
Date: Thu, 24 Jul 2003 17:46:56 -0500
Message-Id: <200307242246.h6OMkumk017747@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 24 - trunk/arch/ia-32
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-24 17:46:53 -0500 (Thu, 24 Jul 2003)
New Revision: 24

Removed:
   trunk/arch/ia-32/include/
Log:
* arch/ia-32/include: Removed, since it's empty now.




From minor-owner@red-bean.com Thu Jul 24 17:48:12 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6OMmBDs017837
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Thu, 24 Jul 2003 17:48:11 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6OMmB6R017835
	for minor-commits@red-bean.com; Thu, 24 Jul 2003 17:48:11 -0500
Date: Thu, 24 Jul 2003 17:48:11 -0500
Message-Id: <200307242248.h6OMmB6R017835@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 25 - in trunk/arch/ia-32: . gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-24 17:48:08 -0500 (Thu, 24 Jul 2003)
New Revision: 25

Added:
   trunk/arch/ia-32/gc/
Modified:
   trunk/arch/ia-32/gc/gc-map.h
Log:
* arch/ia-32/gc/gc-map.h: Some notes about making write barriers SMP-safe.

* arch/ia-32/include/gc-map.h: Moved to...
* arch/ia-32/gc/gc-map.h: ... here, since it's internal to the GC, not 
internal to Minor.


Copied: trunk/arch/ia-32/gc (from rev 22, trunk/arch/ia-32/include)

Modified: trunk/arch/ia-32/gc/gc-map.h
==============================================================================
--- trunk/arch/ia-32/include/gc-map.h	2003-07-23 00:54:23 UTC (rev 22)
+++ trunk/arch/ia-32/gc/gc-map.h	2003-07-24 22:48:08 UTC (rev 25)
@@ -4,6 +4,19 @@
 #ifndef MINOR_ARCH_IA32_GC_MAP_H
 #define MINOR_ARCH_IA32_GC_MAP_H
 
+/* To do:
+
+   - How will the write barrier update the GC map on SMP systems?
+     Holding and releasing a mutex on every store is not going to fly;
+     can we use locked CMPXCHG instructions?
+
+   - If we need control over which instruction accesses the words of
+     the gc_page structure, then bitfields aren't very helpful, since
+     we have to hand-code the reads and writes anyway.  Better for
+     gc_page to simply have two words, with macros to pull them apart
+     and reassemble them, and let C code do all the computation up to
+     the very store, which would be an in-line asm.  */
+
 /* The GC map is a table mapping every heap object's address onto a
    mn__ia32_gc_page structure describing the page the object lives in.
    The mn__ia32_gc_page structure says which generation the objects it
@@ -21,7 +34,7 @@
      But to restrict collection to a limited portion of the heap, the
      collector needs to be able to find all pointers from the
      uncollected portion into the collected portion: these act as
-     roots for the partial collection.
+     additional roots for the partial collection.
 
      Since, in practice, pointers from older objects to younger
      objects are rare, we can reduce the amount of bookkeeping needed
@@ -30,20 +43,19 @@
      track pointers in older generations to objects in younger
      generations --- the rare kind.
 
-     Since a newly allocated object can only be initialized with
-     pointers to existing objects, old->young pointers can only be
-     created by mutation.  Thus, every bit of code that mutates a heap
-     object in Minor needs to include a "write barrier": code that
-     checks whether an old->young pointer is being created, and record
-     such pointers in the GC map, for the collector to use in finding
-     roots for partial collections.
+     How do we track such pointers?  Since a newly allocated object
+     can only be initialized with pointers to existing objects,
+     old->young pointers can only be created by mutation.  Thus, every
+     bit of code that mutates a heap object in Minor needs to include
+     a "write barrier": code that checks whether an old->young pointer
+     is being created, and records such pointers in the GC map, for
+     the collector to use in finding roots for partial collections.
 
-   - The generational collector also needs to be able to quickly
-     determine which generation an object belongs to, to recognize
-     when a pointer points out of the portion of the heap it's
-     collecting.
+   - We also need to be able to quickly determine which generation an
+     object belongs to, to recognize when a pointer points out of the
+     portion of the heap we're collecting.
 
-   - When we're done with a collection, we need to be able to find all
+   - When we've finished collecting, we need to be able to find all
      the pages belonging to now-empty "from" spaces, to free them.
 
    There are two ways objects can come into existence:
@@ -53,7 +65,8 @@
 
    - Executable files and shared libraries may contain objects,
      constructed at compile-time, linked by the system linker, and
-     introduced into memory by the kernel or the dynamic linker.
+     introduced into memory by the kernel doing an 'exec' or the
+     dynamic linker.
 
    In the first case, code generated by Minor, or hand-written for
    Minor, handles the allocation, so it can follow whatever
@@ -61,30 +74,37 @@
 
    But in the second case, Minor has only limited control over the
    allocation.  Minor can ensure that all the heap objects in a
-   particular executable or shared library are contiguous, and not
-   interleaved with other sorts of data.  But the GC has no way to
-   find out at run time where each executable/shared library's heap
-   objects are.  This means that the GC can't reliably free up those
-   pages for re-use; they might also contain non-heap objects.  That,
-   in turn, means that the GC might as well never relocate such
-   objects, or even bother to collect them at all --- much better to
-   simply ignore them, except to check for old->young pointers.
+   particular executable or shared library appear in one contiguous
+   chunk, not interleaved with other sorts of non-heap data --- from C
+   code, say.  But the GC has no way to find out at run time where
+   each executable/shared library's chunk of heap objects is.  (I
+   think we'd need a custom linker script, or some messy stuff based
+   on the C++ static initializer support, but, bleah.)  This means
+   that the GC can't reliably free up such memory for re-use; it can't
+   tell where Minor heap objects end and foreign non-heap objects
+   begin.  That, in turn, means that the GC might as well never
+   relocate such objects, or even bother to collect them at all ---
+   much better to simply ignore them, except to track old->young
+   pointers.
 
-   So, when we allocate pages for a thread to allocate new objects
-   into, we mark the pages as belonging to generation zero.  And when
-   we allocate pages to hold objects the collector has promoted from
-   one generation to the next, we record the appropriate generation
-   for them as well.  But we assume that all other pages belong to
-   generation seven, the "immortal generation".  Any objects that we
-   find here must have come from executable files or shared libraries.
-   Other objects are never promoted into the immortal generation.
+   So, when we allocate fresh pages for a thread to allocate from, we
+   mark them in the GC map as belonging to generation zero, the
+   youngest generation.  And when we allocate pages to hold objects
+   the collector is promoting from one generation to the next, we
+   record the appropriate generation for them as well.  But we assume
+   that all other pages belong to generation seven, the "immortal
+   generation".  Any objects that we find here must have come from
+   executable files or shared libraries.  Other objects are never
+   promoted into the immortal generation --- they come to rest in
+   generation six.
 
    Note that when we load a .o file ourselves --- say, when we load a
-   module compiled by the ahead-of-time compiler --- that's Minor code
-   doing the loading, not the kernel or the dynamic linker.  So we can
-   place the .o file's objects (and procedures) in any generation we
-   want.  So that falls in the first category of allocation, not the
-   second.
+   module previously compiled by the ahead-of-time compiler --- that's
+   Minor code turning that stream of bytes into objects and
+   procedures, not the kernel or the dynamic linker.  Since the
+   allocation is under our code's control, we can place the .o file's
+   objects (and procedures) in any generation we want.  So loading .o
+   files falls in the first category of allocation, not the second.
 
    A "doting object" is an object in one generation that points to an
    object in a younger generation.  A "doting page" is a page on which
@@ -97,32 +117,37 @@
    following information.  (Since there is an instance of this
    structure for every page, it needs to be kept small.)
 
-   At the machine code level, referencing an (unsigned) bit field
+   From the "premature optimization is the root of all evil" dept:
+
+   At the machine code level, fetching an (unsigned) bit field
    turns into:
-   - a memory reference to read the word containing the bit field,
+   - a memory reference to fetch the word containing the bit field,
    - a mask, to get rid of bits that don't belong to the field, and
    - a right shift, to put the bitfield's least significant bit at
-     the bottom of the register.
+     the right end of the register.
 
-   But note that a lot of these fields are indices within a page, or
-   portions of page addresses.  So the first thing we're going to do
-   with such values is shift them left again, to multiply by 8 (for
-   first_doting_object and last_doting_object) or by 4k (for
+   But note that a lot of fields in this struct are indices within a
+   page, or portions of page addresses.  So the first thing we're
+   going to do with such values is shift them left again, to multiply
+   by 8 (for first_doting_object and last_doting_object) or by 4k (for
    next_doting_page and next_generation_page).  So the compiler could
-   combine the right shift and the left shift into a single operation.
+   combine the right shift of the field fetch and the left shift of
+   the multiply into a single operation, net left or net right.
    
    We can do even better: if we make sure that the right shift (the
-   bitfield's position within the word) and the left shift (the factor
-   we need to multiply it by to get a page offset or a page address)
-   are the same, then they cancel each other out, and all we need to
-   do is fetch and mask.  So first_doting_object, last_doting_object,
-   next_doting_page, and next_generation_page are all aligned this
-   way.
+   bitfield's position within the word) and the left shift (the log2
+   of the factor we need to multiply it by to get a page offset or a
+   page address) are the *same*, then the shifts cancel each other
+   out, and all we need to do is fetch and mask.
 
-   Remember, premature optimization is the root of all evil.  */
+   So first_doting_object, last_doting_object, next_doting_page, and
+   next_generation_page are all aligned this way.  Since page
+   addresses and offsets within a page are disjoint portions of an
+   address word, things fit together pretty nicely.  */
 struct mn__ia32_gc_page
 {
-  /* The following fields should all pack into a single 32-bit word.  */
+  /* The following three fields should all pack into a single 32-bit
+     word.  */
 
   /* The generation to which the objects in this page belong.  Zero is
      the youngest generation.  Seven is the "dummy generation", used
@@ -147,15 +172,11 @@
      this field is zero.  */
   unsigned next_doting_page : 20;
 
-  /* The following fields should all pack into a single 32-bit word.  */
+  /* The following three fields should all pack into a single 32-bit
+     word.  */
 
-  /* Non-zero if this page is the first in a contiguous block of two
-     or more pages belonging to the same generation, or on the same
-     free list.  */
-  unsigned first_contiguous : 1;
-
   /* Unused bits!  */
-  unsigned : 2;
+  unsigned : 3;
 
   /* If this is a doting page, this is the offset within this page of
      the start of the last doting object that begins on this page ---
@@ -164,14 +185,10 @@
   unsigned last_doting_object : 9;
 
   /* All the pages in a generation are kept in a singly-linked list.
-     All free pages are kept in a list.  This is the link in those
-     lists.  It's helpful to recognize contiguous blocks of pages, so
-     this takes care of that, too.
-     - If first_contiguous is non-zero, this is the address of the last
-       page in the contiguous run --- divided by 4k.
-     - Otherwise, if first_contiguous is zero, this is the address of
-       the next page in the list --- divided by 4k.  If this is the last
-       page in the list, this is zero.  */
+     All free pages are kept in a list, too.  This is the link in
+     those lists.  This is the address of the next page in the list
+     --- divided by 4k.  If this is the last page in the list, this is
+     zero.  */
   unsigned next_generation_page : 20;
 };
 
@@ -184,7 +201,7 @@
    In other words, we use the top ten bits of the object's address to
    index the top-level array, yielding a pointer to a second-level
    array; then we use the next ten bits to index into that array,
-   yielding a mn__ia32_gc_page structure.
+   yielding a mn__ia32_gc_page structure for a particular page.
 
    Initially, before we've allocated any heap pages at all, every
    entry in mn__ia32_page_map points to the same second-level array
@@ -194,37 +211,41 @@
    mn__ia32_page_map and mn__ia32_immortal_pages occupy only 1k * 4b +
    1k * 8b == 12kb.
 
-   As we allocate pages for new allocation, or for to-spaces during
-   collection, we need to record these allocations in the map.  Since
-   mn__ia32_immortal_pages is (potentially) shared by many top-level
-   array entries, we handle it in a copy-on-write fashion: when the
-   mn__ia32_gc_page struct we want to tweak is actually an element of
-   mn__ia32_immortal_pages, we allocate a fresh second-level table,
-   initialize it to be a copy of mn__ia32_immortal_pages, and then
-   tweak the appropriate mn__ia32_gc_page.  So as the program runs, we
-   use map memory only for the interesting parts.
+   As we allocate pages for newly allocated objects, or for to-spaces
+   during collection, we need to record these allocations in the map.
+   Since mn__ia32_immortal_pages is (potentially) shared by many
+   top-level array entries, we handle things in a copy-on-write
+   fashion: when the mn__ia32_gc_page struct we want to tweak is
+   actually an element of mn__ia32_immortal_pages, we allocate a fresh
+   second-level table, initialize it to be a copy of
+   mn__ia32_immortal_pages, change the appropriate entry in the
+   top-level array to point to it, and then tweak the appropriate
+   mn__ia32_gc_page.  So as the program runs, we dedicate map memory
+   only to the interesting parts, without making any assumptions about
+   where in the address space malloc/mmap will give us pages from.
 
    As described above, GNU/Linux doesn't tell us which regions of an
    executable or shared library contain heap objects: we just
    occasionally find heap references to objects on pages we've never
-   touched before.  So the initial state of a mn__ia32_gc_page struct
-   has to be appropriate for such objects.  This means:
+   touched before.  So the initial state of an mn__ia32_gc_page struct
+   has to be appropriate for such objects.  This tells us several
+   things:
 
-     - The pages' generation should be the immortal generation ---
-       generation seven.
+   - The pages' generation should be the immortal generation ---
+     generation seven.
 
-     - The pages contain no doting objects.  Objects in executables or
-       shared libraries may only (initially) point to objects in other
-       executables or shared libraries, since they were linked by the
-       static linker: otherwise, the static linker would complain
-       about unresolved references.
+   - The pages (initially) contain no doting objects.  Objects in
+     executables or shared libraries may only point to objects in
+     other executables or shared libraries, since they were linked by
+     the static linker: otherwise, the static linker would have
+     complained about unresolved references.
 
-     - The pages will never be freed.  We don't scavenge objects from
-       executables or shared libraries: we can't be sure where the
-       regions start and end, so we couldn't free the area for reuse
-       after the live objects have been copied out of them.  So
-       next_generation_page and first_contiguous don't need to be
-       initialized to anything special.
+   - The pages will never be freed.  We don't scavenge objects from
+     executables or shared libraries: we can't be sure where the
+     regions of heap objects start and end, so we couldn't free the
+     area for reuse after the live objects have been copied out of
+     them anyway.  So next_generation_page and first_contiguous don't
+     need to be initialized to anything special.
 
    Thus, the default mn__ia32_gc_page struct looks like this:
 
@@ -232,44 +253,49 @@
       generation = 7,
       first_doting_object = 1,
       next_doting_page = 0,
-      first_contiguous = 0,
       last_doting_object = 0,
       next_generation_page = 0
     }
 
+   Every element of mn__ia32_immortal_pages looks like that.
+
    Of course, the mutator may create doting objects in executables and
    shared libraries, so it's not the case that every executable or
    shared object page will always look like this.  But initially, this
    is fine.
 
-   Since the write barrier records the first and last doting objects
-   in a page, and the GC never looks outside that range, things will
-   work correctly even if the initial or tail end of a page holds
-   non-heap objects.  So if the linker concatenates the .minor.data
-   section with the .data or .bss section, for example, things will be
-   fine.
+   Since the write barrier records the offsets of the first and last
+   doting objects in a page, and the GC never looks outside that
+   range, things will work correctly even if the initial or tail end
+   of a page holds non-heap objects.  So if the linker concatenates
+   the .minor.data section build by the Minor compiler with the .data
+   or .bss section built by the C compiler, for example, things will
+   be fine.
 
    Problems would arise if non-heap objects were interleaved with heap
-   objects on a page: first_doting_object could end up pointing before
-   them, while last_doting_object was pointing after them.  However,
-   we know this can't happen:
+   objects on a page: if first_doting_object happened to end up
+   pointing before some non-heap objects, and last_doting_object
+   happened to end up pointing after them, then the scan for doting
+   pointers would end up sweeping through non-heap objects.
 
-   - Within a single executable or shared library, the static linker
-     concatenates all the .minor.data sections, without interleaving
-     other sections.  So we don't have to worry about intra-exec/solib
-     interleavings.
+   However, that sort of interleaving can't happen:
 
-   - The IA-32 ABI requires that ELF load segments be aligned on page
-     boundaries.  This means that two non-empty data segments can't
-     appear on the same page.  So we don't have to worry about
-     inter-exec/solib interleavings, either.
+   - Within a single executable or shared library, the static linker's
+     normal behavior is to concatenate all the .minor.data sections,
+     without interleaving other sections.  So we don't have to worry
+     about intra-exec or intra-shared library interleavings.
 
+   - The IA-32 ABI requires that ELF load segments be aligned on 4kb
+     page boundaries.  This means that two non-empty data segments
+     can't appear on the same page.  So we don't have to worry about
+     inter-executable or inter-shared library interleavings, either.
+
    Using the oldest generation, generation 7, as the "immortal"
    generation means that the collector's test for whether to scavenge
    an object doesn't need a special case to recognize immortal
    objects.  The obvious way to write the test, "Is this object's
    generation less than or equal to the oldest generation we're
-   collecting?" will correctly decline to traverse an immortal object.
+   collecting?" will correctly decline to traverse immortal objects.
    Since the collector asks this of every object it touches, it's
    important for this test to be fast.  */
 extern struct mn__ia32_gc_page *mn__ia32_page_map[1 << 10];
@@ -281,10 +307,10 @@
 /* The 'struct mn__ia32_gc_page' object for ADDR.  */
 #define MN__IA32_GC_PAGE(addr)                                  \
   (mn__ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]     \
-                  [((unsigned int) (addr) >> 12) & 0x3ff])
+                    [((unsigned int) (addr) >> 12) & 0x3ff])
 
 
-/* A single generation.  */
+/* A single heap generation.  */
 struct mn__ia32_gc_generation
 {
   /* The base address of the first page in this generation, or zero if



From minor-owner@red-bean.com Sat Jul 26 17:01:04 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6QM13Ds008556
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 26 Jul 2003 17:01:04 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6QM13LF008554
	for minor-commits@red-bean.com; Sat, 26 Jul 2003 17:01:03 -0500
Date: Sat, 26 Jul 2003 17:01:03 -0500
Message-Id: <200307262201.h6QM13LF008554@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 26 - trunk/arch/ia-32/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-26 17:01:00 -0500 (Sat, 26 Jul 2003)
New Revision: 26

Modified:
   trunk/arch/ia-32/gc/gc-map.h
Log:
* arch/ia-32/gc/gc-map.h: Note ideas for performance improvements.


Modified: trunk/arch/ia-32/gc/gc-map.h
==============================================================================
--- trunk/arch/ia-32/gc/gc-map.h	2003-07-24 22:48:08 UTC (rev 25)
+++ trunk/arch/ia-32/gc/gc-map.h	2003-07-26 22:01:00 UTC (rev 26)
@@ -4,19 +4,6 @@
 #ifndef MINOR_ARCH_IA32_GC_MAP_H
 #define MINOR_ARCH_IA32_GC_MAP_H
 
-/* To do:
-
-   - How will the write barrier update the GC map on SMP systems?
-     Holding and releasing a mutex on every store is not going to fly;
-     can we use locked CMPXCHG instructions?
-
-   - If we need control over which instruction accesses the words of
-     the gc_page structure, then bitfields aren't very helpful, since
-     we have to hand-code the reads and writes anyway.  Better for
-     gc_page to simply have two words, with macros to pull them apart
-     and reassemble them, and let C code do all the computation up to
-     the very store, which would be an in-line asm.  */
-
 /* The GC map is a table mapping every heap object's address onto a
    mn__ia32_gc_page structure describing the page the object lives in.
    The mn__ia32_gc_page structure says which generation the objects it
@@ -335,4 +322,118 @@
 extern struct mn__ia32_gc_generation mn__ia32_generations[8];
 
 
+/* Future directions:
+
+   * Faster SMP access:
+
+   At the moment, there's a single mutex that protects the entire gc
+   map, so all stores of boxed values by all threads are serialized.
+   This is bad for programs that make heavy use of side-effect-based
+   data structures like hash tables or vectors.
+
+   However, I think I've figured out a nice representation for the
+   GC map that would require very little mutual exclusion, using
+   hand-written IA-32 assembly code for the critical operations.
+   While this would involve non-trivial changes to the GC map, the
+   essential ideas would remain the same: using a two-level tree to
+   represent a sparse array, and recording for each page the range
+   of starting offsets of objects that might contain doting
+   pointers.  Since the changes aren't fundamental, that means that
+   it's a performance improvement that can be put off for later: we
+   don't cause ourselves any trouble by doing things slow and simple
+   for now.
+
+   There are two basic ideas:
+
+   ** Use locked cmpxchg to update doting ranges.
+
+   If we move the first_doting_object and last_doting_object fields to
+   the same word, then we can use the IA-32 cmpxchg instruction to
+   update them atomically, without holding a mutex.  It's the standard
+   cmpxchg idiom:
+
+   - Read the word containing the first_doting_object and
+     last_doting_object fields.  Keep a copy of the original value.
+
+   - In a temporary, compute the new values the fields should have:
+     push first_d_o back or push last_d_o forward to include the new
+     doting object's offset.
+
+   - Use a locked cmpxchg instruction to atomically verify that the
+     word hasn't changed from its original value, and if it hasn't,
+     plunk the new value in.
+
+   - If the word had changed, then someone else snuck in before us and
+     did an update; start the process from the beginning.
+
+   (This idiom is a little like a database transaction: you get to do
+   complex stuff that appears atomic to the outside world, but you may
+   have to abort and start from the beginning if there's contention.)
+
+   ** Use a tree of flags to let the collector find doting pages.
+
+   At the moment, we're using a singly-linked list to record the set
+   of doting pages, so the collector can find them efficiently.  It
+   may be possible to devise some way to add pages to the linked list
+   without locking, but I think it would involve mfence or sfence
+   instructions, and I'm not really comfortable enough reasoning about
+   those to be able to evaluate possible solutions.
+
+   But there's a decent enough way to handle things based on plain old
+   locked 'or' instructions.
+
+   - First of all, if a top-level map element points to
+     mn__ia32_immortal_pages, then it doesn't contain any doting
+     pages.  I think it's okay to require the collector to scan the
+     entire top-level array and examine only those regions that have
+     had real second-level arrays allocated to them.
+
+   - Set aside two bits in each page structure, L1 and L2.
+
+   - For every i such that 0 <= i < 1024 and i % 128 is zero, let
+     page[i].L1 be set if and only if there are any doting objects in
+     pages i .. i + 127.
+
+   - For every j such that 0 <= i < 1024 and j % 16 is zero, let
+     page[j].L2 be set if and only if there are doting objects in
+     pages j .. j + 15.
+
+   So the collector scans the top-level array looking for second-level
+   arrays that have actual data in them.  Then, it can examine every
+   128'th page's L1 bit, and skip the entire 128 if it's clear.  If
+   it's set, then it can examine the L2 bit of every 16'th page within
+   those 128 pages, and see whether to scan the individual gc_page
+   structures for interesting first_doting_object / last_doting_object
+   values.
+
+   If doting objects are scattered around in the wrong way, then the
+   collector could possibly end up scanning all 1024 gc_page
+   structures in the second-level array.  But if they're clumped, then
+   this will do well.  This is the same kind of compromise we made
+   with first_doting_object and last_doting_object anyway: some usage
+   patterns could end up forcing you to scan a lot of objects that
+   have no doting pointers.
+
+   The beauty of this is the write barrier: once you've done the
+   cmpxchg trick above to get the first_doting_object /
+   last_doting_object values set up, then you just do two locked 'or'
+   instructions: if your doting object is on page k, then you set the
+   L2 bit on page [k & ~15], and the L1 bit or page [k & ~127].
+
+   That's it.  No mutexes required.  No memory fence reasoning
+   required.
+
+   Now, an NPTL futex-based mutex is as efficient as a spin lock when
+   there is no contention, so I'm not sure whether just using a mutex
+   and doing everything in the obvious fashion might actually be just
+   as fast in practice.  This is all something to experiment with.
+
+   The important conclusion is that there are several different
+   approaches which should perform pretty well that don't require
+   really fundamental revisions to the data structure.  This means we
+   don't have to get this right before we can proceed with other
+   things --- we can put it off until we have actual running code to
+   benchmark.  */
+  
+
 #endif /* MINOR_ARCH_IA32_GC_MAP_H */



From minor-owner@red-bean.com Tue Jul 29 12:31:23 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h6THVMDs006722
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 29 Jul 2003 12:31:22 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h6THVMPU006720
	for minor-commits@red-bean.com; Tue, 29 Jul 2003 12:31:22 -0500
Date: Tue, 29 Jul 2003 12:31:22 -0500
Message-Id: <200307291731.h6THVMPU006720@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 27 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-07-29 12:31:18 -0500 (Tue, 29 Jul 2003)
New Revision: 27

Modified:
   trunk/doc/design
Log:
Add reference to Roland's paper on atomic heap transactions.


Modified: trunk/doc/design
==============================================================================
--- trunk/doc/design	2003-07-26 22:01:00 UTC (rev 26)
+++ trunk/doc/design	2003-07-29 17:31:18 UTC (rev 27)
@@ -392,6 +392,13 @@
 Bristol.  http://people.redhat.com/drepper/dsohowto.pdf
 Available from http://people.redhat.com/drepper, which also has slides.
 
+Atomic heap transactions and fine-grain interrupts.
+Olin Shivers, James W. Clark and Roland McGrath.
+In \emph{Proceedings of the 1999 ACM International Conference
+    on Functional Programming (ICFP)},
+    September, 1999, Paris, France.
+http://www.ai.mit.edu/~shivers/citations.html#heap
+
 - Mike Ashley's flow analysis paper
 - Soft typing
 - PLT's successor to soft typing



From minor-owner@red-bean.com Sat Aug  2 01:35:15 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h726ZEqX019946
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 2 Aug 2003 01:35:14 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h726ZETp019944
	for minor-commits@red-bean.com; Sat, 2 Aug 2003 01:35:14 -0500
Date: Sat, 2 Aug 2003 01:35:14 -0500
Message-Id: <200308020635.h726ZETp019944@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 28 - trunk/arch/ia-32/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-02 01:35:11 -0500 (Sat, 02 Aug 2003)
New Revision: 28

Modified:
   trunk/arch/ia-32/gc/gc-map.h
Log:
Talk about store lists.  All that multi-threaded synchronization hair
is gone, gone, gone.


Modified: trunk/arch/ia-32/gc/gc-map.h
===================================================================
--- trunk/arch/ia-32/gc/gc-map.h	2003-07-29 17:31:18 UTC (rev 27)
+++ trunk/arch/ia-32/gc/gc-map.h	2003-08-02 06:35:11 UTC (rev 28)
@@ -6,10 +6,17 @@
 
 /* The GC map is a table mapping every heap object's address onto a
    mn__ia32_gc_page structure describing the page the object lives in.
-   The mn__ia32_gc_page structure says which generation the objects it
-   contains belong to, and is also where the write barrier records
-   old->young pointers.
+   This structure says which generation the objects it contains belong
+   to, and is also where we record doting objects.
 
+   A "doting object" is an object in one generation that points to an
+   object in a younger generation.  A "doting page" is a page on which
+   a doting object starts.  Doting objects can be quite large, and
+   cover many pages, but only the page on which a doting object starts
+   is a doting page.
+
+   Function of the GC Map ============================================
+
    In more detail, here are the jobs the GC map needs to do:
 
    - The whole idea of generational garbage collection is to usually
@@ -27,15 +34,16 @@
      objects are rare, we can reduce the amount of bookkeeping needed
      here by, when collecting generation G, always collecting all
      generations younger than G as well.  This means we only need to
-     track pointers in older generations to objects in younger
-     generations --- the rare kind.
+     track pointers in objects in older generations to objects in
+     younger generations --- the rare kind.  These are the doting
+     objects.
 
      How do we track such pointers?  Since a newly allocated object
-     can only be initialized with pointers to existing objects,
-     old->young pointers can only be created by mutation.  Thus, every
+     can only be initialized with pointers to existing objects, an
+     object can become a doting object only by mutation.  Thus, every
      bit of code that mutates a heap object in Minor needs to include
-     a "write barrier": code that checks whether an old->young pointer
-     is being created, and records such pointers in the GC map, for
+     a "write barrier": code that allows the GC to check whether a
+     doting object has been created, and record it in the GC map, for
      the collector to use in finding roots for partial collections.
 
    - We also need to be able to quickly determine which generation an
@@ -45,6 +53,9 @@
    - When we've finished collecting, we need to be able to find all
      the pages belonging to now-empty "from" spaces, to free them.
 
+
+   Dynamically vs. Statically Allocated Objects ======================
+
    There are two ways objects can come into existence:
 
    - The mutator can allocate them in the usual way, with 'cons',
@@ -93,17 +104,43 @@
    objects (and procedures) in any generation we want.  So loading .o
    files falls in the first category of allocation, not the second.
 
-   A "doting object" is an object in one generation that points to an
-   object in a younger generation.  A "doting page" is a page on which
-   a doting object starts.  Doting objects can be quite large, and
-   cover many pages, but only the page on which a doting object starts
-   is a doting page.  */
 
+   Mutators' Interface to the GC Map ===================================
 
-/* For every 4kb page managed by the garbage collector, we keep the
-   following information.  (Since there is an instance of this
-   structure for every page, it needs to be kept small.)
+   The mutators' write barrier code does not access the GC map
+   directly.  Instead, mutator threads simply construct store lists
+   --- lists of every object they've ever mutated --- and hand them to
+   the collector when needed.  When a collection starts, the collector
+   records the potentially doting objects mentioned in each thread's
+   store list in the GC map, and then throws the store lists away.
+   This indirect arrangement has the following advantages:
 
+   - Mutators don't need to know about the GC map structure.  It's way
+     too complex to be part of a stable ABI.  The GC map remains
+     strictly internal to the GC.
+
+   - The overhead of the write barrier is the allocation of one pair.
+     The pair is allocated in the new object area, so the cache is
+     always hot.
+
+   - Since this map is only updated and consulted by the GC, it
+     doesn't compete for registers and cache with real mutator code at
+     every store operation; it only gets involved when a GC is about
+     to happen, which trashes both of those things anyway.
+
+   - Since store lists are per-thread, we never have to think about
+     synchronization when building them.
+
+   - Since mutator threads never access the GC map directly, we don't
+     have to worry about synchronization when accessing it, either.  */
+
+
+/* For every 4kb page managed by the garbage collector, we have an
+   instance of the following structure.
+
+   (Since there is an instance of this structure for every page, it
+   needs to be kept small.  8b : 4kb :: 1 : 512.)
+
    From the "premature optimization is the root of all evil" dept:
 
    At the machine code level, fetching an (unsigned) bit field
@@ -321,119 +358,4 @@
    allocate ourselves).  */
 extern struct mn__ia32_gc_generation mn__ia32_generations[8];
 
-
-/* Future directions:
-
-   * Faster SMP access:
-
-   At the moment, there's a single mutex that protects the entire gc
-   map, so all stores of boxed values by all threads are serialized.
-   This is bad for programs that make heavy use of side-effect-based
-   data structures like hash tables or vectors.
-
-   However, I think I've figured out a nice representation for the
-   GC map that would require very little mutual exclusion, using
-   hand-written IA-32 assembly code for the critical operations.
-   While this would involve non-trivial changes to the GC map, the
-   essential ideas would remain the same: using a two-level tree to
-   represent a sparse array, and recording for each page the range
-   of starting offsets of objects that might contain doting
-   pointers.  Since the changes aren't fundamental, that means that
-   it's a performance improvement that can be put off for later: we
-   don't cause ourselves any trouble by doing things slow and simple
-   for now.
-
-   There are two basic ideas:
-
-   ** Use locked cmpxchg to update doting ranges.
-
-   If we move the first_doting_object and last_doting_object fields to
-   the same word, then we can use the IA-32 cmpxchg instruction to
-   update them atomically, without holding a mutex.  It's the standard
-   cmpxchg idiom:
-
-   - Read the word containing the first_doting_object and
-     last_doting_object fields.  Keep a copy of the original value.
-
-   - In a temporary, compute the new values the fields should have:
-     push first_d_o back or push last_d_o forward to include the new
-     doting object's offset.
-
-   - Use a locked cmpxchg instruction to atomically verify that the
-     word hasn't changed from its original value, and if it hasn't,
-     plunk the new value in.
-
-   - If the word had changed, then someone else snuck in before us and
-     did an update; start the process from the beginning.
-
-   (This idiom is a little like a database transaction: you get to do
-   complex stuff that appears atomic to the outside world, but you may
-   have to abort and start from the beginning if there's contention.)
-
-   ** Use a tree of flags to let the collector find doting pages.
-
-   At the moment, we're using a singly-linked list to record the set
-   of doting pages, so the collector can find them efficiently.  It
-   may be possible to devise some way to add pages to the linked list
-   without locking, but I think it would involve mfence or sfence
-   instructions, and I'm not really comfortable enough reasoning about
-   those to be able to evaluate possible solutions.
-
-   But there's a decent enough way to handle things based on plain old
-   locked 'or' instructions.
-
-   - First of all, if a top-level map element points to
-     mn__ia32_immortal_pages, then it doesn't contain any doting
-     pages.  I think it's okay to require the collector to scan the
-     entire top-level array and examine only those regions that have
-     had real second-level arrays allocated to them.
-
-   - Set aside two bits in each page structure, L1 and L2.
-
-   - For every i such that 0 <= i < 1024 and i % 128 is zero, let
-     page[i].L1 be set if and only if there are any doting objects in
-     pages i .. i + 127.
-
-   - For every j such that 0 <= i < 1024 and j % 16 is zero, let
-     page[j].L2 be set if and only if there are doting objects in
-     pages j .. j + 15.
-
-   So the collector scans the top-level array looking for second-level
-   arrays that have actual data in them.  Then, it can examine every
-   128'th page's L1 bit, and skip the entire 128 if it's clear.  If
-   it's set, then it can examine the L2 bit of every 16'th page within
-   those 128 pages, and see whether to scan the individual gc_page
-   structures for interesting first_doting_object / last_doting_object
-   values.
-
-   If doting objects are scattered around in the wrong way, then the
-   collector could possibly end up scanning all 1024 gc_page
-   structures in the second-level array.  But if they're clumped, then
-   this will do well.  This is the same kind of compromise we made
-   with first_doting_object and last_doting_object anyway: some usage
-   patterns could end up forcing you to scan a lot of objects that
-   have no doting pointers.
-
-   The beauty of this is the write barrier: once you've done the
-   cmpxchg trick above to get the first_doting_object /
-   last_doting_object values set up, then you just do two locked 'or'
-   instructions: if your doting object is on page k, then you set the
-   L2 bit on page [k & ~15], and the L1 bit or page [k & ~127].
-
-   That's it.  No mutexes required.  No memory fence reasoning
-   required.
-
-   Now, an NPTL futex-based mutex is as efficient as a spin lock when
-   there is no contention, so I'm not sure whether just using a mutex
-   and doing everything in the obvious fashion might actually be just
-   as fast in practice.  This is all something to experiment with.
-
-   The important conclusion is that there are several different
-   approaches which should perform pretty well that don't require
-   really fundamental revisions to the data structure.  This means we
-   don't have to get this right before we can proceed with other
-   things --- we can put it off until we have actual running code to
-   benchmark.  */
-  
-
 #endif /* MINOR_ARCH_IA32_GC_MAP_H */



From minor-owner@red-bean.com Sat Aug  2 01:38:10 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h726c9qX020019
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 2 Aug 2003 01:38:09 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h726c9dR020016
	for minor-commits@red-bean.com; Sat, 2 Aug 2003 01:38:09 -0500
Date: Sat, 2 Aug 2003 01:38:09 -0500
Message-Id: <200308020638.h726c9dR020016@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 29 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-02 01:38:06 -0500 (Sat, 02 Aug 2003)
New Revision: 29

Modified:
   trunk/doc/design
Log:
- Note that the C API shouldn't preclude concurrent collection, either.
- Sketch the order of attack.
- Add reference to Olin's history of T.


Modified: trunk/doc/design
===================================================================
--- trunk/doc/design	2003-08-02 06:35:11 UTC (rev 28)
+++ trunk/doc/design	2003-08-02 06:38:06 UTC (rev 29)
@@ -52,7 +52,7 @@
 
 - Minor Scheme should provide a thread-aware C API, to allow C code to
   call Scheme code.  The API's design should not preclude the use of
-  copying, generational, or incremental garbage collection.
+  copying, generational, incremental, or concurrent garbage collection.
 
 - Minor Scheme should provide a full foreign function interface, to
   allow Scheme code to call C code.
@@ -113,6 +113,53 @@
 
 Here are the specific approaches we'll use to achieve the goals above.
 
+** GC, then interpreter, then compiler.
+
+We'll produce Minor in the following steps:
+
+- Write the GC first.  SMP-safe and everything.
+
+- Many of the functions in <minor/minor.h> are just constructors and
+  accessors, and are trivial given a GC.  Those are next.  Obviously,
+  'mn_eval' is not in this category.
+
+- Implement an interpreter in C for 'core Scheme', the language
+  produced by macro expansion.  (Not sure how this relates to
+  modules.)  Write this on top of the C API; since the API allows you
+  to define Scheme procedures in C, this will be indistinguishable at
+  the user level from the native code implementation.
+
+- Implement, in core Scheme, the macro expander and module system.
+  Effectively, this implements full Scheme by translation into core
+  Scheme.
+
+  Extend the C runtime as necessary, but try to write as much as
+  possible in Scheme, since the performance of the interpreter is not
+  important, and the Scheme code can be shared with later stages.
+
+- Implement machine-code procedures.  This entails designing:
+  - representations for machine-code procedures
+  - calling/returning conventions for Scheme procedures
+  - representations for Scheme continuations
+  - representations for C continuations in Scheme
+  - code->register use mappings
+  - code->heap use mappings
+  - code->unwinding info mappings (for backtraces and exceptions)
+  - code<->source location mappings
+  - code->catch block mappings
+
+- Implement a JIT compiler for core Scheme in full Scheme.
+
+- Implement separate compilation, so that the standard Unix linker can
+  link separately compiled modules into a running program.  Use
+  ordinary ELF relocatable object files to represent separately
+  compiled modules.
+
+- Build an interpreter-free Scheme executable, by compiling the JIT
+  compiler to ELF files and then linking them in.  Now you have a
+  interactive, native-code, JIT compiler!
+
+
 ** Target IA-32 Linux --- first.
 
 The first release of Minor Scheme will target IA-32 family (i386,
@@ -387,6 +434,10 @@
 Technical Report CMU-CS-91-145, School of Computer Science.
 ftp://cs.cmu.edu/afs%2Fcs.cmu.edu%2Fuser%2Fshivers%2Flib%2Fpapers/diss.ps.Z
 
+Olin Shiver's history of T has pointers to a bunch of seminal Scheme
+papers and dissertations:
+http://www.paulgraham.com/thist.html
+
 Ulrich Drepper.  How to Write Shared Libraries.
 2002-11-3, based on a tutorial given at UKUUG 2002 Conference in
 Bristol.  http://people.redhat.com/drepper/dsohowto.pdf



From minor-owner@red-bean.com Sat Aug  2 02:04:20 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h7274KqX020948
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 2 Aug 2003 02:04:20 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h7274KKC020946
	for minor-commits@red-bean.com; Sat, 2 Aug 2003 02:04:20 -0500
Date: Sat, 2 Aug 2003 02:04:20 -0500
Message-Id: <200308020704.h7274KKC020946@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 30 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-02 02:04:17 -0500 (Sat, 02 Aug 2003)
New Revision: 30

Modified:
   trunk/doc/design
Log:
- Note need for code->environment mapping, for debugging.
- Expand grumbling about modules.


Modified: trunk/doc/design
===================================================================
--- trunk/doc/design	2003-08-02 06:38:06 UTC (rev 29)
+++ trunk/doc/design	2003-08-02 07:04:17 UTC (rev 30)
@@ -71,8 +71,8 @@
   source file.
 
 - Minor Scheme should have good performance.  It should compare well
-  with MzScheme, RScheme, Bigloo, and SCM --- even where those systems
-  use ahead-of-time compilation to native code via C.
+  with MzScheme, RScheme, Bigloo, SCM, and MIT Scheme --- both in
+  interactive performance, and in the speed of pre-compiled code.
 
 ** Secondary Goals
 
@@ -124,11 +124,14 @@
   'mn_eval' is not in this category.
 
 - Implement an interpreter in C for 'core Scheme', the language
-  produced by macro expansion.  (Not sure how this relates to
-  modules.)  Write this on top of the C API; since the API allows you
-  to define Scheme procedures in C, this will be indistinguishable at
-  the user level from the native code implementation.
+  produced by macro expansion.  Write this on top of the C API; since
+  the API allows you to define Scheme procedures in C, this will be
+  indistinguishable at the user level from the native code
+  implementation.
 
+  (I have this feeling modules are involved here.  What do the free
+  variables in a macro-expanded program refer to?)
+
 - Implement, in core Scheme, the macro expander and module system.
   Effectively, this implements full Scheme by translation into core
   Scheme.
@@ -146,6 +149,7 @@
   - code->heap use mappings
   - code->unwinding info mappings (for backtraces and exceptions)
   - code<->source location mappings
+  - code->environment mappings (for debugging)
   - code->catch block mappings
 
 - Implement a JIT compiler for core Scheme in full Scheme.



From minor-owner@red-bean.com Sat Aug  2 16:22:05 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h72LM4qX015504
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 2 Aug 2003 16:22:04 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h72LM48I015502
	for minor-commits@red-bean.com; Sat, 2 Aug 2003 16:22:04 -0500
Date: Sat, 2 Aug 2003 16:22:04 -0500
Message-Id: <200308022122.h72LM48I015502@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 31 - in trunk: . arch/ia-32/gc doc gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-02 16:22:00 -0500 (Sat, 02 Aug 2003)
New Revision: 31

Added:
   trunk/gc/
   trunk/gc/gc-map.h
Removed:
   trunk/arch/ia-32/gc/gc-map.h
Modified:
   trunk/doc/design
Log:
gc-map.h can be portable, with some parameters.  This will be renamed, too.


Deleted: trunk/arch/ia-32/gc/gc-map.h
===================================================================
--- trunk/arch/ia-32/gc/gc-map.h	2003-08-02 07:04:17 UTC (rev 30)
+++ trunk/arch/ia-32/gc/gc-map.h	2003-08-02 21:22:00 UTC (rev 31)
@@ -1,361 +0,0 @@
-/* gc-map.h --- tracking GC'd memory on the IA-32
-   Jim Blandy <jimb@red-bean.com> --- July 2003  */
-
-#ifndef MINOR_ARCH_IA32_GC_MAP_H
-#define MINOR_ARCH_IA32_GC_MAP_H
-
-/* The GC map is a table mapping every heap object's address onto a
-   mn__ia32_gc_page structure describing the page the object lives in.
-   This structure says which generation the objects it contains belong
-   to, and is also where we record doting objects.
-
-   A "doting object" is an object in one generation that points to an
-   object in a younger generation.  A "doting page" is a page on which
-   a doting object starts.  Doting objects can be quite large, and
-   cover many pages, but only the page on which a doting object starts
-   is a doting page.
-
-   Function of the GC Map ============================================
-
-   In more detail, here are the jobs the GC map needs to do:
-
-   - The whole idea of generational garbage collection is to usually
-     collect only part of the heap.  Occasionally, you'll need to do a
-     full collection, but if you can focus your time on portions of
-     the heap that contain more garbage, then that time will be more
-     productive, and free up more memory for the mutator to waste.
-
-     But to restrict collection to a limited portion of the heap, the
-     collector needs to be able to find all pointers from the
-     uncollected portion into the collected portion: these act as
-     additional roots for the partial collection.
-
-     Since, in practice, pointers from older objects to younger
-     objects are rare, we can reduce the amount of bookkeeping needed
-     here by, when collecting generation G, always collecting all
-     generations younger than G as well.  This means we only need to
-     track pointers in objects in older generations to objects in
-     younger generations --- the rare kind.  These are the doting
-     objects.
-
-     How do we track such pointers?  Since a newly allocated object
-     can only be initialized with pointers to existing objects, an
-     object can become a doting object only by mutation.  Thus, every
-     bit of code that mutates a heap object in Minor needs to include
-     a "write barrier": code that allows the GC to check whether a
-     doting object has been created, and record it in the GC map, for
-     the collector to use in finding roots for partial collections.
-
-   - We also need to be able to quickly determine which generation an
-     object belongs to, to recognize when a pointer points out of the
-     portion of the heap we're collecting.
-
-   - When we've finished collecting, we need to be able to find all
-     the pages belonging to now-empty "from" spaces, to free them.
-
-
-   Dynamically vs. Statically Allocated Objects ======================
-
-   There are two ways objects can come into existence:
-
-   - The mutator can allocate them in the usual way, with 'cons',
-     'make-vector', etc.
-
-   - Executable files and shared libraries may contain objects,
-     constructed at compile-time, linked by the system linker, and
-     introduced into memory by the kernel doing an 'exec' or the
-     dynamic linker.
-
-   In the first case, code generated by Minor, or hand-written for
-   Minor, handles the allocation, so it can follow whatever
-   conventions we find useful.
-
-   But in the second case, Minor has only limited control over the
-   allocation.  Minor can ensure that all the heap objects in a
-   particular executable or shared library appear in one contiguous
-   chunk, not interleaved with other sorts of non-heap data --- from C
-   code, say.  But the GC has no way to find out at run time where
-   each executable/shared library's chunk of heap objects is.  (I
-   think we'd need a custom linker script, or some messy stuff based
-   on the C++ static initializer support, but, bleah.)  This means
-   that the GC can't reliably free up such memory for re-use; it can't
-   tell where Minor heap objects end and foreign non-heap objects
-   begin.  That, in turn, means that the GC might as well never
-   relocate such objects, or even bother to collect them at all ---
-   much better to simply ignore them, except to track old->young
-   pointers.
-
-   So, when we allocate fresh pages for a thread to allocate from, we
-   mark them in the GC map as belonging to generation zero, the
-   youngest generation.  And when we allocate pages to hold objects
-   the collector is promoting from one generation to the next, we
-   record the appropriate generation for them as well.  But we assume
-   that all other pages belong to generation seven, the "immortal
-   generation".  Any objects that we find here must have come from
-   executable files or shared libraries.  Other objects are never
-   promoted into the immortal generation --- they come to rest in
-   generation six.
-
-   Note that when we load a .o file ourselves --- say, when we load a
-   module previously compiled by the ahead-of-time compiler --- that's
-   Minor code turning that stream of bytes into objects and
-   procedures, not the kernel or the dynamic linker.  Since the
-   allocation is under our code's control, we can place the .o file's
-   objects (and procedures) in any generation we want.  So loading .o
-   files falls in the first category of allocation, not the second.
-
-
-   Mutators' Interface to the GC Map ===================================
-
-   The mutators' write barrier code does not access the GC map
-   directly.  Instead, mutator threads simply construct store lists
-   --- lists of every object they've ever mutated --- and hand them to
-   the collector when needed.  When a collection starts, the collector
-   records the potentially doting objects mentioned in each thread's
-   store list in the GC map, and then throws the store lists away.
-   This indirect arrangement has the following advantages:
-
-   - Mutators don't need to know about the GC map structure.  It's way
-     too complex to be part of a stable ABI.  The GC map remains
-     strictly internal to the GC.
-
-   - The overhead of the write barrier is the allocation of one pair.
-     The pair is allocated in the new object area, so the cache is
-     always hot.
-
-   - Since this map is only updated and consulted by the GC, it
-     doesn't compete for registers and cache with real mutator code at
-     every store operation; it only gets involved when a GC is about
-     to happen, which trashes both of those things anyway.
-
-   - Since store lists are per-thread, we never have to think about
-     synchronization when building them.
-
-   - Since mutator threads never access the GC map directly, we don't
-     have to worry about synchronization when accessing it, either.  */
-
-
-/* For every 4kb page managed by the garbage collector, we have an
-   instance of the following structure.
-
-   (Since there is an instance of this structure for every page, it
-   needs to be kept small.  8b : 4kb :: 1 : 512.)
-
-   From the "premature optimization is the root of all evil" dept:
-
-   At the machine code level, fetching an (unsigned) bit field
-   turns into:
-   - a memory reference to fetch the word containing the bit field,
-   - a mask, to get rid of bits that don't belong to the field, and
-   - a right shift, to put the bitfield's least significant bit at
-     the right end of the register.
-
-   But note that a lot of fields in this struct are indices within a
-   page, or portions of page addresses.  So the first thing we're
-   going to do with such values is shift them left again, to multiply
-   by 8 (for first_doting_object and last_doting_object) or by 4k (for
-   next_doting_page and next_generation_page).  So the compiler could
-   combine the right shift of the field fetch and the left shift of
-   the multiply into a single operation, net left or net right.
-   
-   We can do even better: if we make sure that the right shift (the
-   bitfield's position within the word) and the left shift (the log2
-   of the factor we need to multiply it by to get a page offset or a
-   page address) are the *same*, then the shifts cancel each other
-   out, and all we need to do is fetch and mask.
-
-   So first_doting_object, last_doting_object, next_doting_page, and
-   next_generation_page are all aligned this way.  Since page
-   addresses and offsets within a page are disjoint portions of an
-   address word, things fit together pretty nicely.  */
-struct mn__ia32_gc_page
-{
-  /* The following three fields should all pack into a single 32-bit
-     word.  */
-
-  /* The generation to which the objects in this page belong.  Zero is
-     the youngest generation.  Seven is the "dummy generation", used
-     for memory areas we haven't allocated a separate mn__ia32_gc_page
-     arary for yet.  */
-  unsigned generation : 3;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the first doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then
-     last_doting_object == 0 and first_doting_object > 0.
-
-     4k / 8 == 512, so we need nine bits for this field.  To find all
-     the doting pointers, we start here and scan until
-     last_doting_object.  */
-  unsigned first_doting_object : 9;
-
-  /* All the pages that contain doting objects are kept in a
-     singly-linked list; there is one list per generation.  This field
-     is the link in that list: the address of the next such page in
-     this generation, divided by 4k.  For the last page in the chain,
-     this field is zero.  */
-  unsigned next_doting_page : 20;
-
-  /* The following three fields should all pack into a single 32-bit
-     word.  */
-
-  /* Unused bits!  */
-  unsigned : 3;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the last doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then this is
-     zero.  */
-  unsigned last_doting_object : 9;
-
-  /* All the pages in a generation are kept in a singly-linked list.
-     All free pages are kept in a list, too.  This is the link in
-     those lists.  This is the address of the next page in the list
-     --- divided by 4k.  If this is the last page in the list, this is
-     zero.  */
-  unsigned next_generation_page : 20;
-};
-
-
-/* The map of all pages is a two-level tree.  Given a 32-bit address
-   ADDR, the 'struct mn__ia32_gc_page' for that page is:
-
-      mn__ia32_page_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
-
-   In other words, we use the top ten bits of the object's address to
-   index the top-level array, yielding a pointer to a second-level
-   array; then we use the next ten bits to index into that array,
-   yielding a mn__ia32_gc_page structure for a particular page.
-
-   Initially, before we've allocated any heap pages at all, every
-   entry in mn__ia32_page_map points to the same second-level array
-   object: mn__ia32_immortal_pages.  This creates the appearance of a
-   fully populated tree, with a mn__ia32_gc_page struct for every 4k
-   page in the IA-32's 32-bit address space --- even though
-   mn__ia32_page_map and mn__ia32_immortal_pages occupy only 1k * 4b +
-   1k * 8b == 12kb.
-
-   As we allocate pages for newly allocated objects, or for to-spaces
-   during collection, we need to record these allocations in the map.
-   Since mn__ia32_immortal_pages is (potentially) shared by many
-   top-level array entries, we handle things in a copy-on-write
-   fashion: when the mn__ia32_gc_page struct we want to tweak is
-   actually an element of mn__ia32_immortal_pages, we allocate a fresh
-   second-level table, initialize it to be a copy of
-   mn__ia32_immortal_pages, change the appropriate entry in the
-   top-level array to point to it, and then tweak the appropriate
-   mn__ia32_gc_page.  So as the program runs, we dedicate map memory
-   only to the interesting parts, without making any assumptions about
-   where in the address space malloc/mmap will give us pages from.
-
-   As described above, GNU/Linux doesn't tell us which regions of an
-   executable or shared library contain heap objects: we just
-   occasionally find heap references to objects on pages we've never
-   touched before.  So the initial state of an mn__ia32_gc_page struct
-   has to be appropriate for such objects.  This tells us several
-   things:
-
-   - The pages' generation should be the immortal generation ---
-     generation seven.
-
-   - The pages (initially) contain no doting objects.  Objects in
-     executables or shared libraries may only point to objects in
-     other executables or shared libraries, since they were linked by
-     the static linker: otherwise, the static linker would have
-     complained about unresolved references.
-
-   - The pages will never be freed.  We don't scavenge objects from
-     executables or shared libraries: we can't be sure where the
-     regions of heap objects start and end, so we couldn't free the
-     area for reuse after the live objects have been copied out of
-     them anyway.  So next_generation_page and first_contiguous don't
-     need to be initialized to anything special.
-
-   Thus, the default mn__ia32_gc_page struct looks like this:
-
-    {
-      generation = 7,
-      first_doting_object = 1,
-      next_doting_page = 0,
-      last_doting_object = 0,
-      next_generation_page = 0
-    }
-
-   Every element of mn__ia32_immortal_pages looks like that.
-
-   Of course, the mutator may create doting objects in executables and
-   shared libraries, so it's not the case that every executable or
-   shared object page will always look like this.  But initially, this
-   is fine.
-
-   Since the write barrier records the offsets of the first and last
-   doting objects in a page, and the GC never looks outside that
-   range, things will work correctly even if the initial or tail end
-   of a page holds non-heap objects.  So if the linker concatenates
-   the .minor.data section build by the Minor compiler with the .data
-   or .bss section built by the C compiler, for example, things will
-   be fine.
-
-   Problems would arise if non-heap objects were interleaved with heap
-   objects on a page: if first_doting_object happened to end up
-   pointing before some non-heap objects, and last_doting_object
-   happened to end up pointing after them, then the scan for doting
-   pointers would end up sweeping through non-heap objects.
-
-   However, that sort of interleaving can't happen:
-
-   - Within a single executable or shared library, the static linker's
-     normal behavior is to concatenate all the .minor.data sections,
-     without interleaving other sections.  So we don't have to worry
-     about intra-exec or intra-shared library interleavings.
-
-   - The IA-32 ABI requires that ELF load segments be aligned on 4kb
-     page boundaries.  This means that two non-empty data segments
-     can't appear on the same page.  So we don't have to worry about
-     inter-executable or inter-shared library interleavings, either.
-
-   Using the oldest generation, generation 7, as the "immortal"
-   generation means that the collector's test for whether to scavenge
-   an object doesn't need a special case to recognize immortal
-   objects.  The obvious way to write the test, "Is this object's
-   generation less than or equal to the oldest generation we're
-   collecting?" will correctly decline to traverse immortal objects.
-   Since the collector asks this of every object it touches, it's
-   important for this test to be fast.  */
-extern struct mn__ia32_gc_page *mn__ia32_page_map[1 << 10];
-
-/* The array of immortal pages.  */
-extern struct mn__ia32_gc_page mn__ia32_immortal_pages[1 << 10];
-
-
-/* The 'struct mn__ia32_gc_page' object for ADDR.  */
-#define MN__IA32_GC_PAGE(addr)                                  \
-  (mn__ia32_page_map[((unsigned int) (addr) >> 22) & 0x3ff]     \
-                    [((unsigned int) (addr) >> 12) & 0x3ff])
-
-
-/* A single heap generation.  */
-struct mn__ia32_gc_generation
-{
-  /* The base address of the first page in this generation, or zero if
-     the generation contains no pages.  This is invalid in the
-     immortal generation.  */
-  void *first_generation_page;
-
-  /* The base address of the first doting page in this generation.
-     Zero if the generation contains no doting pages.  */
-  void *first_doting_page;
-
-  /* How many collections we've done since the last time we collected
-     any generations older than this.  */
-  int collections;
-};
-
-
-/* The table of all generations.  Generation zero is the youngest
-   generation.  Generation 7 is the immortal generation, for pages in
-   executables and shared libraries (actually, for any page we didn't
-   allocate ourselves).  */
-extern struct mn__ia32_gc_generation mn__ia32_generations[8];
-
-#endif /* MINOR_ARCH_IA32_GC_MAP_H */

Modified: trunk/doc/design
===================================================================
--- trunk/doc/design	2003-08-02 07:04:17 UTC (rev 30)
+++ trunk/doc/design	2003-08-02 21:22:00 UTC (rev 31)
@@ -151,7 +151,23 @@
   - code<->source location mappings
   - code->environment mappings (for debugging)
   - code->catch block mappings
+  - an API to let Scheme code wrap up hunks of machine code as actual
+    procedure objects, annotated as above
 
+- Implement an assembler.  This has two parts:
+  - an arch-independent part: just a library for putting together byte
+    strings that contain references to labels: blocks, labels, relocs,
+    annotations, etc.
+  - an arch-dependent part: this emits machine-code representations of
+    some architecture's instructions, using the arch-independent
+    facilities
+
+  The result of the assembly process is blocks of bytes, annotated
+  with labels it defines, labels it refers to, and relocs to say how
+  to patch the latter's values into the blocks.  This is what the
+  machine code -> procedure API expects; it's also what the ELF reader
+  and writer (see below) operate on.
+
 - Implement a JIT compiler for core Scheme in full Scheme.
 
 - Implement separate compilation, so that the standard Unix linker can

Copied: trunk/gc/gc-map.h (from rev 28, trunk/arch/ia-32/gc/gc-map.h)
===================================================================
--- trunk/arch/ia-32/gc/gc-map.h	2003-08-02 06:35:11 UTC (rev 28)
+++ trunk/gc/gc-map.h	2003-08-02 21:22:00 UTC (rev 31)
@@ -0,0 +1,361 @@
+/* map.h --- tracking GC'd memory
+   Jim Blandy <jimb@red-bean.com> --- July 2003  */
+
+#ifndef MINOR_GC_MAP_H
+#define MINOR_GC_MAP_H
+
+/* The GC map is a table mapping every heap object's address onto a
+   gc_page structure describing the page the object lives in.
+   This structure says which generation the objects it contains belong
+   to, and is also where we record doting objects.
+
+   A "doting object" is an object in one generation that points to an
+   object in a younger generation.  A "doting page" is a page on which
+   a doting object starts.  Doting objects can be quite large, and
+   cover many pages, but only the page on which a doting object starts
+   is a doting page.
+
+   Function of the GC Map ============================================
+
+   In more detail, here are the jobs the GC map needs to do:
+
+   - The whole idea of generational garbage collection is to usually
+     collect only part of the heap.  Occasionally, you'll need to do a
+     full collection, but if you can focus your time on portions of
+     the heap that contain more garbage, then that time will be more
+     productive, and free up more memory for the mutator to waste.
+
+     But to restrict collection to a limited portion of the heap, the
+     collector needs to be able to find all pointers from the
+     uncollected portion into the collected portion: these act as
+     additional roots for the partial collection.
+
+     Since, in practice, pointers from older objects to younger
+     objects are rare, we can reduce the amount of bookkeeping needed
+     here by, when collecting generation G, always collecting all
+     generations younger than G as well.  This means we only need to
+     track pointers in objects in older generations to objects in
+     younger generations --- the rare kind.  These are the doting
+     objects.
+
+     How do we track such pointers?  Since a newly allocated object
+     can only be initialized with pointers to existing objects, an
+     object can become a doting object only by mutation.  Thus, every
+     bit of code that mutates a heap object in Minor needs to include
+     a "write barrier": code that allows the GC to check whether a
+     doting object has been created, and record it in the GC map, for
+     the collector to use in finding roots for partial collections.
+
+   - We also need to be able to quickly determine which generation an
+     object belongs to, to recognize when a pointer points out of the
+     portion of the heap we're collecting.
+
+   - When we've finished collecting, we need to be able to find all
+     the pages belonging to now-empty "from" spaces, to free them.
+
+
+   Dynamically vs. Statically Allocated Objects ======================
+
+   There are two ways objects can come into existence:
+
+   - The mutator can allocate them in the usual way, with 'cons',
+     'make-vector', etc.
+
+   - Executable files and shared libraries may contain objects,
+     constructed at compile-time, linked by the system linker, and
+     introduced into memory by the kernel doing an 'exec' or the
+     dynamic linker.
+
+   In the first case, code generated by Minor, or hand-written for
+   Minor, handles the allocation, so it can follow whatever
+   conventions we find useful.
+
+   But in the second case, Minor has only limited control over the
+   allocation.  Minor can ensure that all the heap objects in a
+   particular executable or shared library appear in one contiguous
+   chunk, not interleaved with other sorts of non-heap data --- from C
+   code, say.  But the GC has no way to find out at run time where
+   each executable/shared library's chunk of heap objects is.  (I
+   think we'd need a custom linker script, or some messy stuff based
+   on the C++ static initializer support, but, bleah.)  This means
+   that the GC can't reliably free up such memory for re-use; it can't
+   tell where Minor heap objects end and foreign non-heap objects
+   begin.  That, in turn, means that the GC might as well never
+   relocate such objects, or even bother to collect them at all ---
+   much better to simply ignore them, except to track old->young
+   pointers.
+
+   So, when we allocate fresh pages for a thread to allocate from, we
+   mark them in the GC map as belonging to generation zero, the
+   youngest generation.  And when we allocate pages to hold objects
+   the collector is promoting from one generation to the next, we
+   record the appropriate generation for them as well.  But we assume
+   that all other pages belong to generation seven, the "immortal
+   generation".  Any objects that we find here must have come from
+   executable files or shared libraries.  Other objects are never
+   promoted into the immortal generation --- they come to rest in
+   generation six.
+
+   Note that when we load a .o file ourselves --- say, when we load a
+   module previously compiled by the ahead-of-time compiler --- that's
+   Minor code turning that stream of bytes into objects and
+   procedures, not the kernel or the dynamic linker.  Since the
+   allocation is under our code's control, we can place the .o file's
+   objects (and procedures) in any generation we want.  So loading .o
+   files falls in the first category of allocation, not the second.
+
+
+   Mutators' Interface to the GC Map ===================================
+
+   The mutators' write barrier code does not access the GC map
+   directly.  Instead, mutator threads simply construct store lists
+   --- lists of every object they've ever mutated --- and hand them to
+   the collector when needed.  When a collection starts, the collector
+   records the potentially doting objects mentioned in each thread's
+   store list in the GC map, and then throws the store lists away.
+   This indirect arrangement has the following advantages:
+
+   - Mutators don't need to know about the GC map structure.  It's way
+     too complex to be part of a stable ABI.  The GC map remains
+     strictly internal to the GC.
+
+   - The overhead of the write barrier is the allocation of one pair.
+     The pair is allocated in the new object area, so the cache is
+     always hot.
+
+   - Since this map is only updated and consulted by the GC, it
+     doesn't compete for registers and cache with real mutator code at
+     every store operation; it only gets involved when a GC is about
+     to happen, which trashes both of those things anyway.
+
+   - Since store lists are per-thread, we never have to think about
+     synchronization when building them.
+
+   - Since mutator threads never access the GC map directly, we don't
+     have to worry about synchronization when accessing it, either.  */
+
+
+/* For every 4kb page managed by the garbage collector, we have an
+   instance of the following structure.
+
+   (Since there is an instance of this structure for every page, it
+   needs to be kept small.  8b : 4kb :: 1 : 512.)
+
+   From the "premature optimization is the root of all evil" dept:
+
+   At the machine code level, fetching an (unsigned) bit field
+   turns into:
+   - a memory reference to fetch the word containing the bit field,
+   - a mask, to get rid of bits that don't belong to the field, and
+   - a right shift, to put the bitfield's least significant bit at
+     the right end of the register.
+
+   But note that a lot of fields in this struct are indices within a
+   page, or portions of page addresses.  So the first thing we're
+   going to do with such values is shift them left again, to multiply
+   by 8 (for first_doting_object and last_doting_object) or by 4k (for
+   next_doting_page and next_generation_page).  So the compiler could
+   combine the right shift of the field fetch and the left shift of
+   the multiply into a single operation, net left or net right.
+   
+   We can do even better: if we make sure that the right shift (the
+   bitfield's position within the word) and the left shift (the log2
+   of the factor we need to multiply it by to get a page offset or a
+   page address) are the *same*, then the shifts cancel each other
+   out, and all we need to do is fetch and mask.
+
+   So first_doting_object, last_doting_object, next_doting_page, and
+   next_generation_page are all aligned this way.  Since page
+   addresses and offsets within a page are disjoint portions of an
+   address word, things fit together pretty nicely.  */
+struct gc_page
+{
+  /* The following three fields should all pack into a single 32-bit
+     word.  */
+
+  /* The generation to which the objects in this page belong.  Zero is
+     the youngest generation.  Seven is the "dummy generation", used
+     for memory areas we haven't allocated a separate gc_page
+     arary for yet.  */
+  unsigned generation : 3;
+
+  /* If this is a doting page, this is the offset within this page of
+     the start of the first doting object that begins on this page ---
+     divided by eight.  If this is not a doting page, then
+     last_doting_object == 0 and first_doting_object > 0.
+
+     4k / 8 == 512, so we need nine bits for this field.  To find all
+     the doting pointers, we start here and scan until
+     last_doting_object.  */
+  unsigned first_doting_object : 9;
+
+  /* All the pages that contain doting objects are kept in a
+     singly-linked list; there is one list per generation.  This field
+     is the link in that list: the address of the next such page in
+     this generation, divided by 4k.  For the last page in the chain,
+     this field is zero.  */
+  unsigned next_doting_page : 20;
+
+  /* The following three fields should all pack into a single 32-bit
+     word.  */
+
+  /* Unused bits!  */
+  unsigned : 3;
+
+  /* If this is a doting page, this is the offset within this page of
+     the start of the last doting object that begins on this page ---
+     divided by eight.  If this is not a doting page, then this is
+     zero.  */
+  unsigned last_doting_object : 9;
+
+  /* All the pages in a generation are kept in a singly-linked list.
+     All free pages are kept in a list, too.  This is the link in
+     those lists.  This is the address of the next page in the list
+     --- divided by 4k.  If this is the last page in the list, this is
+     zero.  */
+  unsigned next_generation_page : 20;
+};
+
+
+/* The map of all pages is a two-level tree.  Given a 32-bit address
+   ADDR, the 'struct gc_page' for that page is:
+
+      mn__gc_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
+
+   In other words, we use the top ten bits of the object's address to
+   index the top-level array, yielding a pointer to a second-level
+   array; then we use the next ten bits to index into that array,
+   yielding a gc_page structure for a particular page.
+
+   Initially, before we've allocated any heap pages at all, every
+   entry in mn__gc_map points to the same second-level array
+   object: mn__gc_immortal_pages.  This creates the appearance of a
+   fully populated tree, with a gc_page struct for every 4k
+   page in the IA-32's 32-bit address space --- even though
+   mn__gc_map and mn__gc_immortal_pages occupy only 1k * 4b +
+   1k * 8b == 12kb.
+
+   As we allocate pages for newly allocated objects, or for to-spaces
+   during collection, we need to record these allocations in the map.
+   Since mn__gc_immortal_pages is (potentially) shared by many
+   top-level array entries, we handle things in a copy-on-write
+   fashion: when the gc_page struct we want to tweak is
+   actually an element of mn__gc_immortal_pages, we allocate a fresh
+   second-level table, initialize it to be a copy of
+   mn__gc_immortal_pages, change the appropriate entry in the
+   top-level array to point to it, and then tweak the appropriate
+   gc_page.  So as the program runs, we dedicate map memory
+   only to the interesting parts, without making any assumptions about
+   where in the address space malloc/mmap will give us pages from.
+
+   As described above, GNU/Linux doesn't tell us which regions of an
+   executable or shared library contain heap objects: we just
+   occasionally find heap references to objects on pages we've never
+   touched before.  So the initial state of an gc_page struct
+   has to be appropriate for such objects.  This tells us several
+   things:
+
+   - The pages' generation should be the immortal generation ---
+     generation seven.
+
+   - The pages (initially) contain no doting objects.  Objects in
+     executables or shared libraries may only point to objects in
+     other executables or shared libraries, since they were linked by
+     the static linker: otherwise, the static linker would have
+     complained about unresolved references.
+
+   - The pages will never be freed.  We don't scavenge objects from
+     executables or shared libraries: we can't be sure where the
+     regions of heap objects start and end, so we couldn't free the
+     area for reuse after the live objects have been copied out of
+     them anyway.  So next_generation_page and first_contiguous don't
+     need to be initialized to anything special.
+
+   Thus, the default gc_page struct looks like this:
+
+    {
+      generation = 7,
+      first_doting_object = 1,
+      next_doting_page = 0,
+      last_doting_object = 0,
+      next_generation_page = 0
+    }
+
+   Every element of mn__gc_immortal_pages looks like that.
+
+   Of course, the mutator may create doting objects in executables and
+   shared libraries, so it's not the case that every executable or
+   shared object page will always look like this.  But initially, this
+   is fine.
+
+   Since the write barrier records the offsets of the first and last
+   doting objects in a page, and the GC never looks outside that
+   range, things will work correctly even if the initial or tail end
+   of a page holds non-heap objects.  So if the linker concatenates
+   the .minor.data section build by the Minor compiler with the .data
+   or .bss section built by the C compiler, for example, things will
+   be fine.
+
+   Problems would arise if non-heap objects were interleaved with heap
+   objects on a page: if first_doting_object happened to end up
+   pointing before some non-heap objects, and last_doting_object
+   happened to end up pointing after them, then the scan for doting
+   pointers would end up sweeping through non-heap objects.
+
+   However, that sort of interleaving can't happen:
+
+   - Within a single executable or shared library, the static linker's
+     normal behavior is to concatenate all the .minor.data sections,
+     without interleaving other sections.  So we don't have to worry
+     about intra-exec or intra-shared library interleavings.
+
+   - The IA-32 ABI requires that ELF load segments be aligned on 4kb
+     page boundaries.  This means that two non-empty data segments
+     can't appear on the same page.  So we don't have to worry about
+     inter-executable or inter-shared library interleavings, either.
+
+   Using the oldest generation, generation 7, as the "immortal"
+   generation means that the collector's test for whether to scavenge
+   an object doesn't need a special case to recognize immortal
+   objects.  The obvious way to write the test, "Is this object's
+   generation less than or equal to the oldest generation we're
+   collecting?" will correctly decline to traverse immortal objects.
+   Since the collector asks this of every object it touches, it's
+   important for this test to be fast.  */
+extern struct gc_page *mn__gc_map[1 << 10];
+
+/* The array of immortal pages.  */
+extern struct gc_page mn__gc_immortal_pages[1 << 10];
+
+
+/* The 'struct gc_page' object for ADDR.  */
+#define GC_PAGE(addr)                                           \
+  (mn__gc_map[((unsigned int) (addr) >> 22) & 0x3ff]            \
+                    [((unsigned int) (addr) >> 12) & 0x3ff])
+
+
+/* A single heap generation.  */
+struct gc_generation
+{
+  /* The base address of the first page in this generation, or zero if
+     the generation contains no pages.  This is invalid in the
+     immortal generation.  */
+  void *first_generation_page;
+
+  /* The base address of the first doting page in this generation.
+     Zero if the generation contains no doting pages.  */
+  void *first_doting_page;
+
+  /* How many collections we've done since the last time we collected
+     any generations older than this.  */
+  int collections;
+};
+
+
+/* The table of all generations.  Generation zero is the youngest
+   generation.  Generation 7 is the immortal generation, for pages in
+   executables and shared libraries (actually, for any page we didn't
+   allocate ourselves).  */
+extern struct gc_generation mn__gc_generations[8];
+
+#endif /* MINOR_GC_MAP_H */



From minor-owner@red-bean.com Sat Aug  2 16:23:35 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h72LNYqX015582
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 2 Aug 2003 16:23:35 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h72LNY1B015580
	for minor-commits@red-bean.com; Sat, 2 Aug 2003 16:23:34 -0500
Date: Sat, 2 Aug 2003 16:23:34 -0500
Message-Id: <200308022123.h72LNY1B015580@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 32 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-02 16:23:31 -0500 (Sat, 02 Aug 2003)
New Revision: 32

Added:
   trunk/gc/generic-map.h
Removed:
   trunk/gc/gc-map.h
Log:
Rename gc-map.h to generic-map.h.


Deleted: trunk/gc/gc-map.h
===================================================================
--- trunk/gc/gc-map.h	2003-08-02 21:22:00 UTC (rev 31)
+++ trunk/gc/gc-map.h	2003-08-02 21:23:31 UTC (rev 32)
@@ -1,361 +0,0 @@
-/* map.h --- tracking GC'd memory
-   Jim Blandy <jimb@red-bean.com> --- July 2003  */
-
-#ifndef MINOR_GC_MAP_H
-#define MINOR_GC_MAP_H
-
-/* The GC map is a table mapping every heap object's address onto a
-   gc_page structure describing the page the object lives in.
-   This structure says which generation the objects it contains belong
-   to, and is also where we record doting objects.
-
-   A "doting object" is an object in one generation that points to an
-   object in a younger generation.  A "doting page" is a page on which
-   a doting object starts.  Doting objects can be quite large, and
-   cover many pages, but only the page on which a doting object starts
-   is a doting page.
-
-   Function of the GC Map ============================================
-
-   In more detail, here are the jobs the GC map needs to do:
-
-   - The whole idea of generational garbage collection is to usually
-     collect only part of the heap.  Occasionally, you'll need to do a
-     full collection, but if you can focus your time on portions of
-     the heap that contain more garbage, then that time will be more
-     productive, and free up more memory for the mutator to waste.
-
-     But to restrict collection to a limited portion of the heap, the
-     collector needs to be able to find all pointers from the
-     uncollected portion into the collected portion: these act as
-     additional roots for the partial collection.
-
-     Since, in practice, pointers from older objects to younger
-     objects are rare, we can reduce the amount of bookkeeping needed
-     here by, when collecting generation G, always collecting all
-     generations younger than G as well.  This means we only need to
-     track pointers in objects in older generations to objects in
-     younger generations --- the rare kind.  These are the doting
-     objects.
-
-     How do we track such pointers?  Since a newly allocated object
-     can only be initialized with pointers to existing objects, an
-     object can become a doting object only by mutation.  Thus, every
-     bit of code that mutates a heap object in Minor needs to include
-     a "write barrier": code that allows the GC to check whether a
-     doting object has been created, and record it in the GC map, for
-     the collector to use in finding roots for partial collections.
-
-   - We also need to be able to quickly determine which generation an
-     object belongs to, to recognize when a pointer points out of the
-     portion of the heap we're collecting.
-
-   - When we've finished collecting, we need to be able to find all
-     the pages belonging to now-empty "from" spaces, to free them.
-
-
-   Dynamically vs. Statically Allocated Objects ======================
-
-   There are two ways objects can come into existence:
-
-   - The mutator can allocate them in the usual way, with 'cons',
-     'make-vector', etc.
-
-   - Executable files and shared libraries may contain objects,
-     constructed at compile-time, linked by the system linker, and
-     introduced into memory by the kernel doing an 'exec' or the
-     dynamic linker.
-
-   In the first case, code generated by Minor, or hand-written for
-   Minor, handles the allocation, so it can follow whatever
-   conventions we find useful.
-
-   But in the second case, Minor has only limited control over the
-   allocation.  Minor can ensure that all the heap objects in a
-   particular executable or shared library appear in one contiguous
-   chunk, not interleaved with other sorts of non-heap data --- from C
-   code, say.  But the GC has no way to find out at run time where
-   each executable/shared library's chunk of heap objects is.  (I
-   think we'd need a custom linker script, or some messy stuff based
-   on the C++ static initializer support, but, bleah.)  This means
-   that the GC can't reliably free up such memory for re-use; it can't
-   tell where Minor heap objects end and foreign non-heap objects
-   begin.  That, in turn, means that the GC might as well never
-   relocate such objects, or even bother to collect them at all ---
-   much better to simply ignore them, except to track old->young
-   pointers.
-
-   So, when we allocate fresh pages for a thread to allocate from, we
-   mark them in the GC map as belonging to generation zero, the
-   youngest generation.  And when we allocate pages to hold objects
-   the collector is promoting from one generation to the next, we
-   record the appropriate generation for them as well.  But we assume
-   that all other pages belong to generation seven, the "immortal
-   generation".  Any objects that we find here must have come from
-   executable files or shared libraries.  Other objects are never
-   promoted into the immortal generation --- they come to rest in
-   generation six.
-
-   Note that when we load a .o file ourselves --- say, when we load a
-   module previously compiled by the ahead-of-time compiler --- that's
-   Minor code turning that stream of bytes into objects and
-   procedures, not the kernel or the dynamic linker.  Since the
-   allocation is under our code's control, we can place the .o file's
-   objects (and procedures) in any generation we want.  So loading .o
-   files falls in the first category of allocation, not the second.
-
-
-   Mutators' Interface to the GC Map ===================================
-
-   The mutators' write barrier code does not access the GC map
-   directly.  Instead, mutator threads simply construct store lists
-   --- lists of every object they've ever mutated --- and hand them to
-   the collector when needed.  When a collection starts, the collector
-   records the potentially doting objects mentioned in each thread's
-   store list in the GC map, and then throws the store lists away.
-   This indirect arrangement has the following advantages:
-
-   - Mutators don't need to know about the GC map structure.  It's way
-     too complex to be part of a stable ABI.  The GC map remains
-     strictly internal to the GC.
-
-   - The overhead of the write barrier is the allocation of one pair.
-     The pair is allocated in the new object area, so the cache is
-     always hot.
-
-   - Since this map is only updated and consulted by the GC, it
-     doesn't compete for registers and cache with real mutator code at
-     every store operation; it only gets involved when a GC is about
-     to happen, which trashes both of those things anyway.
-
-   - Since store lists are per-thread, we never have to think about
-     synchronization when building them.
-
-   - Since mutator threads never access the GC map directly, we don't
-     have to worry about synchronization when accessing it, either.  */
-
-
-/* For every 4kb page managed by the garbage collector, we have an
-   instance of the following structure.
-
-   (Since there is an instance of this structure for every page, it
-   needs to be kept small.  8b : 4kb :: 1 : 512.)
-
-   From the "premature optimization is the root of all evil" dept:
-
-   At the machine code level, fetching an (unsigned) bit field
-   turns into:
-   - a memory reference to fetch the word containing the bit field,
-   - a mask, to get rid of bits that don't belong to the field, and
-   - a right shift, to put the bitfield's least significant bit at
-     the right end of the register.
-
-   But note that a lot of fields in this struct are indices within a
-   page, or portions of page addresses.  So the first thing we're
-   going to do with such values is shift them left again, to multiply
-   by 8 (for first_doting_object and last_doting_object) or by 4k (for
-   next_doting_page and next_generation_page).  So the compiler could
-   combine the right shift of the field fetch and the left shift of
-   the multiply into a single operation, net left or net right.
-   
-   We can do even better: if we make sure that the right shift (the
-   bitfield's position within the word) and the left shift (the log2
-   of the factor we need to multiply it by to get a page offset or a
-   page address) are the *same*, then the shifts cancel each other
-   out, and all we need to do is fetch and mask.
-
-   So first_doting_object, last_doting_object, next_doting_page, and
-   next_generation_page are all aligned this way.  Since page
-   addresses and offsets within a page are disjoint portions of an
-   address word, things fit together pretty nicely.  */
-struct gc_page
-{
-  /* The following three fields should all pack into a single 32-bit
-     word.  */
-
-  /* The generation to which the objects in this page belong.  Zero is
-     the youngest generation.  Seven is the "dummy generation", used
-     for memory areas we haven't allocated a separate gc_page
-     arary for yet.  */
-  unsigned generation : 3;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the first doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then
-     last_doting_object == 0 and first_doting_object > 0.
-
-     4k / 8 == 512, so we need nine bits for this field.  To find all
-     the doting pointers, we start here and scan until
-     last_doting_object.  */
-  unsigned first_doting_object : 9;
-
-  /* All the pages that contain doting objects are kept in a
-     singly-linked list; there is one list per generation.  This field
-     is the link in that list: the address of the next such page in
-     this generation, divided by 4k.  For the last page in the chain,
-     this field is zero.  */
-  unsigned next_doting_page : 20;
-
-  /* The following three fields should all pack into a single 32-bit
-     word.  */
-
-  /* Unused bits!  */
-  unsigned : 3;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the last doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then this is
-     zero.  */
-  unsigned last_doting_object : 9;
-
-  /* All the pages in a generation are kept in a singly-linked list.
-     All free pages are kept in a list, too.  This is the link in
-     those lists.  This is the address of the next page in the list
-     --- divided by 4k.  If this is the last page in the list, this is
-     zero.  */
-  unsigned next_generation_page : 20;
-};
-
-
-/* The map of all pages is a two-level tree.  Given a 32-bit address
-   ADDR, the 'struct gc_page' for that page is:
-
-      mn__gc_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
-
-   In other words, we use the top ten bits of the object's address to
-   index the top-level array, yielding a pointer to a second-level
-   array; then we use the next ten bits to index into that array,
-   yielding a gc_page structure for a particular page.
-
-   Initially, before we've allocated any heap pages at all, every
-   entry in mn__gc_map points to the same second-level array
-   object: mn__gc_immortal_pages.  This creates the appearance of a
-   fully populated tree, with a gc_page struct for every 4k
-   page in the IA-32's 32-bit address space --- even though
-   mn__gc_map and mn__gc_immortal_pages occupy only 1k * 4b +
-   1k * 8b == 12kb.
-
-   As we allocate pages for newly allocated objects, or for to-spaces
-   during collection, we need to record these allocations in the map.
-   Since mn__gc_immortal_pages is (potentially) shared by many
-   top-level array entries, we handle things in a copy-on-write
-   fashion: when the gc_page struct we want to tweak is
-   actually an element of mn__gc_immortal_pages, we allocate a fresh
-   second-level table, initialize it to be a copy of
-   mn__gc_immortal_pages, change the appropriate entry in the
-   top-level array to point to it, and then tweak the appropriate
-   gc_page.  So as the program runs, we dedicate map memory
-   only to the interesting parts, without making any assumptions about
-   where in the address space malloc/mmap will give us pages from.
-
-   As described above, GNU/Linux doesn't tell us which regions of an
-   executable or shared library contain heap objects: we just
-   occasionally find heap references to objects on pages we've never
-   touched before.  So the initial state of an gc_page struct
-   has to be appropriate for such objects.  This tells us several
-   things:
-
-   - The pages' generation should be the immortal generation ---
-     generation seven.
-
-   - The pages (initially) contain no doting objects.  Objects in
-     executables or shared libraries may only point to objects in
-     other executables or shared libraries, since they were linked by
-     the static linker: otherwise, the static linker would have
-     complained about unresolved references.
-
-   - The pages will never be freed.  We don't scavenge objects from
-     executables or shared libraries: we can't be sure where the
-     regions of heap objects start and end, so we couldn't free the
-     area for reuse after the live objects have been copied out of
-     them anyway.  So next_generation_page and first_contiguous don't
-     need to be initialized to anything special.
-
-   Thus, the default gc_page struct looks like this:
-
-    {
-      generation = 7,
-      first_doting_object = 1,
-      next_doting_page = 0,
-      last_doting_object = 0,
-      next_generation_page = 0
-    }
-
-   Every element of mn__gc_immortal_pages looks like that.
-
-   Of course, the mutator may create doting objects in executables and
-   shared libraries, so it's not the case that every executable or
-   shared object page will always look like this.  But initially, this
-   is fine.
-
-   Since the write barrier records the offsets of the first and last
-   doting objects in a page, and the GC never looks outside that
-   range, things will work correctly even if the initial or tail end
-   of a page holds non-heap objects.  So if the linker concatenates
-   the .minor.data section build by the Minor compiler with the .data
-   or .bss section built by the C compiler, for example, things will
-   be fine.
-
-   Problems would arise if non-heap objects were interleaved with heap
-   objects on a page: if first_doting_object happened to end up
-   pointing before some non-heap objects, and last_doting_object
-   happened to end up pointing after them, then the scan for doting
-   pointers would end up sweeping through non-heap objects.
-
-   However, that sort of interleaving can't happen:
-
-   - Within a single executable or shared library, the static linker's
-     normal behavior is to concatenate all the .minor.data sections,
-     without interleaving other sections.  So we don't have to worry
-     about intra-exec or intra-shared library interleavings.
-
-   - The IA-32 ABI requires that ELF load segments be aligned on 4kb
-     page boundaries.  This means that two non-empty data segments
-     can't appear on the same page.  So we don't have to worry about
-     inter-executable or inter-shared library interleavings, either.
-
-   Using the oldest generation, generation 7, as the "immortal"
-   generation means that the collector's test for whether to scavenge
-   an object doesn't need a special case to recognize immortal
-   objects.  The obvious way to write the test, "Is this object's
-   generation less than or equal to the oldest generation we're
-   collecting?" will correctly decline to traverse immortal objects.
-   Since the collector asks this of every object it touches, it's
-   important for this test to be fast.  */
-extern struct gc_page *mn__gc_map[1 << 10];
-
-/* The array of immortal pages.  */
-extern struct gc_page mn__gc_immortal_pages[1 << 10];
-
-
-/* The 'struct gc_page' object for ADDR.  */
-#define GC_PAGE(addr)                                           \
-  (mn__gc_map[((unsigned int) (addr) >> 22) & 0x3ff]            \
-                    [((unsigned int) (addr) >> 12) & 0x3ff])
-
-
-/* A single heap generation.  */
-struct gc_generation
-{
-  /* The base address of the first page in this generation, or zero if
-     the generation contains no pages.  This is invalid in the
-     immortal generation.  */
-  void *first_generation_page;
-
-  /* The base address of the first doting page in this generation.
-     Zero if the generation contains no doting pages.  */
-  void *first_doting_page;
-
-  /* How many collections we've done since the last time we collected
-     any generations older than this.  */
-  int collections;
-};
-
-
-/* The table of all generations.  Generation zero is the youngest
-   generation.  Generation 7 is the immortal generation, for pages in
-   executables and shared libraries (actually, for any page we didn't
-   allocate ourselves).  */
-extern struct gc_generation mn__gc_generations[8];
-
-#endif /* MINOR_GC_MAP_H */

Copied: trunk/gc/generic-map.h (from rev 31, trunk/gc/gc-map.h)



From minor-owner@red-bean.com Mon Aug  4 06:20:10 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h74BK9qX017010
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Mon, 4 Aug 2003 06:20:09 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h74BK91v017006
	for minor-commits@red-bean.com; Mon, 4 Aug 2003 06:20:09 -0500
Date: Mon, 4 Aug 2003 06:20:09 -0500
Message-Id: <200308041120.h74BK91v017006@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 33 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-04 06:20:05 -0500 (Mon, 04 Aug 2003)
New Revision: 33

Modified:
   trunk/gc/generic-map.h
Log:
* gc/generic-map.h: Make this architecture-generic; let an
arch-specific header file provide parameters for the details.


Modified: trunk/gc/generic-map.h
===================================================================
--- trunk/gc/generic-map.h	2003-08-02 21:23:31 UTC (rev 32)
+++ trunk/gc/generic-map.h	2003-08-04 11:20:05 UTC (rev 33)
@@ -1,8 +1,8 @@
-/* map.h --- tracking GC'd memory
+/* generic-map.h --- tracking GC'd memory, given per-arch parameters
    Jim Blandy <jimb@red-bean.com> --- July 2003  */
 
-#ifndef MINOR_GC_MAP_H
-#define MINOR_GC_MAP_H
+#ifndef MINOR_GC_GENERIC_MAP_H
+#define MINOR_GC_GENERIC_MAP_H
 
 /* The GC map is a table mapping every heap object's address onto a
    gc_page structure describing the page the object lives in.
@@ -132,14 +132,76 @@
      synchronization when building them.
 
    - Since mutator threads never access the GC map directly, we don't
-     have to worry about synchronization when accessing it, either.  */
+     have to worry about synchronization when accessing it, either.
 
 
-/* For every 4kb page managed by the garbage collector, we have an
+   Architecture Parameters ==========================================
+
+   The GC map data structure defined here is meant to be useable by
+   many different architectures.  Rather than #including this file
+   directly, you should first #include a header file (traditionally
+   named gc-map.h) from the appropriate arch/FOO/gc directory, which
+   will #define some parameters, and then #include this file for you.
+   Here are the parameters the arch-specific gc-map.h file should
+   #define:
+
+   GC_MAP_LOG_OBJECT_ALIGN --- the log base 2 of the minimum alignment
+   for every object.  For example, if every object must be aligned on
+   an eight-byte boundary, this would be 3.
+
+   GC_MAP_LOG_NUM_GENERATIONS --- the log base 2 of the number of
+   generations we support, including the nursery and the immortal
+   generation.  This must not be greater than GC_MAP_LOG_OBJECT_ALIGN,
+   for bit-packing reasons; see "premature optimization", below.
+
+   GC_MAP_LOG_PAGE_SIZE --- the log base 2 of the number of bytes per
+   page on the system.  By "page", what we really mean is the minimum
+   required alignment of the data and code segments, according to the
+   ABI.  Choosing that as the page size helps us ensure that the heap
+   areas of two different executable / shared library ELF files will
+   never fall in the purview of the same gc_page structure.
+
+   GC_MAP_FIRST_LEVEL_BITS --- the number of bits to take from the
+   most significant end of the address to use as the index into the
+   top-level array.
+
+   GC_MAP_SECOND_LEVEL_BITS --- the number of bits to take from the
+   most significant end of the address, after the chunk for the
+   top-level index, to use as the index into the second-level array.
+
+   GC_MAP_ADDRESS_BITS --- the total number of bits in an address.
+   This must be GC_MAP_LOG_PAGE_SIZE + GC_MAP_FIRST_LEVEL_BITS
+   + GC_MAP_SECOND_LEVEL_BITS; it's just present as a checksum.
+
+   (To support systems with 64-bit addresses, we could have optional
+   GC_MAP_{THIRD,FOURTH}_LEVEL_BITS macros, whose presence would
+   request the creation of a deeper tree.  Or perhaps someone can come
+   up with something more clever.)  */
+
+#ifndef GC_MAP_LOG_PAGE_SIZE
+#error "must #include a processor-specific gc-map.h before generic-map.h"
+#endif
+
+#if GC_MAP_LOG_NUM_GENERATIONS > GC_MAP_LOG_OBJECT_ALIGN
+#error "generation count too large for object alignment"
+#endif
+
+#if (GC_MAP_ADDRESS_BITS \
+     != (GC_MAP_FIRST_LEVEL_BITS \
+         + GC_MAP_SECOND_LEVEL_BITS \
+         + GC_MAP_LOG_PAGE_SIZE))
+#error "address not subdivided properly"
+#endif
+
+#define GC_MAP_NUM_GENERATIONS (1 << GC_MAP_LOG_NUM_GENERATIONS)
+
+
+/* For every page managed by the garbage collector, we have an
    instance of the following structure.
 
    (Since there is an instance of this structure for every page, it
-   needs to be kept small.  8b : 4kb :: 1 : 512.)
+   needs to be kept small.  If GC_MAP_LOG_PAGE_SIZE is 12, then 8b :
+   4kb :: 1 : 512.)
 
    From the "premature optimization is the root of all evil" dept:
 
@@ -153,7 +215,8 @@
    But note that a lot of fields in this struct are indices within a
    page, or portions of page addresses.  So the first thing we're
    going to do with such values is shift them left again, to multiply
-   by 8 (for first_doting_object and last_doting_object) or by 4k (for
+   by 1 << GC_MAP_LOG_OBJECT_ALIGN (for first_doting_object and
+   last_doting_object) or by 1 << GC_MAP_LOG_PAGE_SIZE (for
    next_doting_page and next_generation_page).  So the compiler could
    combine the right shift of the field fetch and the left shift of
    the multiply into a single operation, net left or net right.
@@ -170,70 +233,87 @@
    address word, things fit together pretty nicely.  */
 struct gc_page
 {
-  /* The following three fields should all pack into a single 32-bit
-     word.  */
+  /* The following three fields should all pack into a single
+     address-sized word.  */
 
   /* The generation to which the objects in this page belong.  Zero is
      the youngest generation.  Seven is the "dummy generation", used
      for memory areas we haven't allocated a separate gc_page
      arary for yet.  */
-  unsigned generation : 3;
+  unsigned generation : GC_MAP_LOG_NUM_GENERATIONS;
 
+  /* Make sure next bitfield is nicely aligned.  */
+  int : GC_MAP_LOG_OBJECT_ALIGN - GC_MAP_LOG_NUM_GENERATIONS;
+
   /* If this is a doting page, this is the offset within this page of
      the start of the first doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then
-     last_doting_object == 0 and first_doting_object > 0.
+     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  To find all the doting
+     pointers, we start here and scan until last_doting_object.  If
+     this is not a doting page, then last_doting_object == 0 and
+     first_doting_object > 0.  */
+  unsigned first_doting_object
+    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
 
-     4k / 8 == 512, so we need nine bits for this field.  To find all
-     the doting pointers, we start here and scan until
-     last_doting_object.  */
-  unsigned first_doting_object : 9;
-
   /* All the pages that contain doting objects are kept in a
      singly-linked list; there is one list per generation.  This field
      is the link in that list: the address of the next such page in
-     this generation, divided by 4k.  For the last page in the chain,
-     this field is zero.  */
-  unsigned next_doting_page : 20;
+     this generation, divided by 1 << GC_MAP_LOG_PAGE_SIZE.  For the
+     last page in the chain, this field is zero.  */
+  unsigned next_doting_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
 
-  /* The following three fields should all pack into a single 32-bit
-     word.  */
+  /* The following three fields should all pack into a single
+     address-sized word.  */
 
   /* Unused bits!  */
-  unsigned : 3;
+  unsigned : GC_MAP_LOG_OBJECT_ALIGN;
 
   /* If this is a doting page, this is the offset within this page of
      the start of the last doting object that begins on this page ---
-     divided by eight.  If this is not a doting page, then this is
-     zero.  */
-  unsigned last_doting_object : 9;
+     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  If this is not a doting
+     page, then this is zero.  */
+  unsigned last_doting_object
+    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
 
   /* All the pages in a generation are kept in a singly-linked list.
      All free pages are kept in a list, too.  This is the link in
      those lists.  This is the address of the next page in the list
-     --- divided by 4k.  If this is the last page in the list, this is
-     zero.  */
-  unsigned next_generation_page : 20;
+     --- divided by 1 << GC_MAP_LOG_PAGE_SIZE.  If this is the last
+     page in the list, this is zero.  */
+  unsigned next_generation_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
 };
 
 
-/* The map of all pages is a two-level tree.  Given a 32-bit address
-   ADDR, the 'struct gc_page' for that page is:
+#define GC_MAP_FIRST_LEVEL_SHIFT \
+  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS)
+#define GC_MAP_FIRST_LEVEL_MASK \
+  ((1 << GC_MAP_FIRST_LEVEL_BITS) - 1)
 
-      mn__gc_map[(ADDR >> 22) & 0x3ff][(ADDR >> 12) & 0x3ff]
+#define GC_MAP_SECOND_LEVEL_SHIFT \
+  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS - GC_MAP_SECOND_LEVEL_BITS)
+#define GC_MAP_SECOND_LEVEL_MASK \
+  ((1 << GC_MAP_SECOND_LEVEL_BITS) - 1)
 
-   In other words, we use the top ten bits of the object's address to
-   index the top-level array, yielding a pointer to a second-level
-   array; then we use the next ten bits to index into that array,
-   yielding a gc_page structure for a particular page.
 
+/* The map of all pages is a two-level tree.  Given an address ADDR,
+   the 'struct gc_page' for that page is:
+
+      mn__gc_map
+        [(ADDR >> GC_MAP_FIRST_LEVEL_SHIFT) & GC_MAP_FIRST_LEVEL_MASK]
+        [(ADDR >> GC_MAP_SECOND_LEVEL_SHIFT) & GC_MAP_SECOND_LEVEL_MASK]
+
+   In other words, we use the top clump of bits of the object's
+   address to index the top-level array, yielding a pointer to a
+   second-level array; then we use the next clump bits to index into
+   that array, yielding a gc_page structure for a particular page.
+
    Initially, before we've allocated any heap pages at all, every
-   entry in mn__gc_map points to the same second-level array
-   object: mn__gc_immortal_pages.  This creates the appearance of a
-   fully populated tree, with a gc_page struct for every 4k
-   page in the IA-32's 32-bit address space --- even though
-   mn__gc_map and mn__gc_immortal_pages occupy only 1k * 4b +
-   1k * 8b == 12kb.
+   entry in mn__gc_map points to the same second-level array object:
+   mn__gc_immortal_pages.  This creates the appearance of a fully
+   populated tree, with a gc_page struct for every page in the address
+   space --- even though mn__gc_map and mn__gc_immortal_pages occupy
+   only ((1 << GC_MAP_FIRST_LEVEL_BITS) * sizeof (a pointer)
+         + (1 << GC_MAP_SECOND_LEVEL_BITS) * sizeof (struct gc_map))
+   which is 12kb on a typical 32-bit system.
 
    As we allocate pages for newly allocated objects, or for to-spaces
    during collection, we need to record these allocations in the map.
@@ -274,7 +354,7 @@
    Thus, the default gc_page struct looks like this:
 
     {
-      generation = 7,
+      generation = GC_MAP_NUM_GENERATIONS - 1,
       first_doting_object = 1,
       next_doting_page = 0,
       last_doting_object = 0,
@@ -309,29 +389,32 @@
      without interleaving other sections.  So we don't have to worry
      about intra-exec or intra-shared library interleavings.
 
-   - The IA-32 ABI requires that ELF load segments be aligned on 4kb
-     page boundaries.  This means that two non-empty data segments
-     can't appear on the same page.  So we don't have to worry about
-     inter-executable or inter-shared library interleavings, either.
+   - We choose GC_MAP_LOG_PAGE_SIZE so that the ABI requires that ELF
+     load segments be aligned at least on page boundaries.  This means
+     that two non-empty data segments can't appear on the same page.
+     So we don't have to worry about inter-executable or inter-shared
+     library interleavings, either.
 
-   Using the oldest generation, generation 7, as the "immortal"
-   generation means that the collector's test for whether to scavenge
-   an object doesn't need a special case to recognize immortal
-   objects.  The obvious way to write the test, "Is this object's
-   generation less than or equal to the oldest generation we're
-   collecting?" will correctly decline to traverse immortal objects.
-   Since the collector asks this of every object it touches, it's
-   important for this test to be fast.  */
-extern struct gc_page *mn__gc_map[1 << 10];
+   Using the oldest generation as the "immortal" generation means that
+   the collector's test for whether to scavenge an object doesn't need
+   a special case to recognize immortal objects.  The obvious way to
+   write the test, "Is this object's generation less than or equal to
+   the oldest generation we're collecting?" will correctly decline to
+   traverse immortal objects.  Since the collector asks this of every
+   object it touches, it's important for this test to be fast.  */
+extern struct gc_page *mn__gc_map[1 << GC_MAP_FIRST_LEVEL_BITS];
 
 /* The array of immortal pages.  */
-extern struct gc_page mn__gc_immortal_pages[1 << 10];
+extern struct gc_page mn__gc_immortal_pages[1 << GC_MAP_SECOND_LEVEL_BITS];
 
 
 /* The 'struct gc_page' object for ADDR.  */
 #define GC_PAGE(addr)                                           \
-  (mn__gc_map[((unsigned int) (addr) >> 22) & 0x3ff]            \
-                    [((unsigned int) (addr) >> 12) & 0x3ff])
+  (mn__gc_map                                                   \
+   [((unsigned int) (addr) >> GC_MAP_FIRST_LEVEL_SHIFT)         \
+    & GC_MAP_FIRST_LEVEL_MASK]                                  \
+   [((unsigned int) (addr) >> GC_MAP_SECOND_LEVEL_SHIFT)        \
+    & GC_MAP_SECOND_LEVEL_SHIFT])
 
 
 /* A single heap generation.  */
@@ -353,9 +436,9 @@
 
 
 /* The table of all generations.  Generation zero is the youngest
-   generation.  Generation 7 is the immortal generation, for pages in
-   executables and shared libraries (actually, for any page we didn't
-   allocate ourselves).  */
-extern struct gc_generation mn__gc_generations[8];
+   generation.  Generation GC_MAP_NUM_GENERATIONS - 1 is the immortal
+   generation, for pages in executables and shared libraries
+   (actually, for any page we didn't allocate ourselves).  */
+extern struct gc_generation mn__gc_generations[GC_MAP_NUM_GENERATIONS];
 
-#endif /* MINOR_GC_MAP_H */
+#endif /* MINOR_GC_GENERIC_MAP_H */



From minor-owner@red-bean.com Mon Aug  4 06:20:36 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h74BKZqX017047
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Mon, 4 Aug 2003 06:20:35 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h74BKZuL017045
	for minor-commits@red-bean.com; Mon, 4 Aug 2003 06:20:35 -0500
Date: Mon, 4 Aug 2003 06:20:35 -0500
Message-Id: <200308041120.h74BKZuL017045@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 34 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-04 06:20:33 -0500 (Mon, 04 Aug 2003)
New Revision: 34

Added:
   trunk/gc/refs.h
Log:
* gc/refs.h: New header file.


Added: trunk/gc/refs.h
===================================================================
--- trunk/gc/refs.h	2003-08-04 11:20:05 UTC (rev 33)
+++ trunk/gc/refs.h	2003-08-04 11:20:33 UTC (rev 34)
@@ -0,0 +1,289 @@
+/* refs.h --- interface between reference manager and GC
+   Jim Blandy <jimb@red-bean.com> --- July 2003  */
+
+#ifndef MINOR_GC_REFS_H
+#define MINOR_GC_REFS_H
+
+#include <stdbool.h>
+#include <pthread.h>
+#inculde <signal.h>
+#include "minor/minor.h"
+#include "tagged.h"
+
+
+/* Requesting a garbage collection.  */
+
+/* Request that a garbage collection take place.  CALL is the current
+   thread's youngest call; through there, we can find all the roots on
+   its C stack.  */
+void mn__request_collection (mn_call *call);
+
+
+
+/* Root set management for C code.  */
+
+/* Locking:
+
+   The garbage collector needs to be able to find every heap object
+   reference every thread is holding, so it can adjust each one if it
+   relocates the object it refers to.  We help it accomplish this in
+   two ways:
+
+   - For all compiled Scheme code, we have a map, indexed by
+     instruction address, indicating which registers hold references
+     at that point in the code.  We provide similar maps for the stack
+     frame and the allocation area.  Machine code covered by these
+     maps is called "mapped code".
+
+   - For everything else, each thread has a chain of mn_call
+     structures, each of which has a ref_group structure, each of
+     which has a bunch of mn_ref structures, each of which contains
+     one tagged_t value.  Unless it holds THREAD->mutex, these are the
+     only tagged_t values allowed to be live in a thread.
+
+   THREAD->in_mapped_code indicates which case applies.
+
+   The THREAD structure, its mn_call structures, and so on have only
+   one writer: only the thread itself is allowed to push and pop
+   calls, and create and destroy references.  But they have two
+   readers: the thread itself, and the collecting thread.
+
+   In this situation, the thread itself only needs to hold
+   THREAD->mutex when *writing* its data structures.  It can read
+   without synchronization --- the only writes to be seen are its own.
+   However, since it does hold the mutex while writing, and the
+   collecting thread needs to hold THREAD->mutex for reading or
+   writing, the collecting thread is guaranteed to see all the mutator
+   thread's writes performed before it released the mutex.
+
+   Let's suppose a thread decides to perform a collection.  Call it
+   the "collecting thread".  Call all the other threads "mutator
+   threads".
+
+   Before the collecting thread can begin, it must first acquire every
+   mutator thread's THREAD->mutex, and its own.  This ensures that
+   either the mutator thread has no tagged_t values other than its
+   references, or that it's executing mapped code.  If the mutator is
+   executing mapped code, then THREAD->in_mapped_code is set; the
+   collecting thread must:
+   - send the mutator thread a signal,
+   - wait for the signal to be received,
+   - gather the mutator thread's registers, and
+   - use the PC to find the map entry saying which registers hold heap
+     references and which don't.
+
+   When the collection is complete, the collecting thread releases
+   each mutator threads' THREAD->mutex.  */
+
+
+/* The 'reference bookkeeping rules' apply to:
+   - some fields of mn_thread structures,
+   - mn_call structures,
+   - ref_group structures,
+   - ref_clump structures, and
+   - mn_ref structures, except for their 'u.obj' and 'global' fields.
+
+   They are as follows:
+   - If this is a global reference, or one of the structures managing
+     the global references, you must hold mn__global_ref_mutex to
+     access it.
+   - If this is a local reference belonging to some thread OWNER, or
+     one of the structures managing OWNER's local references:
+     - If you are OWNER, then you may read without synchronization,
+       but you must hold OWNER->mutex to write.
+     - Otherwise, you must hold OWNER->mutex to read or write.  */
+
+
+/* A Minor reference.  */
+struct mn_ref
+{
+  /* You must be holding THREAD->mutex if you have any tagged_t values
+     live, other than the one in here.  */
+  union
+  {
+    /* If this is a live reference, the object we refer to, as a tagged
+       value.  */
+    tagged_t obj;
+
+    /* If we are clump->refs[0], this is a pointer to the clump that
+       contains us.  */
+    struct mn_clump *clump;
+  } u;    
+
+  /* The self, next, and prev fields are governed by the reference
+     bookkeeping rules.  */
+
+  /* The index of this ref in the clump's REFS array.  So if REF is a
+     mn_ref *, then the clump containing *REF must be
+     ref[-ref->self]->u.clump.  This is how we find the clump's free
+     list.  */
+  unsigned self : 10;
+
+  /* All the allocated references in a clump are in a doubly-linked
+     list, headed by the REFS[0] and chained through these fields.
+     This allows the GC to traverse only the live references in a
+     clump, ignoring free references.
+
+     All the free references in a clump are in a singly-linked list,
+     chained through NEXT.  This allows us to find a free reference to
+     allocate in constant time.  */
+  unsigned next : 10, prev : 10;
+
+  /* True iff this is a global reference.
+
+     You may always access this, without holding any locks.  (If this
+     ref is local, then you must have allocated it yourself, or else
+     you shouldn't be touching it.  If this ref is global, then either
+     you allocated it, in which case you're guaranteed to see the
+     right value for it, or someone else allocated it, in which case
+     you must have gotten a pointer to it via some shared data
+     structure, which you should have accessed using proper
+     synchronization.)  */
+  bool global : 1;
+};
+
+
+/* A clump of references.  We allocate them in bulk.  
+   These are managed according to the reference bookkeeping rules.  */
+
+/* NUM_REFS_PER_CLUMP needs to fit in one third of a word, like the
+   'next', 'prev', and 'self' fields of a 'struct mn_ref'.  */
+#define NUM_REFS_PER_CLUMP (1024)
+
+struct ref_clump
+{
+  /* All the reference clumps belonging to a particular reference
+     group are chained together in a singly-linked list.  To free the
+     reference group, we free each clump in this list.  */
+  struct ref_clump *next;
+
+  /* The references that live in this clump.  REFS[0] is the head of
+     the doubly-linked list of allocated references.  */
+  mn_ref refs[NUM_REFS_PER_CLUMP];
+
+  /* The index of the first free ref in REFS.  */
+  short first_free;
+};
+
+
+/* A group of references --- local or global.
+   Managed according to the reference bookkeeping rules.  */
+struct ref_group
+{
+  /* Links in the doubly-linked list of all reference groups.  */
+  struct ref_group *prev, *next;
+
+  /* The head of the singly-linked list of reference clumps belonging
+     to this group.  */
+  struct ref_clump *clumps;
+
+  /* A singly-linked, null-terminated list of all the free references
+     in this group's clumps, if any.  */
+  mn_ref *free_refs;
+};
+
+
+/* A particular Scheme->C call.  These are managed according to the
+   reference bookkeeping rules.  */
+struct mn_call
+{
+  /* The thread this call belongs to.  */
+  struct mn_thread *thread;
+
+  /* All the local references belonging to this call.  */
+  struct ref_group *local_refs;
+
+  /* The address of a word on the stack closer to the stack bottom
+     than all the C frames associated with this call.  We initialize
+     this on every C->Minor call, and then use it to detect stale
+     mn_calls when we return to C code: if a word in the youngest
+     (i.e., currently running) stack frame is ever closer to the
+     bottom than this, then it must also be closer to the bottom than
+     all of this call's C frames; but since it's the current frame,
+     all those C frames must have been popped, so this mn_call is
+     invalid.  */
+  int *watermark;
+
+  /* The next older call.  Since a local ref is fine to use as long as
+     its call is still live, that means that if a young call is live,
+     all the calls below it on the stack must be live --- even if the
+     GC can prove that no continuation will ever return to that C
+     frame.  So we need to be able to find those calls anyway.  */
+  struct mn_call *next;
+
+  /* When we have compiled Scheme code that can call C code, then
+     we'll also need a pointer to the machine code continuation
+     here.  */
+};
+
+
+/* The per-thread structure for Minor reference tracking.  Every
+   thread in the system that has ever touched a Minor heap object has
+   one of these structures; all the mn_call objects for the thread
+   point back to it.  */
+struct mn_thread
+{
+  /* Links in the doubly-linked list of all threads that have touched
+     Minor heap objects.  You must hold mn__thread_list_mutex while accessing
+     these fields.  */
+  struct mn_thread *next, *prev;
+
+  /* You must hold this mutex:
+     - while you have any tagged_t value live,
+     - whenever writing to in_mapped_code, and
+     - whenever required to by the reference bookkeeping rules.  */
+  pthread_mutex_t mutex;
+
+  /* The system thread object for this thread, or zero if the thread
+     has died.  */
+  pthread_t pthread;
+
+  /* The youngest live mn_call in this thread.  Managed according to
+     the reference bookkeeping rules.  */
+  struct mn_call *youngest_call;
+};
+
+
+
+/* Root set management.  */
+
+/* A `mn_ref' object is a member of the garbage collector's root set;
+   the object a mn_ref refers to is, by definition, live.  mn_ref
+   objects live in groups, which can be freed en masse.
+
+   You needn't hold any mutex to hold a reference to a mn_ref, or to
+   a reference group.  That's the whole point.  */
+
+/* Create a new reference group, and return a pointer to it.  */
+struct ref_group *mn__make_ref_group (void);
+
+/* Free GROUP, and all the references in it.  */
+void mn__free_ref_group (struct ref_group *group);
+
+/* Return a new reference to OBJ in GROUP.  Since the caller is
+   holding a reference to OBJ, it must be holding mn__obj_mutex while
+   it calls this function.  */
+mn_ref *mn__make_ref (struct ref_group *group, tagged_t obj);
+
+/* Return the object to which REF refers.  Since the caller will hold
+   a reference to an object when this function returns, it must be
+   holding mn__obj_mutex while it calls this function.  */
+mn_ref *mn__ref_object (mn_ref *ref);
+
+/* Make REF to refer to OBJ.  Since the caller is holding a reference
+   to OBJ, it must be holding mn__obj_mutex while it calls this
+   function.  */
+void mn__set_ref (mn_ref *ref, tagged_t obj);
+
+/* Return the group to which the reference REF belongs.  */
+struct ref_group *mn__ref_group (mn_ref *ref);
+
+/* Free the reference REF.  */
+void mn__free_ref (mn_ref *ref);
+
+/* A reference group for references to objects we never change.  */
+struct ref_group *mn__static_ref_group;
+
+
+
+#endif /* MINOR_GC_REFS_H */



From minor-owner@red-bean.com Mon Aug  4 06:21:06 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h74BL5qX017076
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Mon, 4 Aug 2003 06:21:05 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h74BL53N017074
	for minor-commits@red-bean.com; Mon, 4 Aug 2003 06:21:05 -0500
Date: Mon, 4 Aug 2003 06:21:05 -0500
Message-Id: <200308041121.h74BL53N017074@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 35 - trunk/arch/ia-32/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-04 06:21:03 -0500 (Mon, 04 Aug 2003)
New Revision: 35

Added:
   trunk/arch/ia-32/gc/gc-map.h
Log:
New arch-specific GC map parameter file.


Added: trunk/arch/ia-32/gc/gc-map.h
===================================================================
--- trunk/arch/ia-32/gc/gc-map.h	2003-08-04 11:20:33 UTC (rev 34)
+++ trunk/arch/ia-32/gc/gc-map.h	2003-08-04 11:21:03 UTC (rev 35)
@@ -0,0 +1,18 @@
+/* gc-map.h --- GC map parameters for the IA-32
+   Jim Blandy <jimb@red-bean.com> --- Aug 2003  */
+
+#ifndef MINOR_ARCH_GC_GC_MAP_H
+#define MINOR_ARCH_GC_GC_MAP_H
+
+/* Provide parameters for "generic-map.h".  See that file for
+   details.  */
+#define GC_MAP_LOG_OBJECT_ALIGN (3)
+#define GC_MAP_LOG_NUM_GENERATIONS (3)
+#define GC_MAP_LOG_PAGE_SIZE (12)
+#define GC_MAP_FIRST_LEVEL_BITS (10)
+#define GC_MAP_SECOND_LEVEL_BITS (10)
+#define GC_MAP_ADDRESS_BITS (32)
+
+#include "generic-map.h"
+
+#undef /* MINOR_ARCH_GC_GC_MAP_H */



From minor-owner@red-bean.com Mon Aug  4 06:21:47 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h74BLkqX017102
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Mon, 4 Aug 2003 06:21:46 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h74BLkrD017100
	for minor-commits@red-bean.com; Mon, 4 Aug 2003 06:21:46 -0500
Date: Mon, 4 Aug 2003 06:21:46 -0500
Message-Id: <200308041121.h74BLkrD017100@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 36 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-04 06:21:44 -0500 (Mon, 04 Aug 2003)
New Revision: 36

Added:
   trunk/doc/locks
Log:
Document the system's partial ordering for all locks.


Added: trunk/doc/locks
===================================================================
--- trunk/doc/locks	2003-08-04 11:21:03 UTC (rev 35)
+++ trunk/doc/locks	2003-08-04 11:21:44 UTC (rev 36)
@@ -0,0 +1,11 @@
+Here is the global ordering of mutexes within Minor.  If A and B are
+mutexes, "A < B" means that A must be locked before B, if both are to
+be held simultaneously.  In other words, a thread must not try to
+acquire mutex A if it is already holding mutex B; instead, it must
+release mutex B, acquire mutex A, and then re-acquire mutex B.
+
+The order in which locks appear here to the left of the "<" respects
+the overall partial ordering of locks.
+
+mn__thread_list_mutex < THREAD->mutex  (for any mn_thread structure *THREAD)
+mn__thread_list_mutex < mn__global_ref_group_mutex



From minor-owner@red-bean.com Tue Aug  5 00:22:32 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h755MWqX027259
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 5 Aug 2003 00:22:32 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h755MVN3027257
	for minor-commits@red-bean.com; Tue, 5 Aug 2003 00:22:31 -0500
Date: Tue, 5 Aug 2003 00:22:31 -0500
Message-Id: <200308050522.h755MVN3027257@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 37 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-05 00:22:28 -0500 (Tue, 05 Aug 2003)
New Revision: 37

Removed:
   trunk/doc/notes
Log:
* doc/notes: Delete.  Everything in here is now in 'design'.


Deleted: trunk/doc/notes
===================================================================
--- trunk/doc/notes	2003-08-04 11:21:44 UTC (rev 36)
+++ trunk/doc/notes	2003-08-05 05:22:28 UTC (rev 37)
@@ -1,82 +0,0 @@
-Steps:
-
-- Read Matt Flatt paper, as it relates to modules and environments
-- GC
-- interpreter for core Scheme, using the GC's C API
-- Minor Scheme, in core Scheme, C primitives
-- GC interface for machine code (loading; root tracking)
-- assembler
-- Compiler, in Minor Scheme
-- ELF library, for separate compilation
-
-
-Tactics:
-- runs on ia32 Linux first, then x86-64, IA64
-- multi-threaded, generational GC
-- JIT
-- machine code verifier, to check JIT
-- produces ELF .o files that the standard Linux linker can turn
-  into executables
-- benchmark suite
-- test suite
-- automated nightly testing and benchmarking
-
-Features:
-- Full numeric tower (loads gmp only when needed)
-- error messages include full source locations, with macro tracebacks
-- Dybvig/Waddell macros
-- Flatt-style phase model (can we have D/W modules with this?)
-- API for working with ELF .o files (cross-clean)
-- API for working with assembly code and objects (cross-clean)
-  - in memory, and in ELF .o files
-- Linker (just exec's Linux "ld")
-- can generate stand-alone executables, that don't depend on the JIT,
-  assembler, linker, etc.
-- JNI-like C interface 
-- Unicode strings
-- lexer and parser generator
-- interface generator
-  - system calls
-  - Gtk
-- manual
-  - tutorial
-  - reference (like GNU C library manual)
-  - IA32 ABI
-- profiling, with oprofile kernel interface
-
-
-
-Build process
-
-- multi-threaded, generational GC, in C, to be shared by C interpreter
-  and native code compiler; C interpreter uses GC's C API
-- Bootstrap language tower:
-  - core Scheme interpreter, written in C (okay if not too fast)
-  - Minor Scheme interpreter, written in core Scheme (okay if not too fast)
-  - Minor Scheme JIT toolchain, written in Minor Scheme:
-    - JIT compiler
-    - assembler
-    - runtime library
-  - compile JIT toolchain to .o files with interpreted compiler, no optimizing
-  - link with GC to produce stand-alone executable, sans core Scheme
-  - compile JIT toolchain to .o files with compiled compiler, and
-    check against the .o files produced by the interpreted compiler;
-    should be identical
-  - compile JIT toolchain to .o files with optimizations turned on
-  - link again to produce stand-alone executable
-  - recompile JIT with optimized JIT, with optimizations turned on
-  - check against the .o files produced by the optimized compiler;
-    should be identical
-
-
-Boxed values
-
-variable-length lowtags:
-
-vvv.........00 - fixnum, vvv... are 2's complement value
-vvv...ttttt001 - immediate types, ttttt is more type info, vvv is value
-ppp........010 - pair
-ppp........011 - symbol
-ppp........101 - procedure
-ppp........110 - string
-ppp........111 - other object type, first word is type tag



From minor-owner@red-bean.com Tue Aug  5 01:32:41 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h756WeqX029970
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 5 Aug 2003 01:32:40 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h756WePW029968
	for minor-commits@red-bean.com; Tue, 5 Aug 2003 01:32:40 -0500
Date: Tue, 5 Aug 2003 01:32:40 -0500
Message-Id: <200308050632.h756WePW029968@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 38 - trunk/include/minor
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-05 01:32:37 -0500 (Tue, 05 Aug 2003)
New Revision: 38

Modified:
   trunk/include/minor/minor.h
Log:
* include/minor/minor.h: To do item: can signal handlers operate on
heap objects?


Modified: trunk/include/minor/minor.h
===================================================================
--- trunk/include/minor/minor.h	2003-08-05 05:22:28 UTC (rev 37)
+++ trunk/include/minor/minor.h	2003-08-05 06:32:37 UTC (rev 38)
@@ -825,3 +825,8 @@
 
 
 #endif /* MINOR_H */
+
+/* To do:
+
+   Can signal handlers operate on heap objects?  How can a signal
+   handler get an mn_call?  */



From minor-owner@red-bean.com Tue Aug  5 01:45:11 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h756jBqX030423
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 5 Aug 2003 01:45:11 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h756jAvp030421
	for minor-commits@red-bean.com; Tue, 5 Aug 2003 01:45:10 -0500
Date: Tue, 5 Aug 2003 01:45:10 -0500
Message-Id: <200308050645.h756jAvp030421@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 39 - trunk/include/minor
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-05 01:45:08 -0500 (Tue, 05 Aug 2003)
New Revision: 39

Modified:
   trunk/include/minor/minor.h
Log:
* include/minor/minor.h: Update reference to description of leases.


Modified: trunk/include/minor/minor.h
===================================================================
--- trunk/include/minor/minor.h	2003-08-05 06:32:37 UTC (rev 38)
+++ trunk/include/minor/minor.h	2003-08-05 06:45:08 UTC (rev 39)
@@ -440,7 +440,7 @@
 
 /* If it's important to avoid copying, then we could introduce a
    lease-based interface here.  Leases are described in the file
-   LEASES, at the top of the Minor source tree.  */
+   doc/leases.  */
 
 /* Return a Minor string object whose contents are the same as the
    null-terminated ISO 8859-1 string STR.  This makes a copy of



From minor-owner@red-bean.com Tue Aug  5 02:25:33 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h757PWqX031722
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 5 Aug 2003 02:25:32 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h757PWv0031720
	for minor-commits@red-bean.com; Tue, 5 Aug 2003 02:25:32 -0500
Date: Tue, 5 Aug 2003 02:25:32 -0500
Message-Id: <200308050725.h757PWv0031720@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 40 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-05 02:25:29 -0500 (Tue, 05 Aug 2003)
New Revision: 40

Modified:
   trunk/gc/refs.h
Log:
* gc/refs.h: More thinking.  Looks good now, wonder how it'll look with sleep.


Modified: trunk/gc/refs.h
===================================================================
--- trunk/gc/refs.h	2003-08-05 06:45:08 UTC (rev 39)
+++ trunk/gc/refs.h	2003-08-05 07:25:29 UTC (rev 40)
@@ -10,6 +10,76 @@
 #include "minor/minor.h"
 #include "tagged.h"
 
+/* To do:
+
+   We want to protect C->Scheme and Scheme->C calls with a mutex.  For
+   a Scheme->C call (say), the Scheme code would acquire the mutex,
+   make the transition to C code, and then the C code has to release
+   the mutex.  Since the same mutex is being accessed by both C and
+   Scheme, we need to share an implementation, which implies calling
+   the same shared library functions.  But isn't the call to acquire
+   the mutex in the first place a Scheme->C call?  So we have to do a
+   Scheme->C call in order to do a Scheme->C call!  Endless
+   regression!
+
+   But a thread running Scheme code should be able to call (say)
+   pthread_mutex_lock and still let a collecting thread find the
+   information it needs.  We know that C frame isn't going to
+   introduce any new references to heap objects, so as long as the
+   collecting thread can still send the mutator thread a signal and
+   find the Scheme continuation, it's just as if we'd interrupted
+   Scheme code directly.
+
+   So a mutator Scheme thread wanting to call a C function like
+   pthread_mutex_lock just needs to:
+
+   - in annotated code, save the information needed to reconstruct its
+     Scheme continuation, and set a flag in the per-thread structure,
+
+   - call the C function, and finally,
+
+   - in annotated code, clear the flag.
+
+   By "Scheme continuation", I just mean all the live registers.  If
+   this whole process is wrapped up as a Scheme function, then we're
+   at a function entry point, so our conservative approximation of the
+   live registers would be the callee-saves registers.  (If you think
+   straight, you can see that the PC and the SP are always
+   callee-saves registers.)
+
+   Saving the continuation, and setting and clearing the flag must be
+   done in assembly code, for several reasons:
+   - it involves working with registers directly,
+   - it needs to be annotated, and
+   - it needs to be sure that the stores are done in the right order,
+     so the signal handler can be sure to see them.  (If we were
+     writing in C, the equivalent carefulness would be putting
+     'volatile' on all the right objects.)
+
+   The job of the mutator's handler for the collection signal is to
+   give the collecting thread a Scheme continuation it can start from
+   to find all the thread's heap references.  So the signal handler
+   should check the flag.  If the flag is set, it should use the saved
+   Scheme continuation.  If it is clear, then it should use its
+   sigcontext to construct a continuation: the annotations for the PC
+   say which registers contain heap references.
+
+   But this gives us a mutex-less calling protocol for functions like
+   pthread_mutex_lock.  Why can't we use it for everything?
+
+   We need the mutex to protect the stack of mn_call objects, and the
+   ref structures they point to: those are structures that communicate
+   between the mutator's C code and the collector, so there should be
+   a synchronization mechanism covering them.
+
+   But to call a function that doesn't expect an mn_call argument, and
+   thus can't refer to the heap at all, you don't need any of that.
+
+   We could clarify the multiple-readers, single-writer mess below by
+   simply saying, "You always have to hold this mutex to do anything
+   to these structures," and then noting the exception for the owning
+   thread reading local ref structures.  */
+
 
 /* Requesting a garbage collection.  */
 
@@ -33,7 +103,7 @@
      instruction address, indicating which registers hold references
      at that point in the code.  We provide similar maps for the stack
      frame and the allocation area.  Machine code covered by these
-     maps is called "mapped code".
+     maps is called "annotated code".
 
    - For everything else, each thread has a chain of mn_call
      structures, each of which has a ref_group structure, each of
@@ -41,7 +111,7 @@
      one tagged_t value.  Unless it holds THREAD->mutex, these are the
      only tagged_t values allowed to be live in a thread.
 
-   THREAD->in_mapped_code indicates which case applies.
+   THREAD->in_annotated_code indicates which case applies.
 
    The THREAD structure, its mn_call structures, and so on have only
    one writer: only the thread itself is allowed to push and pop
@@ -63,8 +133,8 @@
    Before the collecting thread can begin, it must first acquire every
    mutator thread's THREAD->mutex, and its own.  This ensures that
    either the mutator thread has no tagged_t values other than its
-   references, or that it's executing mapped code.  If the mutator is
-   executing mapped code, then THREAD->in_mapped_code is set; the
+   references, or that it's executing annotated code.  If the mutator is
+   executing annotated code, then THREAD->in_annotated_code is set; the
    collecting thread must:
    - send the mutator thread a signal,
    - wait for the signal to be received,
@@ -230,7 +300,7 @@
 
   /* You must hold this mutex:
      - while you have any tagged_t value live,
-     - whenever writing to in_mapped_code, and
+     - whenever writing to in_annotated_code, and
      - whenever required to by the reference bookkeeping rules.  */
   pthread_mutex_t mutex;
 



From minor-owner@red-bean.com Wed Aug  6 22:40:25 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h773ePqX007667
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Wed, 6 Aug 2003 22:40:25 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h773eOno007664
	for minor-commits@red-bean.com; Wed, 6 Aug 2003 22:40:24 -0500
Date: Wed, 6 Aug 2003 22:40:24 -0500
Message-Id: <200308070340.h773eOno007664@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 41 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-08-06 22:40:21 -0500 (Wed, 06 Aug 2003)
New Revision: 41

Modified:
   trunk/gc/refs.h
Log:
* gc/refs.h: Notes about how to simplify the GC's interface.


Modified: trunk/gc/refs.h
===================================================================
--- trunk/gc/refs.h	2003-08-05 07:25:29 UTC (rev 40)
+++ trunk/gc/refs.h	2003-08-07 03:40:21 UTC (rev 41)
@@ -10,6 +10,30 @@
 #include "minor/minor.h"
 #include "tagged.h"
 
+/* Restructure:
+
+   The GC shouldn't care about calls; its interface should simply be
+   functions that create and destroy ref groups.
+
+   The GC should have a global list of threads that might be running
+   Scheme code.  Each thread should have a flag indicating whether:
+
+   - its current Scheme continuation is one explicitly saved
+     (i.e. we're running C code at the moment, and that C code uses
+     refs for everything), or
+
+   - its current registers are its continuation (i.e. start with
+     regs in the sigcontext)
+
+   We can then build the mn_call stuff on top of this abstraction.
+
+   It's true that the global list of threads, with each thread's list
+   of calls and their associated ref groups, plus the single global
+   ref group, contains the same information that the global list of
+   ref groups does.  That is, you can find all ref groups either way.
+   Big deal; it's worth duplicating a few pointer fields to 1) reduce
+   the complexity of the GC, and 2) simplify the locking rules.  */
+
 /* To do:
 
    We want to protect C->Scheme and Scheme->C calls with a mutex.  For



From minor-owner@red-bean.com Tue Sep  2 01:07:05 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h82675nd002512
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 2 Sep 2003 01:07:05 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h82674pv002510
	for minor-commits@red-bean.com; Tue, 2 Sep 2003 01:07:04 -0500
Date: Tue, 2 Sep 2003 01:07:04 -0500
Message-Id: <200309020607.h82674pv002510@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 46 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-02 01:07:01 -0500 (Tue, 02 Sep 2003)
New Revision: 46

Modified:
   trunk/gc/roots.c
Log:
* gc/roots.c: Doc fixes.  Simplify refs (doubling their size).
Functions to allocate and free refs.  Don't use pthread_getspecific in
handle_wait_signal; it's not async-safe.  Post to the semaphore before
returning from handle_wait_signal, to sync memory.


Modified: trunk/gc/roots.c
===================================================================
--- trunk/gc/roots.c	2003-08-25 07:18:53 UTC (rev 45)
+++ trunk/gc/roots.c	2003-09-02 06:07:01 UTC (rev 46)
@@ -1,90 +1,40 @@
 /* roots.c --- tracking roots for garbage collection
    Jim Blandy <jimb@red-bean.com> --- August 2003  */
 
-/* The purpose of a garbage collector is to free the storage
-   associated with objects that the computation will no longer use:
-   doing so can't do any harm.
+/* To begin a collection, the first task is to find each thread's PC,
+   and see what it's executing.  We use the following protocol for
+   this:
 
-   But it's impossible to do this job perfectly: in some cases, the
-   only way to tell tell whether a computation will use a given object
-   is to run the computation to completion.  But by that point any
-   memory we might reclaim would never be re-used anyway, so freeing
-   it would be useless.
-
-   So we use an approximation: an object is live only if there are any
-   other live objects pointing to it.  An object referred to only by
-   objects that will never be used will itself never be used, so it
-   must be okay to free it.
-
-   But that definition is circular: you can't say whether an object is
-   live until you know if any of the objects pointing to it are live.
-
-   So we add some exceptions: a root is an object that is live, simply
-   by definition.  Starting from these axiomatically live objects and
-   looking out at the other objects they refer to, the objects *they*
-   refer to, and so on, we can find all the live objects in memory.
-
-   The only real roots in the system are the program counter registers
-   of every running thread.  Each thread's PC points into a block of
-   machine instructions, and from there all hell breaks loose:
-
-   - Any register used by any instruction we will execute in that
-     block is live.  We relax this a bit, and say all registers are
-     live.  (Register values may not always refer to heap objects, so
-     they may not be relevant to collection, but that's a separate
-     issue.)
-
-   - Any global variable referred to by any instruction we will
-     execute is live.  We relax this a bit, too, and say that if the
-     current block contains any instruction that refers to a global
-     variable, then that variable is live --- even if that instruction
-     will never be reached.
-
-   - Any other blocks of code the current block could jump to are
-     live.  We relax this in the same way, and say that if the current
-     block contains a jump instruction to any other block, then the
-     target block is live, even if the jump instruction will never be
-     reached.
-
-   - The stack pointer points to a stack frame, which we treat like
-     any other heap object.  It may have slots that point to other
-     heap objects.  It probably also has a continuation --- a return
-     address and a stack frame --- which should be considered live.
-
-   So, to begin a collection, the first task is to find each thread's
-   PC, and see what it refers to.  We use the following protocol for
-   that:
-
    - Every function in the C API requires an mn_call object, except
      for two: mn_thread_first_call and mn_init.  This means that we
      can use those functions to maintain a list of all the threads
      that could possibly operate on references.  For each such thread,
      we register handlers for two signals, gc_wait_signal and
-     gc_resume_signal.  We use the 'sigaction' system call to
-     ensure that gc_resume_signal is blocked when
-     gc_wait_signal's handler is called.
+     gc_resume_signal.  We use the 'sigaction' system call to ensure
+     that gc_resume_signal is blocked when gc_wait_signal's handler is
+     called.
 
    - A thread wishing to perform a collection acquires mn__gc_mutex,
      the global GC mutex.  This is also the mutex that protects the
      global thread list.
 
-   - The collecting thread walks the thread list, sending each thread
-     gc_wait_signal.
+   - The collecting thread walks the thread list, sending every thread
+     (other than itself) gc_wait_signal.
 
-   - The signalled thread's handler for gc_wait_signal receives a
-     pointer to a sigcontext structure as one of its arguments; this
-     contains the register values of the interrupted code.  The
-     handler takes the following steps:
+   - Each signalled thread's handler for gc_wait_signal
+     receives a pointer to a sigcontext structure as one of its
+     arguments; this structure contains the register values of the
+     interrupted code.  The handler takes the following steps:
 
      - It stores a pointer to this sigcontext in its per-thread
-       structure, where the GC can find it.
+       structure, where the collecting thread can find it.
 
      - It does a sem_post on thread_waiting_semaphore, to tell the GC
-       that it has stored its sigcontext pointer.
+       that it has stored its sigcontext pointer in the thread
+       structure.
 
-     - It does a sigsuspend, with every signal but
-       gc_resume_signal blocked.  That signal has a trivial
-       handler.
+     - It does a sigsuspend, with every signal but gc_resume_signal
+       blocked.  That signal has a trivial handler.
 
      - When the sigsuspend returns, the gc_wait_signal handler
        returns.
@@ -102,37 +52,41 @@
      stored its sigcontext its thread structure, and we can find its
      PC.
 
+   After posting to thread_waiting_semaphore, but before it is sent
+   gc_resume_signal, a thread is considered to be "waiting for
+   collection".  We use this term in describing the rules for
+   accessing some of the fields in the structures below.
+
    Now, in order to map out these blocks of code, find out which
-   global variables they refer to, which other blocks they jump to,
-   and how the stack frames are laid out, we need extensive
+   global variables they refer to, which other code blocks they jump
+   to, and how the stack frames are laid out, we need extensive
    annotations from the compiler.  The Minor compiler provides these
    annotations, but the C compiler does not, so we need to handle C
    code specially.
 
    The only pointers to heap objects C code is allowed to have are
-   those in mn_ref objects.  Since even the most trivial operation on
-   those pointers gives the C compiler freedom to load them into
-   registers, make derived values, exclusive-or them with 45, do
-   magic, and then exclusive-or them back, etc., this rule basically
-   means that C code can't operate on them at all.
+   those in mn_ref objects.  This makes things simple.  There are no C
+   global variables pointing into the heap.  Registers, as used by C
+   functions, don't point into the heap either: they point at refs.
+   As do stack frames.  All our difficult problems are gone.
 
-   This makes things simple.  There are no C global variables pointing
-   into the heap.  Registers, as used by C functions, don't point into
-   the heap either.  Nor do stack frames.  All our difficult problems
-   are gone.  (Of course, C global vars, stack frames, and registers
-   can point to mn_refs; that's legit.)
+   But since even the most trivial operation on those pointers gives
+   the C compiler freedom to load them into registers, make derived
+   values, exclusive-or them with 45, do magic, and then exclusive-or
+   them back, etc., this rule basically means that C code can't
+   operate on them at all.
 
-   But it's still a bit harsh, so we relax it a bit.  We allow code
-   internal to the Minor C library to set a flag in the thread
-   structure, "incoherent", indicating that it's operating on heap
-   object references directly.  This flag is volatile, and has type
+   That's a bit harsh, so we relax it a bit.  We allow code internal
+   to the Minor C library to set a flag in the thread structure,
+   "incoherent", indicating that it's operating on heap object
+   references directly.  This flag is volatile, and has type
    sig_atomic_t, so the gc_wait_signal signal handler can check it,
    before it does anything else.  If it is set, then the handler sets
-   the thread's wait_when_coherent_again flag (also volatile and
+   the thread's collection_waiting flag (also volatile and
    sig_atomic_t), and returns.  When the interrupted code is finished
    operating on pointers to heap objects, and they are all safely
    packed away in mn_ref objects again, it clears the incoherent flag.
-   Then, if the wait_when_coherent_again flag is set, then it sends
+   Then, if the collection_waiting flag is set, then it sends
    itself a gc_wait_signal.
 
    This is nice, because it means that C code can work on heap
@@ -146,8 +100,29 @@
    - when we need to allocate or free global references, to protect
      the shared data structures managing global references; and
    - when a thread gets its initial mn_call object, or when it dies,
-     to protect the global thread list.  */
+     to protect the global thread list.
 
+   The rules for when the various structure fields may be accessed are
+   horribly complex.  All these 'volatile' annotations, rules for
+   where hoist and drop barriers need to go, and so on, are hard to
+   keep track of.
+
+   But I think it all follows from the decision to not require the
+   user's C code to call a safe-point "check for gc" function
+   periodically, with collections blocking indefinitely if they fail
+   to do so.  In most cases, the user is probably doesn't even have
+   control over all the libraries their program will be using, so they
+   can't make the safe point calls that would be needed.  Furthermore,
+   requiring safe point calls is an ongoing maintenance burden:
+   keeping track of which loops need safe point calls, and whether
+   each modification changes the status of some existing loop, is too
+   hard.
+
+   But given the decision not to require safe points, I think it
+   follows that one needs to use signals to gather threads' state.
+   And given that, one needs to worry about code reordering --- thus
+   the 'volatile' qualifiers and the hoist/drop barriers.  */
+
 #include <assert.h>
 #include <stdbool.h>
 #include <signal.h>
@@ -155,13 +130,23 @@
 #include <semaphore.h>
 #include "minor/minor.h"
 #include "roots.h"
+#include "tagged.h"
 
 
+/* Global mutexes protecting the collector's structures.  */
+
+
+/* This mutex protects various structures, including the thread list
+   and the global ref list.  See the comments for the individual
+   structures for details.  */
+static pthread_mutex_t mn__gc_mutex;
+
+
 /* The thread list.  */
 
-/* A thread that has at least one mn_call.  Since every function in
-   the C API that operates on mn_refs requires an mn_call argument,
-   there is one of these for every thread that could have any local
+/* A thread that has ever had an mn_call.  Since every function in the
+   C API that operates on mn_refs requires an mn_call argument, there
+   is one of these for every thread that could have any local
    refs.  */
 struct mn_thread
 {
@@ -174,25 +159,64 @@
      to access this field.  */
   pthread_t thread;
 
-  /* The youngest call in this thread.  */
-  mn_call *youngest_call;
+  /* The following fields may only by accessed by THREAD itself ---
+     either by ordinary code, or the gc_wait_signal handler.  */
 
   /* True if this thread has any live pointers to heap objects, other
-     than in mn_refs, false otherwise.  */
+     than in mn_refs; false otherwise.  Ordinary code sets and clears
+     this; the signal handler reads it.  */
   volatile sig_atomic_t incoherent;
 
   /* True if this thread should send itself gc_wait_signal once it's
-     coherent again (i.e., after clearing 'incoherent').  */
-  volatile sig_atomic_t wait_when_coherent_again;
+     coherent again (i.e., after clearing 'incoherent').  The signal
+     handler sets this; ordinary code reads and clears it.  */
+  volatile sig_atomic_t collection_waiting;
 
-  /* When waiting, a sigcontext structure giving this thread's current
-     registers.  At other times, this is garbage.  */
+  /* When waiting for collection, this is a sigcontext structure
+     giving the values of thread's registers when it received the
+     gc_wait_signal.  At other times, this is garbage.
+
+     This field may only be accessed by THREAD's gc_wait_signal
+     handler when the thread is not waiting for collection, and by the
+     collecting thread while it is.  */
   struct sigcontext *regs;
+
+  /* The youngest call in this thread.
+
+     This field, and the structures it refers to, may only be modified
+     by the thread's ordinary code while the 'incoherent' flag is set.
+
+     If the thread is going to make any changes to non-volatile fields
+     of those structures, there must be hoist and drop barriers inside
+     the set and clear of 'incoherent', to prevent the compiler from
+     moving the instructions that modify the structure outside the
+     instructions that set and clear 'incoherent'.
+
+     Note that the ref's 'obj' field is volatile, so accesses to it
+     won't be reordered with respect to assignments to the (also
+     volatile) 'incoherent' flag.  So if you are just accessing that,
+     and not freeing or allocating refs, you don't need any hoist or
+     drop barriers.  */
+  mn_call *youngest_call;
 };
 
 
+/* The current thread's structure.
+
+   Note that you can't replace this with a POSIX thread-specific
+   value; pthread_getspecific isn't async-safe, and handle_wait_signal
+   needs to be able to find its thread structure.  Boehm has a hash
+   table mapping pthread_t values onto his thread structures, and he
+   knows the hash table isn't being modified while he's stopping
+   threads, so his signal handler just calls pthread_self (which isn't
+   async-signal-safe) and looks itself up.  */
+static __thread struct mn_thread *self;
+
 /* The head of the global list of all threads.  */
-static struct mn_thread thread_list;
+static struct mn_thread thread_list = {
+  &thread_list, &thread_list,
+  0, 0, 0, 0, 0
+};
 
 
 /* We use this key's destructor function to remove the thread from the
@@ -201,63 +225,86 @@
 
 
 
-/* References and reference clumps.  */
+/* References, reference clumps, and reference groups.  */
 
 
-/* A Minor reference.  */
+
+
+/* A Minor reference.
+
+   At the moment, this is four words long.  I came up with a way to
+   make it fit in two words, but it's hairy, and I don't know for sure
+   that it's necessary, so I took it out.
+
+   (The idea was to make the allocated and free lists per-clump; that
+   way, if NUM_REFS_PER_CLUMP is 1024, you only need ten bits for
+   'next' and 'prev'; they can be bitfields.  Then, you need a way to
+   find the appropriate allocated and free lists for a given ref.  So
+   you put 'obj' in a union with 'struct ref_clump *clump', and then
+   make each clump's refs[0] special: refs[0].u.clump points to the
+   'struct clump' that contains it.  (refs[0] in each clump is
+   dedicated to holding this pointer; you don't use it as a normal
+   ref.)  Now, add another 10-bit field, 'self', which holds each
+   ref's index in the 'refs' array; this means that
+   R[-R->self].u.clump is the clump containing the reference R.  Three
+   ten-bit bitfields fit in a single 32-bit word.
+
+   Anyway, you can see why I took it out.)  */
+
 struct mn_ref
 {
-  /* These fields can be accessed without synchronization, but you
-     must ensure that nobody will ever see an incomplete
-     assignment.  */
-  union
-  {
-    /* If this is a live reference, the object we refer to, as a tagged
-       value.  */
-    tagged_t obj;
+  /* The object we refer to, as a tagged value (assuming this
+     reference is allocated).
 
-    /* If we are clump->refs[0], this is a pointer to the clump that
-       contains us.  */
-    struct mn_clump *clump;
-  } u;    
+     You must have your 'incoherent' flag set while accessing this.
+     If this is a global ref, it's up to the user's code to ensure that
+     two threads don't read or write a ref at the same time.
 
-  /* True iff this is a global reference.  You may always read this,
-     without holding any locks.  This should only be written when a
-     ref is being allocated, in which case nobody else should be able
-     to see it anyway.  */
-  bool global : 1;
+     This is volatile, to keep compilers from moving reads and
+     writes to this around reads and writes to the thread's
+     'incoherent' flag, which is also volatile.  */
+  volatile tagged_t obj;
 
-  /* If this is a global ref, you must hold mn__gc_mutex to access
-     these fields.  If this is a local ref, then only the owning
-     thread should ever access it, unless it's been stopped for GC.  */
+  /* The group this reference belongs to.  We use this to distinguish
+     local and global refs, and to find the right allocated / free
+     lists.
 
-  /* The index of this ref in the clump's REFS array.  So if REF is a
-     mn_ref *, then the clump containing *REF must be
-     ref[-ref->self]->u.clump.  This is how we find the clump's free
-     list.  */
-  unsigned self : 10;
+     You may always read this, without holding any locks.  This should
+     only be changed when a ref is being allocated or deallocated, in
+     which case nobody else should be able to see it anyway.  */
+  struct ref_group *group;
 
-  /* All the allocated references in a clump are in a doubly-linked
-     list, headed by the REFS[0] and chained through these fields.
-     This allows the GC to traverse only the live references in a
-     clump, ignoring free references, while still allowing a reference
-     to be freed quickly.
+  /* All the allocated references in a ref group G are in a
+     doubly-linked list, headed by G->allocated and chained through
+     these fields.  This allows the GC to traverse only the live
+     references in a group, ignoring free references, while still
+     allowing a reference to be freed quickly.
 
-     All the free references in a clump are in a singly-linked list,
-     chained through NEXT.  This allows us to find a free reference to
-     allocate in constant time.  */
-  unsigned next : 10, prev : 10;
+     All the references in a group that have been allocated before,
+     but are free at the moment, are in a singly-linked list, headed
+     by G->free and chained through the 'next' field.  This allows us
+     to find previously freed refs in constant time.
+
+     (There may also be some refs that have never been allocated at
+     all, at the end of the first clump; see the 'first_never_used'
+     field of 'struct ref_clump'.)
+
+     If this is a global ref, you must hold mn__gc_mutex to access
+     these fields.  If this is a local ref, then only the owning
+     thread should ever access it, unless it's been stopped for
+     GC.  */
+  struct mn_ref *next, *prev;
 };
 
 
-/* NUM_REFS_PER_CLUMP needs to fit in one third of a word, like the
-   'next', 'prev', and 'self' fields of a 'struct mn_ref'.  */
 #define NUM_REFS_PER_CLUMP (1024)
 
-/* A clump of references.  We allocate references in bulk.  If this
-   clump holds global references, then you must hold mn__gc_mutex to
-   access this structure.  Otherwise, it holds local references, and
-   only the owning thread should ever touch it.  */
+
+/* A clump of references.  We allocate references a clump at a time.
+   If this clump holds global references, then you must hold
+   mn__gc_mutex to access this structure.  Otherwise, it holds local
+   references, and you must set your 'incoherent' flag while accessing
+   it.  */
 struct ref_clump
 {
   /* All the reference clumps belonging to a particular reference
@@ -265,26 +312,140 @@
      reference group, we free each clump in this list.  */
   struct ref_clump *next;
 
-  /* The references that live in this clump.  REFS[0] is the head of
-     the doubly-linked list of allocated references.  */
+  /* The references that live in this clump.  */
   mn_ref refs[NUM_REFS_PER_CLUMP];
 
-  /* The index of the first free ref in REFS, or zero if there are
-     none.  */
-  short first_free;
+  /* The index of the first ref in REFS that has never been allocated.
+     All refs from this point to the end of REFS are free.  If this is
+     NUM_REFS_PER_CLUMP, then every ref in this clump was allocated at
+     some point.
+
+     This must be NUM_REFS_PER_CLUMP in all but the first clump in the
+     group's clump list.  We should never need to allocate a new clump
+     if there are any other clumps containing virgin refs, which means
+     there should only ever be one such clump, which means we can
+     stipulate that it stay at the front of the list.  */
+  short first_never_used;
 };
 
 
-/* The first in the list of clumps holding global references.  You
-   must hold mn__gc_mutex to access this.  */
-static struct ref_clump *global_ref_clumps;
+/* A reference group.  */
+struct ref_group
+{
+  /* All the clumps belonging to this reference group.  */
+  struct ref_clump *clumps;
 
+  /* The head of the free list.  This is a singly-linked list, chained
+   through the refs' 'next' fields.
+
+     Note that this only includes references that have been allocated
+     before, but are free at the moment.  There may also be some refs
+     that have never been allocated at all, at the end of the first
+     clump; see the 'first_never_used' field of 'struct ref_clump'.  */
+  mn_ref *free;
+
+  /* The head of the allocated list.  */
+  mn_ref allocated;
+};
+
+
+/* The group containing all global references.  You must hold
+   mn__gc_mutex to access any of the structures this points to, except
+   for the refs' 'global' and 'u.obj' fields.  */
+static struct ref_group *global_refs;
+
+
+/* Allocate a fresh reference to OBJ in group G.
+
+   You must have your 'incoherent' flag set (since you're holding a
+   direct reference to the heap), and if G is the global reference
+   group, you must hold mn__gc_mutex.  */
+static mn_ref *
+make_ref (struct ref_group *g, tagged_t obj)
+{
+  mn_ref *r;
+
+  /* Are there any refs on the free list?  */
+  if (g->free.next)
+    {
+      /* Yes, peel off the one on the front.  */
+      r = g->free.next;
+      g->free.next = r->next;
+    }
+
+  /* Does the first clump have any virgin refs?  */
+  else if (g->clumps && g->clumps->first_never_used < NUM_REFS_PER_CLUMP)
+    {
+      /* Yes, peel off the next virgin ref.  */
+      r = &g->clumps->refs[g->clumps->first_never_used++];
+    }
+  else
+    {
+      /* Allocate and initialize a new reference clump, add it to the
+         group's clump list, and grab the first reference from it.  */
+      struct ref_clump *new
+        = (struct ref_clump *) mn__gc_xmalloc (sizeof (*new));
+
+      new->next = g->clumps;
+      g->clumps = new;
+      r = &new->refs[0];
+      new->first_never_used = 1;
+    }      
+
+  /* Add R to the allocated list.  */
+  r->next = g->allocated.next;
+  r->prev = g->allocated.prev;
+  g->allocated.next->prev = r;
+  g->allocated.next = r;
+
+  r->group = g;
+  r->obj = obj;
+}
+
+
+/* Free the reference R.  
+
+   You must have your 'incoherent' flag set, and if G is the global
+   reference group, you must hold mn__gc_mutex.  */
+static void
+free_ref (mn_ref *r)
+{
+  struct group *g = r->group;
+
+  /* Remove R from its allocated list.  */
+  r->next->prev = r->prev;
+  r->prev->next = r->next;
+
+  /* Add R to its free list.  */
+  r->next = g->free;
+  g->free = r;
+}
+
+
+/* Free the reference group G.  */
+static void
+free_ref_group (struct ref_group *g)
+{
+  struct ref_clump *c, *next;
+
+  for (c = g->clumps; c; c = next)
+    {
+      next = c->next;
+      mn__gc_xfree (c);
+    }
+
+  mn__gc_xfree (g);
+}
+
+
 
-/* Stopping the world, and getting its registers.  */
+/* Stopping the world, and getting its threads' registers.  */
 
-/* The signal the collecting thread sends to other mutator threads
-   to tell them to stop what they're doing, record their registers, and
-   wait for the GC to complete.  */
+/* The signal the collecting thread sends to other mutator threads to
+   tell them to stop what they're doing, record their registers, and
+   wait for the GC to complete.  This is also the signal we send them
+   to indicate that the collection is complete, and they may
+   continue.  */
 static int gc_wait_signal;
 
 /* The signal the collecting thread sends to waiting mutator threads
@@ -301,17 +462,12 @@
 static void
 handle_wait_signal (int signo, siginfo_t *info, void *context)
 {
-  /* Hmm, is pthread_getspecific async-safe?  Boehm doesn't use it, so
-     I bet it isn't.  Use __thread?  */
-  struct mn_thread *self
-    = (struct mn_thread *) pthread_getspecific (mn_thread_key);
-
   /* Is this thread incoherent at the moment?  */
   if (self->incoherent)
     {
       /* Request that it re-send the wait signal to itself once it's
          coherent, and return.  */
-      self->wait_when_coherent_again = true;
+      self->collection_waiting = true;
       return;
     }
 
@@ -325,7 +481,10 @@
   self->regs = (struct sigcontext *) context;
 
   /* We've provided the info the collecting thread needs, so post to
-     thread_waiting_semaphore to allow it to continue.  */
+     thread_waiting_semaphore to allow it to continue.
+
+     sem_post is async-safe, and is memory-synchronizing, so the
+     collecting thread will see all our writes.  */
   assert (sem_post (&thread_waiting_semaphore) == 0);
     
   /* Wait for the collecting thread to re-awaken us, by sending us
@@ -344,13 +503,28 @@
        the first gc_wait_signal has definitely been delivered, and
        it's blocked while the signal handler is running, so that'll
        just remain pending across this sigsuspend call, and be
-       delivered as soon as this signal handler returns.  */
+       delivered as soon as this signal handler returns.
+
+       The sig.*set functions and sigsuspend are async-safe.  */
     sigset_t wait_for_resume_set;
     sigfillset (&wait_for_resume_set);
     sigdelset (&wait_for_resume_set, gc_resume_signal);
     assert (sigsuspend (&wait_for_resume_set) == -1
             && errno == EINTR);
   }
+
+  /* According to POSIX, sigsuspend does not synchronize memory with
+     respect to other threads, so at this point we're not guaranteed
+     to see whatever changes the collector has made to the heap.  The
+     only function that does synchronize memory, is async-safe (and
+     thus can be used in a signal handler), and is useable in this
+     context without a lot of fuss, seems to be sem_post.  So even
+     though there's no reason for the collecting thread to wait for
+     all the waiting threads to resume, we needlessly post to
+     thread_waiting_semaphore here, and the collecting thread waits
+     for every thread, to make sure the semaphore's count is back to
+     zero.  */
+  assert (sem_post (&thread_waiting_semaphore) == 0);
 }
 
 
@@ -362,9 +536,9 @@
 }
 
 
-/* Stop all threads other than MYSELF, and bring the 'regs' member of
-   their mn_thread structures up to date.  You must hold mn__gc_mutex
-   while calling this function.  */
+/* Stop all threads other than MYSELF, bring the 'regs' member of
+   their mn_thread structures up to date, and synchronize memory.  You
+   must hold mn__gc_mutex while calling this function.  */
 static void
 stop_all_threads (struct mn_thread *myself)
 {
@@ -375,20 +549,32 @@
     if (t != myself)
       assert (pthread_kill (t->thread, gc_wait_signal) == 0);
 
-  /* Now, we wait for all other threads to post to the semaphore.  */
+  /* Now, we wait for all other threads to post to the semaphore.
+     sem_wait is a memory-synchronizing operation, so we will see all
+     threads' changes to the heap, and to their references.  */
   for (t = thread_list.next; t != &thread_list; t = t->next)
     if (t != myself)
       assert (sem_wait (&thread_waiting_semaphore) == 0);
 }
 
 
-/* Resume all threads other than MYSELF.  */
+/* Synchronize memory, and resume all threads other than MYSELF.  You
+   must hold mn__gc_mutex while calling this function.  */
 static void
 resume_all_threads (struct mn_thread *myself)
 {
   for (t = thread_list.next; t != &thread_list; t = t->next)
     if (t != myself)
       assert (pthread_kill (t->thread, gc_resume_signal) == 0);
+
+  /* Now, we wait for all the threads to post to the semaphore.  It's
+     not necessary for the collecting thread to wait for all the other
+     threads to resume, but it undoes the effects of the second
+     sem_post call in handle_wait_signal, which we need to do to
+     ensure memory synchronization; see the comments there.  */
+  for (t = thread_list.next; t != &thread_list; t = t->next)
+    if (t != myself)
+      assert (sem_wait (&thread_waiting_semaphore) == 0);
 }
 
 



From minor-owner@red-bean.com Tue Sep  2 16:58:56 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h82Lwtnd015580
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 2 Sep 2003 16:58:55 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h82LwtXX015578
	for minor-commits@red-bean.com; Tue, 2 Sep 2003 16:58:55 -0500
Date: Tue, 2 Sep 2003 16:58:55 -0500
Message-Id: <200309022158.h82LwtXX015578@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 47 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-02 16:58:52 -0500 (Tue, 02 Sep 2003)
New Revision: 47

Modified:
   trunk/gc/roots.c
   trunk/gc/roots.h
Log:
* gc/roots.c, gc/roots.h: Doc work.

Use an explicit memory barrier, provided by Minor arch-dep code, after
the collector is done.  POSIX doesn't provide any kind of function we
can use for hand-off that both synchronizes memory and is
async-signal-safe.

Add calls.

Break the roots.h interface into separate functions for starting and
stopping the world, walking ref roots, and walking threads.

Use both POSIX thread-specific data (for the destructor) and a
__thread variable (for async-signal safety).



Modified: trunk/gc/roots.c
===================================================================
--- trunk/gc/roots.c	2003-09-02 06:07:01 UTC (rev 46)
+++ trunk/gc/roots.c	2003-09-02 21:58:52 UTC (rev 47)
@@ -133,15 +133,6 @@
 #include "tagged.h"
 
 
-/* Global mutexes protecting the collector's structures.  */
-
-
-/* This mutex protects various structures, including the thread list
-   and the global ref list.  See the comments for the individual
-   structures for details.  */
-static pthread_mutex_t mn__gc_mutex;
-
-
 /* The thread list.  */
 
 /* A thread that has ever had an mn_call.  Since every function in the
@@ -174,7 +165,8 @@
 
   /* When waiting for collection, this is a sigcontext structure
      giving the values of thread's registers when it received the
-     gc_wait_signal.  At other times, this is garbage.
+     gc_wait_signal.  In the collecting thread, this is zero.  At
+     other times, this is garbage.
 
      This field may only be accessed by THREAD's gc_wait_signal
      handler when the thread is not waiting for collection, and by the
@@ -201,17 +193,6 @@
 };
 
 
-/* The current thread's structure.
-
-   Note that you can't replace this with a POSIX thread-specific
-   value; pthread_getspecific isn't async-safe, and handle_wait_signal
-   needs to be able to find its thread structure.  Boehm has a hash
-   table mapping pthread_t values onto his thread structures, and he
-   knows the hash table isn't being modified while he's stopping
-   threads, so his signal handler just calls pthread_self (which isn't
-   async-signal-safe) and looks itself up.  */
-static __thread struct mn_thread *self;
-
 /* The head of the global list of all threads.  */
 static struct mn_thread thread_list = {
   &thread_list, &thread_list,
@@ -219,17 +200,313 @@
 };
 
 
-/* We use this key's destructor function to remove the thread from the
-   list when it dies.  */
-static pthread_key_t mn_thread_key;
+/* Two pointers to the current thread's structure --- one as a
+   __thread variable, and one as a POSIX thread-specific value.
 
+   This is kind of dumb, but we need them both.  pthread_getspecific
+   isn't async-safe, so we can't use it in handle_wait_signal.
+   __thread variables are async-safe, but don't have destructor
+   functions, so we can't use it to keep the thread list up to date.
 
+   Boehm has a hash table mapping pthread_t values onto his thread
+   structures, and he knows the hash table isn't being modified while
+   he's stopping threads, so his signal handler just calls
+   pthread_self (which isn't officially async-signal-safe) and looks
+   itself up.  */
+static __thread struct mn_thread *self;
+static pthread_key_t self_key;
+
+
+/* Create an entry for the calling thread, add it to the thread list,
+   and return it.  You must *not* hold mn__gc_mutex while calling this
+   function.  */
+static struct mn_thread *
+make_thread (void)
+{
+  struct mn_thread *t = (struct mn_thread *) mn__gc_xmalloc (sizeof (*t));
+
+  t->thread = pthread_self ();
+  t->incoherent = 0;
+  t->collection_waiting = 0;
+  t->youngest_call = 0;
+
+  self = t;
+  pthread_setspecific (self_key, (void *) t);
+
+  pthread_mutex_lock (&mn__gc_mutex);
+  t->next = thread_list.next;
+  t->prev = thread_list.prev;
+  thread_list.next->prev = t;
+  thread_list.next = t;
+  pthread_mutex_unlock (&mn__gc_mutex);
+
+  return t;
+}
+
+
+/* Destructor for self_key.  Remove ourselves from the thread list,
+   and free all the memory we hold.  */
+static void
+self_key_destroy (void *self_untyped)
+{
+  pthread_mutex_lock (&mn__gc_mutex);
+  self->next->prev = self->prev;
+  self->prev->next = self->next;
+  pthread_mutex_unlock (&mn__gc_mutex);
+
+  pop_calls_up_to (0);
+
+  mn__gc_xfree (self);
+  self = 0;
+}
+
+
 
-/* References, reference clumps, and reference groups.  */
+/* Stopping the world, and getting its threads' registers.  */
 
+/* The signal the collecting thread sends to other mutator threads to
+   tell them to stop what they're doing, record their registers, and
+   wait for the GC to complete.  This is also the signal we send them
+   to indicate that the collection is complete, and they may
+   continue.  */
+static int gc_wait_signal;
 
+/* The signal the collecting thread sends to waiting mutator threads
+   to tell them they can continue.  */
+static int gc_resume_signal;
 
+/* Mutator threads post to this semaphore to indicate that they are
+   waiting.  The collecting thread waits once on this semaphore for
+   every mutator thread.  */
+static sem_t thread_waiting_semaphore;
 
+
+/* The handler for gc_wait_signal.  */
+static void
+handle_wait_signal (int signo, siginfo_t *info, void *context)
+{
+  /* Is this thread incoherent at the moment?  */
+  if (self->incoherent)
+    {
+      /* Request that it re-send the wait signal to itself once it's
+         coherent, and return.  */
+      self->collection_waiting = true;
+      return;
+    }
+
+  /* The 'context' argument is a pointer to a sigcontext structure,
+     which holds the values this thread's registers had before it
+     received the GC wait signal.  Save that pointer in our thread
+     structure, so the collecting thread can find it.
+
+     This is really what it's all about; everything else here is just
+     synchronization chit-chat.  */
+  self->regs = (struct sigcontext *) context;
+
+  /* We've provided the info the collecting thread needs, so post to
+     thread_waiting_semaphore to allow it to continue.
+
+     sem_post is async-safe, and is memory-synchronizing, so the
+     collecting thread will see all our writes.  */
+  assert (sem_post (&thread_waiting_semaphore) == 0);
+    
+  /* Wait for the collecting thread to re-awaken us, by sending us
+     gc_resume_signal.  */
+  {
+    /* You might think there would be a race condition here: what if
+       the collecting thread completes the collection and sends us
+       gc_resume_signal before we wait for it?  But it's okay: when we
+       established the handler for gc_wait_signal, we asked that
+       gc_resume_signal also be blocked while this handler is running.
+       So if the collecting thread sends us the signal early, it'll
+       just remain pending until we do the sigsuspend here.
+
+       In fact, collection could complete and a new collection could
+       start before we get to the sigsuspend.  But that's okay, too:
+       the first gc_wait_signal has definitely been delivered, and
+       it's blocked while the signal handler is running, so that'll
+       just remain pending across this sigsuspend call, and be
+       delivered as soon as this signal handler returns.
+
+       The sig.*set functions and sigsuspend are async-safe.  */
+    sigset_t wait_for_resume_set;
+    sigfillset (&wait_for_resume_set);
+    sigdelset (&wait_for_resume_set, gc_resume_signal);
+    assert (sigsuspend (&wait_for_resume_set) == -1
+            && errno == EINTR);
+  }
+
+  /* Make sure that we can see all the work the collecting thread has
+     done.  Ensure that no reads or writes can be moved across this
+     point, by either the compiler or the memory model.  */
+  mn__memory_barrier ();
+}
+
+
+static void
+handle_resume_signal (int signo)
+{
+  /* Nothing needs to be done here.  We only send gc_resume_signal to
+     make the call to sigsuspend in handle_wait_signal return.  */
+}
+
+
+void
+mn__pause_mutator_threads ()
+{
+  struct mn_thread *t;
+
+  /* First, send all other threads the wait signal.  */
+  for (t = thread_list.next; t != &thread_list; t = t->next)
+    if (t != self)
+      assert (pthread_kill (t->thread, gc_wait_signal) == 0);
+
+  /* Now, we wait for all other threads to post to the semaphore.
+     sem_wait is a memory-synchronizing operation, so we will see all
+     threads' changes to the heap, and to their references.  */
+  for (t = thread_list.next; t != &thread_list; t = t->next)
+    if (t != self)
+      assert (sem_wait (&thread_waiting_semaphore) == 0);
+}
+
+
+void
+mn__continue_mutator_threads ()
+{
+  /* Make sure all the mutator threads can see the collection work
+     we've just done.  Ensure that no reads or writes can be moved
+     across this point, by either the compiler or the memory
+     model.  */
+  mn__memory_barrier ();
+
+  for (t = thread_list.next; t != &thread_list; t = t->next)
+    if (t != self)
+      assert (pthread_kill (t->thread, gc_resume_signal) == 0);
+}
+
+
+/* Choose signals the collecting thread should use to stop other
+   threads before a collection and resume them when we're done, and
+   set up the appropriate handlers.  */
+static void
+init_gc_signals (void)
+{
+  /* Ideally, there'd be some sanctioned way to allocate two signals
+     for our use, so we could be sure that we're not stepping on some
+     other module.  But there isn't --- not even for real-time
+     signals.  So we just hard-code things.  *sigh*  */
+#ifdef SIGRTMIN
+  /* These are chosen not to conflict with the signals Boehm's GC
+     uses, which are, in turn, chosen not to conflict with the ones
+     LinuxThreads uses.  */
+  gc_wait_signal = SIGRTMIN + 7;
+  gc_resume_signal = SIGRTMIN + 8;
+#else
+#error "cannot find an appropriate set of GC signals"
+  /* If you send me an appropriate clause for your system, I'd be
+     happy to include it amongst the above.  */
+#endif
+  
+  /* Set up handlers for the signals.  */
+  {
+    struct sigaction action, old_action;
+
+    /* The wait signal has the real handler that does all the
+       suspension work.  To avoid a race condition (described in
+       handle_wait_signal), we arrange for gc_resume_signal to be
+       blocked while the handler runs.  */
+    action.sa_sigaction = handle_wait_signal;
+    sigemptyset (&action.sa_mask);
+    sigaddset (&action.sa_mask, gc_resume_signal);
+    action.sa_flags = SA_SIGINFO;
+    sigaction (gc_wait_signal, &action, &oldaction);
+
+    /* If there was already a handler established for this signal,
+       then someone must be already using it for something else, so
+       abort.  */
+    assert (oldaction.sa_sigaction == SIG_DFL
+            && oldaction.sa_handler == SIG_DFL);
+
+    /* The resume signal has a trivial handler.  The only function of
+       this signal is to make the call to sigsuspend return.  */
+    action.sa_sigaction = handle_resume_signal;
+    sigemptyset (&action.sa_mask);
+    action.sa_flags = 0;
+    sigaction (gc_resume_signal, &action, &oldaction);
+
+    /* As above.  */
+    assert (oldaction.sa_sigaction == SIG_DFL
+            && oldaction.sa_handler == SIG_DFL);
+  }
+}
+
+
+void
+mn__walk_threads (void (*threadf) (struct sigcontext *))
+{
+  struct mn_thread *head = &thread_list;
+  struct mn_thread *t;
+
+  for (t = head->next; t != head; t = t->next)
+    if (t->context)
+      threadf (t->context);
+}
+
+
+
+/* Incoherent sections.  */
+
+/* Most of the structures in this file are shared between ordinary
+   code and the handler for gc_wait_signal.  (Really, they're shared
+   with the collector, but the signal handler takes care of handing
+   them off to the collector, and waiting for them to be returned, so
+   we can
+
+
+If you are going to work with any data shared with the signal
+   handler, 
+
+ you should call start_incoherent_barrier.  The GC wait
+   signal handler assumes that, if the incoherent flag is clear, it
+   can go ahead and access all the thread's call, ref group, ref
+   clump, and reference data structures.  For this to be safe, the
+   ordinary code must tell the compiler not to
+
+
+/* Call this function before working with any refs' obj pointers.  
+
+
+
+   If you need to access any non-volatile fields that the signal handler
+   uses, 
+static inline
+start_incoherent (void)
+{
+  self->incoherent = true;
+}
+
+
+static inline
+end_incoherent (void)
+{
+  self->incoherent = false;
+
+  /* If the caller is accessing 
+  mn_memory_barrier ();
+
+  /* Is someone else trying to get a collection started?  */
+  if (self->collection_waiting)
+    {
+      self->collection_waiting = false;
+      pthread_kill (pthread_self (), gc_wait_signal);
+    }
+}
+
+
+
+/* References, reference clumps, and reference groups.  */
+
+
 /* A Minor reference.
 
    At the moment, this is four words long.  I came up with a way to
@@ -422,6 +699,21 @@
 }
 
 
+/* Return a new, empty reference group.  */
+static struct ref_group *
+make_ref_group (void)
+{
+  struct group *g = (struct group *) mn__gc_xmalloc (sizeof (*g));
+
+  g->clumps = 0;
+  g->free = 0;
+  g->allocated.next = &g->allocated;
+  g->allocated.prev = &g->allocated;
+
+  return g;
+}
+
+
 /* Free the reference group G.  */
 static void
 free_ref_group (struct ref_group *g)
@@ -438,199 +730,100 @@
 }
 
 
-
-/* Stopping the world, and getting its threads' registers.  */
+/* Apply ROOTF to a pointer to the 'obj' field of every reference in
+   the reference group G.  */
+static void
+walk_ref_group_refs (struct ref_group *g, void (*rootf) (tagged_t *))
+{
+  mn_ref *r;
+  mn_ref *head = &g->allocated;
 
-/* The signal the collecting thread sends to other mutator threads to
-   tell them to stop what they're doing, record their registers, and
-   wait for the GC to complete.  This is also the signal we send them
-   to indicate that the collection is complete, and they may
-   continue.  */
-static int gc_wait_signal;
+  for (r = head->next; r != head; r = r->next)
+    rootf (&r->obj)
+}
 
-/* The signal the collecting thread sends to waiting mutator threads
-   to tell them they can continue.  */
-static int gc_resume_signal;
 
-/* Mutator threads post to this semaphore to indicate that they are
-   waiting.  The collecting thread waits once on this semaphore for
-   every mutator thread.  */
-static sem_t thread_waiting_semaphore;
+
+/* Calls.  */
 
 
-/* The handler for gc_wait_signal.  */
-static void
-handle_wait_signal (int signo, siginfo_t *info, void *context)
+/* We create one of these structures for each Minor->C call, and one
+   for the "outermost" C code, outside of any Minor call.  You must
+   have your 'incoherent' flag set to access this structure.  */
+struct mn_call
 {
-  /* Is this thread incoherent at the moment?  */
-  if (self->incoherent)
-    {
-      /* Request that it re-send the wait signal to itself once it's
-         coherent, and return.  */
-      self->collection_waiting = true;
-      return;
-    }
+  /* The next older call on the C stack.  */
+  struct mn_call *older_call;
 
-  /* The 'context' argument is a pointer to a sigcontext structure,
-     which holds the values this thread's registers had before it
-     received the GC wait signal.  Save that pointer in our thread
-     structure, so the collecting thread can find it.
+  /* The local references that belong to this call.  */
+  struct ref_group *local_refs;
 
-     This is really what it's all about; everything else here is just
-     synchronization chit-chat.  */
-  self->regs = (struct sigcontext *) context;
+  /* There will eventually need to be something here to represent the
+     Scheme continuation waiting for this C call to return.  */
+};
 
-  /* We've provided the info the collecting thread needs, so post to
-     thread_waiting_semaphore to allow it to continue.
 
-     sem_post is async-safe, and is memory-synchronizing, so the
-     collecting thread will see all our writes.  */
-  assert (sem_post (&thread_waiting_semaphore) == 0);
-    
-  /* Wait for the collecting thread to re-awaken us, by sending us
-     gc_resume_signal.  */
-  {
-    /* You might think there would be a race condition here: what if
-       the collecting thread completes the collection and sends us
-       gc_resume_signal before we wait for it?  But it's okay: when we
-       established the handler for gc_wait_signal, we asked that
-       gc_resume_signal also be blocked while this handler is running.
-       So if the collecting thread sends us the signal early, it'll
-       just remain pending until we do the sigsuspend here.
+/* Create a new call object for the current thread, push it on the
+   stack as the youngest call, and return it.  */
+static mn_call *
+push_call (void)
+{
+  mn_call *c = (struct mn_call *) mn__gc_xmalloc (sizeof (*c));
+  c->local_refs = make_ref_group ();
 
-       In fact, collection could complete and a new collection could
-       start before we get to the sigsuspend.  But that's okay, too:
-       the first gc_wait_signal has definitely been delivered, and
-       it's blocked while the signal handler is running, so that'll
-       just remain pending across this sigsuspend call, and be
-       delivered as soon as this signal handler returns.
+  start_incoherent ();
+  c->older_call = self->youngest_call;
+  self->youngest_call = c;
+  end_incoherent ();
 
-       The sig.*set functions and sigsuspend are async-safe.  */
-    sigset_t wait_for_resume_set;
-    sigfillset (&wait_for_resume_set);
-    sigdelset (&wait_for_resume_set, gc_resume_signal);
-    assert (sigsuspend (&wait_for_resume_set) == -1
-            && errno == EINTR);
-  }
-
-  /* According to POSIX, sigsuspend does not synchronize memory with
-     respect to other threads, so at this point we're not guaranteed
-     to see whatever changes the collector has made to the heap.  The
-     only function that does synchronize memory, is async-safe (and
-     thus can be used in a signal handler), and is useable in this
-     context without a lot of fuss, seems to be sem_post.  So even
-     though there's no reason for the collecting thread to wait for
-     all the waiting threads to resume, we needlessly post to
-     thread_waiting_semaphore here, and the collecting thread waits
-     for every thread, to make sure the semaphore's count is back to
-     zero.  */
-  assert (sem_post (&thread_waiting_semaphore) == 0);
+  return c;
 }
 
 
+/* Pop calls from the current thread's stack until YOUNGEST is the
+   youngest call.  */
 static void
-handle_resume_signal (int signo)
+pop_calls_up_to (mn_call *youngest)
 {
-  /* Nothing needs to be done here.  We only send gc_resume_signal to
-     make the call to sigsuspend in handle_wait_signal return.  */
-}
+  struct mn_call *here, *next;
 
+  start_incoherent ();
 
-/* Stop all threads other than MYSELF, bring the 'regs' member of
-   their mn_thread structures up to date, and synchronize memory.  You
-   must hold mn__gc_mutex while calling this function.  */
-static void
-stop_all_threads (struct mn_thread *myself)
-{
-  struct mn_thread *t;
+  for (here = self->youngest_call;
+       here != youngest;
+       here = next)
+    {
+      /* If we got through the whole list and never found YOUNGEST,
+         then our caller is confused.  */
+      assert (here);
+      next = here->older_call;
+      free_ref_group (here->local_refs);
+      mn__gc_xfree (here);
+    }
 
-  /* First, send all other threads the wait signal.  */
-  for (t = thread_list.next; t != &thread_list; t = t->next)
-    if (t != myself)
-      assert (pthread_kill (t->thread, gc_wait_signal) == 0);
+  self->youngest_call = youngest;
 
-  /* Now, we wait for all other threads to post to the semaphore.
-     sem_wait is a memory-synchronizing operation, so we will see all
-     threads' changes to the heap, and to their references.  */
-  for (t = thread_list.next; t != &thread_list; t = t->next)
-    if (t != myself)
-      assert (sem_wait (&thread_waiting_semaphore) == 0);
+  end_incoherent ();
 }
 
 
-/* Synchronize memory, and resume all threads other than MYSELF.  You
-   must hold mn__gc_mutex while calling this function.  */
-static void
-resume_all_threads (struct mn_thread *myself)
+void
+mn__walk_ref_roots (void (*rootf) (tagged_t *),
+                    void (*threadf) (tagged_t *))
 {
-  for (t = thread_list.next; t != &thread_list; t = t->next)
-    if (t != myself)
-      assert (pthread_kill (t->thread, gc_resume_signal) == 0);
+  struct mn_thread *t;
 
-  /* Now, we wait for all the threads to post to the semaphore.  It's
-     not necessary for the collecting thread to wait for all the other
-     threads to resume, but it undoes the effects of the second
-     sem_post call in handle_wait_signal, which we need to do to
-     ensure memory synchronization; see the comments there.  */
   for (t = thread_list.next; t != &thread_list; t = t->next)
-    if (t != myself)
-      assert (sem_wait (&thread_waiting_semaphore) == 0);
-}
+    {
+      mn_call *c;
 
+      assert (! t->incoherent);
 
-/* Choose signals the collecting thread should use to stop other
-   threads before a collection and resume them when we're done, and
-   set up the appropriate handlers.  */
-static void
-init_gc_signals (void)
-{
-  /* Ideally, there'd be some sanctioned way to allocate two signals
-     for our use, so we could be sure that we're not stepping on some
-     other module.  But there isn't --- not even for real-time
-     signals.  So we just hard-code things.  *sigh*  */
-#ifdef SIGRTMIN
-  /* These are chosen not to conflict with the signals Boehm's GC
-     uses, which are, in turn, chosen not to conflict with the ones
-     LinuxThreads uses.  */
-  gc_wait_signal = SIGRTMIN + 7;
-  gc_resume_signal = SIGRTMIN + 8;
-#else
-#error "cannot find an appropriate set of GC signals"
-  /* If you send me an appropriate clause for your system, I'd be
-     happy to include it amongst the above.  */
-#endif
-  
-  /* Set up handlers for the signals.  */
-  {
-    struct sigaction action, old_action;
+      for (c = t->youngest_call; c; c = c->older_call)
+        walk_ref_group_refs (c->local_refs, rootf);
+    }
 
-    /* The wait signal has the real handler that does all the
-       suspension work.  To avoid a race condition (described in
-       handle_wait_signal), we arrange for gc_resume_signal to be
-       blocked while the handler runs.  */
-    action.sa_sigaction = handle_wait_signal;
-    sigemptyset (&action.sa_mask);
-    sigaddset (&action.sa_mask, gc_resume_signal);
-    action.sa_flags = SA_SIGINFO;
-    sigaction (gc_wait_signal, &action, &oldaction);
-
-    /* If there was already a handler established for this signal,
-       then someone must be already using it for something else, so
-       abort.  */
-    assert (oldaction.sa_sigaction == SIG_DFL
-            && oldaction.sa_handler == SIG_DFL);
-
-    /* The resume signal has a trivial handler.  The only function of
-       this signal is to make the call to sigsuspend return.  */
-    action.sa_sigaction = handle_resume_signal;
-    sigemptyset (&action.sa_mask);
-    action.sa_flags = 0;
-    sigaction (gc_resume_signal, &action, &oldaction);
-
-    /* As above.  */
-    assert (oldaction.sa_sigaction == SIG_DFL
-            && oldaction.sa_handler == SIG_DFL);
-  }
+  walk_ref_group_refs (global_refs, rootf);
 }
 
 
@@ -641,13 +834,7 @@
 mn__init_roots ()
 {
   init_gc_signals ();
+  global_refs = make_ref_group ();
   assert (sem_init (&thread_waiting_semaphore, 0, 0) == 0);
-  assert (pthread_key_create (&mn_thread_key, mn_thread_key_destroy) == 0);
+  assert (pthread_key_create (&self_key, self_key_destroy) == 0);
 }
-
-
-/* Issues:
-
-   Is pthread_getspecific async-safe?  If not, then the code to set
-   'self' in handle_wait_signal needs to be reworked.  Check the POSIX
-   spec.  */

Modified: trunk/gc/roots.h
===================================================================
--- trunk/gc/roots.h	2003-09-02 06:07:01 UTC (rev 46)
+++ trunk/gc/roots.h	2003-09-02 21:58:52 UTC (rev 47)
@@ -4,6 +4,7 @@
 #ifndef MN__GC_ROOTS_H
 #define MN__GC_ROOTS_H
 
+#include <signal.h>
 #include <pthread.h>
 #include "tagged.h"
 
@@ -12,8 +13,27 @@
    documentation.  */
 extern pthread_mutex_t mn__gc_mutex;
 
-/* Apply F to the address of every root in the system.
-   You must hold mn__gc_mutex to call this function.  */
-void mn__for_each_root (void (*f) (tagged_t *));
+/* Stop all mutator threads, other than the calling thread, and
+   synchronize memory with them.  You must hold mn__gc_mutex while
+   calling this function.  */
+void mn__pause_mutator_threads (void);
+
+/* Apply ROOTF to a pointer to the heap pointer in every live
+   reference in the system, both local and global.  You must hold
+   mn__gc_mutex while calling this function, and all mutator threads
+   must be stopped (other than the caller).  */
+void mn__walk_ref_roots (void (*rootf) (tagged_t *));
+
+/* Apply THREADF to a pointer to a sigcontext structure holding the
+   registers of every thread in the system that could be running
+   Scheme code; if THREADF changes register values, the thread will
+   resume with the changed values.  You must hold mn__gc_mutex to call
+   this function, and all mutator threads must be stopped (other than
+   the caller).  */
+void mn__walk_threads (void (*threadf) (struct sigcontext *));
+
+/* Allow all mutator threads to continue running.  You must hold
+   mn__gc_mutex while calling this function.  */
+void mn__continue_mutator_threads (void);
    
 #endif /* MN__GC_ROOTS_H */



From minor-owner@red-bean.com Tue Sep  2 17:31:27 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h82MVRnd017080
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 2 Sep 2003 17:31:27 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h82MVRAB017078
	for minor-commits@red-bean.com; Tue, 2 Sep 2003 17:31:27 -0500
Date: Tue, 2 Sep 2003 17:31:27 -0500
Message-Id: <200309022231.h82MVRAB017078@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 48 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-02 17:31:23 -0500 (Tue, 02 Sep 2003)
New Revision: 48

Modified:
   trunk/gc/generic-map.h
   trunk/gc/roots.c
   trunk/gc/roots.h
Log:
Remove my name from the head of each file.  In the long run, I hope
that lots of people will contribute to Minor, and it will be
misleading for my name to sit at the top.


Modified: trunk/gc/generic-map.h
===================================================================
--- trunk/gc/generic-map.h	2003-09-02 21:58:52 UTC (rev 47)
+++ trunk/gc/generic-map.h	2003-09-02 22:31:23 UTC (rev 48)
@@ -1,5 +1,4 @@
-/* generic-map.h --- tracking GC'd memory, given per-arch parameters
-   Jim Blandy <jimb@red-bean.com> --- July 2003  */
+/* generic-map.h --- tracking GC'd memory, given per-arch parameters.  */
 
 #ifndef MINOR_GC_GENERIC_MAP_H
 #define MINOR_GC_GENERIC_MAP_H

Modified: trunk/gc/roots.c
===================================================================
--- trunk/gc/roots.c	2003-09-02 21:58:52 UTC (rev 47)
+++ trunk/gc/roots.c	2003-09-02 22:31:23 UTC (rev 48)
@@ -1,6 +1,6 @@
-/* roots.c --- tracking roots for garbage collection
-   Jim Blandy <jimb@red-bean.com> --- August 2003  */
+/* roots.c --- tracking roots for garbage collection.  */
 
+
 /* To begin a collection, the first task is to find each thread's PC,
    and see what it's executing.  We use the following protocol for
    this:

Modified: trunk/gc/roots.h
===================================================================
--- trunk/gc/roots.h	2003-09-02 21:58:52 UTC (rev 47)
+++ trunk/gc/roots.h	2003-09-02 22:31:23 UTC (rev 48)
@@ -1,5 +1,4 @@
-/* roots.h --- finding garbage collection roots
-   Jim Blandy <jimb@red-bean.com> --- August 2003  */
+/* roots.h --- finding garbage collection roots.  */
 
 #ifndef MN__GC_ROOTS_H
 #define MN__GC_ROOTS_H



From minor-owner@red-bean.com Tue Sep  2 23:21:20 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h834LJnd002551
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 2 Sep 2003 23:21:20 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h834LJvM002549
	for minor-commits@red-bean.com; Tue, 2 Sep 2003 23:21:19 -0500
Date: Tue, 2 Sep 2003 23:21:19 -0500
Message-Id: <200309030421.h834LJvM002549@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 49 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-02 23:21:16 -0500 (Tue, 02 Sep 2003)
New Revision: 49

Added:
   trunk/gc/refs.c
   trunk/gc/refs.h
Modified:
   trunk/gc/roots.c
Log:
* gc/refs.c, gc/refs.h: Move reference and reference group code into
its own file.
* gc/roots.c: No longer here.


Added: trunk/gc/refs.c
===================================================================
--- trunk/gc/refs.c	2003-09-02 22:31:23 UTC (rev 48)
+++ trunk/gc/refs.c	2003-09-03 04:21:16 UTC (rev 49)
@@ -0,0 +1,197 @@
+/* refs.c --- implementation of references, and reference groups.  */
+
+#include "refs.h"
+
+
+/* The reference clump and reference group types.  */
+
+
+#define NUM_REFS_PER_CLUMP (1024)
+
+
+/* A clump of references.  We allocate references a clump at a time.
+   If this clump holds global references, then you must hold
+   mn__gc_mutex to access this structure.  Otherwise, it holds local
+   references, and you must set your 'incoherent' flag while accessing
+   it.  */
+struct ref_clump
+{
+  /* All the reference clumps belonging to a particular reference
+     group are chained together in a singly-linked list.  To free the
+     reference group, we free each clump in this list.  */
+  struct ref_clump *next;
+
+  /* The references that live in this clump.  */
+  mn_ref refs[NUM_REFS_PER_CLUMP];
+};
+
+
+/* A reference group.  */
+struct ref_group
+{
+  /* All the clumps belonging to this reference group.  */
+  struct ref_clump *clumps;
+
+  /* The head of the free list.  This is a singly-linked list, chained
+     through the refs' 'next' fields.
+
+     Note that this only includes references that have been allocated
+     before, but are free at the moment.  There may also be some refs
+     that have never been allocated at all, at the end of the first
+     clump; see the 'first_never_used' field of 'struct ref_clump'.  */
+  mn_ref *free;
+
+  /* If CLUMPS is non-zero, this is the index of the first ref in
+     CLUMPS->refs that has never been allocated.  All refs from this
+     point to the end of CLUMPS->refs are free.  If this is
+     NUM_REFS_PER_CLUMP, then every ref in CLUMPS was allocated at
+     some point.
+
+     In all clumps but the one at the front of the list, all refs must
+     have been allocated at some point.  (This is easy to ensure, by
+     simply never adding new clumps to the list until we have no free
+     references.)  */
+  short first_never_used;
+
+  /* The head of the allocated list.  */
+  mn_ref allocated;
+};
+
+
+
+/* Allocating and freeing reference groups.  */
+
+
+struct ref_group *
+mn__make_ref_group (void)
+{
+  struct ref_group *g = (struct ref_group *) mn__gc_xmalloc (sizeof (*g));
+
+  g->clumps = 0;
+  g->free = 0;
+  g->allocated.next = &g->allocated;
+  g->allocated.prev = &g->allocated;
+
+  return g;
+}
+
+
+void
+mn__free_ref_group (struct ref_group *g)
+{
+  struct ref_clump *c, *next;
+
+  for (c = g->clumps; c; c = next)
+    {
+      next = c->next;
+      mn__gc_xfree (c);
+    }
+
+  mn__gc_xfree (g);
+}
+
+
+
+/* Allocating and freeing references.  */
+
+mn_ref *
+mn__make_ref (struct ref_group *g, tagged_t obj)
+{
+  mn_ref *r;
+
+  /* Are there any refs on the free list?  */
+  if (g->free.next)
+    {
+      /* Yes, peel off the first one.  */
+      r = g->free.next;
+      g->free.next = r->next;
+    }
+
+  /* Does the first clump have any virgin refs?  */
+  else if (g->clumps && g->first_never_used < NUM_REFS_PER_CLUMP)
+    {
+      /* Yes, peel off the next virgin ref.  */
+      r = &g->clumps->refs[g->first_never_used++];
+    }
+  else
+    {
+      /* Allocate and initialize a new reference clump, add it to the
+         group's clump list, and grab the first reference from it.  */
+      struct ref_clump *new
+        = (struct ref_clump *) mn__gc_xmalloc (sizeof (*new));
+
+      new->next = g->clumps;
+      g->clumps = new;
+      r = &new->refs[0];
+      g->first_never_used = 1;
+    }      
+
+  /* Add R to the allocated list.  */
+  r->next = g->allocated.next;
+  r->prev = g->allocated.prev;
+  g->allocated.next->prev = r;
+  g->allocated.next = r;
+
+  r->group = g;
+  r->obj = obj;
+
+  return r;
+}
+
+
+void
+mn__free_ref (mn_ref *r)
+{
+  struct group *g = r->group;
+
+  /* Remove R from its allocated list.  */
+  r->next->prev = r->prev;
+  r->prev->next = r->next;
+
+  /* Add R to its free list.  */
+  r->next = g->free;
+  g->free = r;
+}
+
+
+void
+mn__walk_ref_group (struct ref_group *g,
+                    void *closure,
+                    void (*rootf) (void *closure, mn_ref *ref))
+{
+  mn_ref *r;
+  mn_ref *head = &g->allocated;
+
+  for (r = head->next; r != head; r = r->next)
+    rootf (closure, r)
+}
+
+
+
+
+/* Outstanding issues.  */
+
+/* Are refs too large?
+
+   At the moment, the mn_ref structure is four words long.  Most of
+   the size is related to the need to be able to free individual refs
+   in constant time.
+
+   I came up with a way to make it fit in two words, but it's hairy,
+   and I don't know for sure that it's necessary, so I took it out.
+
+   The idea was to make the allocated and free lists per-clump; that
+   way, if NUM_REFS_PER_CLUMP is 1024, you only need ten bits for
+   'next' and 'prev'; they can be bitfields.  Then, you need a way to
+   find the appropriate allocated and free lists for a given ref.  So
+   you put 'obj' in a union with 'struct ref_clump *clump', and then
+   make each clump's refs[0] special: refs[0].u.clump points to the
+   'struct clump' that contains it.  (refs[0] in each clump is
+   dedicated to holding this pointer; you don't use it as a normal
+   ref.)  Now, add another 10-bit field, 'self', which holds each
+   ref's index in the 'refs' array; this means that
+   R[-R->self].u.clump is the clump containing the reference R.  Three
+   ten-bit bitfields fit in a single 32-bit word.
+
+   Anyway, you can see why I took it out.  */
+

Added: trunk/gc/refs.h
===================================================================
--- trunk/gc/refs.h	2003-09-02 22:31:23 UTC (rev 48)
+++ trunk/gc/refs.h	2003-09-03 04:21:16 UTC (rev 49)
@@ -0,0 +1,87 @@
+/* refs.h --- interface to heap reference management  */
+
+#ifndef MN__GC_REFS_H
+#define MN__GC_REFS_H
+
+#include "minor/minor.h"
+#include "tagged.h"
+
+
+/* Thread safety rules.  */
+
+/* The functions declared in this header are not thread-safe, except
+   where noted otherwise; they don't do any mutual exclusion or
+   synchronization.  It is their users' responsibility to ensure that
+   only one thread operates on a given reference group at a time, and
+   that memory is properly synchronized on objects that are shared
+   between threads.
+
+   However, each reference group is completely independent from the
+   others.  If a reference group is used only by a single thread, no
+   synchronization is needed.  */
+
+
+
+/* Reference groups.  */
+
+/* A group of references, which can be quickly freed en masse.  */
+struct ref_group;
+
+/* Create a new, empty reference group.  This function is
+   thread-safe.  */
+struct ref_group *mn__make_ref_group (void);
+
+/* Free the reference group G.  */
+void mn__free_ref_group (struct ref_group *g);
+
+
+/* References.  */
+
+struct mn_ref
+{
+  /* The object we refer to, as a tagged value (assuming this
+     reference is allocated).
+
+     This is volatile, to keep compilers from moving reads and writes
+     to this around reads and writes to other volatile things, like
+     threads' 'incoherent' flags.  */
+  volatile tagged_t obj;
+
+  /* The following are pretty much for internal use by refs.c.  */
+
+  /* The group this reference belongs to.  We use this to distinguish
+     local and global refs, and to find the right allocated / free
+     lists.  */
+  struct ref_group *group;
+
+  /* All the allocated references in a ref group G are in a
+     doubly-linked list, headed by G->allocated and chained through
+     these fields.  This allows the GC to traverse only the live
+     references in a group, ignoring free references, while still
+     allowing a reference to be freed quickly.
+
+     All the references in a group that have been allocated before,
+     but are free at the moment, are in a singly-linked list, headed
+     by G->free and chained through the 'next' field.  This allows us
+     to find previously freed refs in constant time.
+
+     (There may also be some refs that have never been allocated at
+     all, at the end of the first clump; see the 'first_never_used'
+     field of 'struct ref_clump'.)  */
+  struct mn_ref *next, *prev;
+};
+
+
+/* Allocate a new reference to OBJ in reference group GROUP.  */
+mn_ref *mn__make_ref (struct ref_group *group, tagged_t obj);
+
+/* Free the reference REF.  */
+void mn__free_ref (mn_ref *ref);
+
+/* Apply ROOTF to CLOSURE and each reference in G.  */
+void mn__walk_ref_group (struct ref_group *g,
+                         void *closure,
+                         void (*rootf) (void *closure, mn_ref *ref));
+
+
+#endif /* MN__GC_REFS_H */

Modified: trunk/gc/roots.c
===================================================================
--- trunk/gc/roots.c	2003-09-02 22:31:23 UTC (rev 48)
+++ trunk/gc/roots.c	2003-09-03 04:21:16 UTC (rev 49)
@@ -504,246 +504,6 @@
 
 
 
-/* References, reference clumps, and reference groups.  */
-
-
-/* A Minor reference.
-
-   At the moment, this is four words long.  I came up with a way to
-   make it fit in two words, but it's hairy, and I don't know for sure
-   that it's necessary, so I took it out.
-
-   (The idea was to make the allocated and free lists per-clump; that
-   way, if NUM_REFS_PER_CLUMP is 1024, you only need ten bits for
-   'next' and 'prev'; they can be bitfields.  Then, you need a way to
-   find the appropriate allocated and free lists for a given ref.  So
-   you put 'obj' in a union with 'struct ref_clump *clump', and then
-   make each clump's refs[0] special: refs[0].u.clump points to the
-   'struct clump' that contains it.  (refs[0] in each clump is
-   dedicated to holding this pointer; you don't use it as a normal
-   ref.)  Now, add another 10-bit field, 'self', which holds each
-   ref's index in the 'refs' array; this means that
-   R[-R->self].u.clump is the clump containing the reference R.  Three
-   ten-bit bitfields fit in a single 32-bit word.
-
-   Anyway, you can see why I took it out.)  */
-
-struct mn_ref
-{
-  /* The object we refer to, as a tagged value (assuming this
-     reference is allocated).
-
-     You must have your 'incoherent' flag set while accessing this.
-     If this is a global ref, it's up to the user's code to ensure that
-     two threads don't read or write a ref at the same time.
-
-     This is volatile, to keep compilers from moving reads and
-     writes to this around reads and writes to the thread's
-     'incoherent' flag, which is also volatile.  */
-  volatile tagged_t obj;
-
-  /* The group this reference belongs to.  We use this to distinguish
-     local and global refs, and to find the right allocated / free
-     lists.
-
-     You may always read this, without holding any locks.  This should
-     only be changed when a ref is being allocated or deallocated, in
-     which case nobody else should be able to see it anyway.  */
-  struct ref_group *group;
-
-  /* All the allocated references in a ref group G are in a
-     doubly-linked list, headed by G->allocated and chained through
-     these fields.  This allows the GC to traverse only the live
-     references in a group, ignoring free references, while still
-     allowing a reference to be freed quickly.
-
-     All the references in a group that have been allocated before,
-     but are free at the moment, are in a singly-linked list, headed
-     by G->free and chained through the 'next' field.  This allows us
-     to find previously freed refs in constant time.
-
-     (There may also be some refs that have never been allocated at
-     all, at the end of the first clump; see the 'first_never_used'
-     field of 'struct ref_clump'.)
-
-     If this is a global ref, you must hold mn__gc_mutex to access
-     these fields.  If this is a local ref, then only the owning
-     thread should ever access it, unless it's been stopped for
-     GC.  */
-  struct mn_ref *next, *prev;
-};
-
-
-#define NUM_REFS_PER_CLUMP (1024)
-
-
-/* A clump of references.  We allocate references a clump at a time.
-   If this clump holds global references, then you must hold
-   mn__gc_mutex to access this structure.  Otherwise, it holds local
-   references, and you must set your 'incoherent' flag while accessing
-   it.  */
-struct ref_clump
-{
-  /* All the reference clumps belonging to a particular reference
-     group are chained together in a singly-linked list.  To free the
-     reference group, we free each clump in this list.  */
-  struct ref_clump *next;
-
-  /* The references that live in this clump.  */
-  mn_ref refs[NUM_REFS_PER_CLUMP];
-
-  /* The index of the first ref in REFS that has never been allocated.
-     All refs from this point to the end of REFS are free.  If this is
-     NUM_REFS_PER_CLUMP, then every ref in this clump was allocated at
-     some point.
-
-     This must be NUM_REFS_PER_CLUMP in all but the first clump in the
-     group's clump list.  We should never need to allocate a new clump
-     if there are any other clumps containing virgin refs, which means
-     there should only ever be one such clump, which means we can
-     stipulate that it stay at the front of the list.  */
-  short first_never_used;
-};
-
-
-/* A reference group.  */
-struct ref_group
-{
-  /* All the clumps belonging to this reference group.  */
-  struct ref_clump *clumps;
-
-  /* The head of the free list.  This is a singly-linked list, chained
-   through the refs' 'next' fields.
-
-     Note that this only includes references that have been allocated
-     before, but are free at the moment.  There may also be some refs
-     that have never been allocated at all, at the end of the first
-     clump; see the 'first_never_used' field of 'struct ref_clump'.  */
-  mn_ref *free;
-
-  /* The head of the allocated list.  */
-  mn_ref allocated;
-};
-
-
-/* The group containing all global references.  You must hold
-   mn__gc_mutex to access any of the structures this points to, except
-   for the refs' 'global' and 'u.obj' fields.  */
-static struct ref_group *global_refs;
-
-
-/* Allocate a fresh reference to OBJ in group G.
-
-   You must have your 'incoherent' flag set (since you're holding a
-   direct reference to the heap), and if G is the global reference
-   group, you must hold mn__gc_mutex.  */
-static mn_ref *
-make_ref (struct ref_group *g, tagged_t obj)
-{
-  mn_ref *r;
-
-  /* Are there any refs on the free list?  */
-  if (g->free.next)
-    {
-      /* Yes, peel off the one on the front.  */
-      r = g->free.next;
-      g->free.next = r->next;
-    }
-
-  /* Does the first clump have any virgin refs?  */
-  else if (g->clumps && g->clumps->first_never_used < NUM_REFS_PER_CLUMP)
-    {
-      /* Yes, peel off the next virgin ref.  */
-      r = &g->clumps->refs[g->clumps->first_never_used++];
-    }
-  else
-    {
-      /* Allocate and initialize a new reference clump, add it to the
-         group's clump list, and grab the first reference from it.  */
-      struct ref_clump *new
-        = (struct ref_clump *) mn__gc_xmalloc (sizeof (*new));
-
-      new->next = g->clumps;
-      g->clumps = new;
-      r = &new->refs[0];
-      new->first_never_used = 1;
-    }      
-
-  /* Add R to the allocated list.  */
-  r->next = g->allocated.next;
-  r->prev = g->allocated.prev;
-  g->allocated.next->prev = r;
-  g->allocated.next = r;
-
-  r->group = g;
-  r->obj = obj;
-}
-
-
-/* Free the reference R.  
-
-   You must have your 'incoherent' flag set, and if G is the global
-   reference group, you must hold mn__gc_mutex.  */
-static void
-free_ref (mn_ref *r)
-{
-  struct group *g = r->group;
-
-  /* Remove R from its allocated list.  */
-  r->next->prev = r->prev;
-  r->prev->next = r->next;
-
-  /* Add R to its free list.  */
-  r->next = g->free;
-  g->free = r;
-}
-
-
-/* Return a new, empty reference group.  */
-static struct ref_group *
-make_ref_group (void)
-{
-  struct group *g = (struct group *) mn__gc_xmalloc (sizeof (*g));
-
-  g->clumps = 0;
-  g->free = 0;
-  g->allocated.next = &g->allocated;
-  g->allocated.prev = &g->allocated;
-
-  return g;
-}
-
-
-/* Free the reference group G.  */
-static void
-free_ref_group (struct ref_group *g)
-{
-  struct ref_clump *c, *next;
-
-  for (c = g->clumps; c; c = next)
-    {
-      next = c->next;
-      mn__gc_xfree (c);
-    }
-
-  mn__gc_xfree (g);
-}
-
-
-/* Apply ROOTF to a pointer to the 'obj' field of every reference in
-   the reference group G.  */
-static void
-walk_ref_group_refs (struct ref_group *g, void (*rootf) (tagged_t *))
-{
-  mn_ref *r;
-  mn_ref *head = &g->allocated;
-
-  for (r = head->next; r != head; r = r->next)
-    rootf (&r->obj)
-}
-
-
-
 /* Calls.  */
 
 



From minor-owner@red-bean.com Sat Sep  6 03:20:46 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h868Kknd013036
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 6 Sep 2003 03:20:46 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h868Kje1013034
	for minor-commits@red-bean.com; Sat, 6 Sep 2003 03:20:45 -0500
Date: Sat, 6 Sep 2003 03:20:45 -0500
Message-Id: <200309060820.h868Kje1013034@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 50 - trunk/include/minor
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-06 03:20:42 -0500 (Sat, 06 Sep 2003)
New Revision: 50

Modified:
   trunk/include/minor/minor.h
Log:
* include/minor/minor.h: Talk about synchronization issues a bit.


Modified: trunk/include/minor/minor.h
===================================================================
--- trunk/include/minor/minor.h	2003-09-03 04:21:16 UTC (rev 49)
+++ trunk/include/minor/minor.h	2003-09-06 08:20:42 UTC (rev 50)
@@ -103,6 +103,11 @@
      reclaimed, C code must take care of freeing them at the right
      time itself; global refs are more work to manage.
 
+     Like any other kind of object shared between multiple threads,
+     it's up to the user of this interface to ensure that one thread
+     isn't using a global reference while another thread is freeing
+     it.
+
    This interface provides functions to convert local mn_refs to
    global mn_refs and vice versa, and a function to explicitly free
    refs when necessary.
@@ -121,21 +126,44 @@
    obvious way, we don't bother to give it a name in the prototype,
    for (a tiny bit of) legibility.
 
-   Except where stated otherwise, all the functions in this interface
-   promise to free any local refs they allocate before they return
-   (other than the local ref(s) they return).  This allows these
-   functions to be used within long-running loops without accumulating
-   local refs the caller has no way to free.
+   Some subtleties:
 
-   You may notice that even functions that don't need to allocate or
-   return local references still expect a call argument --- if there's
-   no need to indicate who should own any new local refs, why does the
-   function need to know the current call?  The collector also uses
-   calls internally, as a cheap way to keep track of which threads
-   might be accessing heap objects.  So any function which touches the
-   collected heap at all benefits from having a call object handy.  */
+   - Except where we state otherwise, all the functions in this
+     interface promise to free any local refs they allocate before
+     they return (other than the local ref(s) they return).  This
+     allows these functions to be used within long-running loops
+     without accumulating local refs the caller has no way to free.
 
+   - You may notice that even functions that don't need to allocate or
+     return local references still expect a call argument --- if
+     there's no need to indicate who should own any new local refs,
+     why does the function need to know the current call?  The
+     collector also uses calls internally, as a cheap way to keep
+     track of which threads might be accessing heap objects.  If
+     you've ever touched a heap object, you must have a call.  So we
+     can take care of adding a thread to our list when we allocate
+     them their first call --- instead of having every function in
+     this interface check to make sure the calling thread is
+     registered.
 
+   - You may notice that this interface doesn't provide any functions
+     that change the heap object a reference refers to.  This is an
+     important property, because it allows this interface to be
+     perfectly thread-safe without doing any memory or execution
+     synchronization while accessing references.  Where the user's
+     code shares references between threads (global references only,
+     please), it's the user's responsibility to do the right sorts of
+     mutual exclusion to make that sharing kosher --- and that takes
+     care of us, as well.  The users manage the same synchronization
+     burden they've always had; we don't add to it, in complexity or
+     run-time overhead.
+
+     (The functions mn_to_car look a little like side-effecting
+     functions, but the specification actually says they destroy the
+     original reference and return you a fresh one.  And it's always
+     up to the client code to ensure that nobody destroys an object
+     while someone else is using it.)  */
+   
 /* The mn_ref and mn_call types are opaque to clients of this file.  */
 typedef struct mn_ref mn_ref;
 typedef struct mn_call mn_call;
@@ -294,6 +322,12 @@
 /* Allocate a new pair whose car is ELT and whose cdr is LIST.  Free
    LIST, and return a reference to the new pair.
 
+   You may be tempted to construct a list by starting with mn_nil and
+   then mn_push-ing things onto it.  But since mn_push frees LIST,
+   this would free the global reference mn_nil --- which would be
+   rude.  So start by calling mn_make_local_ref or mn_make_global_ref
+   to get your own reference to nil, and then push elements onto that.
+
    This is no different from calling mn_cons and then mn_free_ref,
    except that it's a little more readable, and the implementation can
    optimize the process.  */



From minor-owner@red-bean.com Sat Sep  6 03:21:46 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h868Lknd013123
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 6 Sep 2003 03:21:46 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h868Lka6013121
	for minor-commits@red-bean.com; Sat, 6 Sep 2003 03:21:46 -0500
Date: Sat, 6 Sep 2003 03:21:46 -0500
Message-Id: <200309060821.h868Lka6013121@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 51 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-06 03:21:42 -0500 (Sat, 06 Sep 2003)
New Revision: 51

Modified:
   trunk/gc/roots.c
Log:
* gc/roots.c: guts moved elsewhere


Modified: trunk/gc/roots.c
===================================================================
--- trunk/gc/roots.c	2003-09-06 08:20:42 UTC (rev 50)
+++ trunk/gc/roots.c	2003-09-06 08:21:42 UTC (rev 51)
@@ -1,128 +1,7 @@
 /* roots.c --- tracking roots for garbage collection.  */
 
 
-/* To begin a collection, the first task is to find each thread's PC,
-   and see what it's executing.  We use the following protocol for
-   this:
 
-   - Every function in the C API requires an mn_call object, except
-     for two: mn_thread_first_call and mn_init.  This means that we
-     can use those functions to maintain a list of all the threads
-     that could possibly operate on references.  For each such thread,
-     we register handlers for two signals, gc_wait_signal and
-     gc_resume_signal.  We use the 'sigaction' system call to ensure
-     that gc_resume_signal is blocked when gc_wait_signal's handler is
-     called.
-
-   - A thread wishing to perform a collection acquires mn__gc_mutex,
-     the global GC mutex.  This is also the mutex that protects the
-     global thread list.
-
-   - The collecting thread walks the thread list, sending every thread
-     (other than itself) gc_wait_signal.
-
-   - Each signalled thread's handler for gc_wait_signal
-     receives a pointer to a sigcontext structure as one of its
-     arguments; this structure contains the register values of the
-     interrupted code.  The handler takes the following steps:
-
-     - It stores a pointer to this sigcontext in its per-thread
-       structure, where the collecting thread can find it.
-
-     - It does a sem_post on thread_waiting_semaphore, to tell the GC
-       that it has stored its sigcontext pointer in the thread
-       structure.
-
-     - It does a sigsuspend, with every signal but gc_resume_signal
-       blocked.  That signal has a trivial handler.
-
-     - When the sigsuspend returns, the gc_wait_signal handler
-       returns.
-
-     Note that, since we're in a handler for gc_wait_signal, we know
-     that gc_resume_signal must have been blocked from the moment we
-     entered the handler, so if the collecting thread finishes the
-     collection and sends us the second mn__gc_signal between the time
-     we post thread_waiting_semaphore and the time we do the
-     sigsuspend, we'll still receive it.
-
-   - The collecting thread does a sem_wait once for every thread it
-     signalled.  This process will complete only when every thread has
-     posted on the semaphore.  Now we know that every thread has
-     stored its sigcontext its thread structure, and we can find its
-     PC.
-
-   After posting to thread_waiting_semaphore, but before it is sent
-   gc_resume_signal, a thread is considered to be "waiting for
-   collection".  We use this term in describing the rules for
-   accessing some of the fields in the structures below.
-
-   Now, in order to map out these blocks of code, find out which
-   global variables they refer to, which other code blocks they jump
-   to, and how the stack frames are laid out, we need extensive
-   annotations from the compiler.  The Minor compiler provides these
-   annotations, but the C compiler does not, so we need to handle C
-   code specially.
-
-   The only pointers to heap objects C code is allowed to have are
-   those in mn_ref objects.  This makes things simple.  There are no C
-   global variables pointing into the heap.  Registers, as used by C
-   functions, don't point into the heap either: they point at refs.
-   As do stack frames.  All our difficult problems are gone.
-
-   But since even the most trivial operation on those pointers gives
-   the C compiler freedom to load them into registers, make derived
-   values, exclusive-or them with 45, do magic, and then exclusive-or
-   them back, etc., this rule basically means that C code can't
-   operate on them at all.
-
-   That's a bit harsh, so we relax it a bit.  We allow code internal
-   to the Minor C library to set a flag in the thread structure,
-   "incoherent", indicating that it's operating on heap object
-   references directly.  This flag is volatile, and has type
-   sig_atomic_t, so the gc_wait_signal signal handler can check it,
-   before it does anything else.  If it is set, then the handler sets
-   the thread's collection_waiting flag (also volatile and
-   sig_atomic_t), and returns.  When the interrupted code is finished
-   operating on pointers to heap objects, and they are all safely
-   packed away in mn_ref objects again, it clears the incoherent flag.
-   Then, if the collection_waiting flag is set, then it sends
-   itself a gc_wait_signal.
-
-   This is nice, because it means that C code can work on heap
-   references without having to acquire and release a mutex each time
-   we become incoherent, or return to coherence.  The only
-   communication necessary in those cases is with our own signal
-   handler, which we can do cheaply with volatile sig_atomic_t flags.
-   Inter-thread synchronization only takes place:
-   - when a collection is actually needed, in the gc_wait_signal
-     handler;
-   - when we need to allocate or free global references, to protect
-     the shared data structures managing global references; and
-   - when a thread gets its initial mn_call object, or when it dies,
-     to protect the global thread list.
-
-   The rules for when the various structure fields may be accessed are
-   horribly complex.  All these 'volatile' annotations, rules for
-   where hoist and drop barriers need to go, and so on, are hard to
-   keep track of.
-
-   But I think it all follows from the decision to not require the
-   user's C code to call a safe-point "check for gc" function
-   periodically, with collections blocking indefinitely if they fail
-   to do so.  In most cases, the user is probably doesn't even have
-   control over all the libraries their program will be using, so they
-   can't make the safe point calls that would be needed.  Furthermore,
-   requiring safe point calls is an ongoing maintenance burden:
-   keeping track of which loops need safe point calls, and whether
-   each modification changes the status of some existing loop, is too
-   hard.
-
-   But given the decision not to require safe points, I think it
-   follows that one needs to use signals to gather threads' state.
-   And given that, one needs to worry about code reordering --- thus
-   the 'volatile' qualifiers and the hoist/drop barriers.  */
-
 #include <assert.h>
 #include <stdbool.h>
 #include <signal.h>
@@ -135,88 +14,6 @@
 
 /* The thread list.  */
 
-/* A thread that has ever had an mn_call.  Since every function in the
-   C API that operates on mn_refs requires an mn_call argument, there
-   is one of these for every thread that could have any local
-   refs.  */
-struct mn_thread
-{
-  /* Forward and backward links in the doubly-linked list of all
-     mn_thread structures, headed by 'thread_list'.  You must hold
-     mn__gc_mutex to access these fields.  */
-  struct mn_thread *prev, *next;
-
-  /* The pthread that owns this structure.  You must hold mn__gc_mutex
-     to access this field.  */
-  pthread_t thread;
-
-  /* The following fields may only by accessed by THREAD itself ---
-     either by ordinary code, or the gc_wait_signal handler.  */
-
-  /* True if this thread has any live pointers to heap objects, other
-     than in mn_refs; false otherwise.  Ordinary code sets and clears
-     this; the signal handler reads it.  */
-  volatile sig_atomic_t incoherent;
-
-  /* True if this thread should send itself gc_wait_signal once it's
-     coherent again (i.e., after clearing 'incoherent').  The signal
-     handler sets this; ordinary code reads and clears it.  */
-  volatile sig_atomic_t collection_waiting;
-
-  /* When waiting for collection, this is a sigcontext structure
-     giving the values of thread's registers when it received the
-     gc_wait_signal.  In the collecting thread, this is zero.  At
-     other times, this is garbage.
-
-     This field may only be accessed by THREAD's gc_wait_signal
-     handler when the thread is not waiting for collection, and by the
-     collecting thread while it is.  */
-  struct sigcontext *regs;
-
-  /* The youngest call in this thread.
-
-     This field, and the structures it refers to, may only be modified
-     by the thread's ordinary code while the 'incoherent' flag is set.
-
-     If the thread is going to make any changes to non-volatile fields
-     of those structures, there must be hoist and drop barriers inside
-     the set and clear of 'incoherent', to prevent the compiler from
-     moving the instructions that modify the structure outside the
-     instructions that set and clear 'incoherent'.
-
-     Note that the ref's 'obj' field is volatile, so accesses to it
-     won't be reordered with respect to assignments to the (also
-     volatile) 'incoherent' flag.  So if you are just accessing that,
-     and not freeing or allocating refs, you don't need any hoist or
-     drop barriers.  */
-  mn_call *youngest_call;
-};
-
-
-/* The head of the global list of all threads.  */
-static struct mn_thread thread_list = {
-  &thread_list, &thread_list,
-  0, 0, 0, 0, 0
-};
-
-
-/* Two pointers to the current thread's structure --- one as a
-   __thread variable, and one as a POSIX thread-specific value.
-
-   This is kind of dumb, but we need them both.  pthread_getspecific
-   isn't async-safe, so we can't use it in handle_wait_signal.
-   __thread variables are async-safe, but don't have destructor
-   functions, so we can't use it to keep the thread list up to date.
-
-   Boehm has a hash table mapping pthread_t values onto his thread
-   structures, and he knows the hash table isn't being modified while
-   he's stopping threads, so his signal handler just calls
-   pthread_self (which isn't officially async-signal-safe) and looks
-   itself up.  */
-static __thread struct mn_thread *self;
-static pthread_key_t self_key;
-
-
 /* Create an entry for the calling thread, add it to the thread list,
    and return it.  You must *not* hold mn__gc_mutex while calling this
    function.  */
@@ -262,198 +59,6 @@
 
 
 
-/* Stopping the world, and getting its threads' registers.  */
-
-/* The signal the collecting thread sends to other mutator threads to
-   tell them to stop what they're doing, record their registers, and
-   wait for the GC to complete.  This is also the signal we send them
-   to indicate that the collection is complete, and they may
-   continue.  */
-static int gc_wait_signal;
-
-/* The signal the collecting thread sends to waiting mutator threads
-   to tell them they can continue.  */
-static int gc_resume_signal;
-
-/* Mutator threads post to this semaphore to indicate that they are
-   waiting.  The collecting thread waits once on this semaphore for
-   every mutator thread.  */
-static sem_t thread_waiting_semaphore;
-
-
-/* The handler for gc_wait_signal.  */
-static void
-handle_wait_signal (int signo, siginfo_t *info, void *context)
-{
-  /* Is this thread incoherent at the moment?  */
-  if (self->incoherent)
-    {
-      /* Request that it re-send the wait signal to itself once it's
-         coherent, and return.  */
-      self->collection_waiting = true;
-      return;
-    }
-
-  /* The 'context' argument is a pointer to a sigcontext structure,
-     which holds the values this thread's registers had before it
-     received the GC wait signal.  Save that pointer in our thread
-     structure, so the collecting thread can find it.
-
-     This is really what it's all about; everything else here is just
-     synchronization chit-chat.  */
-  self->regs = (struct sigcontext *) context;
-
-  /* We've provided the info the collecting thread needs, so post to
-     thread_waiting_semaphore to allow it to continue.
-
-     sem_post is async-safe, and is memory-synchronizing, so the
-     collecting thread will see all our writes.  */
-  assert (sem_post (&thread_waiting_semaphore) == 0);
-    
-  /* Wait for the collecting thread to re-awaken us, by sending us
-     gc_resume_signal.  */
-  {
-    /* You might think there would be a race condition here: what if
-       the collecting thread completes the collection and sends us
-       gc_resume_signal before we wait for it?  But it's okay: when we
-       established the handler for gc_wait_signal, we asked that
-       gc_resume_signal also be blocked while this handler is running.
-       So if the collecting thread sends us the signal early, it'll
-       just remain pending until we do the sigsuspend here.
-
-       In fact, collection could complete and a new collection could
-       start before we get to the sigsuspend.  But that's okay, too:
-       the first gc_wait_signal has definitely been delivered, and
-       it's blocked while the signal handler is running, so that'll
-       just remain pending across this sigsuspend call, and be
-       delivered as soon as this signal handler returns.
-
-       The sig.*set functions and sigsuspend are async-safe.  */
-    sigset_t wait_for_resume_set;
-    sigfillset (&wait_for_resume_set);
-    sigdelset (&wait_for_resume_set, gc_resume_signal);
-    assert (sigsuspend (&wait_for_resume_set) == -1
-            && errno == EINTR);
-  }
-
-  /* Make sure that we can see all the work the collecting thread has
-     done.  Ensure that no reads or writes can be moved across this
-     point, by either the compiler or the memory model.  */
-  mn__memory_barrier ();
-}
-
-
-static void
-handle_resume_signal (int signo)
-{
-  /* Nothing needs to be done here.  We only send gc_resume_signal to
-     make the call to sigsuspend in handle_wait_signal return.  */
-}
-
-
-void
-mn__pause_mutator_threads ()
-{
-  struct mn_thread *t;
-
-  /* First, send all other threads the wait signal.  */
-  for (t = thread_list.next; t != &thread_list; t = t->next)
-    if (t != self)
-      assert (pthread_kill (t->thread, gc_wait_signal) == 0);
-
-  /* Now, we wait for all other threads to post to the semaphore.
-     sem_wait is a memory-synchronizing operation, so we will see all
-     threads' changes to the heap, and to their references.  */
-  for (t = thread_list.next; t != &thread_list; t = t->next)
-    if (t != self)
-      assert (sem_wait (&thread_waiting_semaphore) == 0);
-}
-
-
-void
-mn__continue_mutator_threads ()
-{
-  /* Make sure all the mutator threads can see the collection work
-     we've just done.  Ensure that no reads or writes can be moved
-     across this point, by either the compiler or the memory
-     model.  */
-  mn__memory_barrier ();
-
-  for (t = thread_list.next; t != &thread_list; t = t->next)
-    if (t != self)
-      assert (pthread_kill (t->thread, gc_resume_signal) == 0);
-}
-
-
-/* Choose signals the collecting thread should use to stop other
-   threads before a collection and resume them when we're done, and
-   set up the appropriate handlers.  */
-static void
-init_gc_signals (void)
-{
-  /* Ideally, there'd be some sanctioned way to allocate two signals
-     for our use, so we could be sure that we're not stepping on some
-     other module.  But there isn't --- not even for real-time
-     signals.  So we just hard-code things.  *sigh*  */
-#ifdef SIGRTMIN
-  /* These are chosen not to conflict with the signals Boehm's GC
-     uses, which are, in turn, chosen not to conflict with the ones
-     LinuxThreads uses.  */
-  gc_wait_signal = SIGRTMIN + 7;
-  gc_resume_signal = SIGRTMIN + 8;
-#else
-#error "cannot find an appropriate set of GC signals"
-  /* If you send me an appropriate clause for your system, I'd be
-     happy to include it amongst the above.  */
-#endif
-  
-  /* Set up handlers for the signals.  */
-  {
-    struct sigaction action, old_action;
-
-    /* The wait signal has the real handler that does all the
-       suspension work.  To avoid a race condition (described in
-       handle_wait_signal), we arrange for gc_resume_signal to be
-       blocked while the handler runs.  */
-    action.sa_sigaction = handle_wait_signal;
-    sigemptyset (&action.sa_mask);
-    sigaddset (&action.sa_mask, gc_resume_signal);
-    action.sa_flags = SA_SIGINFO;
-    sigaction (gc_wait_signal, &action, &oldaction);
-
-    /* If there was already a handler established for this signal,
-       then someone must be already using it for something else, so
-       abort.  */
-    assert (oldaction.sa_sigaction == SIG_DFL
-            && oldaction.sa_handler == SIG_DFL);
-
-    /* The resume signal has a trivial handler.  The only function of
-       this signal is to make the call to sigsuspend return.  */
-    action.sa_sigaction = handle_resume_signal;
-    sigemptyset (&action.sa_mask);
-    action.sa_flags = 0;
-    sigaction (gc_resume_signal, &action, &oldaction);
-
-    /* As above.  */
-    assert (oldaction.sa_sigaction == SIG_DFL
-            && oldaction.sa_handler == SIG_DFL);
-  }
-}
-
-
-void
-mn__walk_threads (void (*threadf) (struct sigcontext *))
-{
-  struct mn_thread *head = &thread_list;
-  struct mn_thread *t;
-
-  for (t = head->next; t != head; t = t->next)
-    if (t->context)
-      threadf (t->context);
-}
-
-
-
 /* Incoherent sections.  */
 
 /* Most of the structures in this file are shared between ordinary
@@ -593,8 +198,5 @@
 void
 mn__init_roots ()
 {
-  init_gc_signals ();
   global_refs = make_ref_group ();
-  assert (sem_init (&thread_waiting_semaphore, 0, 0) == 0);
-  assert (pthread_key_create (&self_key, self_key_destroy) == 0);
 }



From minor-owner@red-bean.com Sat Sep  6 03:24:10 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h868O9nd013285
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 6 Sep 2003 03:24:09 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h868O9LH013283
	for minor-commits@red-bean.com; Sat, 6 Sep 2003 03:24:09 -0500
Date: Sat, 6 Sep 2003 03:24:09 -0500
Message-Id: <200309060824.h868O9LH013283@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 52 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-06 03:24:06 -0500 (Sat, 06 Sep 2003)
New Revision: 52

Added:
   trunk/gc/gc.h
   trunk/gc/threads.c
   trunk/gc/threads.h
Removed:
   trunk/gc/roots.c
   trunk/gc/roots.h
Modified:
   trunk/gc/refs.c
   trunk/gc/refs.h
Log:
* gc/roots.c, gc/roots.h: Delete; this is now broken down into...
* gc/refs.h, gc/refs.c, gc/threads.h, gc/threads.c: ... these.


Added: trunk/gc/gc.h
===================================================================
--- trunk/gc/gc.h	2003-09-06 08:21:42 UTC (rev 51)
+++ trunk/gc/gc.h	2003-09-06 08:24:06 UTC (rev 52)
@@ -0,0 +1,17 @@
+/* gc.h --- internal interface to garbage collector.  */
+
+#ifndef MINOR_GC_H
+#define MINOR_GC_H
+
+#include <pthread.h>
+
+/* This mutex is held by the thread doing a garbage collection.  It
+   also protects various other structures in the collector.  */
+extern pthread_mutex_t mn__gc_mutex;
+
+
+/* Request a garbage collection.  */
+void mn__collect (void);
+
+
+#endif /* MINOR_GC_H */

Modified: trunk/gc/refs.c
===================================================================
--- trunk/gc/refs.c	2003-09-06 08:21:42 UTC (rev 51)
+++ trunk/gc/refs.c	2003-09-06 08:24:06 UTC (rev 52)
@@ -3,6 +3,38 @@
 #include "refs.h"
 
 
+/* The full reference structure.  */
+
+
+struct mn__ref
+{
+  /* The externally-visible fields.  */
+  struct mn_ref x;
+
+  /* The group this reference belongs to.  We use this to distinguish
+     local and global refs, and to find the right allocated / free
+     lists.  */
+  struct ref_group *group;
+
+  /* All the allocated references in a ref group G are in a
+     doubly-linked list, headed by G->allocated and chained through
+     these fields.  This allows the GC to traverse only the live
+     references in a group, ignoring free references, while still
+     allowing a reference to be freed quickly.
+
+     All the references in a group that have been allocated before,
+     but are free at the moment, are in a singly-linked list, headed
+     by G->free and chained through the 'next' field.  This allows us
+     to find previously freed refs in constant time.
+
+     (There may also be some refs that have never been allocated at
+     all, at the end of the first clump; see the 'first_never_used'
+     field of 'struct ref_clump'.)  */
+  struct mn__ref *next, *prev;
+};
+
+
+
 /* The reference clump and reference group types.  */
 
 
@@ -22,7 +54,7 @@
   struct ref_clump *next;
 
   /* The references that live in this clump.  */
-  mn_ref refs[NUM_REFS_PER_CLUMP];
+  struct mn__ref refs[NUM_REFS_PER_CLUMP];
 };
 
 
@@ -39,7 +71,7 @@
      before, but are free at the moment.  There may also be some refs
      that have never been allocated at all, at the end of the first
      clump; see the 'first_never_used' field of 'struct ref_clump'.  */
-  mn_ref *free;
+  struct mn__ref *free;
 
   /* If CLUMPS is non-zero, this is the index of the first ref in
      CLUMPS->refs that has never been allocated.  All refs from this
@@ -54,7 +86,7 @@
   short first_never_used;
 
   /* The head of the allocated list.  */
-  mn_ref allocated;
+  struct mn__ref allocated;
 };
 
 
@@ -97,7 +129,7 @@
 mn_ref *
 mn__make_ref (struct ref_group *g, tagged_t obj)
 {
-  mn_ref *r;
+  struct mn__ref *r;
 
   /* Are there any refs on the free list?  */
   if (g->free.next)
@@ -133,15 +165,16 @@
   g->allocated.next = r;
 
   r->group = g;
-  r->obj = obj;
+  r->x.obj = obj;
 
-  return r;
+  return &r.x;
 }
 
 
 void
-mn__free_ref (mn_ref *r)
+mn__free_ref (mn_ref *x)
 {
+  struct mn__ref *r = (struct mn__ref *) x;
   struct group *g = r->group;
 
   /* Remove R from its allocated list.  */
@@ -156,14 +189,13 @@
 
 void
 mn__walk_ref_group (struct ref_group *g,
-                    void *closure,
-                    void (*rootf) (void *closure, mn_ref *ref))
+                    void (*rootf) (mn_ref *ref))
 {
-  mn_ref *r;
-  mn_ref *head = &g->allocated;
+  struct mn__ref *r;
+  struct mn__ref *head = &g->allocated;
 
   for (r = head->next; r != head; r = r->next)
-    rootf (closure, r)
+    rootf (&r.x)
 }
 
 
@@ -173,9 +205,9 @@
 
 /* Are refs too large?
 
-   At the moment, the mn_ref structure is four words long.  Most of
-   the size is related to the need to be able to free individual refs
-   in constant time.
+   At the moment, struct mn__ref is four words long.  Most of the size
+   is related to the need to be able to free individual refs in
+   constant time.
 
    I came up with a way to make it fit in two words, but it's hairy,
    and I don't know for sure that it's necessary, so I took it out.

Modified: trunk/gc/refs.h
===================================================================
--- trunk/gc/refs.h	2003-09-06 08:21:42 UTC (rev 51)
+++ trunk/gc/refs.h	2003-09-06 08:24:06 UTC (rev 52)
@@ -12,13 +12,14 @@
 /* The functions declared in this header are not thread-safe, except
    where noted otherwise; they don't do any mutual exclusion or
    synchronization.  It is their users' responsibility to ensure that
-   only one thread operates on a given reference group at a time, and
-   that memory is properly synchronized on objects that are shared
-   between threads.
+   only one thread operates on a given reference group at a time, that
+   memory is properly synchronized on objects that are shared between
+   threads, and that signal handlers don't have unpleasant effects.
 
-   However, each reference group is completely independent from the
-   others.  If a reference group is used only by a single thread, no
-   synchronization is needed.  */
+   However, we do promise that each reference group is completely
+   independent from the others.  If a reference group is used only by
+   a single thread, no inter-thread synchronization is needed to use
+   any of these functions.  */
 
 
 
@@ -37,38 +38,17 @@
 
 /* References.  */
 
+
+/* This is actually just the head of a larger structure, which we
+   define fully in refs.c.  */
 struct mn_ref
 {
-  /* The object we refer to, as a tagged value (assuming this
-     reference is allocated).
+  /* The object we refer to, as a tagged value.
 
      This is volatile, to keep compilers from moving reads and writes
      to this around reads and writes to other volatile things, like
      threads' 'incoherent' flags.  */
   volatile tagged_t obj;
-
-  /* The following are pretty much for internal use by refs.c.  */
-
-  /* The group this reference belongs to.  We use this to distinguish
-     local and global refs, and to find the right allocated / free
-     lists.  */
-  struct ref_group *group;
-
-  /* All the allocated references in a ref group G are in a
-     doubly-linked list, headed by G->allocated and chained through
-     these fields.  This allows the GC to traverse only the live
-     references in a group, ignoring free references, while still
-     allowing a reference to be freed quickly.
-
-     All the references in a group that have been allocated before,
-     but are free at the moment, are in a singly-linked list, headed
-     by G->free and chained through the 'next' field.  This allows us
-     to find previously freed refs in constant time.
-
-     (There may also be some refs that have never been allocated at
-     all, at the end of the first clump; see the 'first_never_used'
-     field of 'struct ref_clump'.)  */
-  struct mn_ref *next, *prev;
 };
 
 
@@ -80,8 +60,7 @@
 
 /* Apply ROOTF to CLOSURE and each reference in G.  */
 void mn__walk_ref_group (struct ref_group *g,
-                         void *closure,
-                         void (*rootf) (void *closure, mn_ref *ref));
+                         void (*rootf) (mn_ref *ref));
 
 
 #endif /* MN__GC_REFS_H */

Deleted: trunk/gc/roots.c
===================================================================
--- trunk/gc/roots.c	2003-09-06 08:21:42 UTC (rev 51)
+++ trunk/gc/roots.c	2003-09-06 08:24:06 UTC (rev 52)
@@ -1,202 +0,0 @@
-/* roots.c --- tracking roots for garbage collection.  */
-
-
-
-#include <assert.h>
-#include <stdbool.h>
-#include <signal.h>
-#include <pthread.h>
-#include <semaphore.h>
-#include "minor/minor.h"
-#include "roots.h"
-#include "tagged.h"
-
-
-/* The thread list.  */
-
-/* Create an entry for the calling thread, add it to the thread list,
-   and return it.  You must *not* hold mn__gc_mutex while calling this
-   function.  */
-static struct mn_thread *
-make_thread (void)
-{
-  struct mn_thread *t = (struct mn_thread *) mn__gc_xmalloc (sizeof (*t));
-
-  t->thread = pthread_self ();
-  t->incoherent = 0;
-  t->collection_waiting = 0;
-  t->youngest_call = 0;
-
-  self = t;
-  pthread_setspecific (self_key, (void *) t);
-
-  pthread_mutex_lock (&mn__gc_mutex);
-  t->next = thread_list.next;
-  t->prev = thread_list.prev;
-  thread_list.next->prev = t;
-  thread_list.next = t;
-  pthread_mutex_unlock (&mn__gc_mutex);
-
-  return t;
-}
-
-
-/* Destructor for self_key.  Remove ourselves from the thread list,
-   and free all the memory we hold.  */
-static void
-self_key_destroy (void *self_untyped)
-{
-  pthread_mutex_lock (&mn__gc_mutex);
-  self->next->prev = self->prev;
-  self->prev->next = self->next;
-  pthread_mutex_unlock (&mn__gc_mutex);
-
-  pop_calls_up_to (0);
-
-  mn__gc_xfree (self);
-  self = 0;
-}
-
-
-
-/* Incoherent sections.  */
-
-/* Most of the structures in this file are shared between ordinary
-   code and the handler for gc_wait_signal.  (Really, they're shared
-   with the collector, but the signal handler takes care of handing
-   them off to the collector, and waiting for them to be returned, so
-   we can
-
-
-If you are going to work with any data shared with the signal
-   handler, 
-
- you should call start_incoherent_barrier.  The GC wait
-   signal handler assumes that, if the incoherent flag is clear, it
-   can go ahead and access all the thread's call, ref group, ref
-   clump, and reference data structures.  For this to be safe, the
-   ordinary code must tell the compiler not to
-
-
-/* Call this function before working with any refs' obj pointers.  
-
-
-
-   If you need to access any non-volatile fields that the signal handler
-   uses, 
-static inline
-start_incoherent (void)
-{
-  self->incoherent = true;
-}
-
-
-static inline
-end_incoherent (void)
-{
-  self->incoherent = false;
-
-  /* If the caller is accessing 
-  mn_memory_barrier ();
-
-  /* Is someone else trying to get a collection started?  */
-  if (self->collection_waiting)
-    {
-      self->collection_waiting = false;
-      pthread_kill (pthread_self (), gc_wait_signal);
-    }
-}
-
-
-
-/* Calls.  */
-
-
-/* We create one of these structures for each Minor->C call, and one
-   for the "outermost" C code, outside of any Minor call.  You must
-   have your 'incoherent' flag set to access this structure.  */
-struct mn_call
-{
-  /* The next older call on the C stack.  */
-  struct mn_call *older_call;
-
-  /* The local references that belong to this call.  */
-  struct ref_group *local_refs;
-
-  /* There will eventually need to be something here to represent the
-     Scheme continuation waiting for this C call to return.  */
-};
-
-
-/* Create a new call object for the current thread, push it on the
-   stack as the youngest call, and return it.  */
-static mn_call *
-push_call (void)
-{
-  mn_call *c = (struct mn_call *) mn__gc_xmalloc (sizeof (*c));
-  c->local_refs = make_ref_group ();
-
-  start_incoherent ();
-  c->older_call = self->youngest_call;
-  self->youngest_call = c;
-  end_incoherent ();
-
-  return c;
-}
-
-
-/* Pop calls from the current thread's stack until YOUNGEST is the
-   youngest call.  */
-static void
-pop_calls_up_to (mn_call *youngest)
-{
-  struct mn_call *here, *next;
-
-  start_incoherent ();
-
-  for (here = self->youngest_call;
-       here != youngest;
-       here = next)
-    {
-      /* If we got through the whole list and never found YOUNGEST,
-         then our caller is confused.  */
-      assert (here);
-      next = here->older_call;
-      free_ref_group (here->local_refs);
-      mn__gc_xfree (here);
-    }
-
-  self->youngest_call = youngest;
-
-  end_incoherent ();
-}
-
-
-void
-mn__walk_ref_roots (void (*rootf) (tagged_t *),
-                    void (*threadf) (tagged_t *))
-{
-  struct mn_thread *t;
-
-  for (t = thread_list.next; t != &thread_list; t = t->next)
-    {
-      mn_call *c;
-
-      assert (! t->incoherent);
-
-      for (c = t->youngest_call; c; c = c->older_call)
-        walk_ref_group_refs (c->local_refs, rootf);
-    }
-
-  walk_ref_group_refs (global_refs, rootf);
-}
-
-
-
-/* Initialization.  */
-
-void
-mn__init_roots ()
-{
-  global_refs = make_ref_group ();
-}

Deleted: trunk/gc/roots.h
===================================================================
--- trunk/gc/roots.h	2003-09-06 08:21:42 UTC (rev 51)
+++ trunk/gc/roots.h	2003-09-06 08:24:06 UTC (rev 52)
@@ -1,38 +0,0 @@
-/* roots.h --- finding garbage collection roots.  */
-
-#ifndef MN__GC_ROOTS_H
-#define MN__GC_ROOTS_H
-
-#include <signal.h>
-#include <pthread.h>
-#include "tagged.h"
-
-/* You must hold this mutex to carry out a collection.  This protects
-   various collector structures; see each structure's
-   documentation.  */
-extern pthread_mutex_t mn__gc_mutex;
-
-/* Stop all mutator threads, other than the calling thread, and
-   synchronize memory with them.  You must hold mn__gc_mutex while
-   calling this function.  */
-void mn__pause_mutator_threads (void);
-
-/* Apply ROOTF to a pointer to the heap pointer in every live
-   reference in the system, both local and global.  You must hold
-   mn__gc_mutex while calling this function, and all mutator threads
-   must be stopped (other than the caller).  */
-void mn__walk_ref_roots (void (*rootf) (tagged_t *));
-
-/* Apply THREADF to a pointer to a sigcontext structure holding the
-   registers of every thread in the system that could be running
-   Scheme code; if THREADF changes register values, the thread will
-   resume with the changed values.  You must hold mn__gc_mutex to call
-   this function, and all mutator threads must be stopped (other than
-   the caller).  */
-void mn__walk_threads (void (*threadf) (struct sigcontext *));
-
-/* Allow all mutator threads to continue running.  You must hold
-   mn__gc_mutex while calling this function.  */
-void mn__continue_mutator_threads (void);
-   
-#endif /* MN__GC_ROOTS_H */

Added: trunk/gc/threads.c
===================================================================
--- trunk/gc/threads.c	2003-09-06 08:21:42 UTC (rev 51)
+++ trunk/gc/threads.c	2003-09-06 08:24:06 UTC (rev 52)
@@ -0,0 +1,543 @@
+/* threads.c --- implementation of GC thread structure.  */
+
+#include <pthread.h>
+#include "gc_xmalloc.h"
+#include "gc.h"
+#include "refs.h"
+#include "threads.h"
+
+
+/* The full internal thread structure.  */
+
+struct mn__thread
+{
+  /* The externally visible portion of the structure.  */
+  struct mn_thread x;
+
+  /* Forward and backward links in the doubly-linked list of all
+     mn_thread structures, headed by 'thread_list'.  You must hold
+     mn__gc_mutex to access these fields.  */
+  struct mn_thread *prev, *next;
+
+  /* The pthread that owns this structure.  You must hold mn__gc_mutex
+     to access this field.  */
+  pthread_t thread;
+
+  /* When waiting for collection, this is a sigcontext structure
+     giving the values of thread's registers when it received the
+     mn__gc_wait_signal.  In the collecting thread, this is zero.  At
+     other times, this is garbage.
+
+     This field may only be accessed by THREAD's mn__gc_wait_signal
+     handler when the thread is not waiting for collection, and by the
+     collecting thread while it is.  */
+  struct sigcontext *regs;
+
+  /* The youngest call in this thread.
+
+     This field, and the structures it refers to, may only be modified
+     by the thread's ordinary code while the 'incoherent' flag is set.
+
+     If the thread is going to make any changes to non-volatile fields
+     of those structures, there must be hoist and drop barriers inside
+     the set and clear of 'incoherent', to prevent the compiler from
+     moving the instructions that modify the structure outside the
+     instructions that set and clear 'incoherent'.
+
+     Note that the ref's 'obj' field is volatile, so accesses to it
+     won't be reordered with respect to assignments to the (also
+     volatile) 'incoherent' flag.  So if you are just accessing that,
+     and not freeing or allocating refs, you don't need any hoist or
+     drop barriers.  */
+  mn_call *youngest_call;
+}
+
+
+
+/* The thread list.  */
+
+
+/* The head of the global list of all threads.  */
+static struct mn__thread thread_list = {
+  { 0, 0 },
+  &thread_list, &thread_list,
+  0, /* but what if pthread_t isn't an int type?  */
+  0, 0
+};
+
+
+/* Two pointers to the current thread's structure --- one as a
+   __thread variable, and one as a POSIX thread-specific value.
+
+   This is kind of dumb, but we need them both.  pthread_getspecific
+   isn't async-safe, so we can't use it in handle_wait_signal.
+   __thread variables are async-safe, but don't have destructor
+   functions, so we can't use it to keep the thread list up to date.
+
+   Boehm has a hash table mapping pthread_t values onto his thread
+   structures, and he knows the hash table isn't being modified while
+   he's stopping threads, so his signal handler just calls
+   pthread_self (which isn't officially async-signal-safe) and looks
+   itself up.  */
+__thread struct mn_thread *mn__thread_self;
+static pthread_key_t thread_self_key;
+
+
+/* Create an entry for the calling thread, add it to the thread list,
+   and return it.  You must *not* hold mn__gc_mutex while calling this
+   function.  */
+static struct mn__thread *
+make_thread (void)
+{
+  struct mn__thread *t = (struct mn__thread *) mn__gc_xmalloc (sizeof (*t));
+
+  t->thread = pthread_self ();
+  t->x.incoherent = 0;
+  t->x.collection_waiting = 0;
+  t->regs = 0;
+  t->youngest_call = 0;
+
+  mn__thread_self = &t.x;
+  pthread_setspecific (thread_self_key, t);
+
+  pthread_mutex_lock (&mn__gc_mutex);
+  t->next = thread_list.next;
+  t->prev = thread_list.prev;
+  thread_list.next->prev = t;
+  thread_list.next = t;
+  pthread_mutex_unlock (&mn__gc_mutex);
+
+  return t;
+}
+
+
+/* Destructor for thread_self_key.  Remove ourselves from the thread list,
+   and free all the memory we hold.  */
+static void
+thread_self_key_destroy (void *self_untyped)
+{
+  struct mn__thread *t = (struct mn__thread *) mn__thread_self;
+
+  pthread_mutex_lock (&mn__gc_mutex);
+  t->next->prev = t->prev;
+  t->prev->next = t->next;
+  pthread_mutex_unlock (&mn__gc_mutex);
+
+  /* Free any active calls, and their reference groups.  */
+  mn__pop_to (0);
+
+  mn__gc_xfree (t);
+  mn__thread_self = 0;
+}
+
+
+
+/* Stopping and resuming the world.  */
+
+/* To begin a collection, the first task is to find each thread's PC,
+   and see what it's executing.  We use the following protocol for
+   this:
+
+   - Every function in the C API requires an mn_call object, except
+     for two: mn_thread_first_call and mn_init.  This means that we
+     can use those functions to maintain a list of all the threads
+     that could possibly operate on references.  For each such thread,
+     we register handlers for two signals, mn__gc_wait_signal and
+     gc_resume_signal.  We use the 'sigaction' system call to ensure
+     that gc_resume_signal is blocked when mn__gc_wait_signal's handler is
+     called.
+
+   - A thread wishing to perform a collection acquires mn__gc_mutex,
+     the global GC mutex.  This is also the mutex that protects the
+     global thread list.
+
+   - The collecting thread walks the thread list, sending every thread
+     (other than itself) mn__gc_wait_signal.
+
+   - Each signalled thread's handler for mn__gc_wait_signal
+     receives a pointer to a sigcontext structure as one of its
+     arguments; this structure contains the register values of the
+     interrupted code.  The handler takes the following steps:
+
+     - It stores a pointer to this sigcontext in its per-thread
+       structure, where the collecting thread can find it.
+
+     - It does a sem_post on thread_waiting_semaphore, to tell the GC
+       that it has stored its sigcontext pointer in the thread
+       structure.
+
+     - It does a sigsuspend, with every signal but gc_resume_signal
+       blocked.  That signal has a trivial handler.
+
+     - When the sigsuspend returns, the mn__gc_wait_signal handler
+       returns.
+
+     Note that, since we're in a handler for mn__gc_wait_signal, we know
+     that gc_resume_signal must have been blocked from the moment we
+     entered the handler, so if the collecting thread finishes the
+     collection and sends us the second mn__gc_signal between the time
+     we post thread_waiting_semaphore and the time we do the
+     sigsuspend, we'll still receive it.
+
+   - The collecting thread does a sem_wait once for every thread it
+     signalled.  This process will complete only when every thread has
+     posted on the semaphore.  Now we know that every thread has
+     stored its sigcontext its thread structure, and we can find its
+     PC.
+
+   After posting to thread_waiting_semaphore, but before it is sent
+   gc_resume_signal, a thread is considered to be "waiting for
+   collection".  We use this term in describing the rules for
+   accessing some of the fields in the structures below.
+
+   Now, in order to map out these blocks of code, find out which
+   global variables they refer to, which other code blocks they jump
+   to, and how the stack frames are laid out, we need extensive
+   annotations from the compiler.  The Minor compiler provides these
+   annotations, but the C compiler does not, so we need to handle C
+   code specially.
+
+   The only pointers to heap objects C code is allowed to have are
+   those in mn_ref objects.  This makes things simple.  There are no C
+   global variables pointing into the heap.  Registers, as used by C
+   functions, don't point into the heap either: they point at refs.
+   As do stack frames.  All our difficult problems are gone.
+
+   But since even the most trivial operation on those pointers gives
+   the C compiler freedom to load them into registers, make derived
+   values, exclusive-or them with 45, do magic, and then exclusive-or
+   them back, etc., this rule basically means that C code can't
+   operate on them at all.
+
+   That's a bit harsh, so we relax it a bit.  We allow code internal
+   to the Minor C library to set a flag in the thread structure,
+   "incoherent", indicating that it's operating on heap object
+   references directly.  This flag is volatile, and has type
+   sig_atomic_t, so the mn__gc_wait_signal signal handler can check it,
+   before it does anything else.  If it is set, then the handler sets
+   the thread's collection_waiting flag (also volatile and
+   sig_atomic_t), and returns.  When the interrupted code is finished
+   operating on pointers to heap objects, and they are all safely
+   packed away in mn_ref objects again, it clears the incoherent flag.
+   Then, if the collection_waiting flag is set, then it sends
+   itself a mn__gc_wait_signal.
+
+   This is nice, because it means that C code can work on heap
+   references without having to acquire and release a mutex each time
+   we become incoherent, or return to coherence.  The only
+   communication necessary in those cases is with our own signal
+   handler, which we can do cheaply with volatile sig_atomic_t flags.
+   Inter-thread synchronization only takes place:
+   - when a collection is actually needed, in the mn__gc_wait_signal
+     handler;
+   - when we need to allocate or free global references, to protect
+     the shared data structures managing global references; and
+   - when a thread gets its initial mn_call object, or when it dies,
+     to protect the global thread list.
+
+   The rules for when the various structure fields may be accessed are
+   horribly complex.  All these 'volatile' annotations, rules for
+   where hoist and drop barriers need to go, and so on, are hard to
+   keep track of.
+
+   But I think it all follows from the decision to not require the
+   user's C code to call a safe-point "check for gc" function
+   periodically, with collections blocking indefinitely if they fail
+   to do so.  In most cases, the user is probably doesn't even have
+   control over all the libraries their program will be using, so they
+   can't make the safe point calls that would be needed.  Furthermore,
+   requiring safe point calls is an ongoing maintenance burden:
+   keeping track of which loops need safe point calls, and whether
+   each modification changes the status of some existing loop, is too
+   hard.
+
+   But given the decision not to require safe points, I think it
+   follows that one needs to use signals to gather threads' state.
+   And given that, one needs to worry about code reordering --- thus
+   the 'volatile' qualifiers and the hoist/drop barriers.  */
+
+/* The signal the collecting thread sends to other mutator threads to
+   tell them to stop what they're doing, record their registers, and
+   wait for the GC to complete.  This is also the signal we send them
+   to indicate that the collection is complete, and they may
+   continue.  */
+int mn__gc_wait_signal;
+
+/* The signal the collecting thread sends to waiting mutator threads
+   to tell them they can continue.  */
+static int gc_resume_signal;
+
+/* Mutator threads post to this semaphore to indicate that they are
+   waiting.  The collecting thread waits once on this semaphore for
+   every mutator thread.  */
+static sem_t thread_waiting_semaphore;
+
+
+/* The handler for mn__gc_wait_signal.  */
+static void
+handle_wait_signal (int signo, siginfo_t *info, void *context)
+{
+  struct mn__thread *t;
+
+  assert (signo == mn__gc_wait_signal);
+
+  /* Is this thread incoherent at the moment?  */
+  if (mn__thread_self->incoherent)
+    {
+      /* Request that it re-send the wait signal to itself once it's
+         coherent, and return.  */
+      mn__thread_self->collection_waiting = true;
+      return;
+    }
+
+  t = (struct mn__thread *) mn__thread_self;
+
+  /* The 'context' argument is a pointer to a sigcontext structure,
+     which holds the values this thread's registers had before it
+     received the GC wait signal.  Save that pointer in our thread
+     structure, so the collecting thread can find it.
+
+     This is really what it's all about; everything else in this
+     function is just synchronization chit-chat.  */
+  t->regs = (struct sigcontext *) context;
+
+  /* We've provided the info the collecting thread needs, so post to
+     thread_waiting_semaphore to allow it to continue.
+
+     sem_post is async-safe, and is memory-synchronizing, so the
+     collecting thread will see all our writes.  */
+  assert (sem_post (&thread_waiting_semaphore) == 0);
+    
+  /* Wait for the collecting thread to re-awaken us, by sending us
+     gc_resume_signal.  */
+  {
+    /* You might think there would be a race condition here: what if
+       the collecting thread completes the collection and sends us
+       gc_resume_signal before we wait for it?  But it's okay: when we
+       established the handler for mn__gc_wait_signal, we asked that
+       gc_resume_signal also be blocked while this handler is running.
+       So if the collecting thread sends us the resume signal early,
+       it'll just remain pending until we do the sigsuspend here.
+
+       In fact, collection could complete and a new collection could
+       start before we get to the sigsuspend.  But that's okay, too:
+       the first mn__gc_wait_signal has definitely been delivered, and
+       it's blocked while the signal handler is running, so the second
+       sending will just remain pending across this sigsuspend call,
+       and be delivered as soon as this signal handler returns.
+
+       The sig.*set functions and sigsuspend are async-safe.  */
+    sigset_t wait_for_resume_set;
+    sigfillset (&wait_for_resume_set);
+    sigdelset (&wait_for_resume_set, gc_resume_signal);
+    assert (sigsuspend (&wait_for_resume_set) == -1
+            && errno == EINTR);
+  }
+
+  /* Make sure we can see all the work the collecting thread has done.
+     Ensure that no reads or writes can be moved across this point, by
+     either the compiler or the memory model.  */
+  mn__memory_barrier ();
+}
+
+
+static void
+handle_resume_signal (int signo)
+{
+  /* Nothing needs to be done here.  We only send gc_resume_signal to
+     make the call to sigsuspend in handle_wait_signal return.  */
+  assert (signo == gc_resume_signal);
+}
+
+
+void
+mn__pause_mutator_threads ()
+{
+  struct mn__thread *t;
+
+  /* First, send all other threads the wait signal.  */
+  for (t = thread_list.next; t != &thread_list; t = t->next)
+    if (&t.x != mn__thread_self)
+      assert (pthread_kill (t->thread, mn__gc_wait_signal) == 0);
+
+  /* Now, we wait for all other threads to post to the semaphore.
+     sem_wait is a memory-synchronizing operation, so we will see all
+     threads' changes to the heap, and to their references.  */
+  for (t = thread_list.next; t != &thread_list; t = t->next)
+    if (&t.x != mn__thread_self)
+      assert (sem_wait (&thread_waiting_semaphore) == 0);
+}
+
+
+void
+mn__walk_threads (void (*threadf) (struct sigcontext *))
+{
+  struct mn__thread *head = &thread_list;
+  struct mn__thread *t;
+
+  for (t = head->next; t != head; t = t->next)
+    if (t->context)
+      threadf (t->context);
+}
+
+
+
+void
+mn__continue_mutator_threads ()
+{
+  /* Make sure all the mutator threads can see the collection work
+     we've just done.  Ensure that no reads or writes can be moved
+     across this point, by either the compiler or the memory
+     model.  */
+  mn__memory_barrier ();
+
+  for (t = thread_list.next; t != &thread_list; t = t->next)
+    if (&t.x != mn__thread_self)
+      assert (pthread_kill (t->thread, gc_resume_signal) == 0);
+}
+
+
+/* Choose signals the collecting thread should use to stop other
+   threads before a collection and resume them when we're done, and
+   set up the appropriate handlers.  */
+static void
+init_gc_signals (void)
+{
+  /* Ideally, there'd be some sanctioned way to allocate two signals
+     for our use, so we could be sure that we're not stepping on some
+     other module.  But there isn't --- not even for real-time
+     signals.  So we just hard-code things.  *sigh*  */
+#ifdef SIGRTMIN
+  /* These are chosen not to conflict with the signals Boehm's GC
+     uses, which are, in turn, chosen not to conflict with the ones
+     LinuxThreads uses.  */
+  mn__gc_wait_signal = SIGRTMIN + 7;
+  gc_resume_signal = SIGRTMIN + 8;
+#else
+#error "cannot find an appropriate set of GC signals"
+  /* If you send me an appropriate clause for your system, I'd be
+     happy to include it amongst the above.  */
+#endif
+  
+  /* Set up handlers for the signals.  */
+  {
+    struct sigaction action, old_action;
+
+    /* The wait signal has the real handler that does all the
+       suspension work.  To avoid a race condition (described in
+       handle_wait_signal), we arrange for gc_resume_signal to be
+       blocked while the handler runs.  */
+    action.sa_sigaction = handle_wait_signal;
+    sigemptyset (&action.sa_mask);
+    sigaddset (&action.sa_mask, gc_resume_signal);
+    action.sa_flags = SA_SIGINFO;
+    sigaction (mn__gc_wait_signal, &action, &oldaction);
+
+    /* If there was already a handler established for this signal,
+       then someone must be already using it for something else, so
+       abort.  */
+    assert (oldaction.sa_sigaction == SIG_DFL
+            && oldaction.sa_handler == SIG_DFL);
+
+    /* The resume signal has a trivial handler.  The only function of
+       this signal is to make the call to sigsuspend return.  */
+    action.sa_sigaction = handle_resume_signal;
+    sigemptyset (&action.sa_mask);
+    action.sa_flags = 0;
+    sigaction (gc_resume_signal, &action, &oldaction);
+
+    /* As above.  */
+    assert (oldaction.sa_sigaction == SIG_DFL
+            && oldaction.sa_handler == SIG_DFL);
+  }
+}
+
+
+
+/* Calls and local references.  */
+
+
+/* We create one of these structures for each Minor->C call, and one
+   for the "outermost" C code, outside of any Minor call.  You must
+   have your 'incoherent' flag set to access this structure.  */
+struct mn_call
+{
+  /* The next older call on the C stack.  */
+  struct mn_call *older_call;
+
+  /* The local references that belong to this call.  */
+  struct ref_group *local_refs;
+
+  /* You'd think there would need to be something here to represent
+     the Scheme continuation waiting for this C call to return.  But
+     actually, that's just a local variable in the stack frame below
+     the C call.  (Of course, that variable's value is a ref, in the
+     ref_group above.)  */
+};
+
+
+
+mn_call *
+mn__push_call (void)
+{
+  struct mn__thread *t = (struct mn__thread *) mn__thread_self;
+  mn_call *c = (struct mn_call *) mn__gc_xmalloc (sizeof (*c));
+  c->local_refs = make_ref_group ();
+
+  mn__begin_incoherent ();
+  c->older_call = t->youngest_call;
+  t->youngest_call = c;
+  mn__end_incoherent ();
+
+  return c;
+}
+
+
+void
+mn__pop_to (mn_call *call)
+{
+  struct mn__thread *t = (struct mn__thread *) mn__thread_self;
+
+  mn__begin_incoherent ();
+
+  while (t->youngest_call != call)
+    {
+      mn_call *here = t->youngest_call;
+      t->youngest_call = here->older_call;
+      free_ref_group (here->local_refs);
+      mn__gc_xfree (here);
+    }
+
+  self->youngest_call = next;
+
+  mn__end_incoherent ();
+}
+
+
+mn_ref *
+mn__make_local_ref (mn_call *call, tagged_t obj)
+{
+  return mn__make_ref (call->local_refs, obj);
+}
+
+
+void
+mn__walk_local_refs (void (*rootf) (tagged_t *))
+{
+  struct mn__thread *t = (struct mn__thread *) mn__thread_self;
+  mn_call *c;
+
+  for (c = t->youngest_call; c; c = c->older_call)
+    mn__walk_ref_group (c->local_refs, rootf);
+}
+
+
+/* Initialization.  */
+
+void
+mn__gc_threads_init (void)
+{
+  init_gc_signals ();
+  assert (sem_init (&thread_waiting_semaphore, 0, 0) == 0);
+  assert (pthread_key_create (&thread_self_key, thread_self_key_destroy) == 0);
+}

Added: trunk/gc/threads.h
===================================================================
--- trunk/gc/threads.h	2003-09-06 08:21:42 UTC (rev 51)
+++ trunk/gc/threads.h	2003-09-06 08:24:06 UTC (rev 52)
@@ -0,0 +1,149 @@
+/* threads.h --- interface to thread tracking code  */
+
+#ifndef MN__GC_THREADS_H
+#define MN__GC_THREADS_H
+
+#include <pthread.h>
+#include "refs.h"
+
+
+/* The thread structure.  */
+
+/* A thread that has ever had an mn_call.  Since every function in the
+   C API that operates on mn_refs requires an mn_call argument, every
+   thread that could ever have operated on a reference has a struct
+   mn_thread.
+
+   This is actually just the head of a larger structure, defined fully
+   in threads.c.  */
+struct mn_thread
+{
+  /* The following fields may only by accessed by THREAD itself ---
+     either by ordinary code, or the mn__gc_wait_signal handler.  */
+
+  /* Non-zero if this thread has any live pointers to heap objects,
+     other than in mn_refs; zero otherwise.  Ordinary code sets and
+     clears this; the signal handler reads it.  */
+  volatile sig_atomic_t incoherent;
+
+  /* Non-zero if this thread should send itself mn__gc_wait_signal
+     once it's coherent again (i.e., after clearing 'incoherent');
+     zero otherwise.  The signal handler sets this; ordinary code
+     reads and clears it.  */
+  volatile sig_atomic_t collection_waiting;
+};
+
+
+/* A pointer to the mn_thread structure for the current thread, if
+   we've ever allocated an mn_call (which we would have had to do
+   before referring to any heap objects).  */
+extern __thread struct mn_thread *mn__thread_self;
+
+
+/* The signal we receive when some other thread wants us to pause for
+   a garbage collection.  If our incoherent flag is set, this signal's
+   handler sets our 'collection_waiting' flag, and returns; we must
+   re-kill ourselves when we're coherent again (please don't tell our
+   therapist we said that.  Ahem).
+
+   More details in threads.c.  */
+extern int mn__gc_wait_signal;
+
+
+
+/* Incoherent sections.  */
+
+/* In order to operate on any tagged_t values directly, C code (local
+   to the Minor implementation) needs to first mark itself
+   "incoherent", meaning that garbage collection must not take place,
+   operate on the values, and then mark itself "coherent" again.  If
+   some other thread requests a collection while we were incoherent,
+   it will set our "collection_waiting" flag, which we must check each
+   time we become coherent again.
+
+   These functions take care of those details for us.
+
+   (What makes things vastly simpler is that we only need to
+   coordinate with our own signal handler, not with other threads.
+   Only our signal handler can pass our state off to the collecting
+   thread, and the nasty synchronization happens there.)  */
+static inline void
+mn__begin_incoherent (void)
+{
+  mn__thread_self->incoherent++;
+}
+
+
+static inline void
+mn__end_incoherent (void)
+{
+  mn__thread_self->incoherent--;
+  if (! mn__thread_self->incoherent
+      && mn__thread_self->collection_waiting)
+    pthread_kill (pthread_self (), mn__gc_wait_signal);
+}
+
+
+
+/* Pausing the world, grunging through its registers, and resuming it.  */
+
+
+/* Stop all mutator threads, other than the calling thread, and
+   synchronize memory with them.
+
+   You must hold mn__gc_mutex while calling this function.  */
+void mn__pause_mutator_threads (void);
+
+
+/* Apply ROOTF to every local reference held by any thread in the
+   system.
+
+   You must hold mn__gc_mutex while calling this function, and all
+   mutator threads must be stopped (other than the calling thread, of
+   course).  */
+void mn__walk_local_refs (void (*rootf) (tagged_t *));
+
+
+/* Apply THREADF to a pointer to a sigcontext structure holding the
+   registers of every thread in the system that could be running
+   Scheme code; if THREADF changes register values, the thread will
+   resume with the changed values.
+
+   You must hold mn__gc_mutex to call this function, and all mutator
+   threads must be stopped (other than the caller).  */
+void mn__walk_threads (void (*threadf) (struct sigcontext *));
+
+
+/* Allow all mutator threads to resume execution.
+
+   You must hold mn__gc_mutex while calling this function.  */
+void mn__continue_mutator_threads (void);
+
+
+
+/* Calls and local references.  */
+
+
+/* Push a new call object onto the current thread, and return it.  */
+mn_call *mn__push_call (void);
+
+
+/* Pop calls from the current thread until CALL is the top call on the
+   stack.  This frees all local references owned by the popped calls.  */
+void mn__pop_to (mn_call *call);
+
+
+/* Create a local reference to OBJ, owned by CALL.  You must be
+   incoherent while calling this, since you've obviously got a direct
+   reference to a heap object.  */
+mn_ref *mn__make_local_ref (mn_call *call, tagged_t obj);
+
+
+
+/* Module initialization.  */
+
+/* Initialize this module.  To be called from mn_init.  */
+void mn__gc_threads_init (void);
+
+
+#undef /* MN__GC_THREADS_H */



From minor-owner@red-bean.com Sat Sep  6 15:57:40 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h86Kvdnd018885
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Sat, 6 Sep 2003 15:57:40 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h86Kvd7p018883
	for minor-commits@red-bean.com; Sat, 6 Sep 2003 15:57:39 -0500
Date: Sat, 6 Sep 2003 15:57:39 -0500
Message-Id: <200309062057.h86Kvd7p018883@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 53 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-06 15:57:36 -0500 (Sat, 06 Sep 2003)
New Revision: 53

Added:
   trunk/gc/map.h
Removed:
   trunk/gc/generic-map.h
Log:
* gc/map.h: Renamed from gc/generic-map.h.  We'll use a more traditional 
config file arrangement.


Deleted: trunk/gc/generic-map.h
===================================================================
--- trunk/gc/generic-map.h	2003-09-06 08:24:06 UTC (rev 52)
+++ trunk/gc/generic-map.h	2003-09-06 20:57:36 UTC (rev 53)
@@ -1,443 +0,0 @@
-/* generic-map.h --- tracking GC'd memory, given per-arch parameters.  */
-
-#ifndef MINOR_GC_GENERIC_MAP_H
-#define MINOR_GC_GENERIC_MAP_H
-
-/* The GC map is a table mapping every heap object's address onto a
-   gc_page structure describing the page the object lives in.
-   This structure says which generation the objects it contains belong
-   to, and is also where we record doting objects.
-
-   A "doting object" is an object in one generation that points to an
-   object in a younger generation.  A "doting page" is a page on which
-   a doting object starts.  Doting objects can be quite large, and
-   cover many pages, but only the page on which a doting object starts
-   is a doting page.
-
-   Function of the GC Map ============================================
-
-   In more detail, here are the jobs the GC map needs to do:
-
-   - The whole idea of generational garbage collection is to usually
-     collect only part of the heap.  Occasionally, you'll need to do a
-     full collection, but if you can focus your time on portions of
-     the heap that contain more garbage, then that time will be more
-     productive, and free up more memory for the mutator to waste.
-
-     But to restrict collection to a limited portion of the heap, the
-     collector needs to be able to find all pointers from the
-     uncollected portion into the collected portion: these act as
-     additional roots for the partial collection.
-
-     Since, in practice, pointers from older objects to younger
-     objects are rare, we can reduce the amount of bookkeeping needed
-     here by, when collecting generation G, always collecting all
-     generations younger than G as well.  This means we only need to
-     track pointers in objects in older generations to objects in
-     younger generations --- the rare kind.  These are the doting
-     objects.
-
-     How do we track such pointers?  Since a newly allocated object
-     can only be initialized with pointers to existing objects, an
-     object can become a doting object only by mutation.  Thus, every
-     bit of code that mutates a heap object in Minor needs to include
-     a "write barrier": code that allows the GC to check whether a
-     doting object has been created, and record it in the GC map, for
-     the collector to use in finding roots for partial collections.
-
-   - We also need to be able to quickly determine which generation an
-     object belongs to, to recognize when a pointer points out of the
-     portion of the heap we're collecting.
-
-   - When we've finished collecting, we need to be able to find all
-     the pages belonging to now-empty "from" spaces, to free them.
-
-
-   Dynamically vs. Statically Allocated Objects ======================
-
-   There are two ways objects can come into existence:
-
-   - The mutator can allocate them in the usual way, with 'cons',
-     'make-vector', etc.
-
-   - Executable files and shared libraries may contain objects,
-     constructed at compile-time, linked by the system linker, and
-     introduced into memory by the kernel doing an 'exec' or the
-     dynamic linker.
-
-   In the first case, code generated by Minor, or hand-written for
-   Minor, handles the allocation, so it can follow whatever
-   conventions we find useful.
-
-   But in the second case, Minor has only limited control over the
-   allocation.  Minor can ensure that all the heap objects in a
-   particular executable or shared library appear in one contiguous
-   chunk, not interleaved with other sorts of non-heap data --- from C
-   code, say.  But the GC has no way to find out at run time where
-   each executable/shared library's chunk of heap objects is.  (I
-   think we'd need a custom linker script, or some messy stuff based
-   on the C++ static initializer support, but, bleah.)  This means
-   that the GC can't reliably free up such memory for re-use; it can't
-   tell where Minor heap objects end and foreign non-heap objects
-   begin.  That, in turn, means that the GC might as well never
-   relocate such objects, or even bother to collect them at all ---
-   much better to simply ignore them, except to track old->young
-   pointers.
-
-   So, when we allocate fresh pages for a thread to allocate from, we
-   mark them in the GC map as belonging to generation zero, the
-   youngest generation.  And when we allocate pages to hold objects
-   the collector is promoting from one generation to the next, we
-   record the appropriate generation for them as well.  But we assume
-   that all other pages belong to generation seven, the "immortal
-   generation".  Any objects that we find here must have come from
-   executable files or shared libraries.  Other objects are never
-   promoted into the immortal generation --- they come to rest in
-   generation six.
-
-   Note that when we load a .o file ourselves --- say, when we load a
-   module previously compiled by the ahead-of-time compiler --- that's
-   Minor code turning that stream of bytes into objects and
-   procedures, not the kernel or the dynamic linker.  Since the
-   allocation is under our code's control, we can place the .o file's
-   objects (and procedures) in any generation we want.  So loading .o
-   files falls in the first category of allocation, not the second.
-
-
-   Mutators' Interface to the GC Map ===================================
-
-   The mutators' write barrier code does not access the GC map
-   directly.  Instead, mutator threads simply construct store lists
-   --- lists of every object they've ever mutated --- and hand them to
-   the collector when needed.  When a collection starts, the collector
-   records the potentially doting objects mentioned in each thread's
-   store list in the GC map, and then throws the store lists away.
-   This indirect arrangement has the following advantages:
-
-   - Mutators don't need to know about the GC map structure.  It's way
-     too complex to be part of a stable ABI.  The GC map remains
-     strictly internal to the GC.
-
-   - The overhead of the write barrier is the allocation of one pair.
-     The pair is allocated in the new object area, so the cache is
-     always hot.
-
-   - Since this map is only updated and consulted by the GC, it
-     doesn't compete for registers and cache with real mutator code at
-     every store operation; it only gets involved when a GC is about
-     to happen, which trashes both of those things anyway.
-
-   - Since store lists are per-thread, we never have to think about
-     synchronization when building them.
-
-   - Since mutator threads never access the GC map directly, we don't
-     have to worry about synchronization when accessing it, either.
-
-
-   Architecture Parameters ==========================================
-
-   The GC map data structure defined here is meant to be useable by
-   many different architectures.  Rather than #including this file
-   directly, you should first #include a header file (traditionally
-   named gc-map.h) from the appropriate arch/FOO/gc directory, which
-   will #define some parameters, and then #include this file for you.
-   Here are the parameters the arch-specific gc-map.h file should
-   #define:
-
-   GC_MAP_LOG_OBJECT_ALIGN --- the log base 2 of the minimum alignment
-   for every object.  For example, if every object must be aligned on
-   an eight-byte boundary, this would be 3.
-
-   GC_MAP_LOG_NUM_GENERATIONS --- the log base 2 of the number of
-   generations we support, including the nursery and the immortal
-   generation.  This must not be greater than GC_MAP_LOG_OBJECT_ALIGN,
-   for bit-packing reasons; see "premature optimization", below.
-
-   GC_MAP_LOG_PAGE_SIZE --- the log base 2 of the number of bytes per
-   page on the system.  By "page", what we really mean is the minimum
-   required alignment of the data and code segments, according to the
-   ABI.  Choosing that as the page size helps us ensure that the heap
-   areas of two different executable / shared library ELF files will
-   never fall in the purview of the same gc_page structure.
-
-   GC_MAP_FIRST_LEVEL_BITS --- the number of bits to take from the
-   most significant end of the address to use as the index into the
-   top-level array.
-
-   GC_MAP_SECOND_LEVEL_BITS --- the number of bits to take from the
-   most significant end of the address, after the chunk for the
-   top-level index, to use as the index into the second-level array.
-
-   GC_MAP_ADDRESS_BITS --- the total number of bits in an address.
-   This must be GC_MAP_LOG_PAGE_SIZE + GC_MAP_FIRST_LEVEL_BITS
-   + GC_MAP_SECOND_LEVEL_BITS; it's just present as a checksum.
-
-   (To support systems with 64-bit addresses, we could have optional
-   GC_MAP_{THIRD,FOURTH}_LEVEL_BITS macros, whose presence would
-   request the creation of a deeper tree.  Or perhaps someone can come
-   up with something more clever.)  */
-
-#ifndef GC_MAP_LOG_PAGE_SIZE
-#error "must #include a processor-specific gc-map.h before generic-map.h"
-#endif
-
-#if GC_MAP_LOG_NUM_GENERATIONS > GC_MAP_LOG_OBJECT_ALIGN
-#error "generation count too large for object alignment"
-#endif
-
-#if (GC_MAP_ADDRESS_BITS \
-     != (GC_MAP_FIRST_LEVEL_BITS \
-         + GC_MAP_SECOND_LEVEL_BITS \
-         + GC_MAP_LOG_PAGE_SIZE))
-#error "address not subdivided properly"
-#endif
-
-#define GC_MAP_NUM_GENERATIONS (1 << GC_MAP_LOG_NUM_GENERATIONS)
-
-
-/* For every page managed by the garbage collector, we have an
-   instance of the following structure.
-
-   (Since there is an instance of this structure for every page, it
-   needs to be kept small.  If GC_MAP_LOG_PAGE_SIZE is 12, then 8b :
-   4kb :: 1 : 512.)
-
-   From the "premature optimization is the root of all evil" dept:
-
-   At the machine code level, fetching an (unsigned) bit field
-   turns into:
-   - a memory reference to fetch the word containing the bit field,
-   - a mask, to get rid of bits that don't belong to the field, and
-   - a right shift, to put the bitfield's least significant bit at
-     the right end of the register.
-
-   But note that a lot of fields in this struct are indices within a
-   page, or portions of page addresses.  So the first thing we're
-   going to do with such values is shift them left again, to multiply
-   by 1 << GC_MAP_LOG_OBJECT_ALIGN (for first_doting_object and
-   last_doting_object) or by 1 << GC_MAP_LOG_PAGE_SIZE (for
-   next_doting_page and next_generation_page).  So the compiler could
-   combine the right shift of the field fetch and the left shift of
-   the multiply into a single operation, net left or net right.
-   
-   We can do even better: if we make sure that the right shift (the
-   bitfield's position within the word) and the left shift (the log2
-   of the factor we need to multiply it by to get a page offset or a
-   page address) are the *same*, then the shifts cancel each other
-   out, and all we need to do is fetch and mask.
-
-   So first_doting_object, last_doting_object, next_doting_page, and
-   next_generation_page are all aligned this way.  Since page
-   addresses and offsets within a page are disjoint portions of an
-   address word, things fit together pretty nicely.  */
-struct gc_page
-{
-  /* The following three fields should all pack into a single
-     address-sized word.  */
-
-  /* The generation to which the objects in this page belong.  Zero is
-     the youngest generation.  Seven is the "dummy generation", used
-     for memory areas we haven't allocated a separate gc_page
-     arary for yet.  */
-  unsigned generation : GC_MAP_LOG_NUM_GENERATIONS;
-
-  /* Make sure next bitfield is nicely aligned.  */
-  int : GC_MAP_LOG_OBJECT_ALIGN - GC_MAP_LOG_NUM_GENERATIONS;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the first doting object that begins on this page ---
-     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  To find all the doting
-     pointers, we start here and scan until last_doting_object.  If
-     this is not a doting page, then last_doting_object == 0 and
-     first_doting_object > 0.  */
-  unsigned first_doting_object
-    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
-
-  /* All the pages that contain doting objects are kept in a
-     singly-linked list; there is one list per generation.  This field
-     is the link in that list: the address of the next such page in
-     this generation, divided by 1 << GC_MAP_LOG_PAGE_SIZE.  For the
-     last page in the chain, this field is zero.  */
-  unsigned next_doting_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
-
-  /* The following three fields should all pack into a single
-     address-sized word.  */
-
-  /* Unused bits!  */
-  unsigned : GC_MAP_LOG_OBJECT_ALIGN;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the last doting object that begins on this page ---
-     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  If this is not a doting
-     page, then this is zero.  */
-  unsigned last_doting_object
-    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
-
-  /* All the pages in a generation are kept in a singly-linked list.
-     All free pages are kept in a list, too.  This is the link in
-     those lists.  This is the address of the next page in the list
-     --- divided by 1 << GC_MAP_LOG_PAGE_SIZE.  If this is the last
-     page in the list, this is zero.  */
-  unsigned next_generation_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
-};
-
-
-#define GC_MAP_FIRST_LEVEL_SHIFT \
-  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS)
-#define GC_MAP_FIRST_LEVEL_MASK \
-  ((1 << GC_MAP_FIRST_LEVEL_BITS) - 1)
-
-#define GC_MAP_SECOND_LEVEL_SHIFT \
-  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS - GC_MAP_SECOND_LEVEL_BITS)
-#define GC_MAP_SECOND_LEVEL_MASK \
-  ((1 << GC_MAP_SECOND_LEVEL_BITS) - 1)
-
-
-/* The map of all pages is a two-level tree.  Given an address ADDR,
-   the 'struct gc_page' for that page is:
-
-      mn__gc_map
-        [(ADDR >> GC_MAP_FIRST_LEVEL_SHIFT) & GC_MAP_FIRST_LEVEL_MASK]
-        [(ADDR >> GC_MAP_SECOND_LEVEL_SHIFT) & GC_MAP_SECOND_LEVEL_MASK]
-
-   In other words, we use the top clump of bits of the object's
-   address to index the top-level array, yielding a pointer to a
-   second-level array; then we use the next clump bits to index into
-   that array, yielding a gc_page structure for a particular page.
-
-   Initially, before we've allocated any heap pages at all, every
-   entry in mn__gc_map points to the same second-level array object:
-   mn__gc_immortal_pages.  This creates the appearance of a fully
-   populated tree, with a gc_page struct for every page in the address
-   space --- even though mn__gc_map and mn__gc_immortal_pages occupy
-   only ((1 << GC_MAP_FIRST_LEVEL_BITS) * sizeof (a pointer)
-         + (1 << GC_MAP_SECOND_LEVEL_BITS) * sizeof (struct gc_map))
-   which is 12kb on a typical 32-bit system.
-
-   As we allocate pages for newly allocated objects, or for to-spaces
-   during collection, we need to record these allocations in the map.
-   Since mn__gc_immortal_pages is (potentially) shared by many
-   top-level array entries, we handle things in a copy-on-write
-   fashion: when the gc_page struct we want to tweak is
-   actually an element of mn__gc_immortal_pages, we allocate a fresh
-   second-level table, initialize it to be a copy of
-   mn__gc_immortal_pages, change the appropriate entry in the
-   top-level array to point to it, and then tweak the appropriate
-   gc_page.  So as the program runs, we dedicate map memory
-   only to the interesting parts, without making any assumptions about
-   where in the address space malloc/mmap will give us pages from.
-
-   As described above, GNU/Linux doesn't tell us which regions of an
-   executable or shared library contain heap objects: we just
-   occasionally find heap references to objects on pages we've never
-   touched before.  So the initial state of an gc_page struct
-   has to be appropriate for such objects.  This tells us several
-   things:
-
-   - The pages' generation should be the immortal generation ---
-     generation seven.
-
-   - The pages (initially) contain no doting objects.  Objects in
-     executables or shared libraries may only point to objects in
-     other executables or shared libraries, since they were linked by
-     the static linker: otherwise, the static linker would have
-     complained about unresolved references.
-
-   - The pages will never be freed.  We don't scavenge objects from
-     executables or shared libraries: we can't be sure where the
-     regions of heap objects start and end, so we couldn't free the
-     area for reuse after the live objects have been copied out of
-     them anyway.  So next_generation_page and first_contiguous don't
-     need to be initialized to anything special.
-
-   Thus, the default gc_page struct looks like this:
-
-    {
-      generation = GC_MAP_NUM_GENERATIONS - 1,
-      first_doting_object = 1,
-      next_doting_page = 0,
-      last_doting_object = 0,
-      next_generation_page = 0
-    }
-
-   Every element of mn__gc_immortal_pages looks like that.
-
-   Of course, the mutator may create doting objects in executables and
-   shared libraries, so it's not the case that every executable or
-   shared object page will always look like this.  But initially, this
-   is fine.
-
-   Since the write barrier records the offsets of the first and last
-   doting objects in a page, and the GC never looks outside that
-   range, things will work correctly even if the initial or tail end
-   of a page holds non-heap objects.  So if the linker concatenates
-   the .minor.data section build by the Minor compiler with the .data
-   or .bss section built by the C compiler, for example, things will
-   be fine.
-
-   Problems would arise if non-heap objects were interleaved with heap
-   objects on a page: if first_doting_object happened to end up
-   pointing before some non-heap objects, and last_doting_object
-   happened to end up pointing after them, then the scan for doting
-   pointers would end up sweeping through non-heap objects.
-
-   However, that sort of interleaving can't happen:
-
-   - Within a single executable or shared library, the static linker's
-     normal behavior is to concatenate all the .minor.data sections,
-     without interleaving other sections.  So we don't have to worry
-     about intra-exec or intra-shared library interleavings.
-
-   - We choose GC_MAP_LOG_PAGE_SIZE so that the ABI requires that ELF
-     load segments be aligned at least on page boundaries.  This means
-     that two non-empty data segments can't appear on the same page.
-     So we don't have to worry about inter-executable or inter-shared
-     library interleavings, either.
-
-   Using the oldest generation as the "immortal" generation means that
-   the collector's test for whether to scavenge an object doesn't need
-   a special case to recognize immortal objects.  The obvious way to
-   write the test, "Is this object's generation less than or equal to
-   the oldest generation we're collecting?" will correctly decline to
-   traverse immortal objects.  Since the collector asks this of every
-   object it touches, it's important for this test to be fast.  */
-extern struct gc_page *mn__gc_map[1 << GC_MAP_FIRST_LEVEL_BITS];
-
-/* The array of immortal pages.  */
-extern struct gc_page mn__gc_immortal_pages[1 << GC_MAP_SECOND_LEVEL_BITS];
-
-
-/* The 'struct gc_page' object for ADDR.  */
-#define GC_PAGE(addr)                                           \
-  (mn__gc_map                                                   \
-   [((unsigned int) (addr) >> GC_MAP_FIRST_LEVEL_SHIFT)         \
-    & GC_MAP_FIRST_LEVEL_MASK]                                  \
-   [((unsigned int) (addr) >> GC_MAP_SECOND_LEVEL_SHIFT)        \
-    & GC_MAP_SECOND_LEVEL_SHIFT])
-
-
-/* A single heap generation.  */
-struct gc_generation
-{
-  /* The base address of the first page in this generation, or zero if
-     the generation contains no pages.  This is invalid in the
-     immortal generation.  */
-  void *first_generation_page;
-
-  /* The base address of the first doting page in this generation.
-     Zero if the generation contains no doting pages.  */
-  void *first_doting_page;
-
-  /* How many collections we've done since the last time we collected
-     any generations older than this.  */
-  int collections;
-};
-
-
-/* The table of all generations.  Generation zero is the youngest
-   generation.  Generation GC_MAP_NUM_GENERATIONS - 1 is the immortal
-   generation, for pages in executables and shared libraries
-   (actually, for any page we didn't allocate ourselves).  */
-extern struct gc_generation mn__gc_generations[GC_MAP_NUM_GENERATIONS];
-
-#endif /* MINOR_GC_GENERIC_MAP_H */

Copied: trunk/gc/map.h (from rev 52, trunk/gc/generic-map.h)
===================================================================
--- trunk/gc/generic-map.h	2003-09-06 08:24:06 UTC (rev 52)
+++ trunk/gc/map.h	2003-09-06 20:57:36 UTC (rev 53)
@@ -0,0 +1,441 @@
+/* map.h --- tracking GC'd memory, given per-arch parameters.  */
+
+#ifndef MINOR_GC_MAP_H
+#define MINOR_GC_MAP_H
+
+#include "arch.h"
+
+/* The GC map is a table mapping every heap object's address onto a
+   gc_page structure describing the page the object lives in.
+   This structure says which generation the objects it contains belong
+   to, and is also where we record doting objects.
+
+   A "doting object" is an object in one generation that points to an
+   object in a younger generation.  A "doting page" is a page on which
+   a doting object starts.  Doting objects can be quite large, and
+   cover many pages, but only the page on which a doting object starts
+   is a doting page.
+
+   Function of the GC Map ============================================
+
+   In more detail, here are the jobs the GC map needs to do:
+
+   - The whole idea of generational garbage collection is to usually
+     collect only part of the heap.  Occasionally, you'll need to do a
+     full collection, but if you can focus your time on portions of
+     the heap that contain more garbage, then that time will be more
+     productive, and free up more memory for the mutator to waste.
+
+     But to restrict collection to a limited portion of the heap, the
+     collector needs to be able to find all pointers from the
+     uncollected portion into the collected portion: these act as
+     additional roots for the partial collection.
+
+     Since, in practice, pointers from older objects to younger
+     objects are rare, we can reduce the amount of bookkeeping needed
+     here by, when collecting generation G, always collecting all
+     generations younger than G as well.  This means we only need to
+     track pointers in objects in older generations to objects in
+     younger generations --- the rare kind.  These are the doting
+     objects.
+
+     How do we track such pointers?  Since a newly allocated object
+     can only be initialized with pointers to existing objects, an
+     object can become a doting object only by mutation.  Thus, every
+     bit of code that mutates a heap object in Minor needs to include
+     a "write barrier": code that allows the GC to check whether a
+     doting object has been created, and record it in the GC map, for
+     the collector to use in finding roots for partial collections.
+
+   - We also need to be able to quickly determine which generation an
+     object belongs to, to recognize when a pointer points out of the
+     portion of the heap we're collecting.
+
+   - When we've finished collecting, we need to be able to find all
+     the pages belonging to now-empty "from" spaces, to free them.
+
+
+   Dynamically vs. Statically Allocated Objects ======================
+
+   There are two ways objects can come into existence:
+
+   - The mutator can allocate them in the usual way, with 'cons',
+     'make-vector', etc.
+
+   - Executable files and shared libraries may contain objects,
+     constructed at compile-time, linked by the system linker, and
+     introduced into memory by the kernel doing an 'exec' or the
+     dynamic linker.
+
+   In the first case, code generated by Minor, or hand-written for
+   Minor, handles the allocation, so it can follow whatever
+   conventions we find useful.
+
+   But in the second case, Minor has only limited control over the
+   allocation.  Minor can ensure that all the heap objects in a
+   particular executable or shared library appear in one contiguous
+   chunk, not interleaved with other sorts of non-heap data --- from C
+   code, say.  But the GC has no way to find out at run time where
+   each executable/shared library's chunk of heap objects is.  (I
+   think we'd need a custom linker script, or some messy stuff based
+   on the C++ static initializer support, but, bleah.)  This means
+   that the GC can't reliably free up such memory for re-use; it can't
+   tell where Minor heap objects end and foreign non-heap objects
+   begin.  That, in turn, means that the GC might as well never
+   relocate such objects, or even bother to collect them at all ---
+   much better to simply ignore them, except to track old->young
+   pointers.
+
+   So, when we allocate fresh pages for a thread to allocate from, we
+   mark them in the GC map as belonging to generation zero, the
+   youngest generation.  And when we allocate pages to hold objects
+   the collector is promoting from one generation to the next, we
+   record the appropriate generation for them as well.  But we assume
+   that all other pages belong to generation seven, the "immortal
+   generation".  Any objects that we find here must have come from
+   executable files or shared libraries.  Other objects are never
+   promoted into the immortal generation --- they come to rest in
+   generation six.
+
+   Note that when we load a .o file ourselves --- say, when we load a
+   module previously compiled by the ahead-of-time compiler --- that's
+   Minor code turning that stream of bytes into objects and
+   procedures, not the kernel or the dynamic linker.  Since the
+   allocation is under our code's control, we can place the .o file's
+   objects (and procedures) in any generation we want.  So loading .o
+   files falls in the first category of allocation, not the second.
+
+
+   Mutators' Interface to the GC Map ===================================
+
+   The mutators' write barrier code does not access the GC map
+   directly.  Instead, mutator threads simply construct store lists
+   --- lists of every object they've ever mutated --- and hand them to
+   the collector when needed.  When a collection starts, the collector
+   records the potentially doting objects mentioned in each thread's
+   store list in the GC map, and then throws the store lists away.
+   This indirect arrangement has the following advantages:
+
+   - Mutators don't need to know about the GC map structure.  It's way
+     too complex to be part of a stable ABI.  The GC map remains
+     strictly internal to the GC.
+
+   - The overhead of the write barrier is the allocation of one pair.
+     The pair is allocated in the new object area, so the cache is
+     always hot.
+
+   - Since this map is only updated and consulted by the GC, it
+     doesn't compete for registers and cache with real mutator code at
+     every store operation; it only gets involved when a GC is about
+     to happen, which trashes both of those things anyway.
+
+   - Since store lists are per-thread, we never have to think about
+     synchronization when building them.
+
+   - Since mutator threads never access the GC map directly, we don't
+     have to worry about synchronization when accessing it, either.
+
+
+   Architecture Parameters ==========================================
+
+   The GC map data structure defined here is meant to be useable by
+   many different architectures.  Rather than #including this file
+   directly, you should first #include a header file (traditionally
+   named gc-map.h) from the appropriate arch/FOO/gc directory, which
+   will #define some parameters, and then #include this file for you.
+   Here are the parameters the arch-specific gc-map.h file should
+   #define:
+
+   GC_MAP_LOG_OBJECT_ALIGN --- the log base 2 of the minimum alignment
+   for every object.  For example, if every object must be aligned on
+   an eight-byte boundary, this would be 3.
+
+   GC_MAP_LOG_NUM_GENERATIONS --- the log base 2 of the number of
+   generations we support, including the nursery and the immortal
+   generation.  This must not be greater than GC_MAP_LOG_OBJECT_ALIGN,
+   for bit-packing reasons; see "premature optimization", below.
+
+   GC_MAP_LOG_PAGE_SIZE --- the log base 2 of the number of bytes per
+   page on the system.  By "page", what we really mean is the minimum
+   required alignment of the data and code segments, according to the
+   ABI.  Choosing that as the page size helps us ensure that the heap
+   areas of two different executable / shared library ELF files will
+   never fall in the purview of the same gc_page structure.
+
+   GC_MAP_FIRST_LEVEL_BITS --- the number of bits to take from the
+   most significant end of the address to use as the index into the
+   top-level array.
+
+   GC_MAP_SECOND_LEVEL_BITS --- the number of bits to take from the
+   most significant end of the address, after the chunk for the
+   top-level index, to use as the index into the second-level array.
+
+   GC_MAP_ADDRESS_BITS --- the total number of bits in an address.
+   This must be GC_MAP_LOG_PAGE_SIZE + GC_MAP_FIRST_LEVEL_BITS
+   + GC_MAP_SECOND_LEVEL_BITS; it's just present as a checksum.
+
+   (To support systems with 64-bit addresses, we could have optional
+   GC_MAP_{THIRD,FOURTH}_LEVEL_BITS macros, whose presence would
+   request the creation of a deeper tree.  Or perhaps someone can come
+   up with something more clever.)  */
+
+#if GC_MAP_LOG_NUM_GENERATIONS > GC_MAP_LOG_OBJECT_ALIGN
+#error "generation count too large for object alignment"
+#endif
+
+#if (GC_MAP_ADDRESS_BITS \
+     != (GC_MAP_FIRST_LEVEL_BITS \
+         + GC_MAP_SECOND_LEVEL_BITS \
+         + GC_MAP_LOG_PAGE_SIZE))
+#error "address not subdivided properly"
+#endif
+
+#define GC_MAP_NUM_GENERATIONS (1 << GC_MAP_LOG_NUM_GENERATIONS)
+
+
+/* For every page managed by the garbage collector, we have an
+   instance of the following structure.
+
+   (Since there is an instance of this structure for every page, it
+   needs to be kept small.  If GC_MAP_LOG_PAGE_SIZE is 12, then 8b :
+   4kb :: 1 : 512.)
+
+   From the "premature optimization is the root of all evil" dept:
+
+   At the machine code level, fetching an (unsigned) bit field
+   turns into:
+   - a memory reference to fetch the word containing the bit field,
+   - a mask, to get rid of bits that don't belong to the field, and
+   - a right shift, to put the bitfield's least significant bit at
+     the right end of the register.
+
+   But note that a lot of fields in this struct are indices within a
+   page, or portions of page addresses.  So the first thing we're
+   going to do with such values is shift them left again, to multiply
+   by 1 << GC_MAP_LOG_OBJECT_ALIGN (for first_doting_object and
+   last_doting_object) or by 1 << GC_MAP_LOG_PAGE_SIZE (for
+   next_doting_page and next_generation_page).  So the compiler could
+   combine the right shift of the field fetch and the left shift of
+   the multiply into a single operation, net left or net right.
+   
+   We can do even better: if we make sure that the right shift (the
+   bitfield's position within the word) and the left shift (the log2
+   of the factor we need to multiply it by to get a page offset or a
+   page address) are the *same*, then the shifts cancel each other
+   out, and all we need to do is fetch and mask.
+
+   So first_doting_object, last_doting_object, next_doting_page, and
+   next_generation_page are all aligned this way.  Since page
+   addresses and offsets within a page are disjoint portions of an
+   address word, things fit together pretty nicely.  */
+struct gc_page
+{
+  /* The following three fields should all pack into a single
+     address-sized word.  */
+
+  /* The generation to which the objects in this page belong.  Zero is
+     the youngest generation.  Seven is the "dummy generation", used
+     for memory areas we haven't allocated a separate gc_page
+     arary for yet.  */
+  unsigned generation : GC_MAP_LOG_NUM_GENERATIONS;
+
+  /* Make sure next bitfield is nicely aligned.  */
+  int : GC_MAP_LOG_OBJECT_ALIGN - GC_MAP_LOG_NUM_GENERATIONS;
+
+  /* If this is a doting page, this is the offset within this page of
+     the start of the first doting object that begins on this page ---
+     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  To find all the doting
+     pointers, we start here and scan until last_doting_object.  If
+     this is not a doting page, then last_doting_object == 0 and
+     first_doting_object > 0.  */
+  unsigned first_doting_object
+    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
+
+  /* All the pages that contain doting objects are kept in a
+     singly-linked list; there is one list per generation.  This field
+     is the link in that list: the address of the next such page in
+     this generation, divided by 1 << GC_MAP_LOG_PAGE_SIZE.  For the
+     last page in the chain, this field is zero.  */
+  unsigned next_doting_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
+
+  /* The following three fields should all pack into a single
+     address-sized word.  */
+
+  /* Unused bits!  */
+  unsigned : GC_MAP_LOG_OBJECT_ALIGN;
+
+  /* If this is a doting page, this is the offset within this page of
+     the start of the last doting object that begins on this page ---
+     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  If this is not a doting
+     page, then this is zero.  */
+  unsigned last_doting_object
+    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
+
+  /* All the pages in a generation are kept in a singly-linked list.
+     All free pages are kept in a list, too.  This is the link in
+     those lists.  This is the address of the next page in the list
+     --- divided by 1 << GC_MAP_LOG_PAGE_SIZE.  If this is the last
+     page in the list, this is zero.  */
+  unsigned next_generation_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
+};
+
+
+#define GC_MAP_FIRST_LEVEL_SHIFT \
+  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS)
+#define GC_MAP_FIRST_LEVEL_MASK \
+  ((1 << GC_MAP_FIRST_LEVEL_BITS) - 1)
+
+#define GC_MAP_SECOND_LEVEL_SHIFT \
+  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS - GC_MAP_SECOND_LEVEL_BITS)
+#define GC_MAP_SECOND_LEVEL_MASK \
+  ((1 << GC_MAP_SECOND_LEVEL_BITS) - 1)
+
+
+/* The map of all pages is a two-level tree.  Given an address ADDR,
+   the 'struct gc_page' for that page is:
+
+      mn__gc_map
+        [(ADDR >> GC_MAP_FIRST_LEVEL_SHIFT) & GC_MAP_FIRST_LEVEL_MASK]
+        [(ADDR >> GC_MAP_SECOND_LEVEL_SHIFT) & GC_MAP_SECOND_LEVEL_MASK]
+
+   In other words, we use the top clump of bits of the object's
+   address to index the top-level array, yielding a pointer to a
+   second-level array; then we use the next clump bits to index into
+   that array, yielding a gc_page structure for a particular page.
+
+   Initially, before we've allocated any heap pages at all, every
+   entry in mn__gc_map points to the same second-level array object:
+   mn__gc_immortal_pages.  This creates the appearance of a fully
+   populated tree, with a gc_page struct for every page in the address
+   space --- even though mn__gc_map and mn__gc_immortal_pages occupy
+   only ((1 << GC_MAP_FIRST_LEVEL_BITS) * sizeof (a pointer)
+         + (1 << GC_MAP_SECOND_LEVEL_BITS) * sizeof (struct gc_map))
+   which is 12kb on a typical 32-bit system.
+
+   As we allocate pages for newly allocated objects, or for to-spaces
+   during collection, we need to record these allocations in the map.
+   Since mn__gc_immortal_pages is (potentially) shared by many
+   top-level array entries, we handle things in a copy-on-write
+   fashion: when the gc_page struct we want to tweak is
+   actually an element of mn__gc_immortal_pages, we allocate a fresh
+   second-level table, initialize it to be a copy of
+   mn__gc_immortal_pages, change the appropriate entry in the
+   top-level array to point to it, and then tweak the appropriate
+   gc_page.  So as the program runs, we dedicate map memory
+   only to the interesting parts, without making any assumptions about
+   where in the address space malloc/mmap will give us pages from.
+
+   As described above, GNU/Linux doesn't tell us which regions of an
+   executable or shared library contain heap objects: we just
+   occasionally find heap references to objects on pages we've never
+   touched before.  So the initial state of an gc_page struct
+   has to be appropriate for such objects.  This tells us several
+   things:
+
+   - The pages' generation should be the immortal generation ---
+     generation seven.
+
+   - The pages (initially) contain no doting objects.  Objects in
+     executables or shared libraries may only point to objects in
+     other executables or shared libraries, since they were linked by
+     the static linker: otherwise, the static linker would have
+     complained about unresolved references.
+
+   - The pages will never be freed.  We don't scavenge objects from
+     executables or shared libraries: we can't be sure where the
+     regions of heap objects start and end, so we couldn't free the
+     area for reuse after the live objects have been copied out of
+     them anyway.  So next_generation_page and first_contiguous don't
+     need to be initialized to anything special.
+
+   Thus, the default gc_page struct looks like this:
+
+    {
+      generation = GC_MAP_NUM_GENERATIONS - 1,
+      first_doting_object = 1,
+      next_doting_page = 0,
+      last_doting_object = 0,
+      next_generation_page = 0
+    }
+
+   Every element of mn__gc_immortal_pages looks like that.
+
+   Of course, the mutator may create doting objects in executables and
+   shared libraries, so it's not the case that every executable or
+   shared object page will always look like this.  But initially, this
+   is fine.
+
+   Since the write barrier records the offsets of the first and last
+   doting objects in a page, and the GC never looks outside that
+   range, things will work correctly even if the initial or tail end
+   of a page holds non-heap objects.  So if the linker concatenates
+   the .minor.data section build by the Minor compiler with the .data
+   or .bss section built by the C compiler, for example, things will
+   be fine.
+
+   Problems would arise if non-heap objects were interleaved with heap
+   objects on a page: if first_doting_object happened to end up
+   pointing before some non-heap objects, and last_doting_object
+   happened to end up pointing after them, then the scan for doting
+   pointers would end up sweeping through non-heap objects.
+
+   However, that sort of interleaving can't happen:
+
+   - Within a single executable or shared library, the static linker's
+     normal behavior is to concatenate all the .minor.data sections,
+     without interleaving other sections.  So we don't have to worry
+     about intra-exec or intra-shared library interleavings.
+
+   - We choose GC_MAP_LOG_PAGE_SIZE so that the ABI requires that ELF
+     load segments be aligned at least on page boundaries.  This means
+     that two non-empty data segments can't appear on the same page.
+     So we don't have to worry about inter-executable or inter-shared
+     library interleavings, either.
+
+   Using the oldest generation as the "immortal" generation means that
+   the collector's test for whether to scavenge an object doesn't need
+   a special case to recognize immortal objects.  The obvious way to
+   write the test, "Is this object's generation less than or equal to
+   the oldest generation we're collecting?" will correctly decline to
+   traverse immortal objects.  Since the collector asks this of every
+   object it touches, it's important for this test to be fast.  */
+extern struct gc_page *mn__gc_map[1 << GC_MAP_FIRST_LEVEL_BITS];
+
+/* The array of immortal pages.  */
+extern struct gc_page mn__gc_immortal_pages[1 << GC_MAP_SECOND_LEVEL_BITS];
+
+
+/* The 'struct gc_page' object for ADDR.  */
+#define GC_PAGE(addr)                                           \
+  (mn__gc_map                                                   \
+   [((unsigned int) (addr) >> GC_MAP_FIRST_LEVEL_SHIFT)         \
+    & GC_MAP_FIRST_LEVEL_MASK]                                  \
+   [((unsigned int) (addr) >> GC_MAP_SECOND_LEVEL_SHIFT)        \
+    & GC_MAP_SECOND_LEVEL_SHIFT])
+
+
+/* A single heap generation.  */
+struct gc_generation
+{
+  /* The base address of the first page in this generation, or zero if
+     the generation contains no pages.  This is invalid in the
+     immortal generation.  */
+  void *first_generation_page;
+
+  /* The base address of the first doting page in this generation.
+     Zero if the generation contains no doting pages.  */
+  void *first_doting_page;
+
+  /* How many collections we've done since the last time we collected
+     any generations older than this.  */
+  int collections;
+};
+
+
+/* The table of all generations.  Generation zero is the youngest
+   generation.  Generation GC_MAP_NUM_GENERATIONS - 1 is the immortal
+   generation, for pages in executables and shared libraries
+   (actually, for any page we didn't allocate ourselves).  */
+extern struct gc_generation mn__gc_generations[GC_MAP_NUM_GENERATIONS];
+
+#endif /* MINOR_GC_MAP_H */



From minor-owner@red-bean.com Tue Sep  9 00:12:55 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h895Ctnd028217
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 9 Sep 2003 00:12:55 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h895CsSg028215
	for minor-commits@red-bean.com; Tue, 9 Sep 2003 00:12:54 -0500
Date: Tue, 9 Sep 2003 00:12:54 -0500
Message-Id: <200309090512.h895CsSg028215@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 54 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-09 00:12:51 -0500 (Tue, 09 Sep 2003)
New Revision: 54

Added:
   trunk/gc/tagged.h
Log:
* gc/tagged.h: New file.


Added: trunk/gc/tagged.h
===================================================================
--- trunk/gc/tagged.h	2003-09-06 20:57:36 UTC (rev 53)
+++ trunk/gc/tagged.h	2003-09-09 05:12:51 UTC (rev 54)
@@ -0,0 +1,194 @@
+/* tagged.h --- the representation of tagged values
+   Jim Blandy <jimb@red-bean.com> --- July 2003  */
+
+#ifndef MINOR_GC_TAGGED_H
+#define MINOR_GC_TAGGED_H
+
+#include <stddef.h>
+#include <stdint.h>
+
+/* A tagged value (tagged_t) is an integral type large enough to hold a
+   pointer.  */
+typedef uintptr_t tagged_t;
+
+/* The low three bits of a tagged_t value give an initial hint at what
+   type of value it represents.  These bits are called the 'lowtag'.
+
+   Some types (small integers; characters; booleans) fit entirely in a
+   tagged_t value; these are called "immediate values".  For these,
+   the five bits above the lowtag give more type information; these
+   are called the 'midtag'.
+
+   For other types, the tagged_t value indicates the address in memory
+   at which the object lives; these are called "stored objects".
+   Stored objects store type information in their first word; this
+   word is called the 'stored tag'.
+
+   So, here is how to interpret a tagged_t value.  Suppose tagged_t is
+   T bits long (T = 32 on the IA-32 architecture, for example); assume
+   that the left-most field in the diagrams below extends to the full
+   width of a tagged_t.
+
+     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvv00
+        A "fixnum" --- a small exact integer.  The "v" bits are the
+        actual value, as a T-2-bit two's complement signed integer.
+        fixnums are immediate values.  Since the lowtag is three bits
+        long, fixnums actually use two lowtag values --- 000
+        (lowtag_even_fixnum) and 100 (lowtag_odd_fixnum).
+
+     vvvvvvvvvvvvvvvvvvvvvvvvmmmmm001
+        Some other type of immediate value --- a character, boolean,
+        etc.  The "m" bits are the midtag, and select from up to 32
+        other immediate types; see 'enum midtag ', below.  The "v"
+        bits give the actual value --- their interpretation depends on
+        the type.
+
+     ppppppppppppppppppppppppppppp010
+        A pair.  Masking off the lowtag yields the pair's address.
+        Since all objects are allocated on eight-byte boundaries, we
+        don't need to preserve the low three bits of the address ---
+        they are zero.
+
+     ppppppppppppppppppppppppppppp011
+        Some other sort of stored object --- a procedure, vector,
+        string, etc.  Masking off the lowtag yields the object's
+        address; the first word of the object indicates its type.  See
+        'enum tag_stored' for the available types.
+
+   The remaining possible lowtags --- 101, 110, and 111 (100 is
+   lowtag_odd_fixnum, remember) --- are reserved for future use.  */
+
+enum lowtag
+  {
+    lowtag_even_fixnum = 0,
+    lowtag_immediate = 1,
+    lowtag_pair = 2,
+    lowtag_stored = 3,
+    lowtag_odd_fixnum = 4
+  };
+
+
+/* If a tagged_t's lowtag is lowtag_immediate, then the five bits
+   above the lowtag are the "midtag", giving further information about
+   the type.  */
+enum midtag
+  {
+    /* A character.  The 'v' bits are the ISO-10646 (Unicode)
+       character code.  */
+    midtag_character = 0,
+
+    /* A "unique value".  This is my broad term for things like #t,
+       #f, the empty list, and the end-of-file object.  The 'v' bits
+       are a value from 'enum unique_value', defined below.  */
+    midtag_unique = 1,
+
+    /* The type of a stored object.  If a tagged_t has a lowtag of
+       lowtag_stored, then masking off the lowtag yields the address
+       of the object, whose first word is always one of these (lowtag
+       == lowtag_immediate, midtag == midtag_stored_type).  The 'v'
+       bits are a value from 'enum stored_tag', defined below.
+
+       Objects of this type *only* occur as the first word of stored
+       objects.  They're not valid Scheme values; they can never
+       appear as the car of a pair, as the value of a variable, etc.
+       They're just used in the representation of stored values.
+
+       In a Cheney-style copying collector, to-space also functions as
+       the queue driving the breadth-first traversal.  As long as
+       every word in each object in to-space is a tagged_t, the
+       collector's job is simple: it simply examines each word in
+       to-space, without regard for the boundaries between individual
+       objects.  This is adequate for pairs, for example.  But for
+       objects that contain non-tagged_t data, like byte vectors,
+       objects containing pointers to C objects, and so on, the
+       collector must skip (or process specially) some words in the
+       object.
+
+       Giving stored object type tags a distinctive tag of their own
+       is what makes this possible: since these values only appear at
+       the start of a stored object, and can't be confused with any
+       normal Scheme value, seeing one tells the the collector how to
+       treat subsequent to-space words.  */
+    midtag_stored_type = 2,
+  };
+    
+
+/* Unique values.  The bits of the value above the tag6_unique tag are
+   one of these.  */
+enum unique_value
+  {
+    unique_value_null = 0,      /* '() */
+    unique_value_true = 1,      /* #t */
+    unique_value_false = 2,     /* #f */
+    unique_value_eof = 3,       /* end-of-file object */
+  };
+
+
+/* Stored value type tags.  */
+enum stored_tag
+  {
+    stored_tag_vector = 0,      /* struct vector */
+    stored_tag_symbol = 1,      /* struct symbol */
+    stored_tag_procedure = 2,   /* struct procedure */
+    stored_tag_string = 3,      /* struct string */
+    stored_tag_port = 4,        /* struct port */
+    stored_tag_byte_vector = 5, /* struct byte_vector */
+  };
+
+
+
+/* Structure types describing the layout of various stored types.  */
+
+/* There's a problem here: in reality, these structures are shared
+   between the garbage collector and the Minor compiler.  Until Minor
+   has the ability to parse C code (which we do want it to have
+   eventually, to help with foreign function interface generation), 
+
+
+In the long
+   run, we should have the compiler actually parse this header file,
+   so it can be canonical.  */
+
+
+/* "And now, a type who needs no introduction..."  */
+struct pair
+{
+  tagged_t car, cdr;
+};
+
+
+struct vector
+{
+  /* Always stored_tag_vector, midtag_stored_type, lowtag_immediate.  */
+  tagged_t stored_tag;
+
+  /* The number of elements in the vector.  */
+  size_t length;
+
+  /* The elements of the vector.  I love ISO C.  */
+  tagged_t *elements[];
+
+  /* At some point we may want to think about special handling for
+     large vectors.  */
+};
+
+
+/* A byte vector.  */
+struct byte_vector
+{
+  /* Always stored_tag_byte_vector, midtag_stored_type, lowtag_immediate.  */
+  tagged_t stored_tag;
+
+  /* The number of bytes in the vector.  */
+  size_t length;
+
+  /* The elements of the byte vector.  */
+  unsigned char bytes[];
+
+  /* At some point we will want to think about:
+     - byte vectors that refer to subranges of other byte vectors
+     - special handling for large byte vectors  */
+};
+
+
+#endif /* MINOR_GC_TAGGED_H */



From minor-owner@red-bean.com Tue Sep  9 02:49:11 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h897nBnd005711
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 9 Sep 2003 02:49:11 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h897nAQO005709
	for minor-commits@red-bean.com; Tue, 9 Sep 2003 02:49:10 -0500
Date: Tue, 9 Sep 2003 02:49:10 -0500
Message-Id: <200309090749.h897nAQO005709@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 55 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-09 02:49:07 -0500 (Tue, 09 Sep 2003)
New Revision: 55

Modified:
   trunk/gc/tagged.h
Log:
* gc/tagged.h: Finished off basic types.


Modified: trunk/gc/tagged.h
===================================================================
--- trunk/gc/tagged.h	2003-09-09 05:12:51 UTC (rev 54)
+++ trunk/gc/tagged.h	2003-09-09 07:49:07 UTC (rev 55)
@@ -6,6 +6,7 @@
 
 #include <stddef.h>
 #include <stdint.h>
+#include <stdio.h>
 
 /* A tagged value (tagged_t) is an integral type large enough to hold a
    pointer.  */
@@ -99,10 +100,9 @@
        collector's job is simple: it simply examines each word in
        to-space, without regard for the boundaries between individual
        objects.  This is adequate for pairs, for example.  But for
-       objects that contain non-tagged_t data, like byte vectors,
-       objects containing pointers to C objects, and so on, the
-       collector must skip (or process specially) some words in the
-       object.
+       objects that contain non-tagged_t data, like strings, objects
+       containing pointers to C objects, and so on, the collector must
+       skip (or process specially) some words in the object.
 
        Giving stored object type tags a distinctive tag of their own
        is what makes this possible: since these values only appear at
@@ -132,24 +132,26 @@
     stored_tag_procedure = 2,   /* struct procedure */
     stored_tag_string = 3,      /* struct string */
     stored_tag_port = 4,        /* struct port */
-    stored_tag_byte_vector = 5, /* struct byte_vector */
   };
 
 
 
 /* Structure types describing the layout of various stored types.  */
 
-/* There's a problem here: in reality, these structures are shared
-   between the garbage collector and the Minor compiler.  Until Minor
-   has the ability to parse C code (which we do want it to have
-   eventually, to help with foreign function interface generation), 
+/* Eventually, I'd like to have a written ABI for Minor --- a document
+   that explains how to generate machine code that reads, modifies,
+   and creates Minor heap objects, calls and is called by Minor
+   procedures, plays nicely with the garbage collector, and so on.  It
+   should also document the interface for loading such code into
+   Minor, and how it is represented on disk.
 
+   Once that exists, then the definitions here will become subservient
+   to that: where they disagree, the written ABI will be right, and
+   this file wrong.  This will mirror the relationship between the
+   description of the ELF file format in the System V ABI, and the
+   <elf/common.h> file in the GNU toolchain sources.  */
 
-In the long
-   run, we should have the compiler actually parse this header file,
-   so it can be canonical.  */
 
-
 /* "And now, a type who needs no introduction..."  */
 struct pair
 {
@@ -173,22 +175,59 @@
 };
 
 
-/* A byte vector.  */
-struct byte_vector
+struct symbol
 {
-  /* Always stored_tag_byte_vector, midtag_stored_type, lowtag_immediate.  */
+  /* Always stored_tag_symbol, midtag_stored_type, lowtag_immediate.  */
   tagged_t stored_tag;
 
-  /* The number of bytes in the vector.  */
+  /* The name of the symbol --- a string.  */
+  tagged_t name;
+};
+
+
+struct procedure
+{
+  /* Always stored_tag_procedure, midtag_stored_type, lowtag_immediate.  */
+  tagged_t stored_tag;
+
+  /* ... need to think more about this, esp. how it refers to a particular
+     instantiation/phase of a module.  ... */
+};
+
+
+struct string
+{
+  /* Always stored_tag_string, midtag_stored_type, lowtag_immediate.  */
+  tagged_t stored_tag;
+
+  /* The number of characters in the string.  */
   size_t length;
 
-  /* The elements of the byte vector.  */
-  unsigned char bytes[];
+  /* The contents of the string.  We only support ISO-8859-1 at the
+     moment.  */
+  char contents[];
 
-  /* At some point we will want to think about:
-     - byte vectors that refer to subranges of other byte vectors
-     - special handling for large byte vectors  */
+  /* At some point we will want to:
+     - distinguish between strings and byte vectors
+     - support byte vectors that are subranges of other byte vectors,
+       and use this to implement shared substrings
+     - provide special handling for large string / byte vectors  */
 };
 
 
+struct port
+{
+  /* Always stored_tag_string, midtag_stored_type, lowtag_immediate.  */
+  tagged_t stored_tag;
+
+  /* The underlying stdio file object.  */
+  FILE *file;
+
+  /* In the long run, I think we should have our own I/O port type,
+     with exposed buffers for arbitrary look-ahead and efficient
+     layering.  I hear C++ does this right, but I don't know the
+     details.  */
+};
+
+
 #endif /* MINOR_GC_TAGGED_H */



From minor-owner@red-bean.com Tue Sep  9 11:27:46 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h89GRknd003338
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Tue, 9 Sep 2003 11:27:46 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h89GRjsa003336
	for minor-commits@red-bean.com; Tue, 9 Sep 2003 11:27:45 -0500
Date: Tue, 9 Sep 2003 11:27:45 -0500
Message-Id: <200309091627.h89GRjsa003336@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 56 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-09 11:27:42 -0500 (Tue, 09 Sep 2003)
New Revision: 56

Modified:
   trunk/gc/threads.h
Log:
* gc/threads.h: Doc fix.


Modified: trunk/gc/threads.h
===================================================================
--- trunk/gc/threads.h	2003-09-09 07:49:07 UTC (rev 55)
+++ trunk/gc/threads.h	2003-09-09 16:27:42 UTC (rev 56)
@@ -56,10 +56,10 @@
 /* In order to operate on any tagged_t values directly, C code (local
    to the Minor implementation) needs to first mark itself
    "incoherent", meaning that garbage collection must not take place,
-   operate on the values, and then mark itself "coherent" again.  If
-   some other thread requests a collection while we were incoherent,
-   it will set our "collection_waiting" flag, which we must check each
-   time we become coherent again.
+   then operate on the values, and then mark itself "coherent" again.
+   If some other thread requests a collection while we were
+   incoherent, it will set our "collection_waiting" flag, which we
+   must check each time we become coherent again.
 
    These functions take care of those details for us.
 



From minor-owner@red-bean.com Wed Sep 10 22:37:31 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h8B3bUAA011281
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Wed, 10 Sep 2003 22:37:31 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h8B3bUZk011279
	for minor-commits@red-bean.com; Wed, 10 Sep 2003 22:37:30 -0500
Date: Wed, 10 Sep 2003 22:37:30 -0500
Message-Id: <200309110337.h8B3bUZk011279@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 57 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-10 22:37:28 -0500 (Wed, 10 Sep 2003)
New Revision: 57

Added:
   trunk/gc/heap.h
Removed:
   trunk/gc/map.h
Log:
* gc/heap.h: Renamed from gc/map.h.


Copied: trunk/gc/heap.h (from rev 55, trunk/gc/map.h)

Deleted: trunk/gc/map.h
===================================================================
--- trunk/gc/map.h	2003-09-09 16:27:42 UTC (rev 56)
+++ trunk/gc/map.h	2003-09-11 03:37:28 UTC (rev 57)
@@ -1,441 +0,0 @@
-/* map.h --- tracking GC'd memory, given per-arch parameters.  */
-
-#ifndef MINOR_GC_MAP_H
-#define MINOR_GC_MAP_H
-
-#include "arch.h"
-
-/* The GC map is a table mapping every heap object's address onto a
-   gc_page structure describing the page the object lives in.
-   This structure says which generation the objects it contains belong
-   to, and is also where we record doting objects.
-
-   A "doting object" is an object in one generation that points to an
-   object in a younger generation.  A "doting page" is a page on which
-   a doting object starts.  Doting objects can be quite large, and
-   cover many pages, but only the page on which a doting object starts
-   is a doting page.
-
-   Function of the GC Map ============================================
-
-   In more detail, here are the jobs the GC map needs to do:
-
-   - The whole idea of generational garbage collection is to usually
-     collect only part of the heap.  Occasionally, you'll need to do a
-     full collection, but if you can focus your time on portions of
-     the heap that contain more garbage, then that time will be more
-     productive, and free up more memory for the mutator to waste.
-
-     But to restrict collection to a limited portion of the heap, the
-     collector needs to be able to find all pointers from the
-     uncollected portion into the collected portion: these act as
-     additional roots for the partial collection.
-
-     Since, in practice, pointers from older objects to younger
-     objects are rare, we can reduce the amount of bookkeeping needed
-     here by, when collecting generation G, always collecting all
-     generations younger than G as well.  This means we only need to
-     track pointers in objects in older generations to objects in
-     younger generations --- the rare kind.  These are the doting
-     objects.
-
-     How do we track such pointers?  Since a newly allocated object
-     can only be initialized with pointers to existing objects, an
-     object can become a doting object only by mutation.  Thus, every
-     bit of code that mutates a heap object in Minor needs to include
-     a "write barrier": code that allows the GC to check whether a
-     doting object has been created, and record it in the GC map, for
-     the collector to use in finding roots for partial collections.
-
-   - We also need to be able to quickly determine which generation an
-     object belongs to, to recognize when a pointer points out of the
-     portion of the heap we're collecting.
-
-   - When we've finished collecting, we need to be able to find all
-     the pages belonging to now-empty "from" spaces, to free them.
-
-
-   Dynamically vs. Statically Allocated Objects ======================
-
-   There are two ways objects can come into existence:
-
-   - The mutator can allocate them in the usual way, with 'cons',
-     'make-vector', etc.
-
-   - Executable files and shared libraries may contain objects,
-     constructed at compile-time, linked by the system linker, and
-     introduced into memory by the kernel doing an 'exec' or the
-     dynamic linker.
-
-   In the first case, code generated by Minor, or hand-written for
-   Minor, handles the allocation, so it can follow whatever
-   conventions we find useful.
-
-   But in the second case, Minor has only limited control over the
-   allocation.  Minor can ensure that all the heap objects in a
-   particular executable or shared library appear in one contiguous
-   chunk, not interleaved with other sorts of non-heap data --- from C
-   code, say.  But the GC has no way to find out at run time where
-   each executable/shared library's chunk of heap objects is.  (I
-   think we'd need a custom linker script, or some messy stuff based
-   on the C++ static initializer support, but, bleah.)  This means
-   that the GC can't reliably free up such memory for re-use; it can't
-   tell where Minor heap objects end and foreign non-heap objects
-   begin.  That, in turn, means that the GC might as well never
-   relocate such objects, or even bother to collect them at all ---
-   much better to simply ignore them, except to track old->young
-   pointers.
-
-   So, when we allocate fresh pages for a thread to allocate from, we
-   mark them in the GC map as belonging to generation zero, the
-   youngest generation.  And when we allocate pages to hold objects
-   the collector is promoting from one generation to the next, we
-   record the appropriate generation for them as well.  But we assume
-   that all other pages belong to generation seven, the "immortal
-   generation".  Any objects that we find here must have come from
-   executable files or shared libraries.  Other objects are never
-   promoted into the immortal generation --- they come to rest in
-   generation six.
-
-   Note that when we load a .o file ourselves --- say, when we load a
-   module previously compiled by the ahead-of-time compiler --- that's
-   Minor code turning that stream of bytes into objects and
-   procedures, not the kernel or the dynamic linker.  Since the
-   allocation is under our code's control, we can place the .o file's
-   objects (and procedures) in any generation we want.  So loading .o
-   files falls in the first category of allocation, not the second.
-
-
-   Mutators' Interface to the GC Map ===================================
-
-   The mutators' write barrier code does not access the GC map
-   directly.  Instead, mutator threads simply construct store lists
-   --- lists of every object they've ever mutated --- and hand them to
-   the collector when needed.  When a collection starts, the collector
-   records the potentially doting objects mentioned in each thread's
-   store list in the GC map, and then throws the store lists away.
-   This indirect arrangement has the following advantages:
-
-   - Mutators don't need to know about the GC map structure.  It's way
-     too complex to be part of a stable ABI.  The GC map remains
-     strictly internal to the GC.
-
-   - The overhead of the write barrier is the allocation of one pair.
-     The pair is allocated in the new object area, so the cache is
-     always hot.
-
-   - Since this map is only updated and consulted by the GC, it
-     doesn't compete for registers and cache with real mutator code at
-     every store operation; it only gets involved when a GC is about
-     to happen, which trashes both of those things anyway.
-
-   - Since store lists are per-thread, we never have to think about
-     synchronization when building them.
-
-   - Since mutator threads never access the GC map directly, we don't
-     have to worry about synchronization when accessing it, either.
-
-
-   Architecture Parameters ==========================================
-
-   The GC map data structure defined here is meant to be useable by
-   many different architectures.  Rather than #including this file
-   directly, you should first #include a header file (traditionally
-   named gc-map.h) from the appropriate arch/FOO/gc directory, which
-   will #define some parameters, and then #include this file for you.
-   Here are the parameters the arch-specific gc-map.h file should
-   #define:
-
-   GC_MAP_LOG_OBJECT_ALIGN --- the log base 2 of the minimum alignment
-   for every object.  For example, if every object must be aligned on
-   an eight-byte boundary, this would be 3.
-
-   GC_MAP_LOG_NUM_GENERATIONS --- the log base 2 of the number of
-   generations we support, including the nursery and the immortal
-   generation.  This must not be greater than GC_MAP_LOG_OBJECT_ALIGN,
-   for bit-packing reasons; see "premature optimization", below.
-
-   GC_MAP_LOG_PAGE_SIZE --- the log base 2 of the number of bytes per
-   page on the system.  By "page", what we really mean is the minimum
-   required alignment of the data and code segments, according to the
-   ABI.  Choosing that as the page size helps us ensure that the heap
-   areas of two different executable / shared library ELF files will
-   never fall in the purview of the same gc_page structure.
-
-   GC_MAP_FIRST_LEVEL_BITS --- the number of bits to take from the
-   most significant end of the address to use as the index into the
-   top-level array.
-
-   GC_MAP_SECOND_LEVEL_BITS --- the number of bits to take from the
-   most significant end of the address, after the chunk for the
-   top-level index, to use as the index into the second-level array.
-
-   GC_MAP_ADDRESS_BITS --- the total number of bits in an address.
-   This must be GC_MAP_LOG_PAGE_SIZE + GC_MAP_FIRST_LEVEL_BITS
-   + GC_MAP_SECOND_LEVEL_BITS; it's just present as a checksum.
-
-   (To support systems with 64-bit addresses, we could have optional
-   GC_MAP_{THIRD,FOURTH}_LEVEL_BITS macros, whose presence would
-   request the creation of a deeper tree.  Or perhaps someone can come
-   up with something more clever.)  */
-
-#if GC_MAP_LOG_NUM_GENERATIONS > GC_MAP_LOG_OBJECT_ALIGN
-#error "generation count too large for object alignment"
-#endif
-
-#if (GC_MAP_ADDRESS_BITS \
-     != (GC_MAP_FIRST_LEVEL_BITS \
-         + GC_MAP_SECOND_LEVEL_BITS \
-         + GC_MAP_LOG_PAGE_SIZE))
-#error "address not subdivided properly"
-#endif
-
-#define GC_MAP_NUM_GENERATIONS (1 << GC_MAP_LOG_NUM_GENERATIONS)
-
-
-/* For every page managed by the garbage collector, we have an
-   instance of the following structure.
-
-   (Since there is an instance of this structure for every page, it
-   needs to be kept small.  If GC_MAP_LOG_PAGE_SIZE is 12, then 8b :
-   4kb :: 1 : 512.)
-
-   From the "premature optimization is the root of all evil" dept:
-
-   At the machine code level, fetching an (unsigned) bit field
-   turns into:
-   - a memory reference to fetch the word containing the bit field,
-   - a mask, to get rid of bits that don't belong to the field, and
-   - a right shift, to put the bitfield's least significant bit at
-     the right end of the register.
-
-   But note that a lot of fields in this struct are indices within a
-   page, or portions of page addresses.  So the first thing we're
-   going to do with such values is shift them left again, to multiply
-   by 1 << GC_MAP_LOG_OBJECT_ALIGN (for first_doting_object and
-   last_doting_object) or by 1 << GC_MAP_LOG_PAGE_SIZE (for
-   next_doting_page and next_generation_page).  So the compiler could
-   combine the right shift of the field fetch and the left shift of
-   the multiply into a single operation, net left or net right.
-   
-   We can do even better: if we make sure that the right shift (the
-   bitfield's position within the word) and the left shift (the log2
-   of the factor we need to multiply it by to get a page offset or a
-   page address) are the *same*, then the shifts cancel each other
-   out, and all we need to do is fetch and mask.
-
-   So first_doting_object, last_doting_object, next_doting_page, and
-   next_generation_page are all aligned this way.  Since page
-   addresses and offsets within a page are disjoint portions of an
-   address word, things fit together pretty nicely.  */
-struct gc_page
-{
-  /* The following three fields should all pack into a single
-     address-sized word.  */
-
-  /* The generation to which the objects in this page belong.  Zero is
-     the youngest generation.  Seven is the "dummy generation", used
-     for memory areas we haven't allocated a separate gc_page
-     arary for yet.  */
-  unsigned generation : GC_MAP_LOG_NUM_GENERATIONS;
-
-  /* Make sure next bitfield is nicely aligned.  */
-  int : GC_MAP_LOG_OBJECT_ALIGN - GC_MAP_LOG_NUM_GENERATIONS;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the first doting object that begins on this page ---
-     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  To find all the doting
-     pointers, we start here and scan until last_doting_object.  If
-     this is not a doting page, then last_doting_object == 0 and
-     first_doting_object > 0.  */
-  unsigned first_doting_object
-    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
-
-  /* All the pages that contain doting objects are kept in a
-     singly-linked list; there is one list per generation.  This field
-     is the link in that list: the address of the next such page in
-     this generation, divided by 1 << GC_MAP_LOG_PAGE_SIZE.  For the
-     last page in the chain, this field is zero.  */
-  unsigned next_doting_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
-
-  /* The following three fields should all pack into a single
-     address-sized word.  */
-
-  /* Unused bits!  */
-  unsigned : GC_MAP_LOG_OBJECT_ALIGN;
-
-  /* If this is a doting page, this is the offset within this page of
-     the start of the last doting object that begins on this page ---
-     divided by 1 << GC_MAP_LOG_OBJECT_ALIGN.  If this is not a doting
-     page, then this is zero.  */
-  unsigned last_doting_object
-    : GC_MAP_LOG_PAGE_SIZE - GC_MAP_LOG_OBJECT_ALIGN;
-
-  /* All the pages in a generation are kept in a singly-linked list.
-     All free pages are kept in a list, too.  This is the link in
-     those lists.  This is the address of the next page in the list
-     --- divided by 1 << GC_MAP_LOG_PAGE_SIZE.  If this is the last
-     page in the list, this is zero.  */
-  unsigned next_generation_page : GC_MAP_ADDRESS_BITS - GC_MAP_LOG_PAGE_SIZE;
-};
-
-
-#define GC_MAP_FIRST_LEVEL_SHIFT \
-  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS)
-#define GC_MAP_FIRST_LEVEL_MASK \
-  ((1 << GC_MAP_FIRST_LEVEL_BITS) - 1)
-
-#define GC_MAP_SECOND_LEVEL_SHIFT \
-  (GC_MAP_ADDRESS_BITS - GC_MAP_FIRST_LEVEL_BITS - GC_MAP_SECOND_LEVEL_BITS)
-#define GC_MAP_SECOND_LEVEL_MASK \
-  ((1 << GC_MAP_SECOND_LEVEL_BITS) - 1)
-
-
-/* The map of all pages is a two-level tree.  Given an address ADDR,
-   the 'struct gc_page' for that page is:
-
-      mn__gc_map
-        [(ADDR >> GC_MAP_FIRST_LEVEL_SHIFT) & GC_MAP_FIRST_LEVEL_MASK]
-        [(ADDR >> GC_MAP_SECOND_LEVEL_SHIFT) & GC_MAP_SECOND_LEVEL_MASK]
-
-   In other words, we use the top clump of bits of the object's
-   address to index the top-level array, yielding a pointer to a
-   second-level array; then we use the next clump bits to index into
-   that array, yielding a gc_page structure for a particular page.
-
-   Initially, before we've allocated any heap pages at all, every
-   entry in mn__gc_map points to the same second-level array object:
-   mn__gc_immortal_pages.  This creates the appearance of a fully
-   populated tree, with a gc_page struct for every page in the address
-   space --- even though mn__gc_map and mn__gc_immortal_pages occupy
-   only ((1 << GC_MAP_FIRST_LEVEL_BITS) * sizeof (a pointer)
-         + (1 << GC_MAP_SECOND_LEVEL_BITS) * sizeof (struct gc_map))
-   which is 12kb on a typical 32-bit system.
-
-   As we allocate pages for newly allocated objects, or for to-spaces
-   during collection, we need to record these allocations in the map.
-   Since mn__gc_immortal_pages is (potentially) shared by many
-   top-level array entries, we handle things in a copy-on-write
-   fashion: when the gc_page struct we want to tweak is
-   actually an element of mn__gc_immortal_pages, we allocate a fresh
-   second-level table, initialize it to be a copy of
-   mn__gc_immortal_pages, change the appropriate entry in the
-   top-level array to point to it, and then tweak the appropriate
-   gc_page.  So as the program runs, we dedicate map memory
-   only to the interesting parts, without making any assumptions about
-   where in the address space malloc/mmap will give us pages from.
-
-   As described above, GNU/Linux doesn't tell us which regions of an
-   executable or shared library contain heap objects: we just
-   occasionally find heap references to objects on pages we've never
-   touched before.  So the initial state of an gc_page struct
-   has to be appropriate for such objects.  This tells us several
-   things:
-
-   - The pages' generation should be the immortal generation ---
-     generation seven.
-
-   - The pages (initially) contain no doting objects.  Objects in
-     executables or shared libraries may only point to objects in
-     other executables or shared libraries, since they were linked by
-     the static linker: otherwise, the static linker would have
-     complained about unresolved references.
-
-   - The pages will never be freed.  We don't scavenge objects from
-     executables or shared libraries: we can't be sure where the
-     regions of heap objects start and end, so we couldn't free the
-     area for reuse after the live objects have been copied out of
-     them anyway.  So next_generation_page and first_contiguous don't
-     need to be initialized to anything special.
-
-   Thus, the default gc_page struct looks like this:
-
-    {
-      generation = GC_MAP_NUM_GENERATIONS - 1,
-      first_doting_object = 1,
-      next_doting_page = 0,
-      last_doting_object = 0,
-      next_generation_page = 0
-    }
-
-   Every element of mn__gc_immortal_pages looks like that.
-
-   Of course, the mutator may create doting objects in executables and
-   shared libraries, so it's not the case that every executable or
-   shared object page will always look like this.  But initially, this
-   is fine.
-
-   Since the write barrier records the offsets of the first and last
-   doting objects in a page, and the GC never looks outside that
-   range, things will work correctly even if the initial or tail end
-   of a page holds non-heap objects.  So if the linker concatenates
-   the .minor.data section build by the Minor compiler with the .data
-   or .bss section built by the C compiler, for example, things will
-   be fine.
-
-   Problems would arise if non-heap objects were interleaved with heap
-   objects on a page: if first_doting_object happened to end up
-   pointing before some non-heap objects, and last_doting_object
-   happened to end up pointing after them, then the scan for doting
-   pointers would end up sweeping through non-heap objects.
-
-   However, that sort of interleaving can't happen:
-
-   - Within a single executable or shared library, the static linker's
-     normal behavior is to concatenate all the .minor.data sections,
-     without interleaving other sections.  So we don't have to worry
-     about intra-exec or intra-shared library interleavings.
-
-   - We choose GC_MAP_LOG_PAGE_SIZE so that the ABI requires that ELF
-     load segments be aligned at least on page boundaries.  This means
-     that two non-empty data segments can't appear on the same page.
-     So we don't have to worry about inter-executable or inter-shared
-     library interleavings, either.
-
-   Using the oldest generation as the "immortal" generation means that
-   the collector's test for whether to scavenge an object doesn't need
-   a special case to recognize immortal objects.  The obvious way to
-   write the test, "Is this object's generation less than or equal to
-   the oldest generation we're collecting?" will correctly decline to
-   traverse immortal objects.  Since the collector asks this of every
-   object it touches, it's important for this test to be fast.  */
-extern struct gc_page *mn__gc_map[1 << GC_MAP_FIRST_LEVEL_BITS];
-
-/* The array of immortal pages.  */
-extern struct gc_page mn__gc_immortal_pages[1 << GC_MAP_SECOND_LEVEL_BITS];
-
-
-/* The 'struct gc_page' object for ADDR.  */
-#define GC_PAGE(addr)                                           \
-  (mn__gc_map                                                   \
-   [((unsigned int) (addr) >> GC_MAP_FIRST_LEVEL_SHIFT)         \
-    & GC_MAP_FIRST_LEVEL_MASK]                                  \
-   [((unsigned int) (addr) >> GC_MAP_SECOND_LEVEL_SHIFT)        \
-    & GC_MAP_SECOND_LEVEL_SHIFT])
-
-
-/* A single heap generation.  */
-struct gc_generation
-{
-  /* The base address of the first page in this generation, or zero if
-     the generation contains no pages.  This is invalid in the
-     immortal generation.  */
-  void *first_generation_page;
-
-  /* The base address of the first doting page in this generation.
-     Zero if the generation contains no doting pages.  */
-  void *first_doting_page;
-
-  /* How many collections we've done since the last time we collected
-     any generations older than this.  */
-  int collections;
-};
-
-
-/* The table of all generations.  Generation zero is the youngest
-   generation.  Generation GC_MAP_NUM_GENERATIONS - 1 is the immortal
-   generation, for pages in executables and shared libraries
-   (actually, for any page we didn't allocate ourselves).  */
-extern struct gc_generation mn__gc_generations[GC_MAP_NUM_GENERATIONS];
-
-#endif /* MINOR_GC_MAP_H */



From minor-owner@red-bean.com Thu Sep 11 00:15:14 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h8B5FDAA014634
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Thu, 11 Sep 2003 00:15:13 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h8B5FDEC014632
	for minor-commits@red-bean.com; Thu, 11 Sep 2003 00:15:13 -0500
Date: Thu, 11 Sep 2003 00:15:13 -0500
Message-Id: <200309110515.h8B5FDEC014632@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 58 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-11 00:15:10 -0500 (Thu, 11 Sep 2003)
New Revision: 58

Modified:
   trunk/gc/heap.h
Log:
* gc/heap.h: Doc fixes.
(struct gc_generation): Expand to include the fields we'll need for
allocation and collection.


Modified: trunk/gc/heap.h
===================================================================
--- trunk/gc/heap.h	2003-09-11 03:37:28 UTC (rev 57)
+++ trunk/gc/heap.h	2003-09-11 05:15:10 UTC (rev 58)
@@ -1,4 +1,4 @@
-/* map.h --- tracking GC'd memory, given per-arch parameters.  */
+/* heap.h --- allocating and tracking GC'd memory.  */
 
 #ifndef MINOR_GC_MAP_H
 #define MINOR_GC_MAP_H
@@ -6,15 +6,18 @@
 #include "arch.h"
 
 /* The GC map is a table mapping every heap object's address onto a
-   gc_page structure describing the page the object lives in.
-   This structure says which generation the objects it contains belong
-   to, and is also where we record doting objects.
+   gc_page structure describing the page the object lives in.  (Or, in
+   the case of an object that crosses page boundaries, the page the
+   object starts on.)  This structure says which generation the
+   objects its page contains belong to, and it is also where we
+   record doting objects.
 
    A "doting object" is an object in one generation that points to an
    object in a younger generation.  A "doting page" is a page on which
-   a doting object starts.  Doting objects can be quite large, and
-   cover many pages, but only the page on which a doting object starts
-   is a doting page.
+   a doting object starts.  Doting objects can be large, and cover
+   many pages, but only the page on which a doting object starts is a
+   doting page.  The youngest generation, in which the mutator
+   allocates new objects, is called the "nursery".
 
    Function of the GC Map ============================================
 
@@ -24,7 +27,8 @@
      collect only part of the heap.  Occasionally, you'll need to do a
      full collection, but if you can focus your time on portions of
      the heap that contain more garbage, then that time will be more
-     productive, and free up more memory for the mutator to waste.
+     productive, and free up more memory for the mutator to fritter
+     away as if it grew on trees.
 
      But to restrict collection to a limited portion of the heap, the
      collector needs to be able to find all pointers from the
@@ -151,9 +155,10 @@
    an eight-byte boundary, this would be 3.
 
    GC_MAP_LOG_NUM_GENERATIONS --- the log base 2 of the number of
-   generations we support, including the nursery and the immortal
-   generation.  This must not be greater than GC_MAP_LOG_OBJECT_ALIGN,
-   for bit-packing reasons; see "premature optimization", below.
+   generations we support, including the nursery generation and the
+   immortal generation.  This must not be greater than
+   GC_MAP_LOG_OBJECT_ALIGN, for bit-packing reasons; see "premature
+   optimization", below.
 
    GC_MAP_LOG_PAGE_SIZE --- the log base 2 of the number of bytes per
    page on the system.  By "page", what we really mean is the minimum
@@ -417,15 +422,83 @@
 /* A single heap generation.  */
 struct gc_generation
 {
-  /* The base address of the first page in this generation, or zero if
-     the generation contains no pages.  This is invalid in the
-     immortal generation.  */
-  void *first_generation_page;
+  /* In describing these fields, when we say "during a collection",
+     "between collections", etc. what we really mean is, "during a
+     collection which includes this generation", "between collections
+     which include this generation", etc., unless we state
+     otherwise.  */
 
-  /* The base address of the first doting page in this generation.
-     Zero if the generation contains no doting pages.  */
-  void *first_doting_page;
+  /* During a collection, this is the base address of the first in the
+     chain of pages in this generation's to-space, or zero if no pages
+     have been allocated to the generation yet.
 
+     Between collections, this is the list of pages currently
+     allocated to the generation --- the last collection's to-space.
+     These will become this generation's from-space when the next
+     collection starts.
+
+     This field is invalid in the immortal generation, since we don't
+     know which pages really belong to it.  */
+  void *to_space_pages;
+
+  /* The size of to-space, in bytes.  This is just the number of pages
+     in the to_space_pages list, multiplied by the page size.  */
+  size_t to_space_size;
+
+  /* During a collection, this is the first in the chain of pages in
+     this generation's from-space, or zero if there are no such pages.
+     These will be freed at the end of the collection.
+
+     Between collections, this is invalid.  */
+  void *from_space_pages;
+
+  /* We allocate new pages to a generation in contiguous blocks.  We
+     don't allocate any more until the current block is full, so there
+     is only one block of free space at any given time.  */
+
+  /* The address of the next free byte in to-space.
+
+     In the nursery generation, this is only valid between
+     collections; it is where we allocate new objects.  In older
+     generations, this is only valid during collections, since that is
+     the only time we ever allocate objects in any place other than
+     the nursery.
+
+     In the future, we may want to have per-thread free blocks for the
+     nursery generation, to avoid SMP contention on these words.  For
+     the time being, all threads allocate out of the same block;
+     allocations use a compare-and-exchange sequence to update this
+     pointer.  */
+  void *next_free;
+
+  /* The address beyond the last free byte in to-space.
+
+     Like next_free, between collections this is only valid in the
+     nursery generation, and during collections this is only valid in
+     non-nursery generations.  */
+  void *free_end;
+
+  /* Between collections that include this generation, this is the
+     base address of the first in the chain of doting pages in this
+     generation, or zero if there are no such doting pages.  These
+     pages are a subset of the to-space pages.
+
+     During collections of only younger generations, we use this list
+     to scan this generation for references to objects in younger
+     generations; we treat any such references as roots for that
+     collection.
+
+     A collection that includes this generation will copy any live
+     doting objects in this generation's from-space to new addresses
+     in its to-space.  In doing so it may scatter the doting objects
+     on a given page across several entirely different pages, making
+     the doting information in the GC map entirely invalid.  So during
+     these collections, we ignore the from-space's doting list (we'll
+     free those pages at the end of the collection anyway) and rebuild
+     the to-space doting information as we copy in individual
+     objects.  */
+  void *doting_pages;
+
   /* How many collections we've done since the last time we collected
      any generations older than this.  */
   int collections;
@@ -438,4 +511,5 @@
    (actually, for any page we didn't allocate ourselves).  */
 extern struct gc_generation mn__gc_generations[GC_MAP_NUM_GENERATIONS];
 
+
 #endif /* MINOR_GC_MAP_H */



From minor-owner@red-bean.com Thu Sep 11 00:16:04 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h8B5G3AA014689
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Thu, 11 Sep 2003 00:16:03 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h8B5G3hr014687
	for minor-commits@red-bean.com; Thu, 11 Sep 2003 00:16:03 -0500
Date: Thu, 11 Sep 2003 00:16:03 -0500
Message-Id: <200309110516.h8B5G3hr014687@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 59 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-11 00:16:00 -0500 (Thu, 11 Sep 2003)
New Revision: 59

Modified:
   trunk/gc/tagged.h
Log:
* gc/tagged.h: Doc fixes.


Modified: trunk/gc/tagged.h
===================================================================
--- trunk/gc/tagged.h	2003-09-11 05:15:10 UTC (rev 58)
+++ trunk/gc/tagged.h	2003-09-11 05:16:00 UTC (rev 59)
@@ -138,19 +138,29 @@
 
 /* Structure types describing the layout of various stored types.  */
 
-/* Eventually, I'd like to have a written ABI for Minor --- a document
-   that explains how to generate machine code that reads, modifies,
-   and creates Minor heap objects, calls and is called by Minor
-   procedures, plays nicely with the garbage collector, and so on.  It
-   should also document the interface for loading such code into
-   Minor, and how it is represented on disk.
+/* These structure definitions are not authoritative.  That is, you
+   can't change them and be sure everything will Just Work.  The Minor
+   compiler has its own idea how objects are laid out, and if the
+   compiler and this file disagree, then bad things will happen.
 
+   Eventually, I'd like to have a written ABI for Minor --- a document
+   that explains how to generate machine code that creates, reads, and
+   modifies heap objects, calls and is called by Scheme procedures,
+   plays nicely with the garbage collector, and so on.  It should also
+   document the interface for loading such code into Minor, and how it
+   is represented on disk.
+
    Once that exists, then the definitions here will become subservient
    to that: where they disagree, the written ABI will be right, and
    this file wrong.  This will mirror the relationship between the
    description of the ELF file format in the System V ABI, and the
-   <elf/common.h> file in the GNU toolchain sources.  */
+   <elf/common.h> file in the GNU toolchain sources.
 
+   It would be nice to have one commented, machine-readable
+   specification of the object layouts that both the collector and the
+   compiler could use.  Once Minor can parse C code, it might be
+   enough to simply have it parse this header file and check it for
+   consistency with its own specifications.  */
 
 /* "And now, a type who needs no introduction..."  */
 struct pair



From minor-owner@red-bean.com Thu Sep 11 00:32:16 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h8B5WFAA015301
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Thu, 11 Sep 2003 00:32:16 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h8B5WFk7015299
	for minor-commits@red-bean.com; Thu, 11 Sep 2003 00:32:15 -0500
Date: Thu, 11 Sep 2003 00:32:15 -0500
Message-Id: <200309110532.h8B5WFk7015299@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 60 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-11 00:32:11 -0500 (Thu, 11 Sep 2003)
New Revision: 60

Modified:
   trunk/gc/threads.c
   trunk/gc/threads.h
Log:
* gc/threads.h: Doc fixes.  Note need for incoherent begin/end
functions with barriers.
* gc/threads.c: Doc fixes.  Note missing barriers.


Modified: trunk/gc/threads.c
===================================================================
--- trunk/gc/threads.c	2003-09-11 05:16:00 UTC (rev 59)
+++ trunk/gc/threads.c	2003-09-11 05:32:11 UTC (rev 60)
@@ -50,7 +50,7 @@
      and not freeing or allocating refs, you don't need any hoist or
      drop barriers.  */
   mn_call *youngest_call;
-}
+};
 
 
 
@@ -485,8 +485,10 @@
   c->local_refs = make_ref_group ();
 
   mn__begin_incoherent ();
+  ... need barrier;
   c->older_call = t->youngest_call;
   t->youngest_call = c;
+  ... need barrier;
   mn__end_incoherent ();
 
   return c;
@@ -499,6 +501,7 @@
   struct mn__thread *t = (struct mn__thread *) mn__thread_self;
 
   mn__begin_incoherent ();
+  ... need barrier;
 
   while (t->youngest_call != call)
     {
@@ -510,6 +513,7 @@
 
   self->youngest_call = next;
 
+  ... need barrier;
   mn__end_incoherent ();
 }
 
@@ -517,6 +521,8 @@
 mn_ref *
 mn__make_local_ref (mn_call *call, tagged_t obj)
 {
+  /* No barrier needed here; the caller must already be incoherent, so
+     it's their responsibility to mark the barrier.  */
   return mn__make_ref (call->local_refs, obj);
 }
 

Modified: trunk/gc/threads.h
===================================================================
--- trunk/gc/threads.h	2003-09-11 05:16:00 UTC (rev 59)
+++ trunk/gc/threads.h	2003-09-11 05:32:11 UTC (rev 60)
@@ -73,6 +73,7 @@
   mn__thread_self->incoherent++;
 }
 
+... need versions of these with hoist and drop barriers, too;
 
 static inline void
 mn__end_incoherent (void)
@@ -133,9 +134,12 @@
 void mn__pop_to (mn_call *call);
 
 
-/* Create a local reference to OBJ, owned by CALL.  You must be
-   incoherent while calling this, since you've obviously got a direct
-   reference to a heap object.  */
+/* Create a local reference to OBJ, owned by CALL.
+
+   You must be incoherent while calling this, since you've obviously
+   got a direct reference to a heap object, and you'll need to use the
+   barrier versions of the incoherent section functions, since you're
+   touching more than just a ref's 'obj' field.  */
 mn_ref *mn__make_local_ref (mn_call *call, tagged_t obj);
 
 



From minor-owner@red-bean.com Fri Sep 12 01:48:04 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h8C6m4AA008903
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Fri, 12 Sep 2003 01:48:04 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h8C6m4qC008901
	for minor-commits@red-bean.com; Fri, 12 Sep 2003 01:48:04 -0500
Date: Fri, 12 Sep 2003 01:48:04 -0500
Message-Id: <200309120648.h8C6m4qC008901@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 61 - trunk/doc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-12 01:48:00 -0500 (Fri, 12 Sep 2003)
New Revision: 61

Added:
   trunk/doc/synchronization
Log:
* doc/synchronization: New file.


Added: trunk/doc/synchronization
===================================================================
--- trunk/doc/synchronization	2003-09-11 05:32:11 UTC (rev 60)
+++ trunk/doc/synchronization	2003-09-12 06:48:00 UTC (rev 61)
@@ -0,0 +1,155 @@
+This file contains an early draft of an explanation of how to use the
+incoherence barriers; but in the process of writing it, I realized
+that the only kind of barrier that could be used in any but the most
+restricted circumstances was, in fact, identical to the kind of
+barrier that would work in almost every circumstance...
+
+So the long, involved explanation became unnecessary.  But I think it
+would eventually be nice to have a file which helps people think about
+about reordering by compilers and memory hierarchy reordering in SMP
+systems, and this explanation might be useful as part of that.  Thus
+we keep it around.
+
+
+/* In order to operate on any tagged_t values directly, C code (local
+   to the Minor implementation) needs to first mark itself
+   "incoherent", meaning that garbage collection must not take place,
+   then operate on the values, and then mark itself "coherent" again.
+   If some other thread requests a collection while we were
+   incoherent, it will set our "collection_waiting" flag, which we
+   must check each time we become coherent again.
+
+   The functions on this page take care of entering and leaving
+   coherent sections.
+
+   To use these functions correctly, you must tell the C compiler not
+   to rearrange your code so that the tagged_t manipulation takes
+   place outside the incoherence barriers.  Here's what I mean:
+
+   (This basically amounts to an explanation of what 'volatile' really
+   means; if you understand 'volatile' already, you can skim this.)
+
+   Normally, C compilers are free to reorder statements --- including
+   statements that modify variables or data structures --- in any way
+   that doesn't affect the behavior of the program *in the absence of
+   signal handlers*.  For example, suppose you have two global 'int'
+   variables x and y, and you write:
+
+       x = 3;
+       y = 4;
+
+   And further suppose that both x and y will always be zero when we
+   begin executing these statements.
+
+   If these statements appear in your program right next to each
+   other, as they do here, the compiler has the freedom to (for
+   example) exchange them, so that y becomes 4 before x becomes 3.
+   Since neither statement uses the value assigned by the other,
+   swapping them can't affect the behavior of the program.  Since the
+   reordering is invisible, it's a permitted optimization.
+
+   However, if your program includes a signal handler that reads x and
+   y, then rearranging the assignments could, in fact, affect the
+   behavior of the program.  If the statements are executed as
+   written, the signal handler will never see y == 4 unless x == 3.
+   But if the compiler exchanges the two assignments, then the signal
+   handler might see x == 3 while y is still zero.
+
+   One could prohibit the compiler from rearranging the program in a
+   way that signal handlers could see.  Since signal handlers can only
+   see variables in memory, I think this amounts to simply prohibiting
+   the compiler from rearranging stores.  But the committees writing
+   the C standards felt that this was too restrictive, so compilers
+   are permitted to reorder statements as they like.  They can also
+   delete code entirely:
+
+      x = 3;                    // this could be optimized away
+      x = 4;
+
+   And so on.
+
+   However, these rules applied everywhere make signal handlers really
+   hard to use: you don't want to make assumptions about how clever
+   your compiler was, so it's extremely difficult to be sure what
+   values the variables your signal handler uses will have, and where
+   in the main-line code they will be visible.  For example, suppose
+   you have a loop like this:
+
+       interrupted = 0;
+       while (! interrupted)
+         do some work;
+
+   and you have a signal handler that sets the 'interrupted' variable
+   to 1 when the user hits C-c.  Since there are no assignments to
+   'interrupted' in the code, the compiler is free to transform this
+   code into:
+
+       interrupted = 0;
+       while (1)
+         do some work;
+
+   And things get more subtle.
+
+   So, in the case at hand, if you write code like this:
+
+      mn__thread_self->incoherent++;
+      work with some tagged_t values directly;
+      mn__thread_self->incoherent--;
+
+   and the middle statement doesn't actually refer to the thread's
+   'incoherent' flag, then the compiler is free to transform this
+   into:
+   
+      work with some tagged_t values directly;
+
+   After all, the increment and the decrement cancel each other out,
+   and nobody sees their effect.  And even if the middle statement did
+   use the incoherent flag, the compiler could always say:
+
+     work with some tagged_t (mn__thread_self->incoherent + 1) values directly;
+
+   and eliminate the increment and decrement.
+
+   Clearly what we want is a way to tighten up the rules in those
+   places that need it, while still letting the compiler rearrange
+   code as it pleases elsewhere.
+
+   So, the modified rule is that, where a program accesses or modifies
+   objects whose type has the 'volatile' qualifier, the compiler may
+   only reorder and rearrange those operations within the nearest
+   enclosing sequence points (statement boundaries; function calls;
+   &&, ||, and comma operators; etc.).
+
+   So, if we declare mn__thread_self->incoherent to be volatile, does
+   that help?
+
+      mn__thread_self->incoherent++;
+      work with some tagged_t values directly;
+      mn__thread_self->incoherent--;
+
+   The compiler is no longer allowed to eliminate the increments and
+   decrements, but it is still allowed to rearrange other code
+   relative to them, like so:
+
+      mn__thread_self->incoherent++;
+      mn__thread_self->incoherent--;
+      work with some tagged_t values directly;
+
+   In some cases, the work will involve accessing 
+
+ the compiler may only reorder and
+   delete operations on objects whose type has the 'volatile'
+   qualifier within sequence points.  (Things like function calls,
+   statement boundaries, the comma operator, and the && and ||
+   operators all introduce sequence points.)
+
+that operations on
+   objects of that type may only be reordered and  within
+   sequence points, but 
+
+   
+
+   (What makes things vastly simpler is that we only need to
+   coordinate with our own signal handler, not with other threads.
+   Only our signal handler can pass our state off to the collecting
+   thread, and the nasty synchronization happens there.)  */



From minor-owner@red-bean.com Fri Sep 12 02:09:34 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) with ESMTP id h8C79YAA009782
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <minor-commits@red-bean.com>; Fri, 12 Sep 2003 02:09:34 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.3/8.12.3/Debian -4) id h8C79XnZ009780
	for minor-commits@red-bean.com; Fri, 12 Sep 2003 02:09:33 -0500
Date: Fri, 12 Sep 2003 02:09:33 -0500
Message-Id: <200309120709.h8C79XnZ009780@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 62 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-12 02:09:30 -0500 (Fri, 12 Sep 2003)
New Revision: 62

Modified:
   trunk/gc/threads.h
Log:
* gc/threads.h: Doc fixes.
(mn__begin_incoherent, mn__end_incoherent): Add barriers. 


Modified: trunk/gc/threads.h
===================================================================
--- trunk/gc/threads.h	2003-09-12 06:48:00 UTC (rev 61)
+++ trunk/gc/threads.h	2003-09-12 07:09:30 UTC (rev 62)
@@ -53,31 +53,44 @@
 
 /* Incoherent sections.  */
 
-/* In order to operate on any tagged_t values directly, C code (local
-   to the Minor implementation) needs to first mark itself
-   "incoherent", meaning that garbage collection must not take place,
-   then operate on the values, and then mark itself "coherent" again.
-   If some other thread requests a collection while we were
-   incoherent, it will set our "collection_waiting" flag, which we
-   must check each time we become coherent again.
+/* All code which operates on tagged_t values directly must either be
+   annotated (like the code generated by the Minor compiler), or
+   appear between a call to mn__begin_incoherent and a call to
+   mn__end_incoherent.  The latter is the only option for C code.
 
-   These functions take care of those details for us.
+   Code between such a pair is called an "incoherent section": while
+   any thread is in an incoherent section, a collecting thread cannot
+   assume that it can find all the program's heap references via
+   annotations or in mn_ref objects --- and thus the collection cannot
+   take place.  In this case, the incoherent thread is allowed to
+   continue execution until it becomes coherent again, at which point
+   it will pause for the collection.
 
-   (What makes things vastly simpler is that we only need to
-   coordinate with our own signal handler, not with other threads.
-   Only our signal handler can pass our state off to the collecting
-   thread, and the nasty synchronization happens there.)  */
-static inline void
+   The same mechanism protects the mn_call stack and the local
+   reference groups: you may only push and pop calls and allocate and
+   free local references in incoherent sections.  */
+
+static MN__INLINE void
 mn__begin_incoherent (void)
 {
   mn__thread_self->incoherent++;
+
+  /* Tell the compiler that loads and stores must not be moved across
+     this point.  This ensures that, if all operations on tagged_t
+     values and the heap objects they refer to are written between the
+     begin/end_incoherent markers, they will actually stay that way in
+     the machine code.  */
+  mn__all_volatile_barrier ();
 }
 
-... need versions of these with hoist and drop barriers, too;
 
-static inline void
+static MN__INLINE void
 mn__end_incoherent (void)
 {
+  /* Tell the compiler that loads and stores must not be moved across
+     this point.  See the comment above in mn__begin_incoherent.  */
+  mn__all_volatile_barrier ();
+
   mn__thread_self->incoherent--;
   if (! mn__thread_self->incoherent
       && mn__thread_self->collection_waiting)



From minor-owner@red-bean.com Sun Sep 14 02:18:52 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.9/8.12.9/Debian-5) with ESMTP id h8E7Iqxm027688
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <minor-commits@red-bean.com>; Sun, 14 Sep 2003 02:18:52 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.9/8.12.9/Debian-5) id h8E7IpIm027686
	for minor-commits@red-bean.com; Sun, 14 Sep 2003 02:18:51 -0500
Date: Sun, 14 Sep 2003 02:18:51 -0500
Message-Id: <200309140718.h8E7IpIm027686@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 63 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-14 02:18:48 -0500 (Sun, 14 Sep 2003)
New Revision: 63

Modified:
   trunk/gc/threads.c
   trunk/gc/threads.h
Log:
* gc/threads.c, gc/threads.h: Doc fixes.
Use POSIX ucontext_t, not who-knows-what 'struct sigcontext'.
(mn__begin_incoherent, mn_end_incoherent): Don't make these inline;
deal with that kind of stuff later.  Move definitions from threads.h
into threads.c.


Modified: trunk/gc/threads.c
===================================================================
--- trunk/gc/threads.c	2003-09-12 07:09:30 UTC (rev 62)
+++ trunk/gc/threads.c	2003-09-14 07:18:48 UTC (rev 63)
@@ -1,6 +1,7 @@
 /* threads.c --- implementation of GC thread structure.  */
 
 #include <pthread.h>
+#include <ucontext.h>
 #include "gc_xmalloc.h"
 #include "gc.h"
 #include "refs.h"
@@ -23,15 +24,15 @@
      to access this field.  */
   pthread_t thread;
 
-  /* When waiting for collection, this is a sigcontext structure
-     giving the values of thread's registers when it received the
+  /* When waiting for collection, this is a ucontext_t object giving
+     the values of thread's registers when it received the
      mn__gc_wait_signal.  In the collecting thread, this is zero.  At
      other times, this is garbage.
 
      This field may only be accessed by THREAD's mn__gc_wait_signal
      handler when the thread is not waiting for collection, and by the
      collecting thread while it is.  */
-  struct sigcontext *regs;
+  ucontext_t *regs;
 
   /* The youngest call in this thread.
 
@@ -138,14 +139,16 @@
    and see what it's executing.  We use the following protocol for
    this:
 
-   - Every function in the C API requires an mn_call object, except
-     for two: mn_thread_first_call and mn_init.  This means that we
-     can use those functions to maintain a list of all the threads
-     that could possibly operate on references.  For each such thread,
-     we register handlers for two signals, mn__gc_wait_signal and
-     gc_resume_signal.  We use the 'sigaction' system call to ensure
-     that gc_resume_signal is blocked when mn__gc_wait_signal's handler is
-     called.
+   - Every function in Minor C API <minor/minor.h> requires an mn_call
+     object, except for two: mn_thread_first_call and mn_init.  This
+     means that we can use those functions to maintain a list of all
+     the threads that could possibly hold references to heap objects.
+     For each such thread, we register handlers for two signals:
+     mn__gc_wait_signal and gc_resume_signal.  We use the 'sigaction'
+     system call to ensure that gc_resume_signal is blocked while
+     mn__gc_wait_signal's handler is called.  (By default, a signal is
+     blocked while its own handler runs, so mn__gc_wait_signal is
+     blocked, too.)
 
    - A thread wishing to perform a collection acquires mn__gc_mutex,
      the global GC mutex.  This is also the mutex that protects the
@@ -154,17 +157,18 @@
    - The collecting thread walks the thread list, sending every thread
      (other than itself) mn__gc_wait_signal.
 
-   - Each signalled thread's handler for mn__gc_wait_signal
-     receives a pointer to a sigcontext structure as one of its
-     arguments; this structure contains the register values of the
-     interrupted code.  The handler takes the following steps:
+   - Since we register the handler for mn__gc_wait_signal with
+     SA_SIGINFO, each signalled thread's handler receives a pointer to
+     a ucontext_t object as one of its arguments; this structure
+     contains the register values of the interrupted code.  The
+     handler takes the following steps:
 
-     - It stores a pointer to this sigcontext in its per-thread
+     - It stores a pointer to this ucontext_t in its per-thread
        structure, where the collecting thread can find it.
 
-     - It does a sem_post on thread_waiting_semaphore, to tell the GC
-       that it has stored its sigcontext pointer in the thread
-       structure.
+     - It does a sem_post on thread_waiting_semaphore, to tell the
+       collecting thread that it has stored its ucontext_t pointer in
+       the thread structure.
 
      - It does a sigsuspend, with every signal but gc_resume_signal
        blocked.  That signal has a trivial handler.
@@ -182,51 +186,63 @@
    - The collecting thread does a sem_wait once for every thread it
      signalled.  This process will complete only when every thread has
      posted on the semaphore.  Now we know that every thread has
-     stored its sigcontext its thread structure, and we can find its
+     stored its ucontext_t its thread structure, and we can find its
      PC.
 
+   - When collection is complete, the collecting thread sends the
+     other threads gc_resume_signal, causing their signal handlers to
+     return, and thus allowing the threads to resume execution.
+
    After posting to thread_waiting_semaphore, but before it is sent
    gc_resume_signal, a thread is considered to be "waiting for
    collection".  We use this term in describing the rules for
    accessing some of the fields in the structures below.
 
-   Now, in order to map out these blocks of code, find out which
-   global variables they refer to, which other code blocks they jump
-   to, and how the stack frames are laid out, we need extensive
-   annotations from the compiler.  The Minor compiler provides these
-   annotations, but the C compiler does not, so we need to handle C
-   code specially.
+   The Minor compiler provides annotations for all the code it
+   generates.  For each instruction, these annotations say which
+   registers and stack frame slots hold heap references, and which
+   portions of the allocation block have been used so for.
 
+   Given the ucontext_t structures for all the other threads, the
+   collecting thread looks up each waiting thread's PC in the code
+   annotations.  If the address is annotated, then that gives the
+   collecting thread the information it needs to find any heap objects
+   the thread refers to, and update the thread's state if any of those
+   objects are relocated by the collection.  If the address is not
+   annotated, then the collector assumes the thread is currently
+   executing C code.
+
    The only pointers to heap objects C code is allowed to have are
    those in mn_ref objects.  This makes things simple.  There are no C
    global variables pointing into the heap.  Registers, as used by C
    functions, don't point into the heap either: they point at refs.
    As do stack frames.  All our difficult problems are gone.
 
-   But since even the most trivial operation on those pointers gives
-   the C compiler freedom to load them into registers, make derived
-   values, exclusive-or them with 45, do magic, and then exclusive-or
-   them back, etc., this rule basically means that C code can't
-   operate on them at all.
+   But since even the most trivial operation on an mn_ref's heap
+   pointer gives the C compiler freedom to load the references into
+   registers, make derived values, exclusive-or them with 45, do
+   magic, and then exclusive-or them back, etc., this rule basically
+   means that C code can't operate on them at all.
 
    That's a bit harsh, so we relax it a bit.  We allow code internal
    to the Minor C library to set a flag in the thread structure,
    "incoherent", indicating that it's operating on heap object
    references directly.  This flag is volatile, and has type
-   sig_atomic_t, so the mn__gc_wait_signal signal handler can check it,
-   before it does anything else.  If it is set, then the handler sets
-   the thread's collection_waiting flag (also volatile and
+   sig_atomic_t, so the mn__gc_wait_signal signal handler can check
+   it, before it does anything else.  If it is set, then the handler
+   sets the thread's collection_waiting flag (also volatile and
    sig_atomic_t), and returns.  When the interrupted code is finished
    operating on pointers to heap objects, and they are all safely
    packed away in mn_ref objects again, it clears the incoherent flag.
-   Then, if the collection_waiting flag is set, then it sends
-   itself a mn__gc_wait_signal.
+   Then, if the collection_waiting flag is set, it sends itself a
+   mn__gc_wait_signal.
 
    This is nice, because it means that C code can work on heap
    references without having to acquire and release a mutex each time
-   we become incoherent, or return to coherence.  The only
-   communication necessary in those cases is with our own signal
-   handler, which we can do cheaply with volatile sig_atomic_t flags.
+   it becomes incoherent, or returns to coherence.  The only
+   communication necessary in those cases is between a thread and its
+   own signal handler, which it can do cheaply with volatile
+   sig_atomic_t flags, rather than inter-thread-synchronizing mutexes.
    Inter-thread synchronization only takes place:
    - when a collection is actually needed, in the mn__gc_wait_signal
      handler;
@@ -235,27 +251,53 @@
    - when a thread gets its initial mn_call object, or when it dies,
      to protect the global thread list.
 
-   The rules for when the various structure fields may be accessed are
-   horribly complex.  All these 'volatile' annotations, rules for
-   where hoist and drop barriers need to go, and so on, are hard to
-   keep track of.
+   We provide functions for becoming incoherent and returning to
+   coherence again; these functions include the appropriate compiler
+   code motion barriers to make sure the compiler doesn't move code
+   from inside an incoherent section out of it.
 
-   But I think it all follows from the decision to not require the
-   user's C code to call a safe-point "check for gc" function
-   periodically, with collections blocking indefinitely if they fail
-   to do so.  In most cases, the user is probably doesn't even have
-   control over all the libraries their program will be using, so they
-   can't make the safe point calls that would be needed.  Furthermore,
-   requiring safe point calls is an ongoing maintenance burden:
-   keeping track of which loops need safe point calls, and whether
-   each modification changes the status of some existing loop, is too
-   hard.
+   This is all a bit hairy.  But I think it all follows from the
+   decision to not require the user's C code to call a safe-point
+   "check for gc" function periodically, with collections blocking
+   indefinitely if they fail to do so.  In most cases, the user is
+   probably doesn't even have control over all the libraries their
+   program will be using, so they can't make the safe point calls that
+   would be needed.  Furthermore, requiring safe point calls is an
+   ongoing maintenance burden: keeping track of which loops need safe
+   point calls, and whether each change to the program affects the
+   status of some loop elsewhere, is too hard.
 
-   But given the decision not to require safe points, I think it
-   follows that one needs to use signals to gather threads' state.
-   And given that, one needs to worry about code reordering --- thus
-   the 'volatile' qualifiers and the hoist/drop barriers.  */
+   But if one is not going to require safe points, then I think it
+   follows that one needs to use signals to gather threads' state.  */
 
+void
+mn__begin_incoherent (void)
+{
+  mn__thread_self->incoherent++;
+
+  /* Tell the compiler that loads and stores must not be moved across
+     this point.  This ensures that, if all operations on tagged_t
+     values and the heap objects they refer to are written between the
+     begin/end_incoherent markers, they will actually stay that way in
+     the machine code.  */
+  mn__cc_all_volatile_barrier ();
+}
+
+
+void
+mn__end_incoherent (void)
+{
+  /* Tell the compiler that loads and stores must not be moved across
+     this point.  See the comment above in mn__begin_incoherent.  */
+  mn__cc_all_volatile_barrier ();
+
+  mn__thread_self->incoherent--;
+  if (! mn__thread_self->incoherent
+      && mn__thread_self->collection_waiting)
+    pthread_kill (pthread_self (), mn__gc_wait_signal);
+}
+
+
 /* The signal the collecting thread sends to other mutator threads to
    tell them to stop what they're doing, record their registers, and
    wait for the GC to complete.  This is also the signal we send them
@@ -278,6 +320,7 @@
 handle_wait_signal (int signo, siginfo_t *info, void *context)
 {
   struct mn__thread *t;
+  int saved_errno = errno;
 
   assert (signo == mn__gc_wait_signal);
 
@@ -287,19 +330,20 @@
       /* Request that it re-send the wait signal to itself once it's
          coherent, and return.  */
       mn__thread_self->collection_waiting = true;
+      errno = saved_errno;
       return;
     }
 
   t = (struct mn__thread *) mn__thread_self;
 
-  /* The 'context' argument is a pointer to a sigcontext structure,
+  /* The 'context' argument is a pointer to a ucontext_t object,
      which holds the values this thread's registers had before it
      received the GC wait signal.  Save that pointer in our thread
      structure, so the collecting thread can find it.
 
      This is really what it's all about; everything else in this
      function is just synchronization chit-chat.  */
-  t->regs = (struct sigcontext *) context;
+  t->regs = (ucontext_t *) context;
 
   /* We've provided the info the collecting thread needs, so post to
      thread_waiting_semaphore to allow it to continue.
@@ -337,7 +381,9 @@
   /* Make sure we can see all the work the collecting thread has done.
      Ensure that no reads or writes can be moved across this point, by
      either the compiler or the memory model.  */
-  mn__memory_barrier ();
+  mn__arch_memory_barrier ();
+
+  errno = saved_errno;
 }
 
 
@@ -370,7 +416,7 @@
 
 
 void
-mn__walk_threads (void (*threadf) (struct sigcontext *))
+mn__walk_threads (void (*threadf) (ucontext_t *))
 {
   struct mn__thread *head = &thread_list;
   struct mn__thread *t;
@@ -389,7 +435,7 @@
      we've just done.  Ensure that no reads or writes can be moved
      across this point, by either the compiler or the memory
      model.  */
-  mn__memory_barrier ();
+  mn__arch_memory_barrier ();
 
   for (t = thread_list.next; t != &thread_list; t = t->next)
     if (&t.x != mn__thread_self)
@@ -485,10 +531,10 @@
   c->local_refs = make_ref_group ();
 
   mn__begin_incoherent ();
-  ... need barrier;
-  c->older_call = t->youngest_call;
-  t->youngest_call = c;
-  ... need barrier;
+  {
+    c->older_call = t->youngest_call;
+    t->youngest_call = c;
+  }
   mn__end_incoherent ();
 
   return c;
@@ -501,19 +547,17 @@
   struct mn__thread *t = (struct mn__thread *) mn__thread_self;
 
   mn__begin_incoherent ();
-  ... need barrier;
+  {
+    while (t->youngest_call != call)
+      {
+        mn_call *here = t->youngest_call;
+        t->youngest_call = here->older_call;
+        free_ref_group (here->local_refs);
+        mn__gc_xfree (here);
+      }
 
-  while (t->youngest_call != call)
-    {
-      mn_call *here = t->youngest_call;
-      t->youngest_call = here->older_call;
-      free_ref_group (here->local_refs);
-      mn__gc_xfree (here);
-    }
-
-  self->youngest_call = next;
-
-  ... need barrier;
+    self->youngest_call = next;
+  }
   mn__end_incoherent ();
 }
 
@@ -521,8 +565,9 @@
 mn_ref *
 mn__make_local_ref (mn_call *call, tagged_t obj)
 {
-  /* No barrier needed here; the caller must already be incoherent, so
-     it's their responsibility to mark the barrier.  */
+  /* No barrier needed here; the caller must already be incoherent,
+     since they're passing us a tagged value, so it's their
+     responsibility to mark the barrier.  */
   return mn__make_ref (call->local_refs, obj);
 }
 

Modified: trunk/gc/threads.h
===================================================================
--- trunk/gc/threads.h	2003-09-12 07:09:30 UTC (rev 62)
+++ trunk/gc/threads.h	2003-09-14 07:18:48 UTC (rev 63)
@@ -70,34 +70,10 @@
    reference groups: you may only push and pop calls and allocate and
    free local references in incoherent sections.  */
 
-static MN__INLINE void
-mn__begin_incoherent (void)
-{
-  mn__thread_self->incoherent++;
+void mn__begin_incoherent (void);
+void mn__end_incoherent (void);
 
-  /* Tell the compiler that loads and stores must not be moved across
-     this point.  This ensures that, if all operations on tagged_t
-     values and the heap objects they refer to are written between the
-     begin/end_incoherent markers, they will actually stay that way in
-     the machine code.  */
-  mn__all_volatile_barrier ();
-}
 
-
-static MN__INLINE void
-mn__end_incoherent (void)
-{
-  /* Tell the compiler that loads and stores must not be moved across
-     this point.  See the comment above in mn__begin_incoherent.  */
-  mn__all_volatile_barrier ();
-
-  mn__thread_self->incoherent--;
-  if (! mn__thread_self->incoherent
-      && mn__thread_self->collection_waiting)
-    pthread_kill (pthread_self (), mn__gc_wait_signal);
-}
-
-
 
 /* Pausing the world, grunging through its registers, and resuming it.  */
 



From minor-owner@red-bean.com Sun Sep 14 03:51:27 2003
Received: from sanpietro.red-bean.com (localhost [127.0.0.1])
	by sanpietro.red-bean.com (8.12.9/8.12.9/Debian-5) with ESMTP id h8E8pQxm030357
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <minor-commits@red-bean.com>; Sun, 14 Sep 2003 03:51:27 -0500
Received: (from www-data@localhost)
	by sanpietro.red-bean.com (8.12.9/8.12.9/Debian-5) id h8E8pQE9030355
	for minor-commits@red-bean.com; Sun, 14 Sep 2003 03:51:26 -0500
Date: Sun, 14 Sep 2003 03:51:26 -0500
Message-Id: <200309140851.h8E8pQE9030355@sanpietro.red-bean.com>
To: minor-commits@red-bean.com
From: jimb@sanpietro.red-bean.com
Subject: rev 64 - trunk/gc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: jimb
Date: 2003-09-14 03:51:23 -0500 (Sun, 14 Sep 2003)
New Revision: 64

Added:
   trunk/gc/gc.c
Modified:
   trunk/gc/gc.h
   trunk/gc/heap.h
   trunk/gc/tagged.h
Log:
* gc/gc.h, gc/tagged.h, gc/heap.h, gc/gc.c: Beginnings of the
collector itself.


Added: trunk/gc/gc.c
===================================================================
--- trunk/gc/gc.c	2003-09-14 07:18:48 UTC (rev 63)
+++ trunk/gc/gc.c	2003-09-14 08:51:23 UTC (rev 64)
@@ -0,0 +1,194 @@
+/* gc.c --- the Minor garbage collector
+   Jim Blandy <jimb@red-bean.com>  */
+
+#include <assert.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include "gc.h"
+#include "threads.h"
+#include "refs.h"
+#include "heap.h"
+
+
+/* The garbage collection lock.  */
+
+pthread_mutex_t mn__gc_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+
+/* Collecting!  */
+
+static bool to_space_label;
+
+/* Prepare the spaces for COLLECT_TO and younger generations for a
+   collection.  */
+static void
+prepare_spaces (int collect_to)
+{
+  int i;
+
+  for (i = 0; i <= collect_to; i++)
+    {
+      struct gc_generation *g = &mn__gc_generations[i];
+
+      g->from_space_pages = g->to_space_pages;
+      g->to_space_pages = 0;
+      g->next_free = 0;
+      g->free_end = 0;
+      g->promote_to = &mn__gc_generations[i + 1];
+    };
+
+  g->promote_to[collect_to] = &mn__gc_generations[collect_to];
+}
+
+
+/* Allocate SIZE bytes in generation G.  */
+static void *
+allocate (struct gc_generation *g, size_t size)
+{
+  void *addr;
+  
+  /* Round up size to the next multiple of eight.  */
+  size = (size + 7) & ~7;
+
+  if (g->next_free + size > g->free_end)
+    enlarge_generation (g);
+
+  addr = g->next_free;
+  g->next_free += size;
+  return addr;
+}
+
+
+/* Scavenge the object OBJ, and return a new value referring to the
+   scavenged copy.  OBJ must not already have been scavenged; it must
+   be in the from-space of a generation being collected.
+
+   By "scavenge", we mean: if OBJ refers to an object in from-space
+   in generation COLLECT_TO or younger:
+   - copy the object to the appropriate destination generation's
+     to-space,
+   - leave a forwarding pointer in its place, and
+   - return a value referring to the new copy.  */
+static tagged_t
+scavenge (tagged_t obj, int collect_to)
+{
+  if (mn__lowtag (obj) == lowtag_even_fixnum
+      || mn__lowtag (obj) == lowtag_odd_fixnum
+      || mn__lowtag (obj) == lowtag_immediate)
+    /* It's an immediate value, contained entirely in the tagged_t.
+       It doesn't need to be scavenged.  */
+    return obj;
+
+  else
+    {
+      struct generic_stored *original_generic
+
