You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
closos/Documentation/chap-garbage-collection.tex

90 lines
4.6 KiB
TeX

\chapter{Garbage collection}
\label{chap-garbage-collection}
\section{Introduction}
Contrary to traditional operating systems such as \unix{}, a Lisp
operating system will need a global \emph{tracing garbage collector}.
Traditional operating systems get away with a simpler technique,
because the file system in such an operating system can not contain
cycles. With this restriction, the simpler \texttt{reference
counting} mechanism is sufficient. Furthermore, although reference
counting is usually slower than a tracing garbage collector, the
additional overhead of reference counters is of no importance when
used for a file system in secondary memory.
There is a rich literature on automatic memory management.
(see e.g., \cite{Jones:2011:GCH:2025255})
For \sysname{}, we plan to have a two-level memory management
technique. The low level consists of a relatively small local heap
for each thread, and a per-thread garbage collector that manages that
heap. The higher-level consists of a global heap that contains
long-lived objects and objects that are shared between several
threads.
\section{Per-thread garbage collector}
Each thread has a local heap, roughly the size of the cache, say
around 4MiB. The thread-local heap is managed entirely by the thread
itself, so that the garbage collector for it is executed by the thread
itself. Experiments show that we will be able to run the thread-local
garbage collector in a few milliseconds, which is good enough for most
applications. We will use a sliding garbage collector in order to
maintain allocation order. This way, we have a precise measure of the
relative age of the objects, so that we can promote only the oldest
objects when required.
There can be no references between an object in one local heap to an
object in another local heap. And there can be no references from the
global heap to a local heap. Whenever a reference is about to be
created from an object in the global heap to an object in the local
heap, this attempt is caught by a \emph{write barrier} on the global
heap. As a result if this write barrier being tripped, the object in
the local heap being referred to (and its transitive closure) will
migrate to the global heap, thereby preserving the general invariant.
\section{Global garbage collector}
In addition to the thread-local heaps, there is a global heap. The
garbage collector for this heap will use a combination of the
traditional \emph{mark-and-sweep} collector and an ordinary memory
allocator, similar to the one used by the \clanguage{} functions
\texttt{malloc} and \texttt{free}.
Recall that that a heap-allocated object is either a \texttt{cons}
cell or a \emph{general instance}. A \texttt{cons} cell is
represented as two machine words. A general instance is represented
as a \emph{header} consisting of two machine words, and a \emph{rack}
which is a vector of words with a contents that depends on the exact
type of the object. In both cases, then, a reference to a
heap-allocated object is a reference to a double word.
Given this representation, we separate the headers from the racks, so
that \texttt{cons} cells and headers of general instances are
allocated from a separate part of the global heap. Since this part of
the global heap consists of only two-word objects, it can be managed
very efficiently with a \emph{mark-and-sweep} garbage collector, using
a simple free list. The advantage of this technique is that an object
is never moved as a result of a garbage collection. Therefore, any
reference to an object that is shared between several threads, remains
valid after a garbage collection of the global heap.
The marking phase is done by first requesting each thread to do a
garbage collection and to mark any object in the global heap that is
referred to by local objects. When all threads have responded, a
global collection is started. The global collection is done
concurrently with thread activity. For that reason, objects allocated
in the global heap during this phase are marked as being live.
The global collection traces the global heap starting with objects
marked by the mutator threads. This tracing uses a standard
three-color algorithm. Write operations to the global heap are caught
by a write barrier.
When tracing in the global heap is finished, the part of the global
heap that contains two-word headers and \texttt{cons} cells is scanned
and unmarked cells are collected into a free list. If an unmarked
cell is a \texttt{cons} cell, then no further action is needed. If an
unmarked cell is a header object, then the corresponding rack is
returned to the rack part of the global heap.