% -----------------------------------------  
% Autogenerated LaTeX file from XML DocBook  
% -----------------------------------------  
%%<params>
%%</params>
\documentclass{report}
\usepackage[T1]{fontenc}
\usepackage[bitstream-charter]{mathdesign}
\usepackage{inconsolata}
\usepackage[latin1]{inputenc}

%\usepackage{fancybox}
\usepackage{makeidx}
\usepackage{listings}
%\usepackage[hyperlink]{docbook}
%\renewcommand{\DBKreleaseinfo}{}
\setcounter{tocdepth}{5}
\setcounter{secnumdepth}{5}
\title{Understanding and Configuring the Garbage Collector}
\author{}

\def\CCL{Clozure CL}
\def\cf{\tt\frenchspacing}
\catcode`@=11   % Make it possible to refer to some LaTeX utility macros.
\def\@cd#1{\null#1\endgroup}
\def\cd{\leavevmode\begingroup\cf\@cd}
\catcode`@=12           % Restore character codes





\makeindex
\makeglossary
\input /Users/rme/Downloads/cltl_src/defun
\begin{document}

\chapter{Understanding and Configuring the Garbage Collector}

\section{Heap space allocation}

Release 0.10 or later of \CCL\ uses a different memory management
scheme than previous versions did. Those earlier versions would
allocate a block of memory (of specified size) at startup and would
allocate lisp objects within that block. When that block filled with
live (non-GCed) objects, the lisp would signal a ``heap full''
condition. The heap size imposed a limit on the size of the largest
object that could be allocated.

The new strategy involves reserving a very large (2GB on DarwinPPC32,
1GB on LinuxPPC, ``very large'' on 64-bit implementations) block at
startup and consuming (and relinquishing) its contents as the size of
the live lisp heap data grows and shrinks. After the initial heap
image loads and after each full GC, the lisp kernel will try to ensure
that a specified amount (the ``lisp-heap-gc-threshold'') of free
memory is available. The initial value of this kernel variable is 16MB
on 32-bit implementations and 32MB on 64-bit implementations; it can
be manipulated from Lisp (see below.)

The large reserved memory block consumes very little in the way of
system resources; memory that's actually committed to the lisp heap
(live data and the ``threshold'' area where allocation takes place)
consumes finite resources (physical memory and swap space). The lisp's
consumption of those resources is proportional to its actual memory
usage, which is generally a good thing.

This scheme is much more flexible than the old one, but it may also
increase the possibility that those resources can become
exhausted. Neither the new scheme nor the old handles that situation
gracefully; under the old scheme, a program that consumes lots of
memory may have run into an artificial limit on heap size before
exhausting virtual memory.

The \cd{-R} or \cd{-{}-heap-reserve} command-line option can be
use to limit the size of the reserved block and therefore bound heap
expansion. Running

\begin{verbatim}
> openmcl --heap-reserve 8M
\end{verbatim}

\noindent would provide an execution environment that's very similar to
that provided by earlier \CCL\ versions.

\begin{defun}[Function]
lisp-heap-gc-threshold

This function returns the value of the kernel variable that specifies
the amount of free space to leave in the heap after full GC.
\end{defun}

\begin{defun}[Function]
set-lisp-heap-gc-threshold new-threshold

This function sets the value of the kernel variable that specifies the
amount of free space to leave in the heap after a full GC to
{\it new-threshold}, which must be a non-negative fixnum. Returns the new
value of that kernel variable (which may be somewhat larger than what
was specified).
\end{defun}

\begin{defun}[Function]
use-lisp-heap-gc-threshold

Tries to grow or shrink lisp's heap space, so that the free space is
(approximately) equal to the current heap threshold.  Returns \cd{nil}.
\end{defun}

\section{The Ephemeral GC}

For many programs, the following observations are true to a very large
degree:
\begin{enumerate}
\item
  Most heap-allocated objects have very short lifetimes (``are
  ephemeral''): they become inaccessible soon after they're created.

\item
  Most non-ephemeral objects have very long lifetimes: it's rarely
  productive for the GC to consider reclaiming them, since it's rarely
  able to do so. (An object that has survived a large number of GCs is
  likely to survive the next one. That's not always true of course,
  but it's a reasonable heuristic.)

\item
  It's relatively rare for an old object to be destructively modified
  (via \cd{setf}) so that it points to a new one, therefore most references
  to newly-created objects can be found in the stacks and registers
  of active threads. It's not generally necessary to scan the entire
  heap to find references to new objects (or to prove that such
  references don't exist), though it is necessary to keep track of
  the (hopefully exceptional) cases where old objects are modified to
  point at new ones.
\end{enumerate}

Ephemeral (or generational) garbage collectors try to exploit
these observations: by concentrating on frequently reclaiming
newly-created objects quickly, it's less often necessary to do more
expensive GCs of the entire heap in order to reclaim unreferenced
memory.  In some environments, the pauses associated with such full
GCs can be noticeable and disruptive, and minimizing the frequency
(and sometimes the duration) of these pauses is probably the EGC's
primary goal (though there may be other benefits, such as increased
locality of reference and better paging behavior.) The EGC generally
leads to slightly longer execution times (and slightly higher,
amortized GC time), but there are cases where it can improve overall
performance as well; the nature and degree of its impact on
performance is highly application-dependent.

Most EGC strategies (including the one employed by \CCL) logically or
physically divide memory into one or more areas of relatively young
objects (``generations'') and one or more areas of old objects.  Objects
that have survived one or more GCs as members of a young generation
are promoted (or ``tenured'') into an older generation, where they may
or may not survive long enough to be promoted to the next generation
and eventually may become ``old'' objects that can only be reclaimed if
a full GC proves that there are no live references to them.  This
filtering process isn't perfect---a certain amount of premature
tenuring may take place---but it usually works very well in
practice.

It's important to note that a GC of the youngest generation is
typically very fast (perhaps a few milliseconds on a modern CPU,
depending on various factors), \CCL's EGC is not concurrent and
doesn't offer realtime guarantees.

\CCL's EGC maintains three ephemeral generations; all newly created
objects are created as members of the youngest generation. Each
generation has an associated \emph{threshold}, which indicates the
number of bytes in it and all younger generations that can be
allocated before a GC is triggered. These GCs will involve the target
generation and all younger ones (and may therefore cause some
premature tenuring); since the older generations have larger
thresholds, they're GCed less frequently and most short-lived
objects that make it into an older generation tend not to survive
there very long.

The EGC can be \emph{enabled} or \emph{disabled} under program
control; under some circumstances, it may be enabled but
\emph{inactive} (because a full GC is imminent.)  Since it may be hard
to know or predict the consing behavior of other threads, the
distinction between the ``active'' and ``inactive'' state isn't very
meaningful, especially when native threads are involved.

\begin{defun}[Function]
egc arg

Enables the EGC if {\it arg} is non-\cd{nil}, disables the EGC otherwise.
Returns the previous enabled status.

Although this function is thread-safe (in the sense that calls to it
are serialized), it doesn't make a whole lot of sense to be turning
the EGC on and off from multiple threads...
\end{defun}

\begin{defun}[Function]
egc-enabled-p

Returns \cd{t} if the EGC was enabled at the time of the call, and
\cd{nil} otherwise.
\end{defun}

\begin{defun}[Function]
egc-active-p

Returns \cd{t} if the EGC was active at the time of the call, and
\cd{nil} otherwise. Since this is generally a volatile piece of
information, it's not clear whether this function serves a useful
purpose when native threads are involved.
\end{defun}

\begin{defun}[Function]
egc-configuration

Returns, as multiple values, the sizes in kilobytes of the thresholds
associated with the youngest ephemeral generation, the middle
ephemeral generation, and the oldest ephemeral generation.
\end{defun}

\begin{defun}[Function]
configure-egc g0-size g1-size g2-size

Puts the indicated threshold sizes (specified in kilobytes) into
effect.  Each threshold indicates the total size that may be allocated
in that and all younger generations before a GC is triggered.
Disables EGC while setting the values.  The provided threshold sizes
are rounded up to a multiple of 64 Kbytes.
\end{defun}

\section{GC Page reclamation policy}

After a full GC finishes, it'll try to ensure that at least
(LISP-HEAP-GC-THRESHOLD) of virtual memory are available;
objects will be allocated in this block of memory until it fills up,
the GC is triggered, and the process repeats itself.

Many programs reach near stasis in terms of the amount of logical
memory that's in use after full GC (or run for long periods of time in
a nearly static state), so the logical address range used for consing
after the $N$th full GC is likely to be nearly or entirely identical to
the address range used by the $N+1$th full GC.

By default (and traditionally in \CCL), the GC's policy is to
``release'' the pages in this address range: to advise the virtual
memory system that the pages contain garbage and any physical pages
associated with them don't need to be swapped out to disk before being
reused and to (re-{})map the logical address range so that the pages
will be zero-filled by the virtual memory system when they're next
accessed.  This policy is intended to reduce the load on the VM system
and keep \CCL's working set to a minimum.

For some programs (especially those that cons at a very high rate),
the default policy may be less than ideal: releasing pages that are
going to be needed almost immediately---and zero-fill-faulting
them back in, lazily---incurs unnecessary overhead. (There's a false
economy associated with minimizing the size of the working set if it's
just going to shoot back up again until the next GC.) A policy of
``retaining'' pages between GCs might work better in such an
environment.

Functions described below give the user some control over this
behavior. An adaptive, feedback-mediated approach might yield a
better solution.

\begin{defun}[Function]
gc-retain-pages arg

Tries to influence the GC to retain/recycle the pages allocated
between GCs if {\it arg} is true, and to release them otherwise. This is
generally a tradeoff between paging and other VM considerations.
\end{defun}

\begin{defun}[Function]
gc-retaining-pages

Returns \cd{t} if the GC tries to retain pages between full GCs and
\cd{nil} if it's trying to release them to improve VM paging
performance.
\end{defun}

\section{"Pure" areas are read-only, paged from image file}

SAVE-APPLICATION identifies code vectors and the pnames of interned
symbols and copies these objects to a ``pure'' area of the image file it
creates. (The ``pure'' area accounts for most of what the ROOM function
reports as ``static'' space.)

When the resulting image file is loaded, the pure area of the file is
now memory-mapped with read-only access. Code and pure data are
paged in from the image file as needed (and don't compete for global
virtual memory resources with other memory areas.)

Code-vectors and interned symbol pnames are immutable : it is an
error to try to change the contents of such an object. Previously,
that error would have manifested itself in some random way. In the new
scheme, it'll manifest itself as an ``unhandled exception'' error in the
Lisp kernel. The kernel could probably be made to detect a spurious,
accidental write to read-only space and signal a lisp error in that
case, but it doesn't yet do so.

The image file should be opened and/or mapped in some mode which
disallows writing to the memory-mapped regions of the file from
other processes. I'm not sure of how to do that; writing to the file
when it's mapped by \CCL\ can have unpredictable and unpleasant
results.  SAVE-APPLICATION will delete its output file's directory
entry and create a new file; one may need to exercise care when using
file system utilities (like tar, for instance) that might overwrite an
existing image file.

\section{Weak References}

In general, a ``weak reference'' is a reference to an object which does
not prevent the object from being garbage-collected.  For example,
suppose that you want to keep a list of all the objects of a certain
type.  If you don't take special steps, the fact that you have a list
of them will mean that the objects are always ``live'', because you can
always reference them through the list.  Therefore, they will never be
garbage-collected, and their memory will never be reclaimed, even if
they are referenced nowhere else in the program.  If you don't want
this behavior, you need weak references.

\CCL\ supports weak references with two kinds of objects: weak hash
tables and populations.

Weak hash tables are created with the standard function
\cd{make-hash-table}, which is extended to accept the keyword argument
\cd{:weak}.  Hash tables may be weak with respect to either their keys
or their values.  To make a hash table with weak keys, invoke
\cd{make-hash-table} with the option \cd{:weak t}, or, equivalently,
\cd{:weak :key}.  To make one with weak values, use \cd{:weak :value}.
When the key is weak, the equality test must be \cd{\#'eq} (because it
wouldn't make sense otherwise).

When garbage collection occurs, key-value pairs are removed from
the hash table if there are no non-weak references to the weak
element of the pair (key or value).

In general, weak-key hash tables are useful when you want to use the
hash to store some extra information about the objects you look up in
it, while weak-value hash tables are useful when you want to use the
hash as an index for looking up objects.

A population encapsulates an object, causing certain reference from
the object to be considered weak.  \CCL\ supports two kinds of
populations: lists, in which case the encapsulated object is a list of
elements, which are spliced out of the list when there are no
non-weak references to the element; and alists, in which case the
encapsulated object is a list of conses which are spliced out of the
list if there are no non-weak references to the car of the cons.

If you are experimenting with weak references interactively, remember
that an object is not dead if it was returned by one of the last three
interactively-evaluated expressions, because of the variables
\cd{*}, \cd{**}, and \cd{***}.  The easy workaround
is to evaluate some meaningless expression before invoking
\cd{gc}, to get the object out of the REPL variables.

\section{Weak References Dictionary}

\begin{defun}[Function]
make-population &key :type :initial-contents

This function creates and returns a new population. A population
encapsulates an object, causing certain references from the object to
be considered weak.

The \cd{:type} argument determines what kind of population to create.  Its value
may be either \cd{:list} (the default) or \cd{:alist}.

The \cd{:initial-contents} argument is a sequence of elements (or conses, for
type \cd{:alist}) to be used to initialize the population.  The sequence
itself (including the conses in the case of an alist) is not stored in
the population;  a new list or alist is created to hold the elements.
\end{defun}

\begin{defun}[Function]
population-type population

This function returns the type of {\it population}, either
\cd{:list} or \cd{:alist}.
\end{defun}

\begin{defun}[Function]
population-contents population

This function returns the list encapsulated in {\it population}.

Note that as long as there is a direct (non-weak) reference to this
list, it will not be modified by the garbage collector.  Therefore it
is safe to traverse the list, and even modify it, just as with any
other list.  If you want the elements of the list to become subject to
garbage collection again, you must stop referring to the list
directly.

\cd{setf} may be used with \cd{population-contents} to replace the
list encapsulated by the population.  The new list is not copied; it
is used directly.
\end{defun}

\section{Garbage-{}Collection Dictionary}




\printindex

\end{document}
