Also available as: DOI, PDF (arxiv).
See also: Experiment (DOI).
Abstract Rust is a non-Garbage Collected (GCed) language, but the lack of GC makes expressing data-structures that require shared ownership awkward, inefficient, or both. In this paper we explore a new design for, and implementation of, GC in Rust, called Alloy. Unlike previous approaches to GC in Rust, Alloy allows existing Rust destructors to be automatically used as GC finalizers: this makes Alloy integrate better with existing Rust code than previous solutions but introduces surprising soundness and performance problems. Alloy provides novel solutions for the core problems: finalizer safety analysis rejects unsound destructors from automatically being reused as finalizers; finalizer elision optimises away unnecessary finalizers; and premature finalizer prevention ensures that finalizers are only run when it is provably safe to do so.
Amongst the ways one can classify programming languages are whether they are Garbage Collected
(GCed) or not: GCed languages enable implicit memory management; non-GCed languages require
explicit memory management (e.g C
's malloc
/ free
functions). Rust's use of affine types [25, p. 5] and
ownership does not fit within this classification: it is not GCed but it has implicit scope-based
memory management. Most portions of Rust programs are as succinct as a GCed equivalent, but
ownership is too inflexible to express shared ownership for data-structures that require multiple
owners (e.g. doubly linked lists). Workarounds such as reference counting impose an extra
burden on the programmer, make mistakes more likely, and often come with a performance
penalty.
In an attempt to avoid such problems, there are now a number of GCs for Rust (e.g. [2, 11, 14, 32, 33]).
Most introduce a user-visible type Gc<T>
which takes a value v of type T
and moves v to the 'GC
heap'. The Gc<T>
value itself is a wrapper around a pointer to v on the GC heap. Gc<T>
can be
cloned (i.e. duplicated) and dereferenced to a value of type &T
(i.e. a type-safe pointer) at
will by the user. When no Gc<T>
wrappers pointing to v can be found, indirectly or directly,
from the program's roots (e.g. variables on the stack), then the GC heap memory for v can be
reclaimed.
It has proven hard to find a satisfying design and implementation for a GC for Rust, as perhaps
suggested by the number of attempts to do so. We identify two fundamental challenges for
GC for Rust: how to give Gc<T>
an idiomatic and complete API; and how to make finalizers
(i.e. the code that is run just before a value is collected by the GC) safe, performant, and
ergonomic.
struct GcNode { value: u8, nbr: Option<Gc<RefCell<GcNode>>>}
impl Drop for GcNode { fn drop(&mut self) { println!("drop {}", self.value); } }
fn main() {
let mut gc1 = Gc::new(RefCell::new(GcNode { value: 1, nbr: None } ));
gc1.borrow_mut().nbr = Some(gc1);
let gc2 = Gc::new(RefCell::new(GcNode { value: 2, nbr: None } ));
gc2.borrow_mut().nbr = Some(gc2);
gc1 = gc2;
force_gc();
println!("{} {}", gc1.borrow().value, gc1.borrow().nbr.unwrap().borrow().value);
}
Gc<T>
and destructors as finalizers. We create a type GcNode
which
models a graph: it stores an 8 bit integer value and a possibly-null
reference (via Rust's standard Option
type) to a neighbouring node
(line 1). We add a normal Rust destructor which Alloy is able to use as a
finalizer when GcNode
is used inside Gc<T>
(line 2).
Inside main
we create the first GCed node in the graph (line 5).
We use Rust's normal RefCell
type to allow the node to be mutated
(using the RefCell::borrow_mut
method which dynamically checks
for mutation that would undermine Rust's static rules)
and a cycle created directly back to itself (line 6). We then create a second cyclic graph (lines 7
and 8), immediately assigning it to the gc1
variable (line 9):
this copies, rather than moves, the Gc<T>
.
This causes the first cyclic graph GcNode{value: 1, ..}
to no longer be reachable, so after forcing a collection (line 10) that node
can be collected. Its finalizer is then scheduled to be run, causing
drop 1
to be printed out at a later point; when it has completed the GC
heap memory can be reclaimed. The print statement outputs 2 2
(line
11).
In this paper we introduce Alloy, a new GC for Rust: an example of its use is shown in Listing 1. Alloy uses conservative garbage collection (i.e. treating each reachable machine word as a potential pointer), which naturally solves the API challenge. However, the finalization challenge is much more involved: the causes of this challenge, and our solutions to it, occupy the bulk of this paper.
Normal Rust code uses destructors (i.e. code which is run just before a value is reclaimed by Rust's
implicit memory management) extensively. Although finalizers and destructors may seem to be
synonyms, existing GCs for Rust cannot reuse destructors as finalizers: the latter must be manually
implemented for each type that needs it. Unfortunately, even this is trickier than it appears: it is not
possible to implement a finalizer for Gc<T>
if T
is an external library; some parts of destructors are
automatically created by the Rust compiler, but hand-written finalizers must duplicate those parts
manually; and users can accidentally cause a type's finalizer to be run more than once. In short,
finalization in existing GCs for Rust is unpalatable.
GCs for Rust are not alone in requiring manually written finalizers. In a close cousin to our work, a GC proposal for C++, the reuse of destructors as finalizers was ruled out due to seemingly insoluble problems [8, p. 32], which we divide into four categories: (1) some safe destructors are not safe finalizers; (2) finalizers can be run prematurely; (3) running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks; (4) and finalizers are prohibitively slower than destructors. All are, at least to some degree, classical GC problems; all are exacerbated in some way by Rust; and none, with the partial exception of #2, has existing solutions.
We show that it is possible to reuse most Rust destructors as finalizers in a satisfying way. We introduce novel solutions to the long-standing problems this implies by making use of some of Rust's unusual static guarantees. We thus gain a better GC for Rust and solutions to open GC problems. Our solutions, in order, are: (1) finalizer safety analysis extends Rust's static analyses to reject programs whose destructors are not provably safe to be used as finalizers; (2) premature finalizer prevention automatically inserts fences to prevent the GC from being 'tricked' into collecting values before they are dead; (3) we run finalizers on a separate thread; and (4) and finalizer elision statically optimises away finalizers if the underlying destructor duplicates the GC's work.
Alloy as an implementation is necessarily tied to Rust, though most of the novel techniques in this paper rely on general properties of affine types and ownership. While we do not wish to claim generality without evidence, it seems likely that many of the techniques in this paper will generalise to other ownership-based languages, as and when such emerge.
Although Alloy is not production ready, its performance is already reasonable: when we control for the (admittedly somewhat slow) conservative GC (BDWGC) Alloy currently uses, the performance of Alloy varies from 0.74× to, in the worst case, 1.17× that of reference counting. Alloy is also sufficiently polished (e.g. good quality error messages) in other ways for it to: show a plausible path forwards for those who may wish to follow it; and to allow others to evaluate whether GC for Rust is a good idea or not.
This paper is divided into four main parts: GC and Rust background (Section 2); Alloy's basic design (Section 3); destructor and finalizer challenges and solutions (Sections 4 to 7); and evaluation (Section 8). The first three parts have the challenge that our work straddles two areas that can seem mutually exclusive: GC and Rust. We have tried to provide sufficient material for readers expert in one of these areas to gain adequate familiarity with the other, without boring either, but we encourage readers to skip material they are already comfortable with.
2.1 The Challenges of Shared Ownership in Rust
Rust uses affine types and ownership to statically guarantee that: a value has a single owner (e.g. a
variable); an owner can move (i.e. permanently transfer the ownership of) a value to another owner;
and when a value's owner goes out of scope, the value's destructor is run and its backing
memory reclaimed. An owner can pass references to a value to other code, subject to the
following static restrictions: there can be multiple immutable references ('&
') to a value or a single
mutable reference ('&mut
'); and references cannot outlast the owner. These rules allow many
Rust programs to be as succinct as their equivalents in GCed languages. This suggests that
the search for a good GC for Rust may be intellectually stimulating but of little practical
value.
However, there are many programs which need to express data structures which do not fit into the restrictions of affine types and ownership. These are often described as 'cyclic data-structures', but in this paper we use the more abstract term 'shared ownership', which includes, but is not limited to, cyclic data-structures.
A common way of expressing shared ownership is to use the reference counting type Rc<T>
from
Rust's standard library. For many data-structures, this is a reasonable solution, but some forms of shared
ownership require juggling strong and weak counts. This complicates programs (see Listing 2) and can
cause problems when values live for shorter or longer than intended.
struct RcNode { value: u8, nbr: Option<Weak<RefCell<RcNode>>> }
impl Drop for RcNode { fn drop(&mut self) { println!("drop {}", self.value); } }
fn main() {
let mut rc1 = Rc::new(RefCell::new(RcNode{value: 1, nbr: None}));
rc1.borrow_mut().nbr = Some(Rc::downgrade(&rc1));
let rc2 = Rc::new(RefCell::new(RcNode{value: 2, nbr: None}));
rc2.borrow_mut().nbr = Some(Rc::downgrade(&rc2));
rc1 = Rc::clone(&rc2);
println!("{} {}", rc1.borrow().value,
rc1.borrow().nbr.as_ref().unwrap().upgrade().unwrap().borrow().value);
}
Rc<T>
. To avoid
memory leaks we use weak references between nodes (line 1). We again create two cyclic
graphs (lines 5–8) using Rc::downgrade
to create weak references (lines 6 and 8). Since Rc<T>
is
not copyable, we must use a manual clone
call to have both the rc1
and rc2
variables point to
the same cyclic graph (line 9). Accessing a neighbour node becomes a delicate dance requiring
upgrading the weak reference (line 11). The need to downgrade Rc<T>
to Weak<T>
and upgrade
(which may fail, hence the unwrap
) back to Rc<T>
creates significant extra complexity relative
to Listing 1: compare line 11 in Listing 1 to lines 10-11 above.
A different solution is to store values in a vector and use integer indices into that vector. Such indices are morally closer to machine pointers than normal Rust references: the indices can become stale, dangle, or may never have been valid in the first place. The programmer must also manually deal with issues such as detecting unused values, compaction, and so on. In other words, the programmer ends up writing a partial GC themselves. A variant of this idea are arenas, which gradually accumulate multiple values but free all of them in one go: values can no longer be reclaimed too early, though arenas tend to unnecessarily increase the lifetime of values.
A type-based approach is GhostCell
[35], which uses branding to statically ensure that at any given
point only one part of a program can access a shared ownership data-structure. This necessarily excludes
common use cases where multiple owners (e.g. in different threads) need to simultaneously access
disjoint parts of a data-structure.
Although it is easily overlooked, some workarounds (e.g. Rc<T>
) rely on using unsafe Rust
(i.e. parts of the language, often involving pointers, that are not fully statically checked by the
compiler). Pragmatically, we assume that widely used code, even if technically unsafe, has
been pored over sufficiently that it is trustworthy in practise. However, 'new' solutions that a
programmer implements using unsafe Rust are unlikely to immediately reach the same level of
trustworthiness.
While we do not believe that every Rust program would be improved by GC, the variety of workarounds already present in Rust code, and the difficultly of creating new ones, suggests that there is a subset that would benefit from GC.
GC is a venerable field and has accumulated terminology that can seem unfamiliar or unintuitive. We mostly use the same terminology as Jones et al [19], the major parts of which we define here.A program which uses GC is split between the mutator (the user's program) and the collector (the GC itself). At any given point in time, a thread is either running as a mutator or a collector. In our context, all threads run as a collector at least sometimes (for reasons that will become apparent later, some threads always run as a collector). Tracing and reclamation is performed during a collection phase. Our collections always stop-the-world, where all threads running mutator code are paused while collection occurs.
A tracing GC is one that scans memory looking for reachable values from a program's roots: values, including cycles of values, that are not reachable from the roots can then be reclaimed. In contrast, a pure reference counting GC does not scan memory, and thus cannot free values that form a cycle. Increasingly, GC implementations make use of multiple techniques (see [3]) but, for simplicity's sake, we assume that implementations wholly use one technique or another except otherwise stated. For brevity, we use 'GC' as a short-hand for 'tracing GC'; when we deal with other kinds of GC (e.g. reference counting), we explicitly name them.
We refer to memory which is allocated via Gc<T>
as being on the GC heap. We use the term 'GC value'
to refer both to the pointer wrapped in a Gc<T>
and the underlying value on the GC heap, even though
multiple pointers / wrappers can refer to a single value on the GC heap, unless doing so would lead to
ambiguity.
We use 'Alloy' to refer to the combination of: our extension to the Rust language; our modifications to
the rustc
compiler; and our integration of the Boehm-Demers-Weiser GC (BDWGC) into the runtime of
programs compiled with our modified rustc
.
In this section we outline Alloy's basic design and implementation choices – the rest of the paper then goes into detail on the more advanced aspects.
Alloy provides a Gc<T>
type that exposes an API modelled on the Rc<T>
type from Rust's
standard library, because Rc<T>
: is conceptually similar to Gc<T>
; widely used in Rust code, and
its API familiar; and that API reflects long-term experience about what Rust programmers
need.
When a user calls Gc::new(v)
, the value v
is moved to the GC heap: the Gc<T>
value returned
to the user is a simple wrapper around a pointer to v
's new address. The same underlying
GCed value may thus have multiple, partly or wholly overlapping, references active at any
point. To avoid undermining Rust's ownership system, dereferencing a Gc<T>
produces an
immutable (i.e. '&
') reference to the underlying value. If the user wishes to mutate the underlying
value, they must use other Rust types that enable interior mutability (e.g. RefCell<T>
or
Mutex<T>
).
One feature that Alloy explicitly supports is the ability in Rust to cast references to raw pointers and
back again. This can occur in two main ways. Gc<T>
can be dereferenced to &T
which can
then, as with any other reference, be converted to *const T (i.e. a C-esque pointer to T).
Gc<T>
also supports the common Rust functions (into_raw
and from_raw
) which wrap the
value-to-pointer conversion in a slightly higher-level API. The ability to convert references to raw
pointers is used in many places (e.g. Rust's standard C Foreign Function Interface (FFI)).
We believe that a viable GC for Rust must allow the same conversions, but doing so has a
profound impact because Rust allows raw pointers to be converted to the integer type usize
and
back1.
Having acknowledged that pointers can be 'disguised' as integers, it is then inevitable that Alloy must be a conservative GC: if a machine word's integer value, when considered as a pointer, falls within a GCed block of memory, then that block itself is considered reachable (and is transitively scanned). Since a conservative GC cannot know if a word is really a pointer, or is a random sequence of bits that happens to be the same as a valid pointer, this over-approximates the live set (i.e. the blocks that the GC will not reclaim). Typically the false detection rate is very low (see e.g. a Java study which measures it at under 0.01% of the live set [28]).
Conservative GC occupies a grey zone in programming language semantics: in most languages, and most compiler's internal semantics, conservative GC is, formally speaking, unsound; and furthermore some languages (including Rust) allow arbitrary 'bit fiddling' on pointers, temporarily obscuring the address they are referring to. Despite this, conservative GC is widely used, including in the two most widespread web browsers: Chrome uses it in its Blink rendering engine [1] and Safari uses it in its JavaScript VM JavaScriptCore [26]. Even in 2025, we lack good alternatives to conservative GC: there is no cross-platform API for precise GC; and while some compilers such as LLVM provide some support for GC features [23], we have found them incomplete and buggy. Despite the potential soundness worries, conservative GC thus remains a widely used technique.
Conservative GC enables Alloy to make a useful ergonomic improvement over most other GCs
for Rust whose Gc<T>
is only cloneable. Such types can be duplicated, but doing so requires
executing arbitrary user code. To make the possible run-time cost of this clear, Rust has no
direct syntax for cloning: users must explicitly call Rc::clone(&v)
to duplicate a value v
. In
contrast, since Alloy's Gc<T>
is just a wrapper around a pointer it is not just cloneable but
also copyable: duplication only requires copying bytes (i.e. no arbitrary user code need be
executed). Copying is implied by assignment (i.e. w = v
), reducing the need for explicit
cloning2.
This is not just a syntactic convenience but also reflects an underlying semantic difference: duplicating a
Gc<T>
in Alloy is is a cheaper and simpler operation than most other GCs for Rust which which tend to
rely, at least in part, on reference counting.
There is one notable limitation of Gc<T>
's API relative to Rc<T>
. The latter, by definition, knows how
many references there are to the underlying data, allowing the value stored inside it to be mutably
borrowed at run-time if there is only a single reference to it (via get_mut
and make_mut
). In contrast,
Gc<T>
cannot know how many references there are to the underlying data. As we shall see in Section 8,
some Rust programs are built around the performance advantages of this API (e.g. turning 'copy on
write' into just 'write' in some important cases).
The most visible aspect of Alloy is its fork, and extension of, the standard Rust compiler rustc
. We
forked rustc
1.79.0, adding or changing approximately 5,500 Lines of Code (LoC) in the core compiler,
and adding approximately 2,250 LoC of tests.
Alloy uses BDWGC [9] as the underlying conservative GC, because it is the most widely ported
conservative GC we know of. We use BDWGC's GC_set_finalize_on_demand(1)
API, which causes
finalizers to be run on their own thread.
We had to make some minor changes to BDWGC to suit our situation. First, we disabled BDWGC's
parallel collector because it worsens Alloy's performance. It is unclear to us why this happens:
we observe significant lock contention within BDWGC during GC collections, but have not
correlated this with a cause. Second, BDWGC cannot scan pointers stored in thread locals
because these are platform dependent. Fortunately, rustc
uses LLVM's thread local storage
implementation, which stores such pointers in the PT_TLS
segment of the ELF binary: we modified
BDWGC to scan this ELF segment during each collection. Third, BDWGC dynamically intercepts
thread creation calls so that it can can scan their stacks, but (for bootstrapping reasons) is
unable to do so in our context: we explicitly changed Alloy to register new threads with
BDWGC.
In many GCed languages, 'destructor' and 'finalizer' are used as synonyms, as both terms refer to code run when a value's lifetime has ended. In existing GCs for Rust, these two terms refer to completely different hierarchies of code (i.e. destructors and finalizers are fundamentally different). In Alloy, in contrast, a reasonable first approximation is that finalizers are a strict subset of destructors. In this section we pick apart these differences, before describing the challenges of using destructors as finalizers.
When a value in Rust is dropped (i.e. at the point its owner goes out of lexical scope) its destructor is automatically run. Rust's destructors enable a style of programming that originated in C++ called RAII (Resource Acquisition Is Initialization) [30, Section 14.4]: when a value is dropped, the underlying resources it possesses (e.g. file handles or heap memory) are released. Destructors are used frequently in Rust code (to give a rough idea: approximately 15% of source-level types in our benchmark suite have destructors).
Rust destructors are formed of two parts, run in the following order: a user-defined drop method; and
automatically inserted drop glue. Drop methods are optional and users can provide one for a type by
implementing the Drop
trait's drop
method. Drop glue recursively calls destructors of contained types
(e.g. fields in a struct). Although it is common usage to conflate 'destructor' in Rust with drop methods,
drop glue is an integral part of a Rust destructor: we therefore use 'destructor' as the umbrella term for
both drop methods and drop glue.
When considering finalizers for a GC for Rust, there are several layers of design choices. We will
shortly see that finalizers cause a number of challenges (Section 4.1) and one choice would be to forbid
finalizers entirely. However, this would mean that one could not sensibly embed types that have
destructors in a Gc<T>
. While Rust does not always call destructors, the situations where this occurs are
best considered 'exceptional': not calling destructors from Gc<T>
would completely undermine
reasonable programmer expectations. Because of this, Alloy, and indeed virtually all GCs for Rust,
support finalizers in some form.
However, existing GCs force distinct notions of destructors and finalizers onto the programmer. Where
the former have the Drop
trait, the latter typically have a Finalize
trait. If a user type needs to be
finalized then the user must provide an implementation of the Finalize
trait. However, doing so
introduces a number of problems: (1) external libraries are unlikely to provide finalizers, so they must be
manually implemented by each consumer; (2) Rust's orphan rule [27, Section 6.12] prevents one
implementing traits for types defined in external libraries (i.e. unless a library's types were designed to
support Gc<T>
, those types cannot be directly GCed); (3) one cannot automatically replicate drop glue
for finalizers; and (4) one cannot replicate rustc
's refusal to allow calls to the equivalent of
Drop::drop
.
Programmers can work around problems #1 and #2 in various ways. For example, they can wrap external library types in newtypes (zero-cost wrappers) and implement finalizers on those instead [20, Section 19.3]. Doing so is tedious but not conceptually difficult.
Problem #3 has partial solutions: for example, [14] uses the Trace
macro to generate finalizer glue (the
finalizer equivalent of drop glue) for struct fields. This runs into an unsolvable variant of problem #2:
types in external libraries will not implement this trait and cannot be recursively scanned for finalizer
glue.
Problem #4 is impossible to solve in Rust as-is. One cannot define a function that can never be called —
what use would such a function have? A possible partial solution might seem to be for the finalize
method take ownership of the value, but Drop::drop
does not do so because that would not allow drop
glue to be run afterwards.
4.1 General Challenges When Using Destructors as Finalizers
We have stated as our aim that Alloy should use destructors as finalizers. Above we explained some Rust-specific challenges — but there are several non-Rust-specific challenges too! Fundamentally, finalizers and destructors have different, and sometimes incompatible, properties. The best guide to these differences, and the resulting problems, is Boehm [6], supplemented by later work on support for GC in the C++ specification [8]3.
An obvious difference between destructors and finalizers is when both are run. While C++ and Rust define precisely when a destructor will be run4, finalizers run at an unspecified point in time. This typically happens at some point after the equivalent destructor would run, though a program may exit before any given finalizer is run5. There are, however, two situations which invert this. First, if a thread exits due to an error, and the program is either not compiled with unwinding, or the thread has crossed a non-unwinding ABI boundary, then destructors might not be run at all, where a GC will naturally run the equivalent finalizers: we do not dwell on this, as both behaviours are reasonable in their different contexts. Second, and more surprisingly, it is possible for finalizers in non-error situations to run prematurely, that is before the equivalent destructor [6, section 3.4].
A less obvious difference relates to where destructors and finalizers are run. Destructors run in the same thread as the last owner of a value. However, running finalizers in the same thread as the last owner of the value can lead to race conditions [24] and deadlocks [6, section 3.3] if a finalizer tries to access a resource that the mutator expects to have exclusive access too. When such problems affect destructors in normal Rust code, it is the clear result of programmer error, since they should have taken into account the predictable execution point of destructors. However, since finalizers do not have a predictable execution point, there is no way to safely access shared resources if they are run on the same thread. The only way to avoid this is to run finalizers on a non-mutator thread — but not all Rust types / destructors are safe to run on another thread.
There are several additional differences such as: finalizers can reference other GCed values that are partly, or wholly, 'finalized' and may have had their backing memory reused; and finalizers can resurrect values by copying the reference passed to the finalizer and storing it somewhere.
Over time, finalizers have thus come to be viewed with increasing suspicion. Java, for example, has deprecated, and intends eventually removing, per-type finalizers: instead it has introduced deliberately less flexible per-object 'cleaners', whose API prevents problems such as object resurrection and per-class finalization [13].
4.2 The Challenge of Finalizers for Alloy
At this point we hope to have convinced the reader that: a viable GC for Rust needs to be able to use existing destructors as finalizers whenever possible; but that finalizers, even in existing GCs, cause various problems.
It is our belief that some problems with finalizers are fundamental. For example, finalizers inevitably introduce latency between the last use of a value and its finalization.
Some problems with finalizers are best considered the accidental artefacts of older designs.
Java's cleaners, for example, can be thought of as a more restrictive version of finalizers that
allow most common use-cases but forbid by design many dangerous use cases. For example,
per-class/struct finalization can easily be replaced by per-object/value finalization; and object
resurrection can be prevented if object access requires a level of indirection. Alloy benefits from
our better shared understanding of such problems and the potential solutions: it trivially
addresses per-object/value finalization (Gc::new_unfinalizable
function turns finalization off
for specific values) and, as we shall see later, via only slightly more involved means, object
resurrection.
However, that leaves many problems that are potentially in the middle: they are not obviously fundamental, but there are not obvious fixes for them either. In our context, where we wish to use destructors as finalizers, four problems have hitherto been thought insoluble [8, p. 32]: (1) finalizers are prohibitively slower than destructors; (2) finalizers can run prematurely; (3) running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks; (4) some safe destructors are not safe finalizers.
Fortunately for us, Rust's unusual static guarantees, suitably expanded by Alloy, allow us to address each problem in novel, satisfying, ways. In the following section, we tackle these problems in the order above, noting that we tackle problems #1 and #2 separately, and #3 and #4 together.
As we shall see in Section 8, there is a correlation between the number of finalizers that are run and overhead from GC (with a worst case, albeit a definite outlier, in our experiment of 3.35× slowdown). In this section we show how to reduce the number of finalizers that are run, which helps reduce this overhead.
A variety of factors contribute to the finalizer performance overhead, including: a queue of finalizers must be maintained, whereas destructors can be run immediately; finalizers run some time after the last access of a value, making cache misses more likely; and finalizers can cause values (including values they own) to live for longer (e.g. leading to increased memory usage and marking overhead). Most of these factors are inherent to any GC and our experience of using and working on BDWGC– a mature, widely used GC – does not suggest that it is missing optimisations which would overcome all of this overhead.
Instead, whenever possible, Alloy elides finalizers so that they do not need to be run at all. We are able
to do this because: (1) BDWGC is responsible for all allocations and will, if necessary GC allocations even
if they are not directly wrapped in a Gc<T>
; and (2) many Rust destructors only free memory which
BDWGC would, albeit with some latency, do anyway.
Consider the standard Rust type Box<T>
which heap allocates space for a value; when a Box<T>
value
is dropped, the heap allocation will be freed. We can then make two observations. First, Box<T>
's drop
method solely consists of a deallocate
call. Second, while we informally say that Box<T>
allocates on
the 'heap' and Gc<T>
allocates on the 'GC heap', all allocations in Alloy are made through BDWGC and
stored in the same heap.
When used as a finalizer, Box<T>
's drop method is thus unneeded, as the underlying memory will be
freed by BDWGC anyway. This means that there is no need to run a finalizer for a type such as
Gc<Box<u8>>
at all, and the finalizer can be statically elided. However, we cannot elide a
finalizer for a type such as Gc<Box<Rc<u8>>>
because Rc<T>
's drop method must be run for the
reference count to be decremented. As this shows, we must consider the complete destructor, and
not just the top-level drop method, when deciding whether a corresponding finalizer can be
elided.
function NeedsFinalizer(T):
if Impls(T, Drop) and not Impls(T, DropMethodFinalizerElidable) then
return true;
for each field ∈ T do
if NeedsFinalizer(field) then
return true;
return false;
impl<T> Gc<T> {
pub fn new(value: T) -> Self {
if needs_finalizer::<T>() { Gc<T>::new_with_finalizer(value) }
else { Gc<T>::new_without_finalizer(value) }
...
}
}
needs_finalizer
returns true if a finalizer is required for a type. The Gc<T>
type uses this
intrinsic to ensure that the value is registered as requiring a finalizer. With optimisations turned
on, this seemingly dynamic, branching code will be turned into static, branchless code.5.1 Implementing Finalizer Elision
Finalizer elision statically determines which type's destructors do not require corresponding finalizers and elides them. It does so conservatively, and deals correctly with drop glue.
As shown in Algorithm 1, any type which implements the Drop
trait requires finalization unless it also
implements the new DropMethodFinalizerElidable
marker trait (i.e. a trait without methods). This
trait can be used by a programmer to signify that a type's drop method need not be called if the type is
placed inside a Gc<T>
. The 'Drop' part of the trait name is deliberate (i.e. it is not a simplification of
'destructor') as it allows the programmer to reason about a type locally (i.e. without considering drop
glue or concrete type paramaters). If the type has a transitively reachable field whose type implements
the Drop
trait but not the DropMethodFinalizerElidable
trait, then then the top-level type still
requires finalization.
Even though neither normal Rust destructors or Alloy finalizers are guaranteed to run, a program
whose destructors or finalizers never run would probably not be usable (leaking resources such as
memory, deadlocking, and so on). We therefore make DropMethodFinalizerElidable
an unsafe trait,
because implementing it inappropriately is likely to lead to undesired – though not incorrect! –
behaviour at run-time.
Alloy modifies the standard Rust library to implement DropMethodFinalizerElidable
on the following types:
Box<T>
, Vec<T>
, RawVec<T>
, VecDeque<T>
, LinkedList<T>
, BTreeMap<K, V>
, BTreeSet<T>
, HashMap<K, V>
, HashSet<T>
,
RawTable<K, V>
6,
and BinaryHeap<T>
. Fortunately, not only are these types' drop methods compatible with
DropMethodFinalizerElidable
, but they are extensively used in real Rust code: they enable significant
numbers of finalizers to be elided.
Listing 3 shows the new const compiler intrinsic needs_finalizer
we added to implement
Algorithm 1. The intrinsic is evaluated at compile-time: its result can be inlined into Gc::new
, allowing
the associated conditional to be removed too. In other words – compiler optimisations allowing – the
'does this specific type require a finalizer?' check has no run-time overhead.
Most of us assume that finalizers are always run later than the equivalent destructor would have run, but they can sometimes run before [6, section 3.4], undermining soundness. Such premature finalization is also possible in Alloy as described thus far (see Listing 4). In this section we show how to prevent premature finalization.
There are two aspects to premature finalization. First, language specifications often do not define, or do not precisely define, when the earliest point that a value can be finalized is. While this means that, formally, there is no 'premature' finalization, it seems unlikely that language designers anticipated some of the resulting implementation surprises (see e.g. this example in Java [29]). Second, compiler optimisations – at least in LLVM – are 'GC unaware', so optimisations such as scalar replacement can change the point in a program when GCed values appear to be finalizable.
struct S { value: u8 }
impl Drop for S { fn drop(&mut self) { self.value = 0; } }
fn main() {
let root = Gc::new(Box::new(S{ value: 1 }));
let inner: &u8 = &**root.value;
force_gc();
println!("{}", *inner);
}
S
(line 1) with
a drop method that sets the wrapped integer to zero (line 2). In the main method, we move an
instance of the struct into a Box<T>
, which we then move into a Gc<T>
(line 4). We obtain a Rust
reference to the inner integer (line 5), which at run-time will be a pointer to the Box<T>
. At this
point, the compiler can determine that the Gc<T>
is no longer used and overwrite root
's pointer
(which may be in a register). If a collection then occurs, a finalizer can run S
's drop method,
causing the program to print '0' instead of the expected '1' (line 7).
In our context, it is natural to define premature finalization as a (dynamic) finalizer for a Gc<T>
value
running before the (static) Gc<T>
owner has gone out of scope. Similar to the high-level proposal mooted
in [7, Solution 1], we must ensure that the dynamic lifetime of a reference derived from a Gc<T>
matches
or exceeds the lifetime of the Gc<T>
itself.
Our solution relies on adjusting Gc<T>
's drop method to keep alive a GCed value for at least the static
lifetime of the Gc<T>
itself. In other words, we ensure that the conservative GC will always see a pointer
to a GCed value while the corresponding Gc<T>
is in-scope.
However, there is a major problem to overcome: copyable types such as Gc<T>
are forbidden from
having destructors. The fundamental challenge we have to solve is that each copied value will have a
destructor called on it, which has the potential for any shared underlying value to be destructed more
than once. Alloy explicitly allows Gc<T>
– but no other copyable type – to have a destructor, but to
ensure it doesn't cause surprises in the face of arbitrary numbers of copies, the destructor must be
idempotent. Our task is made easier because Gc<T>
naturally has no drop glue from Rust's perspective:
Gc<T>
consists of a field with a pointer type, and such types are opaque from a destruction
perspective.
We therefore only need to make sure that Gc<T>
's drop method is idempotent. Fortunately, this is
sufficient for our purposes: we want the drop method to inhibit finalization but that does not
require run-time side effects. To achieve this, we use a fence. These come in various flavours.
What we need is a fence that prevents both: the compiler from reordering computations
around a particular syntactic point; and the CPU from reordering computations around a
particular address. We copy the platform specific code from the BDWGC GC_reachable_here
macro7
into Gc<T>
's drop method, which achieves the effect we require.
6.1 Optimising Premature Finalizer Prevention
The drop method we add to Gc<T>
fully prevents premature finalization. It also naturally solves a
performance problem with the suggested solution for C++ in [7, Solution 1], which requires keeping
alive all pointers, no matter their type, for their full scope. By definition, our solution only
keeps alive Gc<T>
values: the compiler is free to optimise values of other types as it so wishes.
However, on an artificial microbenchmark we observed a noticeable overhead from our fence
insertion.
We thus implemented a simple optimisation: we only insert fences for a Gc<T>
if it has a finalizer.
Intuitively, it seems that we should not generate drop methods in such cases, but this is difficult to do
directly inside rustc
. Instead, we suppress calls to the drop methods of such types: the two approaches
are functionally equivalent, though ours does put an extra burden on dead-code elimination in the
compiler tool-chain.
Alloy adds a new pass RemoveElidableDrops
to rustc
's Mid-Level Intermediate Representation
(MIR) processing. MIR is best thought of as the main IR inside rustc
: it contains the complete set of
functions in the program, where each function consists of a sequence of basic blocks. Simplifying
somewhat, function and drop method calls are represented as different kinds of terminators on basic
blocks. Terminators reference both a callee and a successor basic block.
The remove_elidable_drops
pass iterates over a program's MIR, identifies drop method
terminators which reference elidable finalizers, and turns them into 'goto' terminators to the
successor basic basic block. Algorithm 4 in the Appendix presents a more formal version of this
algorithm.
In this section we address two high-level problems: running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks; and some safe destructors are not safe finalizers. Addressing the former problem is conceptually simple – finalizers must be run on a separate thread – but we must ensure that doing so is sound. We therefore consider this a specific instance of the latter problem, treating both equally in this section.
We therefore introduce Finalizer Safety Analysis (FSA), which prevents unsafe (in the sense of 'not safe Rust') destructors being used as finalizers. As a first approximation, FSA guarantees that finalizers are memory safe, cycle safe (i.e. do not access already finalized objects), and thread safe. We present the three main components of FSA individually before bringing them together.
Gc<T>
can store, directly or indirectly, normal Rust references (i.e. &
and &mut
), at which point it is
subject to Rust's normal borrow checker rules and cannot outlive the reference. However, finalizers
implicitly extend the lifetime of a GCed value, including any stored references: accessing a reference in a
finalizer could undermine Rust's borrow checking rules.
A simple way of avoiding this problem is to forbid Gc<T>
from storing, directly or indirectly,
references. This might seem to be no great loss: storing references in a Gc<T>
largely nullifies the
'GCness' of Gc<T>
. However, we found the result hard to use, as it can make simple tasks such as
gradually migrating existing code to use Gc<T>
painful.
A moderate, but in our experience insufficient, relaxation is to recognise that only types that
need a finalizer can possibly have problems with references, and to forbid such types from
storing references in Gc<T>
. For example, if there is no drop method for struct S{x: &u8}
then its destructor is safe to use as a finalizer, since its non-drop aspects will not use the &u8
reference.
The eventual rule we alighted upon for FSA is that a destructor for a type T
can be used as a
finalizer provided the destructor's drop methods do not obtain references derived from T
's
fields (including fields reachable from its attributes). Using Rust's terminology, we forbid
projections (which include a struct's fields, indexes into a vector, and so on) in destructors from
generating references. Any non-projection references that are used in a destructor are by
definition safe to use, as they either exist only for the duration of the drop method (references to
variables on the stack) or will exist for the remainder of the program (references to global
variables).
This rule over-approximates the safe set of destructors. For example, a drop method that creates a new value and tries to obtain a reference to a field in it (i.e. a projection) cannot be a destructor under FSA, even though the reference cannot outlast the drop method. We found that attempting to relax our rule further to deal with such cases rapidly complicates exposition and implementation.
struct GcNode { value: u8, nbr: Option>> }
impl Drop for GcNode {
fn drop(&mut self) { self.value = 0; println!("{}", self.nbr.unwrap().borrow().value); }
}
fn main() {
let mut gc1 = Gc::new(RefCell::new(GcNode{value: 1, nbr: None}));
let gc2 = Gc::new(RefCell::new(GcNode{value: 2, nbr: None}));
gc1.borrow_mut().nbr = Some(gc2);
gc2.borrow_mut().nbr = Some(gc1);
}
nbr
(line
3). Running this program on a strong memory model machine is likely to print either 2 0
or 1 0
depending on whether gc1
or gc2
is finalized first. The 'seemingly expected' output of 1 2
or 2 1
would never be printed: whichever GCed value is finalized first changes what the other GCed
value sees in its finalizer. As that implies, this example is unsound: whichever finalizer runs
second leads to undefined behaviour.
One of the main motivations for GCs is that they solve problems with cyclic data structures. However, finalizers can be unsound if they access state shared within members of a cycle. Listing 5 shows an example of undefined behaviour when two GCed values create a cycle and both their finalizers reference the other GCed value. Whichever order the finalizers are run in, at least one of the finalizers will see the other GCed value as partly or wholly 'finalized'.
Most languages and systems we are aware of assume that users either don't run into this problem (finalization cycles are considered rare in GCed languages [19, p. 229]) or know how to deal with it when they do (e.g. refactoring the types into parts that do and don't require finalization [6, p. 11]). There is no fully automatic solution to this problem. Some GCs offer weak references, which allow users to detect when finalization cycles have been broken, though they still have to deal with the consequences manually.
We wanted to provide users with static guarantees that their destructors will not behave unexpectedly
when used as finalizers in a cycle. A first attempt at enforcing such a property might seem to
be that a Gc<T>
cannot have, directly or indirectly, fields of type Gc<T>
. This would indeed
prevent the mistakes we want to catch but also disallow shared ownership! We therefore
check only that a type's destructor does not, directly or indirectly, access a Gc<T>
. This allows
GCed types to express shared ownership so long as their destructor(s) do not access other GC
types.
To make this check easier to implement, we introduce an auto trait [27, Section. 11], a kind of marker
trait that the compiler propagates automatically. An auto trait A
will be automatically implemented for a
type T
unless one of the following is true: there is an explicit negative implementation of A
for T
; or T
contains a field that is not itself A
. Informally, we say that a negative implementation of an auto-trait
pollutes containing types.
Our new auto trait is called FinalizerSafe
, and we provide a single negative implementation
impl<T> !FinalizerSafe for Gc<T>
. This naturally handles transitively reachable code, allowing FSA
itself to only check that a destructor's direct field accesses are FinalizerSafe
.
7.3 Destructors Need to be Runnable on a Finalizer Thread
impl Drop for GcNode {
fn drop(&mut self) { println!("drop {}", self.value.lock().unwrap()); }
}
fn main() {
let counter = Rc::new(Mutex::new(0));
{ let _ = Gc::new(GcNode { value: Rc::clone(&counter), nbr: None }); }
let r1 = counter.lock().unwrap();
force_gc();
assert_eq!(*r1, 0);
}
GcNode
that immediately goes out of scope
(line 7). The mutator then acquires the lock (line 8) but before it can release the lock a GC cycle
occurs and the GcNode
's finalizer run (line 9). If the finalizer is run on the same thread as the
mutator, then it will fail to grab the lock (line 2) and cause a deadlock.
Running finalizers on the same thread as a mutator can cause problems when the finalizer accesses state shared with the mutator (see Section 4.1 for a general description and Listing 6 for a concrete example). The most general solution to this problem is to run finalizers on a separate finalizer thread that never runs mutator code.
We must therefore ensure that it is safe to run a type's destructor on the finalizer thread. A
conservative definition is that Gc<T>
is safe to use if T
implements both of Rust's existing Send
(denoting
a type that can be permanently moved from one thread to another) and Sync
(denoting a type that can be
safely accessed simultaneously by multiple threads) auto traits. However, requiring that finalization be
restricted to types that implement both Send
and Sync
can be frustrating, particularly because more
types implement Send
than Sync
.
It may seem sufficient for T
to implement Send
alone so that the value can be safely sent to the finalizer
thread. However, this would not prevent a finalizer indirectly accessing state shared with a
non-GCed value via a mechanism such as Arc
, causing the very problems we are trying to
avoid.
FSA thus ignores whether a type implements Send
or Sync
(or not) and instead examines the
destructor directly. To pass FSA: the destructor must not access thread locals; and any types the
destructor accesses via projections must implement both Send
and Sync
. Intuitively, this allows a
non-Send
-or-Sync
type T
to have a safe finalizer provided that T
's destructor only access the Send
and
Sync
'subset' of T
.
This rule shows clearly that FSA is a form of abstract interpretation rather than a mere extension of the type
system8.
After careful examination we believe this is compatible with Rust's semantics (and rustc
and LLVM's
implementations) at the time of writing, but it is worth knowing that this rule would be unsafe in other
languages and implementations (for example our assumption would be unsafe in Java due to
synchronisation removal [31]). We leave it as an open question to others as to whether Rust should
deliberately permit or forbid such checks in its semantics.
The implementation of the finalization thread is fairly simple. For example, we do not need to explicitly synchronise memory between the mutator and finalization threads because BDWGC's stop-the-world collection phase already synchronises all memory between threads.
function FinalizerSafetyAnalysis(func):
for each basic_block ∈ func do
t ← basic_block.terminator;
if not IsGcConstructorCall(t) then
continue;
ty ← GetTyOfGcValue(t);
if isFinalizerUnchecked(ty) or not NeedsFinalizer(ty) then
continue;
for each drop_method ∈ GetDropGlue(ty) do
if not IsMIRAvailable(drop_method) then
EmitFinalizerUnsafeError();
CheckFunctionSafety(drop_method);
function CheckFunctionSafety(drop):
for each basic_block ∈ drop do
for each statement ∈ basic_block do
for each projection ∈ statement do
if not IsFinalizerSafe(projection.element) then
EmitFinalizerUnsafeError();
if IsFunctionCall(basic_block.terminator) then
CheckFunctionSafety(basic_block.terminator);
function IsFinalizerSafe(ty):
return Impls(ty, Send) and Impls(ty, Sync) and Impls(ty, FinalizerSafe);
FSA integrates the seemingly separate components presented above into one. It iterates over every
function in a Rust program analysing destructors of types that are used in Gc<T>
. Algorithm 2 shows the
essence of FSA (for example eliding details of caching which Alloy uses to speed up compile
times).
Because FSA is a form of abstract interpretation, we need to determine when to run FSA on a program.
In essence, whenever a previously unchecked type T
is used to create a new Gc<T>
, FSA is run. As well as
the Gc::new
constructor, Gc<T>
instances can be created with conversion traits such as From
. We
annotated each such entry point with a new rustc
-only attribute rustc_fsa_entry_point
: calls to
functions with this attribute lead to FSA checks.
A naive implementation of FSA would be a notable cost, so Alloy uses several optimisations. As
alluded to above, FSA caches the results of various checks to avoid pointlessly repeating work. We also
extend FinalizerSafe
with negative implementations for &T
, and &mut T
. If a type T
implements all of
FinalizerSafe
, Send
, and Sync
, we know that there can be no unsafe projections used in a destructor,
and can bypass most FSA checks entirely (though we still need to check for thread local accesses).
Across our benchmark suite, FSA increases compilation time in release mode by a modest
0.8–1.6%.
Algorithm 2 also captures Alloy's approach to error messages. Rather than just inform a user that
'your drop method has not passed FSA', Alloy pinpoints which field or line in a drop method caused FSA
to fail: EmitReferenceError
informs the user when a reference in a type is used in a way that violates
FSA (see Section 7.1); and EmitFinalizerUnsafeError
when a drop method contains code which is
unsafe (e.g. references a Gc<T>
type, an opaque function, etc.). Listing 7 shows an example of the errors
reported by Alloy: note that it pinpoints the line within a drop method that caused an FSA
error.
error: `RefCell::new(GcNode{value: 2, nbr: None})` cannot be safely finalized.
--> finalization_cycle.rs:11:21
|
7 | fn drop(&mut self) { self.value = 0; println!("{}", self.nbr.unwrap().borrow().value); }
| ^^^^^^^^
| a finalizer cannot safely dereference this
| because it might have already been finalized.
...
11 | let gc2 = Gc::new(RefCell::new(GcNode{value: 2, nbr: None}));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| has a drop method which cannot be safely finalized.
rustc
's normal error messages: note that both the use of a type with a FSA-incompatible drop
method ('has a drop method') and the line in the drop method ('caused by the expression') are
highlighted to the user.7.4.1 Awkward Kinds of Functions
FSA can encounter two kinds of 'awkward' functions.
First, some functions (e.g. due to use of trait objects, or FFIs) do not have a body available when FSA
runs: using such a function necessarily causes an FSA check to fail. One common class of functions
which causes this are Rust intrinsics (e.g. min_align_of
etc.): we audited the most frequently used of
these and annotated those which are FSA-safe with a new rustc_fsa_safe_fn
attribute. Other functions
whose bodies are unknown cause FSA to fail.
Second, in most cases, FSA runs on Rust functions whose generic types have been replaced with
concrete types (in Rust terminology, functions have been 'monomorphised'). Sometimes, however, FSA
encounters functions (e.g. intrinsics or functions with certain annotations) whose generic types have not
yet been replaced. FSA can still run on such functions, but will reject them unless all generic types imply
the FinalizerSafe
, Send
, and Sync
traits. Note that calling a method on a generically typed value will
lead to FSA finding a method without a body: as in the first case above, this will cause FSA to
fail.
The common theme to both is that we wish FSA to be sound, at which point we forego completeness.
This can cause users frustration when FSA raises an error on code they know is FSA safe. As is common
in Rust, we therefore provide an unsafe escape hatch which allows users to silence FSA errors when they
can prove to their satisfaction that doing so does undermine correctness. We experimented
with a per-type approach, but found that unduly restrictive: we therefore provide a per-value
escape hatch with the unsafe FinalizerUnchecked<T>
type. Values wrapped in this type are
considered safe to use at all points in FSA. Our aim is that users should rarely need to resort to this
escape hatch, but, as is not uncommon in Rust, there are valid idioms of use where we found it
necessary.
Version | Description | #benchmarks | |
---|---|---|---|
Binary Trees | Debian CLBG Rust#2 | Heap allocation microbenchmark | 1 |
Regex-Redux | Debian CLBG Rust#1 | Regular expression matching | 1 |
Alacritty | v0.15.0-dev | Terminal emulator | 10 |
fd | v9.0.0 | Unix find replacement | 7 |
grmtools | v0.13.4 | Lexer / parser library | 4 |
Ripgrep | v14.1.1 | Fast grep replacement | 13 |
som-rs-ast | git #35b780 | SOM AST VM | 26 |
som-rs-bc | git #35b780 | SOM bytecode VM | 26 |
Rc<T>
, etc.). Binary Trees and Regex-Redux are classic
stand-alone GC benchmarks; the other 'benchmarks' represent benchmark suites (e.g. Ripgrep contains 13
benchmarks). The middle portion of the table shows a variety of 'normal' Rust programs; the bottom portion
of the program shows three implementations of the SOM programming language.In this section we explain our methodology and our experimental results.
There is no existing benchmark suite for GCs for Rust. Even if such a suite did exist, it may not have
been suitable for our purposes because in experiment EGCvs we want to compare programs using
existing shared ownership approaches. We searched through roughly the 100 most popular Rust libraries
on crates.io
(the de facto standard Rust package system) looking for suitable candidates. In
practise this meant we looked for crates using reference counting. In the interests of brevity,
for the rest of this section we use 'Rc<T>
' to cover both Rc<T>
and its thread-safe cousin
Arc<T>
.
Gc<T> |
Rc<T> |
Weak<T> | |
---|---|---|---|
Alacritty | 107 | 9,450 | 1,970 |
Binary Trees | 2 | 0 | 0 |
fd | 7 | 421 | 1 |
grmtools | 299 | 1,825 | 23 |
Regex-Redux | 108 | 109 | 0 |
Ripgrep | 104 | 249 | 4 |
som-rs-ast | 206 | 35 | 0 |
som-rs-bc | 464 | 39 | 0 |
Rc<T>
, we also show
how many weak references are left in: this is a proxy both for partial porting, and also how the extent weak
references. This is a proxy for the extent of changes that cyclic data-structures impose upon source code.Table 1 shows the resulting suite: note that, except for Binary Trees and Regex-Redux, the 'benchmarks' are themselves benchmark suites. Collectively, our suite contains – depending on whether you count the SOM implementations' (identical) benchmark suites collectively or separately – 62 or 88 benchmarks. Table 2 shows how often relevant types are used after porting. Table 3 shows the distribution of heap data at run-time. This shows that our suite contains benchmarks with a variety of memory patterns.
Binary Trees is allocation intensive and sufficiently simple that it can be easily and meaningfully
ported to additional shared ownership strategies: Rust-GC, a user library for GC for Rust [14]; and
Arena<T>
, a non-GC memory arena [10]. Alacritty, fd, and Ripgrep are well known Rust programs, all
of which have their own benchmark suites. grmtools is a parsing library which uses Rc<T>
extensively
in error recovery: we benchmarked it using 28KLoC of real Java source code, which we mutated with
syntax errors.
SOM is a small, but complete, language in the Smalltalk mould, which has a wide variety of
implementations. Our suite includes two of these: som-rs-ast (which represents programs as ASTs); and
som-rs-bc (which represents programs as bytecode). Both are existing ports of a Java SOM VM into Rust
and use Rc<T>
. We use the same SOM core-lib
benchmarks for both, derived from git commit
#afd5a6.
We were not able to port all parts of all programs. In particular, some programs make extensive use of
the make_mut
and get_mut
functions in Rc<T>
, which allow a programmer to mutate their contents if, at
run-time, they only have a single owner. There is not, and cannot be, equivalent functionality with a
copyable Gc<T>
type. In some cases we were able to successfully use alternative mechanisms. In others
we judged the usages to either be rare at run-time (i.e. not worth porting), or too difficult to port (i.e. too
much of the program is built around the resulting assumptions). In a small number of cases we
ended up introducing bugs. Alacritty's UTF-8 support is an example, resulting in deadlocks.
Whenever we encountered a bug in our porting, we reverted back to Rc<T>
for that portion of the
port.
Allocated (#) | GC owned (%) | |||
---|---|---|---|---|
Rc<T> |
Box<T> | Gc<T> |
||
Alacritty | 125 | 8,770 | 2 | 2.70 |
Binary Trees | 0 | 3,222,201 | 3,222,190 | 100.00 |
fd | 17,821 | 306,902 | 61 | 1.23 |
grmtools | 2,283 | 19,859,431 | 4,038,605 | 44.19 |
Regex-Redux | 45 | 3,132 | 78 | 15.39 |
Ripgrep | 12,786 | 521,366 | 26,069 | 17.97 |
som-rs-ast | 15 | 8,586,976 | 1,533,728 | 76.95 |
som-rs-bc | 15 | 2,397,931 | 1,530,325 | 99.71 |
Gc<T>
values. For example, a program consisting of a single Gc<Box<T>>
would have a
'GC Owned' value of 100% because the Box<T>
is owned by the Gc<T>
. As we shall see later, there can be a
number of knock-on effects when a Gc<T>
owns other such values.8.1.2 What We Couldn't Include in the Benchmark Suite
We tried porting 10 other programs that are not included in our benchmark suite. To avoid readers wondering if we have 'cherry-picked' our eventual benchmark suite, we briefly report why those other programs have been excluded. All excluded benchmarks are shown in Table 4 in the Appendix.
Several programs (e.g. numbat, mini-moka, and salsa), once ported, turned out to be uninteresting from a GC benchmarking perspective. Irrespective of the number of source locations that reference memory allocation types, the benchmarks we could run from them allocated sufficiently little memory that there are no worthwhile differences between different allocation strategies. Put another way: these programs are in a sense 'the same' from our evaluation perspective.
Two programs (bevy and rust-analyzer) did not run correctly after porting. Both extensively use the
make_mut
or get_mut
functions in Rc<T>
and reverting those changes made the benchmarks
uninteresting.
We also ported RustPython, but were unable to adjust it to faithfully implement Python-level
destructors. In essence, in RustPython's default configuration, its representation of objects is
not compatible with FSA. This means that we can not run Python __del__
methods in the
finalizer thread. Although technically this is still compatible with Python's semantics, we
felt this would be a misleading comparison, as our port of RustPython would be doing less
work.
Our experiment can be seen as a comparison of Alloy against 'normal' Rust. Fortunately, Alloy is a strict superset of 'normal' Rust: only if users explicitly opt into GC does Alloy really become a 'GC for Rust'. This allows us to use the same compiler, standard library, and so on, removing several potential confounding factor in our results. We compile two binaries: one without logging features compiled and one with. We only use the latter when reporting collector related metrics.
A challenge in our experiment is that different allocation strategies can use different underlying
allocators. In particular, Alloy has to use BDWGC, but, for example, Rc<T>
can use a modern allocator
such as jemalloc. Much has changed in the performance of allocators since BDWGC's 1980s roots: in
Rc<T>
-only benchmarks, we observe an inherent overhead from BDWGC of 2–26% relative to jemalloc
(see Table 6 in the Appendix), which is a significant, and variable, confounding factor. Fortunately,
BDWGC can be used as a 'traditional' allocator that allocates and frees on demand (i.e. no
conservative GC occurs): in the main experiment, we thus use BDWGC as the allocator for all
benchmarks.
We want to understand the memory usage of different allocation strategies over the lifetime of a benchmark. However, there is no single metric which captures 'memory usage', nor even an agreed set of metrics [5]. We use two metrics to capture different facets: (1) heap footprint, the amount of live heap memory recorded by by Heaptrack [34] at every allocation and deallocation; and (2) Resident Set Size (RSS), the total physical memory in RAM used by the process (including memory-mapped files, stack, and code/text segments), sampled at 10Hz. The overhead of recording heap footprint is much greater than RSS, but it provides a more detailed view of memory usage.
Another pair of confounding factors are the initial and maximum sizes of the GC heap: too-small values can lead to frequent resizing and/or 'thrashing'; large values to unrealistically few collections. What 'small' and 'large' are varies by benchmark, and 'careful' (or thoughtless) choices can significantly distort one's view of performance. BDWGC uses an adaptive strategy by default, growing the heap size as it detects that it would benefit from doing so. To give some sense of whether a different strategy and/or heap size would make a difference, we ran our benchmarks with three different fixed heap sizes. Doing so either has little effect or speeds benchmarks up; when it does so, the impact is generally under 10% and is at most 28% (the detailed results are presented in Table 9 in the Appendix). Broadly speaking, this suggests that BDWGC's default heap sizing approach, at least in our context, is not significantly distorting our view of performance.
We ran each benchmark in our suite 30 times. We report wall-clock times as returned by the
standard Unix time
utility. The SOM benchmarks are run using its conventional rebench
tool; we adjusted rebench to use time
for consistency with our other benchmarks. We ran
all benchmarks on an AMD EPYC 7773X 64-Core 3.5GHz CPU with 128GiB RAM, running
Debian 12 ('bookworm'). We turned off turbo boost and hyper-threading, as both can colour
results.
Except where otherwise stated, we report means and 99% confidence intervals for all metrics. We use the arithmetic mean for individual benchmarks and the geometric mean for benchmark suites.
When plotting time-series (i.e. sampled) memory metrics, we face the challenge that different configurations of the same benchmark can execute at different speeds. We thus resample each benchmark's data to 1000 evenly spaced points using linear interpolation. We chose 1000 samples because it is considerably above the visual resolution of our plots. After normalization, we calculate the arithmetic mean of the memory footprint measurement at each grid point (and not the raw underlying data) across all runs of the same benchmark. We record 99% confidence intervals at each point and show the result as shaded regions around the mean.
gc<t>
and rc<t>
on wall-clock time; heap footprint (i.e. the size of the
live set); and rss. the baseline at 1 is rc<t>
; values less than 1 show gc<t>
as better than rc<t>
; and the blue
vertical line shows the geometric mean of ratios. the wall-clock times of gc<t>
and rc<t>
are similar; the rss
somewhat similar; and the average heap footprint often very different. broadly speaking, gc<t>
increases the
average heap footprint because gc, and especially finalization, causes values to live for longer. benchmarks
which allocate relatively little memory (particularly ripgrep as shown in table 3) can exaggerate this effect.
perhaps surprisingly, the heap footprint and rss do not correlate. this is partly because the sample rate for
rss is rather low (which notably affects fast running benchmarks such as those for fd) and partly because
rss necessarily includes headroom, that is memory beyond that needed for the live set (and which may later
be returned to the os).The main results for EGCvs can be seen in Fig. 1. Though there is variation, Alloy has an overhead on wall-clock time of 5% on our benchmark suite. The effect on memory is more variable though, unsurprisingly, Alloy typically has a larger average heap footprint (i.e. allocated memory lives for longer). This metric needs to treated with slight caution: benchmarks which allocate relatively small amounts of memory (see Table 3) can make the relative effect of average heap footprint seem much worse than it is in absolute terms.
Binary Trees is sufficiently simple that we also used it to compare against Arena<T>
and Rust-GC.
The time-series data in Fig. 2 is particularly illuminating (for completeness, Table 5 in the Appendix has
the raw timings). Alloy is around 3.5× slower than Arena<T>
. The time-series data for the latter shows it
going through distinct phases: a (relatively long) allocation phase, a (relatively moderate) 'work' phase,
and a (relatively short) deallocation phase. Put another way: these clear phases make Binary Trees a
perfect match for an arena. In the other approaches, the 'work' phase occupies a much greater proportion
of their execution, because it also incorporates allocator work. Alloy is around 1.3× faster than Rc<T>
,
but both have similar memory profiles. Alloy is around 3× faster than Rust-GC and has an
average heap footprint around 4× smaller, reflecting Alloy's advantage in not being a user-land
library that relies in part on Rc<T>
. Although we caution against over-interpreting a single
benchmark, this does give us at least some idea of the performance ceiling and floor for different
approaches.
Rc<T>
; Rust-GC's
heap footprint is around 4× greater. Binary Trees is a perfect fit to Arena<T>
: it frees memory in one batch at
the end, and because it is 3× faster than Alloy, this 'wind down' period is a substantial portion of the overall
(quick!) execution. Ripgrep Alternates may seem to be an example of a memory leak in Alloy, but it is really
the result of the inevitable delay that GC imposes on noticing that values are lived, which is exacerbated by
the presence of finalizers. The frequent plateaus and dips show that memory is being freed, but at a later
point than one might initially expect. In contrast, som-rs-bc JSON Small shows a real memory leak due to
cyclic objects in Rc<T>
, where Alloy's heap footprint remains steady.
The time-series data in Fig. 2 helps explain other factors. For example, it shows that som-rs-bc leaks
memory on the JSON Small benchmark (we suspect it also leaks in some other benchmarks, though
rarely as visibly). This is because Rc<T>
keeps alive values in cycles; Alloy does not leak memory on
som-rs-bc, as it naturally deals correctly with cycles.
We can see from the time-series data that Ripgrep has a complex heap footprint pattern.
This may suggest a memory leak, but in fact it is a consequence of the inevitable delay in
freeing memory in a GC. In general, GC notices that memory is unused later than reference
counting, but this is exacerbated further by finalizers. Surprisingly, finalizers can lengthen or
shorten an allocation's lifetime. GCed values with finalizers tend to have longer lifetimes,
because they have to wait in the finalizer queue. However, when a finalizer calls free
on
indirectly owned values, those are immediately marked as not live, rather than having to
wait until the next collection to be discovered as such. This, albeit indirectly, explains the
seemingly random peaks and troughs in memory usage one can observe in Ripgrep's time-series
data.
Gc<T>
values that: have had their finalizers elided; cannot have their finalizers elided; have no
finalizers to elide. This chart is best read in conjunction in Table 3 to (a) get a sense of the quantity of
run-time memory involved (b) how much indirectly owned memory the Gc<T>
values have. The other plots
use 'no finalizer elision' as the normalization base (i.e. values below 1 show that finalizer elision improves a
metric). Total GC pause time is the cumulative time spent in stop-the-world collections. User time captures
the time spent in all threads, including the finalizer thread. Broadly speaking, the more finalizers are elided,
and the greater the proportion of the overall heap the memory owned by Gc<T>
, the better the metrics
become.
The results of EElision are shown in Fig. 3. In general, there is a fairly clear correlation: the more
finalizers are removed, and the greater the proportion of the overall heap the memory owned by Gc<T>
is,
the better the metrics become. However, there are several caveats. First, when all finalizers are removed,
BDWGC does not start a finalizer thread or invoke locking related to it, unduly flattering the time-based
metrics. Second, the quantity of finalizers is only a partial proxy for cost: some finalizers free
up large graphs of indirectly owned values, which can take some time to run. Third, some
benchmarks change the work they do: grmtools speeds up so much that its error recovery
algorithm has time to do more work, so while finalizer hugely benefits its GC pause time,
its wall-clock time changes much less. Finally, since finalizers can cause indirectly owned
allocations to be freed earlier than the GC itself does naturally, removing them can cause
indirectly owned values to live for longer: Ripgrep's average heap footprint highlights this
issue.
The results for EPremOpt are shown in Fig. 4. We created three configurations of Alloy. None has no fences, and thus is unsound, but allows us to approximate (allowing for possible vagaries from running unsound code!) the best possible outcome. Naive inserts all possible fences. Optimised inserts only necessary fences. Once confidence intervals are taken into account, there are no statistically significant results for this experiment. Although it is possible that benchmarking 'noise' is hiding a meaningful result, our data suggests that any such differences are likely to be minimal. To make up for this disappointment, the fact that there is no difference between any of these suggests that, on non-artificial benchmarks, premature finalizer prevention is not a noticeable cost.
Any performance judgements we make are necessarily contingent on our methodology the benchmark
suite we chose, including the proportion of benchmarks that we ported, and the way we process and
present data. For example, we did not port external libraries to use Gc<T>
so many benchmarks use a
variety of allocation strategies. Even had we ported everything, we would not be able to say, for example,
that finalizer elision will always improve performance by exactly the factor we see in our experiment:
there undoubtedly exist reasonable, non-pathological, programs which will see performance changes
outside the ranges that our results suggest.
Using BDWGC as the allocator for all benchmarks has the advantage of removing 'pure' allocator performance as a confounding factor, but does mean that some of the performance characteristics of benchmarks will be changed (e.g due to the portion of time we spend in the allocator; or BDWGC's adaptive heap sizing strategy). A generic, modern conservative GC, using the insights of recent non-GC allocators, would almost certainly give different – though we suspect not profoundly different – results. To the best of our knowledge there is currently no production-quality modern, generic conservative, GC we could use instead, though we are aware of at least one attempt to create such an alternative: it will be interesting to rerun our experiments if and when that arrives.
The RSS memory metric we collect is at Linux's whim: if it does not update as frequently as we expect, we will see artificially 'smoothed' data that may miss out peaks and troughs. Similar, our interpolation of time-series data onto a normalised grid can also smooth data. We manually checked a large quantity of data to ensure this was not a significant effect; by running benchmarks 30 times means it is also less more likely that peaks and troughs are caught at least sometimes.
In this paper we hope to have given sufficient background on GC and the use of destructors and finalizers in general. In this section we mostly survey the major parts of the GC for Rust landscape more widely. Our survey is inevitably incomplete, in part because this is a rapidly evolving field (a number of changes have occurred since the most recent equivalent survey we are aware of [16]). We also cover some relevant non-Rust GC work not mentioned elsewhere.
Early versions of Rust had 'managed pointers' (using the @T
syntax) which were intended to
represent GC types [16]. The core implementation used reference counting though there
were several, sometimes short-lived, cycle detectors [17]. Managed pointer support was
removed9
around a year before the first stable release of Rust. This was not the end of the story for
'GC as a core part of Rust', with core Rust developers exploring the problem space in more
detail [15, 21, 22]. Over time these efforts dwindled, and those interested in GC for Rust largely
moved from anticipating rustc
support to expecting to have to do everything in user-level
libraries.
One of the earliest user-level GC for Rust libraries is Bacon-Rajan-CC [12]. This provides a type Cc<T>
which is similar in intention to Alloy's Gc<T>
. The mechanism by which objects are collected is rather
different: they have a naive reference count, which causes objects outside a cycle to have deterministic
destruction; and users can manually invoke a cycle detector, which uses trial deletion in the style of Bacon and
Rajan [4]10
to identify objects in unused cycles. Cycle detection requires users manually implementing a Trace
trait
which traverses a type's fields. Destructors are used as finalizers: to avoid the problems with Rust
references we solved in Section 7.1, Bacon-Rajan-CC imposes a T:'static
lifetime bound on the type
parameter passed to Cc<T>
. Simplifying slightly, this means that any references in such a type must be
valid for the remaining lifetime of the program, a severe restriction. Unlike our approach to the access of
already-finalized values (Section 7.2), it can only detect such accesses at runtime, leading to a (safe) Rust
panic
.
Probably the best known GC for Rust is Rust-GC [14] (partly covered in Section 4). Rust-GC's Gc<T>
provides a similar API to Alloy, with the notable exception that its Gc<T>
is not, and cannot be, copyable,
thus always requiring calls to Gc::clone
. Although, like Alloy, Rust-GC allows Gc<T>
values to be
converted into pointers, its lack of conservative GC means that users must ensure that a Gc<T>
wrapper
is kept alive for the entire lifetime of pointers derived from it. Similarly to Bacon-Rajan-CC,
GCed values are reference counted, with occasional tracing sweeps to identify cycles, though
Rust-GC performs cycle detection automatically (i.e. it doesn't require manual calls to a
function such as collect_cycles
). Drop methods are not used as finalizers: if a finalizer is
required, a manual implementation of the Finalize
trait must be provided; finalizer glue can be
largely, though not fully (see Section 4), automatically created by the provided Trace
macro.
Rust-GC detects accesses to already-finalized values dynamically at run-time, panicking
if they occur. Unlike Bacon-Rajan-CC, these accesses are detected by recording what the
collector's state is in: if the collector is in a 'sweep' phase, any access of a Gc<T>
leads to a
panic. We have not yet verified whether cross-thread collection / sweeping can evade this
check.
An example of moving beyond reference counting in a GC for Rust is Shifgrethor [2]. It requires Gc
values to be created by a Root<'root>
: the resulting Gc<'root, T>
is then tied to the lifetime of the
Root<'root>
. This allows roots to be precisely identified, but requires explicitly having access to a
Root<'root>
whenever a Gc<'root, T>
is used. As with Rust-GC, Shifgrethor requires users to
manually implement a Finalize
trait, though Shifgrethor's is more restrictive: not only can other
GCed values not be accessed (implicitly solving the same problem as Section 7.2) but any other
type without the same 'root
lifetime as the GCed value is forbidden. This means that many
seemingly safe finalizers require implementing the unsafe UnsafeFinalize
trait. We view
Shifgrethor as proof that accurately tracking GC roots in normal Rust without reference
counting is possible, though it cannot deal with references being converted into pointers and
usize
s.
A different means of tackling the root-finding problem is GcArena [32], which uses branding in a
similar way to GhostCell
s (see Section 2). In essence, users provide a special 'root' type which is the
only place where roots can be stored. Mutating the heap can only be done in the context
of functions that are passed a branded reference to the GCed heap. Once such a function
has completed, GcArena is in full control of the GC heap, and knows that only the root
type needs to be scanned for roots. This leads to a precise guarantee about GC reference
lifetimes. However, if code executes in an arena for too long, the system can find itself starved of
resources, with no way of recovering, even if much of the arena is no longer used. GcArena
was originally part of the Piccolo VM (which was itself previously called Luster), a Lua VM
written in Rust. Such VMs have a frequently executed main loop which is a natural point for a
program to relinquish references to the GCed heap, but this is not true of many other GCed
programs.
One attempt to improve upon Rust-GC is Bronze
[11], though it shows how challenging it can be to meaningfully improve GC for
Rust: both of its main advances have subsequently been disabled because they are not just unsound but
actively lead to crashes. First, Bronze tried to solve the root-finding problem by using LLVM's gc.root
intrinsic at function entries to generate stack-maps (a run-time mechanism for accurately tracking active
pointers). This rules out the false positives that are inevitable in conservative GC. However, Bronze
could not track nested references: if a Gc<T>
was used as a field in a struct, it was not tracked
by the GC. Second, Bronze tried to give GC in Rust similar semantics to non-ownership
languages such as Java. It did this by allowing shared mutation, undermining Rust's borrow
checker.
Chrome's rendering engine Blink uses the conservative GC Oilpan. It has the interesting property that it has two classes of finalizers. 'Full finalizers' are similar to finalizers in Alloy, running on a finalizer thread at an indeterminate future point, but with the difference that they can only reference parts of a GCed value. To mitigate this, 'pre-finalizers' are run by the collector on the same thread as mutator as soon as an object as recognised as unused, and can access all of a GCed value. Pre-finalizers are necessary, but not encouraged, because they implicitly pause the stop-the-world phase of the collector. This reflects the fact that latency is a fundamental concern for a rendering engine: Alloy currently makes no pretences to being low latency.
We introduced a novel design for GC in Rust that solves a number of outstanding challenges in GC for Rust, as well as – by taking advantage of Rust's unusual static guarantees – some classical GC finalizer problems. By making integration with existing Rust code easier than previous GCs for Rust, we hope to have shown a pragmatic route for partial or wholesale migration of Rust code that would benefit from GC.
Challenges and future opportunities remain. For example, Alloy is an 'all or nothing' cost: if you
want to use Gc<T>
in a single location, you must pay the costs of the GC runtime and so
on. Alloy's absolute speed is, we believe, limited by BDWGC: it is probable that using a
semi-precise GC and/or a faster conservative GC could change our view of the absolute performance
speed
In summary, we do not claim that Alloy is the ultimate design for GC in Rust – reasonable people
may, for example, disagree on whether the costs of conservative GC are worth the gains –
but it does show what can be achieved if one is willing to alter the language's design and
rustc
.
The accompanying artefact [18] contains: the source code necessary to run this paper's experiment (including generating figures etc.) from scratch; and data from a run of the experiment that we used in this paper.
This work was funded by an EPSRC PhD studentship and the Shopify / Royal Academy of Engineering Research Chair in Language Engineering. We thank Steve Klabnik and Andy Wingo for comments.
function RemoveElidableDrops(func):
for each basic_block ∈ func do
if IsDropTerminator(basic_block.terminator.kind) then
ty ← GetTypeOfDroppedValue(block.terminator);
if IsGcType(ty) then
if not RequiresFinalizer(ty) then
ReplaceTerminator(basic_block);
function ReplaceTerminator(basic_block):
drop_func ← GetDropFunc(basic_block.terminator);
last_block ← GetLastBasicBlock(drop_func);
block.terminator ← last_block.terminator;
Algorithm 3: Removing elidable drops
Benchmark | Description | Reason for exclusion |
---|---|---|
bevy | ECS game engine in Rust | Unable to port successfully (see Section 8.1.2) |
dyon | Scripting language in Rust | Unable to port successfully (see Section 8.1.2) |
jiff | A datetime library for Rust | Too few allocations to measure |
mini-moka | Concurrent in-memory cache library | Too few allocations to measure |
numbat | Math search engine | Too few allocations to measure |
rkyv | Zero-copy deserialization framework | Insufficient Gc<T> coverage in benchmarks |
RustPython | Python interpreter written in Rust | Difficulty retro-fitting __del__ semantics (see Section 8.1.2) |
rust-analyzer | Language server for Rust | Unable to port successfully (see Section 8.1.2) |
salsa | Incremental recomputation library | Too few allocations to measure |
WLambda | Scripting language written in Rust | Insufficient Gc<T> coverage in benchmarks |
Wall-clock time (s) | ||||
---|---|---|---|---|
Gc<T> |
Rc<T> |
Gc<T> (Rust-GC) |
Arena<T> |
|
Alacritty | 0.41 [0.39, 0.45] | 0.40 [0.38, 0.44] | – | – |
Binary Trees | 0.11 [0.11, 0.11] | 0.15 [0.14, 0.15] | 0.33 [0.32, 0.33] | 0.03 [0.03, 0.04] |
fd | 0.33 [0.29, 0.38] | 0.31 [0.26, 0.37] | – | – |
grmtools | 3.06 [3.00, 3.14] | 3.24 [3.17, 3.31] | – | – |
Regex-Redux | 0.47 [0.47, 0.47] | 0.45 [0.45, 0.46] | – | – |
Ripgrep | 1.61 [1.55, 1.69] | 1.52 [1.45, 1.59] | – | – |
som-rs-ast | 0.92 [0.88, 0.95] | 0.79 [0.76, 0.82] | – | – |
som-rs-bc | 0.28 [0.27, 0.29] | 0.29 [0.28, 0.30] | – | – |
Wall-clock time (s) | Ratio | ||
---|---|---|---|
jemalloc | BDWGC | ||
Alacritty | 0.36 [0.33, 0.40] | 0.40 [0.38, 0.44] | 1.11 |
Binary Trees | 0.12 [0.12, 0.12] | 0.15 [0.14, 0.15] | 1.26 |
fd | 0.30 [0.25, 0.36] | 0.31 [0.26, 0.37] | 1.02 |
grmtools | 3.09 [3.01, 3.17] | 3.24 [3.17, 3.31] | 1.05 |
Regex-Redux | 0.45 [0.44, 0.45] | 0.45 [0.45, 0.46] | 1.01 |
Ripgrep | 1.46 [1.40, 1.53] | 1.52 [1.45, 1.59] | 1.04 |
som-rs-ast | 0.77 [0.74, 0.80] | 0.79 [0.76, 0.82] | 1.02 |
som-rs-bc | 0.28 [0.27, 0.29] | 0.29 [0.28, 0.30] | 1.02 |
Rc<T>
-only code (i.e., no GC). The ratio column shows BDWGC time divided by jemalloc time.Wall-clock time (s) | |||
---|---|---|---|
jemalloc | BDWGC | ||
Rc<T> |
Gc<T> |
Rc<T> |
|
Alacritty ▶ | |||
fd ▶ | |||
som-rs-ast ▶ | |||
grmtools ▶ | |||
Ripgrep ▶ | |||
som-rs-bc ▶ |
User time (s) | |||
---|---|---|---|
jemalloc | BDWGC | ||
Rc<T> |
Gc<T> |
Rc<T> |
|
Alacritty ▶ | |||
fd ▶ | |||
som-rs-ast ▶ | |||
grmtools ▶ | |||
Ripgrep ▶ | |||
som-rs-bc ▶ |
Heap Size (MiB) | Relative wall-clock time | Benchmarks failed | |
---|---|---|---|
Alacritty | 16 | 0.96 [0.91, 0.99] | |
32 | 0.98 [0.95, 1.02] | ||
64 | 0.94 [0.89, 0.98] | ||
Binary Trees | 4 | 0.88 [0.82, 1.02] | |
8 | 0.90 [0.80, 1.01] | ||
16 | 0.87 [0.82, 0.94] | ||
fd | 16 | 0.94 [0.90, 0.99] | |
32 | 0.94 [0.88, 0.98] | ||
64 | 0.94 [0.89, 1.00] | ||
grmtools | 1024 | 1.01 [1.00, 1.02] | 2/4 (Eclipse, Jenkins) |
2048 | 1.00 [1.00, 1.01] | ||
4096 | 1.01 [1.00, 1.02] | ||
Regex-Redux | 256 | 0.94 [0.92, 0.95] | |
512 | 0.93 [0.90, 0.94] | ||
1024 | 0.96 [0.92, 1.07] | ||
Ripgrep | 32 | 0.96 [0.95, 0.96] | |
64 | 0.95 [0.94, 0.95] | ||
128 | 0.94 [0.93, 0.95] | ||
som-rs-ast | 64 | 0.72 [0.71, 0.74] | 2/4 (Fannkuch, TreeSort) |
96 | 0.74 [0.73, 0.75] | 2/4 (Fannkuch, TreeSort) | |
128 | 0.75 [0.74, 0.76] | ||
som-rs-bc | 32 | 0.79 [0.78, 0.80] | |
64 | 0.79 [0.77, 0.80] | ||
128 | 0.84 [0.83, 0.86] |
Fin. elided (%) | |
---|---|
Alacritty ▶ | |
fd ▶ | |
som-rs-ast ▶ | |
grmtools ▶ | |
Ripgrep ▶ | |
som-rs-bc ▶ |
Table 10. Percentage of finalizers Alloy was able to elide for each benchmark.
Wall-clock time (s) | ||
---|---|---|
Before elision | After elision | |
Alacritty ▶ | ||
fd ▶ | ||
som-rs-ast ▶ | ||
grmtools ▶ | ||
Ripgrep ▶ | ||
som-rs-bc ▶ |
User time (s) | ||
---|---|---|
Before elision | After elision | |
Alacritty ▶ | ||
fd ▶ | ||
som-rs-ast ▶ | ||
grmtools ▶ | ||
Ripgrep ▶ | ||
som-rs-bc ▶ |
Avg. heap footprint (MiB) | ||
---|---|---|
Before elision | After elision | |
Alacritty ▶ | ||
fd ▶ | ||
som-rs-ast ▶ | ||
grmtools ▶ | ||
Ripgrep ▶ | ||
som-rs-bc ▶ |
size_t
and uintptr_t
types respectively. Rust now has a provenance lint to nudge users in this general direction, but the as
keyword still allows arbitrary conversions.
y = Gc::clone(&v)
is available, since every copyable type is also cloneable.
RawTable
is contained in the separate hashbrown
crate which is then included in Rust's standard library. We previously maintained a fork of this, but synchronising it is painful. For now, at least, we have hacked explicit knowledge of RawTable
into the needs_finalize
function.