Cite this paper as: Jacob Hughes, Laurence Tratt, Garbage Collection for Rust: The Finalizer Frontier, OOPSLA 2025, Pages 3588 - 3614, October 2025.

Also available as: DOI, PDF (arxiv).

See also: Experiment (DOI).

Garbage Collection for Rust: The Finalizer Frontier

Jacob Hughes and Laurence Tratt

October, 2025

Abstract Rust is a non-Garbage Collected (GCed) language, but the lack of GC makes expressing data-structures that require shared ownership awkward, inefficient, or both. In this paper we explore a new design for, and implementation of, GC in Rust, called Alloy. Unlike previous approaches to GC in Rust, Alloy allows existing Rust destructors to be automatically used as GC finalizers: this makes Alloy integrate better with existing Rust code than previous solutions but introduces surprising soundness and performance problems. Alloy provides novel solutions for the core problems: finalizer safety analysis rejects unsound destructors from automatically being reused as finalizers; finalizer elision optimises away unnecessary finalizers; and premature finalizer prevention ensures that finalizers are only run when it is provably safe to do so.

1 Introduction

Amongst the ways one can classify programming languages are whether they are Garbage Collected (GCed) or not: GCed languages enable implicit memory management; non-GCed languages require explicit memory management (e.g C's malloc / free functions). Rust's use of affine types [25, p. 5] and ownership does not fit within this classification: it is not GCed but it has implicit scope-based memory management. Most portions of Rust programs are as succinct as a GCed equivalent, but ownership is too inflexible to express shared ownership for data-structures that require multiple owners (e.g. doubly linked lists). Workarounds such as reference counting impose an extra burden on the programmer, make mistakes more likely, and often come with a performance penalty.

In an attempt to avoid such problems, there are now a number of GCs for Rust (e.g. [2, 11, 14, 32, 33]). Most introduce a user-visible type Gc<T> which takes a value v of type T and moves v to the 'GC heap'. The Gc<T> value itself is a wrapper around a pointer to v on the GC heap. Gc<T> can be cloned (i.e. duplicated) and dereferenced to a value of type &T (i.e. a type-safe pointer) at will by the user. When no Gc<T> wrappers pointing to v can be found, indirectly or directly, from the program's roots (e.g. variables on the stack), then the GC heap memory for v can be reclaimed.

It has proven hard to find a satisfying design and implementation for a GC for Rust, as perhaps suggested by the number of attempts to do so. We identify two fundamental challenges for GC for Rust: how to give Gc<T> an idiomatic and complete API; and how to make finalizers (i.e. the code that is run just before a value is collected by the GC) safe, performant, and ergonomic.

struct GcNode { value: u8, nbr: Option<Gc<RefCell<GcNode>>>}
impl Drop for GcNode { fn drop(&mut self) { println!("drop {}", self.value); } }

fn main() {
  let mut gc1 = Gc::new(RefCell::new(GcNode { value: 1, nbr: None } ));
  gc1.borrow_mut().nbr = Some(gc1);
  let gc2 = Gc::new(RefCell::new(GcNode { value: 2, nbr: None } ));
  gc2.borrow_mut().nbr = Some(gc2);
  gc1 = gc2;
  force_gc();
  println!("{} {}", gc1.borrow().value, gc1.borrow().nbr.unwrap().borrow().value);
}

Listing 1. An Alloy example, showing Gc<T> and destructors as finalizers. We create a type GcNode which models a graph: it stores an 8 bit integer value and a possibly-null reference (via Rust's standard Option type) to a neighbouring node (line 1). We add a normal Rust destructor which Alloy is able to use as a finalizer when GcNode is used inside Gc<T> (line 2). Inside main we create the first GCed node in the graph (line 5). We use Rust's normal RefCell type to allow the node to be mutated (using the RefCell::borrow_mut method which dynamically checks for mutation that would undermine Rust's static rules) and a cycle created directly back to itself (line 6). We then create a second cyclic graph (lines 7 and 8), immediately assigning it to the gc1 variable (line 9): this copies, rather than moves, the Gc<T>. This causes the first cyclic graph GcNode{value: 1, ..} to no longer be reachable, so after forcing a collection (line 10) that node can be collected. Its finalizer is then scheduled to be run, causing drop 1 to be printed out at a later point; when it has completed the GC heap memory can be reclaimed. The print statement outputs 2 2 (line 11).

In this paper we introduce Alloy, a new GC for Rust: an example of its use is shown in Listing 1. Alloy uses conservative garbage collection (i.e. treating each reachable machine word as a potential pointer), which naturally solves the API challenge. However, the finalization challenge is much more involved: the causes of this challenge, and our solutions to it, occupy the bulk of this paper.

Normal Rust code uses destructors (i.e. code which is run just before a value is reclaimed by Rust's implicit memory management) extensively. Although finalizers and destructors may seem to be synonyms, existing GCs for Rust cannot reuse destructors as finalizers: the latter must be manually implemented for each type that needs it. Unfortunately, even this is trickier than it appears: it is not possible to implement a finalizer for Gc<T> if T is an external library; some parts of destructors are automatically created by the Rust compiler, but hand-written finalizers must duplicate those parts manually; and users can accidentally cause a type's finalizer to be run more than once. In short, finalization in existing GCs for Rust is unpalatable.

GCs for Rust are not alone in requiring manually written finalizers. In a close cousin to our work, a GC proposal for C++, the reuse of destructors as finalizers was ruled out due to seemingly insoluble problems [8, p. 32], which we divide into four categories: (1) some safe destructors are not safe finalizers; (2) finalizers can be run prematurely; (3) running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks; (4) and finalizers are prohibitively slower than destructors. All are, at least to some degree, classical GC problems; all are exacerbated in some way by Rust; and none, with the partial exception of #2, has existing solutions.

We show that it is possible to reuse most Rust destructors as finalizers in a satisfying way. We introduce novel solutions to the long-standing problems this implies by making use of some of Rust's unusual static guarantees. We thus gain a better GC for Rust and solutions to open GC problems. Our solutions, in order, are: (1) finalizer safety analysis extends Rust's static analyses to reject programs whose destructors are not provably safe to be used as finalizers; (2) premature finalizer prevention automatically inserts fences to prevent the GC from being 'tricked' into collecting values before they are dead; (3) we run finalizers on a separate thread; and (4) and finalizer elision statically optimises away finalizers if the underlying destructor duplicates the GC's work.

Alloy as an implementation is necessarily tied to Rust, though most of the novel techniques in this paper rely on general properties of affine types and ownership. While we do not wish to claim generality without evidence, it seems likely that many of the techniques in this paper will generalise to other ownership-based languages, as and when such emerge.

Although Alloy is not production ready, its performance is already reasonable: when we control for the (admittedly somewhat slow) conservative GC (BDWGC) Alloy currently uses, the performance of Alloy varies from 0.74× to, in the worst case, 1.17× that of reference counting. Alloy is also sufficiently polished (e.g. good quality error messages) in other ways for it to: show a plausible path forwards for those who may wish to follow it; and to allow others to evaluate whether GC for Rust is a good idea or not.

This paper is divided into four main parts: GC and Rust background (Section 2); Alloy's basic design (Section 3); destructor and finalizer challenges and solutions (Sections 4 to 7); and evaluation (Section 8). The first three parts have the challenge that our work straddles two areas that can seem mutually exclusive: GC and Rust. We have tried to provide sufficient material for readers expert in one of these areas to gain adequate familiarity with the other, without boring either, but we encourage readers to skip material they are already comfortable with.

2 Background

2.1 The Challenges of Shared Ownership in Rust

Rust uses affine types and ownership to statically guarantee that: a value has a single owner (e.g. a variable); an owner can move (i.e. permanently transfer the ownership of) a value to another owner; and when a value's owner goes out of scope, the value's destructor is run and its backing memory reclaimed. An owner can pass references to a value to other code, subject to the following static restrictions: there can be multiple immutable references ('&') to a value or a single mutable reference ('&mut'); and references cannot outlast the owner. These rules allow many Rust programs to be as succinct as their equivalents in GCed languages. This suggests that the search for a good GC for Rust may be intellectually stimulating but of little practical value.

However, there are many programs which need to express data structures which do not fit into the restrictions of affine types and ownership. These are often described as 'cyclic data-structures', but in this paper we use the more abstract term 'shared ownership', which includes, but is not limited to, cyclic data-structures.

A common way of expressing shared ownership is to use the reference counting type Rc<T> from Rust's standard library. For many data-structures, this is a reasonable solution, but some forms of shared ownership require juggling strong and weak counts. This complicates programs (see Listing 2) and can cause problems when values live for shorter or longer than intended.

struct RcNode { value: u8, nbr: Option<Weak<RefCell<RcNode>>> }
impl Drop for RcNode { fn drop(&mut self) { println!("drop {}", self.value); } }

fn main() {
  let mut rc1 = Rc::new(RefCell::new(RcNode{value: 1, nbr: None}));
  rc1.borrow_mut().nbr = Some(Rc::downgrade(&rc1));
  let rc2 = Rc::new(RefCell::new(RcNode{value: 2, nbr: None}));
  rc2.borrow_mut().nbr = Some(Rc::downgrade(&rc2));
  rc1 = Rc::clone(&rc2);
  println!("{} {}", rc1.borrow().value,
                   rc1.borrow().nbr.as_ref().unwrap().upgrade().unwrap().borrow().value);
}

Listing 2. A version of Listing 1 using Rust's standard reference counting type Rc<T>. To avoid memory leaks we use weak references between nodes (line 1). We again create two cyclic graphs (lines 5–8) using Rc::downgrade to create weak references (lines 6 and 8). Since Rc<T> is not copyable, we must use a manual clone call to have both the rc1 and rc2 variables point to the same cyclic graph (line 9). Accessing a neighbour node becomes a delicate dance requiring upgrading the weak reference (line 11). The need to downgrade Rc<T> to Weak<T> and upgrade (which may fail, hence the unwrap) back to Rc<T> creates significant extra complexity relative to Listing 1: compare line 11 in Listing 1 to lines 10-11 above.

A different solution is to store values in a vector and use integer indices into that vector. Such indices are morally closer to machine pointers than normal Rust references: the indices can become stale, dangle, or may never have been valid in the first place. The programmer must also manually deal with issues such as detecting unused values, compaction, and so on. In other words, the programmer ends up writing a partial GC themselves. A variant of this idea are arenas, which gradually accumulate multiple values but free all of them in one go: values can no longer be reclaimed too early, though arenas tend to unnecessarily increase the lifetime of values.

A type-based approach is GhostCell [35], which uses branding to statically ensure that at any given point only one part of a program can access a shared ownership data-structure. This necessarily excludes common use cases where multiple owners (e.g. in different threads) need to simultaneously access disjoint parts of a data-structure.

Although it is easily overlooked, some workarounds (e.g. Rc<T>) rely on using unsafe Rust (i.e. parts of the language, often involving pointers, that are not fully statically checked by the compiler). Pragmatically, we assume that widely used code, even if technically unsafe, has been pored over sufficiently that it is trustworthy in practise. However, 'new' solutions that a programmer implements using unsafe Rust are unlikely to immediately reach the same level of trustworthiness.

While we do not believe that every Rust program would be improved by GC, the variety of workarounds already present in Rust code, and the difficultly of creating new ones, suggests that there is a subset that would benefit from GC.

2.2 GC Terminology

GC is a venerable field and has accumulated terminology that can seem unfamiliar or unintuitive. We mostly use the same terminology as Jones et al [19], the major parts of which we define here.

A program which uses GC is split between the mutator (the user's program) and the collector (the GC itself). At any given point in time, a thread is either running as a mutator or a collector. In our context, all threads run as a collector at least sometimes (for reasons that will become apparent later, some threads always run as a collector). Tracing and reclamation is performed during a collection phase. Our collections always stop-the-world, where all threads running mutator code are paused while collection occurs.

A tracing GC is one that scans memory looking for reachable values from a program's roots: values, including cycles of values, that are not reachable from the roots can then be reclaimed. In contrast, a pure reference counting GC does not scan memory, and thus cannot free values that form a cycle. Increasingly, GC implementations make use of multiple techniques (see [3]) but, for simplicity's sake, we assume that implementations wholly use one technique or another except otherwise stated. For brevity, we use 'GC' as a short-hand for 'tracing GC'; when we deal with other kinds of GC (e.g. reference counting), we explicitly name them.

We refer to memory which is allocated via Gc<T> as being on the GC heap. We use the term 'GC value' to refer both to the pointer wrapped in a Gc<T> and the underlying value on the GC heap, even though multiple pointers / wrappers can refer to a single value on the GC heap, unless doing so would lead to ambiguity.

We use 'Alloy' to refer to the combination of: our extension to the Rust language; our modifications to the rustc compiler; and our integration of the Boehm-Demers-Weiser GC (BDWGC) into the runtime of programs compiled with our modified rustc.

3 Alloy: Design and Implementation

In this section we outline Alloy's basic design and implementation choices – the rest of the paper then goes into detail on the more advanced aspects.

3.1 Basic Design

Alloy provides a Gc<T> type that exposes an API modelled on the Rc<T> type from Rust's standard library, because Rc<T>: is conceptually similar to Gc<T>; widely used in Rust code, and its API familiar; and that API reflects long-term experience about what Rust programmers need.

When a user calls Gc::new(v), the value v is moved to the GC heap: the Gc<T> value returned to the user is a simple wrapper around a pointer to v's new address. The same underlying GCed value may thus have multiple, partly or wholly overlapping, references active at any point. To avoid undermining Rust's ownership system, dereferencing a Gc<T> produces an immutable (i.e. '&') reference to the underlying value. If the user wishes to mutate the underlying value, they must use other Rust types that enable interior mutability (e.g. RefCell<T> or Mutex<T>).

One feature that Alloy explicitly supports is the ability in Rust to cast references to raw pointers and back again. This can occur in two main ways. Gc<T> can be dereferenced to &T which can then, as with any other reference, be converted to *const T (i.e. a C-esque pointer to T). Gc<T> also supports the common Rust functions (into_raw and from_raw) which wrap the value-to-pointer conversion in a slightly higher-level API. The ability to convert references to raw pointers is used in many places (e.g. Rust's standard C Foreign Function Interface (FFI)). We believe that a viable GC for Rust must allow the same conversions, but doing so has a profound impact because Rust allows raw pointers to be converted to the integer type usize and back¹.

Having acknowledged that pointers can be 'disguised' as integers, it is then inevitable that Alloy must be a conservative GC: if a machine word's integer value, when considered as a pointer, falls within a GCed block of memory, then that block itself is considered reachable (and is transitively scanned). Since a conservative GC cannot know if a word is really a pointer, or is a random sequence of bits that happens to be the same as a valid pointer, this over-approximates the live set (i.e. the blocks that the GC will not reclaim). Typically the false detection rate is very low (see e.g. a Java study which measures it at under 0.01% of the live set [28]).

Conservative GC occupies a grey zone in programming language semantics: in most languages, and most compiler's internal semantics, conservative GC is, formally speaking, unsound; and furthermore some languages (including Rust) allow arbitrary 'bit fiddling' on pointers, temporarily obscuring the address they are referring to. Despite this, conservative GC is widely used, including in the two most widespread web browsers: Chrome uses it in its Blink rendering engine [1] and Safari uses it in its JavaScript VM JavaScriptCore [26]. Even in 2025, we lack good alternatives to conservative GC: there is no cross-platform API for precise GC; and while some compilers such as LLVM provide some support for GC features [23], we have found them incomplete and buggy. Despite the potential soundness worries, conservative GC thus remains a widely used technique.

Conservative GC enables Alloy to make a useful ergonomic improvement over most other GCs for Rust whose Gc<T> is only cloneable. Such types can be duplicated, but doing so requires executing arbitrary user code. To make the possible run-time cost of this clear, Rust has no direct syntax for cloning: users must explicitly call Rc::clone(&v) to duplicate a value v. In contrast, since Alloy's Gc<T> is just a wrapper around a pointer it is not just cloneable but also copyable: duplication only requires copying bytes (i.e. no arbitrary user code need be executed). Copying is implied by assignment (i.e. w = v), reducing the need for explicit cloning². This is not just a syntactic convenience but also reflects an underlying semantic difference: duplicating a Gc<T> in Alloy is is a cheaper and simpler operation than most other GCs for Rust which which tend to rely, at least in part, on reference counting.

There is one notable limitation of Gc<T>'s API relative to Rc<T>. The latter, by definition, knows how many references there are to the underlying data, allowing the value stored inside it to be mutably borrowed at run-time if there is only a single reference to it (via get_mut and make_mut). In contrast, Gc<T> cannot know how many references there are to the underlying data. As we shall see in Section 8, some Rust programs are built around the performance advantages of this API (e.g. turning 'copy on write' into just 'write' in some important cases).

3.2 Basic Implementation

The most visible aspect of Alloy is its fork, and extension of, the standard Rust compiler rustc. We forked rustc 1.79.0, adding or changing approximately 5,500 Lines of Code (LoC) in the core compiler, and adding approximately 2,250 LoC of tests.

Alloy uses BDWGC [9] as the underlying conservative GC, because it is the most widely ported conservative GC we know of. We use BDWGC's GC_set_finalize_on_demand(1) API, which causes finalizers to be run on their own thread.

We had to make some minor changes to BDWGC to suit our situation. First, we disabled BDWGC's parallel collector because it worsens Alloy's performance. It is unclear to us why this happens: we observe significant lock contention within BDWGC during GC collections, but have not correlated this with a cause. Second, BDWGC cannot scan pointers stored in thread locals because these are platform dependent. Fortunately, rustc uses LLVM's thread local storage implementation, which stores such pointers in the PT_TLS segment of the ELF binary: we modified BDWGC to scan this ELF segment during each collection. Third, BDWGC dynamically intercepts thread creation calls so that it can can scan their stacks, but (for bootstrapping reasons) is unable to do so in our context: we explicitly changed Alloy to register new threads with BDWGC.

4 Destructors and Finalizers

In many GCed languages, 'destructor' and 'finalizer' are used as synonyms, as both terms refer to code run when a value's lifetime has ended. In existing GCs for Rust, these two terms refer to completely different hierarchies of code (i.e. destructors and finalizers are fundamentally different). In Alloy, in contrast, a reasonable first approximation is that finalizers are a strict subset of destructors. In this section we pick apart these differences, before describing the challenges of using destructors as finalizers.

When a value in Rust is dropped (i.e. at the point its owner goes out of lexical scope) its destructor is automatically run. Rust's destructors enable a style of programming that originated in C++ called RAII (Resource Acquisition Is Initialization) [30, Section 14.4]: when a value is dropped, the underlying resources it possesses (e.g. file handles or heap memory) are released. Destructors are used frequently in Rust code (to give a rough idea: approximately 15% of source-level types in our benchmark suite have destructors).

Rust destructors are formed of two parts, run in the following order: a user-defined drop method; and automatically inserted drop glue. Drop methods are optional and users can provide one for a type by implementing the Drop trait's drop method. Drop glue recursively calls destructors of contained types (e.g. fields in a struct). Although it is common usage to conflate 'destructor' in Rust with drop methods, drop glue is an integral part of a Rust destructor: we therefore use 'destructor' as the umbrella term for both drop methods and drop glue.

When considering finalizers for a GC for Rust, there are several layers of design choices. We will shortly see that finalizers cause a number of challenges (Section 4.1) and one choice would be to forbid finalizers entirely. However, this would mean that one could not sensibly embed types that have destructors in a Gc<T>. While Rust does not always call destructors, the situations where this occurs are best considered 'exceptional': not calling destructors from Gc<T> would completely undermine reasonable programmer expectations. Because of this, Alloy, and indeed virtually all GCs for Rust, support finalizers in some form.

However, existing GCs force distinct notions of destructors and finalizers onto the programmer. Where the former have the Drop trait, the latter typically have a Finalize trait. If a user type needs to be finalized then the user must provide an implementation of the Finalize trait. However, doing so introduces a number of problems: (1) external libraries are unlikely to provide finalizers, so they must be manually implemented by each consumer; (2) Rust's orphan rule [27, Section 6.12] prevents one implementing traits for types defined in external libraries (i.e. unless a library's types were designed to support Gc<T>, those types cannot be directly GCed); (3) one cannot automatically replicate drop glue for finalizers; and (4) one cannot replicate rustc's refusal to allow calls to the equivalent of Drop::drop.

Programmers can work around problems #1 and #2 in various ways. For example, they can wrap external library types in newtypes (zero-cost wrappers) and implement finalizers on those instead [20, Section 19.3]. Doing so is tedious but not conceptually difficult.

Problem #3 has partial solutions: for example, [14] uses the Trace macro to generate finalizer glue (the finalizer equivalent of drop glue) for struct fields. This runs into an unsolvable variant of problem #2: types in external libraries will not implement this trait and cannot be recursively scanned for finalizer glue.

Problem #4 is impossible to solve in Rust as-is. One cannot define a function that can never be called — what use would such a function have? A possible partial solution might seem to be for the finalize method take ownership of the value, but Drop::drop does not do so because that would not allow drop glue to be run afterwards.

4.1 General Challenges When Using Destructors as Finalizers

We have stated as our aim that Alloy should use destructors as finalizers. Above we explained some Rust-specific challenges — but there are several non-Rust-specific challenges too! Fundamentally, finalizers and destructors have different, and sometimes incompatible, properties. The best guide to these differences, and the resulting problems, is Boehm [6], supplemented by later work on support for GC in the C++ specification [8]³.

An obvious difference between destructors and finalizers is when both are run. While C++ and Rust define precisely when a destructor will be run⁴, finalizers run at an unspecified point in time. This typically happens at some point after the equivalent destructor would run, though a program may exit before any given finalizer is run⁵. There are, however, two situations which invert this. First, if a thread exits due to an error, and the program is either not compiled with unwinding, or the thread has crossed a non-unwinding ABI boundary, then destructors might not be run at all, where a GC will naturally run the equivalent finalizers: we do not dwell on this, as both behaviours are reasonable in their different contexts. Second, and more surprisingly, it is possible for finalizers in non-error situations to run prematurely, that is before the equivalent destructor [6, section 3.4].

A less obvious difference relates to where destructors and finalizers are run. Destructors run in the same thread as the last owner of a value. However, running finalizers in the same thread as the last owner of the value can lead to race conditions [24] and deadlocks [6, section 3.3] if a finalizer tries to access a resource that the mutator expects to have exclusive access too. When such problems affect destructors in normal Rust code, it is the clear result of programmer error, since they should have taken into account the predictable execution point of destructors. However, since finalizers do not have a predictable execution point, there is no way to safely access shared resources if they are run on the same thread. The only way to avoid this is to run finalizers on a non-mutator thread — but not all Rust types / destructors are safe to run on another thread.

There are several additional differences such as: finalizers can reference other GCed values that are partly, or wholly, 'finalized' and may have had their backing memory reused; and finalizers can resurrect values by copying the reference passed to the finalizer and storing it somewhere.

Over time, finalizers have thus come to be viewed with increasing suspicion. Java, for example, has deprecated, and intends eventually removing, per-type finalizers: instead it has introduced deliberately less flexible per-object 'cleaners', whose API prevents problems such as object resurrection and per-class finalization [13].

4.2 The Challenge of Finalizers for Alloy

At this point we hope to have convinced the reader that: a viable GC for Rust needs to be able to use existing destructors as finalizers whenever possible; but that finalizers, even in existing GCs, cause various problems.

It is our belief that some problems with finalizers are fundamental. For example, finalizers inevitably introduce latency between the last use of a value and its finalization.

Some problems with finalizers are best considered the accidental artefacts of older designs. Java's cleaners, for example, can be thought of as a more restrictive version of finalizers that allow most common use-cases but forbid by design many dangerous use cases. For example, per-class/struct finalization can easily be replaced by per-object/value finalization; and object resurrection can be prevented if object access requires a level of indirection. Alloy benefits from our better shared understanding of such problems and the potential solutions: it trivially addresses per-object/value finalization (Gc::new_unfinalizable function turns finalization off for specific values) and, as we shall see later, via only slightly more involved means, object resurrection.

However, that leaves many problems that are potentially in the middle: they are not obviously fundamental, but there are not obvious fixes for them either. In our context, where we wish to use destructors as finalizers, four problems have hitherto been thought insoluble [8, p. 32]: (1) finalizers are prohibitively slower than destructors; (2) finalizers can run prematurely; (3) running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks; (4) some safe destructors are not safe finalizers.

Fortunately for us, Rust's unusual static guarantees, suitably expanded by Alloy, allow us to address each problem in novel, satisfying, ways. In the following section, we tackle these problems in the order above, noting that we tackle problems #1 and #2 separately, and #3 and #4 together.

5 Finalizer Elision

As we shall see in Section 8, there is a correlation between the number of finalizers that are run and overhead from GC (with a worst case, albeit a definite outlier, in our experiment of 3.35× slowdown). In this section we show how to reduce the number of finalizers that are run, which helps reduce this overhead.

A variety of factors contribute to the finalizer performance overhead, including: a queue of finalizers must be maintained, whereas destructors can be run immediately; finalizers run some time after the last access of a value, making cache misses more likely; and finalizers can cause values (including values they own) to live for longer (e.g. leading to increased memory usage and marking overhead). Most of these factors are inherent to any GC and our experience of using and working on BDWGC– a mature, widely used GC – does not suggest that it is missing optimisations which would overcome all of this overhead.

Instead, whenever possible, Alloy elides finalizers so that they do not need to be run at all. We are able to do this because: (1) BDWGC is responsible for all allocations and will, if necessary GC allocations even if they are not directly wrapped in a Gc<T>; and (2) many Rust destructors only free memory which BDWGC would, albeit with some latency, do anyway.

Consider the standard Rust type Box<T> which heap allocates space for a value; when a Box<T> value is dropped, the heap allocation will be freed. We can then make two observations. First, Box<T>'s drop method solely consists of a deallocate call. Second, while we informally say that Box<T> allocates on the 'heap' and Gc<T> allocates on the 'GC heap', all allocations in Alloy are made through BDWGC and stored in the same heap.

When used as a finalizer, Box<T>'s drop method is thus unneeded, as the underlying memory will be freed by BDWGC anyway. This means that there is no need to run a finalizer for a type such as Gc<Box<u8>> at all, and the finalizer can be statically elided. However, we cannot elide a finalizer for a type such as Gc<Box<Rc<u8>>> because Rc<T>'s drop method must be run for the reference count to be decremented. As this shows, we must consider the complete destructor, and not just the top-level drop method, when deciding whether a corresponding finalizer can be elided.

function NeedsFinalizer(T):
    if Impls(T, Drop) and not Impls(T, DropMethodFinalizerElidable) then
        return true;
    for each field ∈ T do
        if NeedsFinalizer(field) then
            return true;
    return false;

Algorithm 1. Finalizer Elision

impl<T> Gc<T> {
  pub fn new(value: T) -> Self {
    if needs_finalizer::<T>() { Gc<T>::new_with_finalizer(value) }
    else { Gc<T>::new_without_finalizer(value) }
    ...
  }
}

Listing 3. A simplified view of how finalizers are elided inside Alloy. The new compiler intrinsic needs_finalizer returns true if a finalizer is required for a type. The Gc<T> type uses this intrinsic to ensure that the value is registered as requiring a finalizer. With optimisations turned on, this seemingly dynamic, branching code will be turned into static, branchless code.

5.1 Implementing Finalizer Elision

Finalizer elision statically determines which type's destructors do not require corresponding finalizers and elides them. It does so conservatively, and deals correctly with drop glue.

As shown in Algorithm 1, any type which implements the Drop trait requires finalization unless it also implements the new DropMethodFinalizerElidable marker trait (i.e. a trait without methods). This trait can be used by a programmer to signify that a type's drop method need not be called if the type is placed inside a Gc<T>. The 'Drop' part of the trait name is deliberate (i.e. it is not a simplification of 'destructor') as it allows the programmer to reason about a type locally (i.e. without considering drop glue or concrete type paramaters). If the type has a transitively reachable field whose type implements the Drop trait but not the DropMethodFinalizerElidable trait, then then the top-level type still requires finalization.

Even though neither normal Rust destructors or Alloy finalizers are guaranteed to run, a program whose destructors or finalizers never run would probably not be usable (leaking resources such as memory, deadlocking, and so on). We therefore make DropMethodFinalizerElidable an unsafe trait, because implementing it inappropriately is likely to lead to undesired – though not incorrect! – behaviour at run-time.

Alloy modifies the standard Rust library to implement DropMethodFinalizerElidable on the following types: Box<T>, Vec<T>, RawVec<T>, VecDeque<T>, LinkedList<T>, BTreeMap<K, V>, BTreeSet<T>, HashMap<K, V>, HashSet<T>, RawTable<K, V>⁶, and BinaryHeap<T>. Fortunately, not only are these types' drop methods compatible with DropMethodFinalizerElidable, but they are extensively used in real Rust code: they enable significant numbers of finalizers to be elided.

Listing 3 shows the new const compiler intrinsic needs_finalizer we added to implement Algorithm 1. The intrinsic is evaluated at compile-time: its result can be inlined into Gc::new, allowing the associated conditional to be removed too. In other words – compiler optimisations allowing – the 'does this specific type require a finalizer?' check has no run-time overhead.

6 Premature Finalizer Prevention

Most of us assume that finalizers are always run later than the equivalent destructor would have run, but they can sometimes run before [6, section 3.4], undermining soundness. Such premature finalization is also possible in Alloy as described thus far (see Listing 4). In this section we show how to prevent premature finalization.

There are two aspects to premature finalization. First, language specifications often do not define, or do not precisely define, when the earliest point that a value can be finalized is. While this means that, formally, there is no 'premature' finalization, it seems unlikely that language designers anticipated some of the resulting implementation surprises (see e.g. this example in Java [29]). Second, compiler optimisations – at least in LLVM – are 'GC unaware', so optimisations such as scalar replacement can change the point in a program when GCed values appear to be finalizable.

struct S { value: u8 }
impl Drop for S { fn drop(&mut self) { self.value = 0; } }

fn main() {
  let root = Gc::new(Box::new(S{ value: 1 }));
  let inner: &u8 = &**root.value;
  force_gc();
  println!("{}", *inner);
}

Listing 4. An example of possible premature finalization. We create a new struct S (line 1) with a drop method that sets the wrapped integer to zero (line 2). In the main method, we move an instance of the struct into a Box<T>, which we then move into a Gc<T> (line 4). We obtain a Rust reference to the inner integer (line 5), which at run-time will be a pointer to the Box<T>. At this point, the compiler can determine that the Gc<T> is no longer used and overwrite root's pointer (which may be in a register). If a collection then occurs, a finalizer can run S's drop method, causing the program to print '0' instead of the expected '1' (line 7).

In our context, it is natural to define premature finalization as a (dynamic) finalizer for a Gc<T> value running before the (static) Gc<T> owner has gone out of scope. Similar to the high-level proposal mooted in [7, Solution 1], we must ensure that the dynamic lifetime of a reference derived from a Gc<T> matches or exceeds the lifetime of the Gc<T> itself.

Our solution relies on adjusting Gc<T>'s drop method to keep alive a GCed value for at least the static lifetime of the Gc<T> itself. In other words, we ensure that the conservative GC will always see a pointer to a GCed value while the corresponding Gc<T> is in-scope.

However, there is a major problem to overcome: copyable types such as Gc<T> are forbidden from having destructors. The fundamental challenge we have to solve is that each copied value will have a destructor called on it, which has the potential for any shared underlying value to be destructed more than once. Alloy explicitly allows Gc<T> – but no other copyable type – to have a destructor, but to ensure it doesn't cause surprises in the face of arbitrary numbers of copies, the destructor must be idempotent. Our task is made easier because Gc<T> naturally has no drop glue from Rust's perspective: Gc<T> consists of a field with a pointer type, and such types are opaque from a destruction perspective.

We therefore only need to make sure that Gc<T>'s drop method is idempotent. Fortunately, this is sufficient for our purposes: we want the drop method to inhibit finalization but that does not require run-time side effects. To achieve this, we use a fence. These come in various flavours. What we need is a fence that prevents both: the compiler from reordering computations around a particular syntactic point; and the CPU from reordering computations around a particular address. We copy the platform specific code from the BDWGC GC_reachable_here macro⁷ into Gc<T>'s drop method, which achieves the effect we require.

6.1 Optimising Premature Finalizer Prevention

The drop method we add to Gc<T> fully prevents premature finalization. It also naturally solves a performance problem with the suggested solution for C++ in [7, Solution 1], which requires keeping alive all pointers, no matter their type, for their full scope. By definition, our solution only keeps alive Gc<T> values: the compiler is free to optimise values of other types as it so wishes. However, on an artificial microbenchmark we observed a noticeable overhead from our fence insertion.

We thus implemented a simple optimisation: we only insert fences for a Gc<T> if it has a finalizer. Intuitively, it seems that we should not generate drop methods in such cases, but this is difficult to do directly inside rustc. Instead, we suppress calls to the drop methods of such types: the two approaches are functionally equivalent, though ours does put an extra burden on dead-code elimination in the compiler tool-chain.

Alloy adds a new pass RemoveElidableDrops to rustc's Mid-Level Intermediate Representation (MIR) processing. MIR is best thought of as the main IR inside rustc: it contains the complete set of functions in the program, where each function consists of a sequence of basic blocks. Simplifying somewhat, function and drop method calls are represented as different kinds of terminators on basic blocks. Terminators reference both a callee and a successor basic block.

The remove_elidable_drops pass iterates over a program's MIR, identifies drop method terminators which reference elidable finalizers, and turns them into 'goto' terminators to the successor basic basic block. Algorithm 4 in the Appendix presents a more formal version of this algorithm.

7 Finalizer Safety Analysis

In this section we address two high-level problems: running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks; and some safe destructors are not safe finalizers. Addressing the former problem is conceptually simple – finalizers must be run on a separate thread – but we must ensure that doing so is sound. We therefore consider this a specific instance of the latter problem, treating both equally in this section.

We therefore introduce Finalizer Safety Analysis (FSA), which prevents unsafe (in the sense of 'not safe Rust') destructors being used as finalizers. As a first approximation, FSA guarantees that finalizers are memory safe, cycle safe (i.e. do not access already finalized objects), and thread safe. We present the three main components of FSA individually before bringing them together.

7.1 Rust References

Gc<T> can store, directly or indirectly, normal Rust references (i.e. & and &mut), at which point it is subject to Rust's normal borrow checker rules and cannot outlive the reference. However, finalizers implicitly extend the lifetime of a GCed value, including any stored references: accessing a reference in a finalizer could undermine Rust's borrow checking rules.

A simple way of avoiding this problem is to forbid Gc<T> from storing, directly or indirectly, references. This might seem to be no great loss: storing references in a Gc<T> largely nullifies the 'GCness' of Gc<T>. However, we found the result hard to use, as it can make simple tasks such as gradually migrating existing code to use Gc<T> painful.

A moderate, but in our experience insufficient, relaxation is to recognise that only types that need a finalizer can possibly have problems with references, and to forbid such types from storing references in Gc<T>. For example, if there is no drop method for struct S{x: &u8} then its destructor is safe to use as a finalizer, since its non-drop aspects will not use the &u8 reference.

The eventual rule we alighted upon for FSA is that a destructor for a type T can be used as a finalizer provided the destructor's drop methods do not obtain references derived from T's fields (including fields reachable from its attributes). Using Rust's terminology, we forbid projections (which include a struct's fields, indexes into a vector, and so on) in destructors from generating references. Any non-projection references that are used in a destructor are by definition safe to use, as they either exist only for the duration of the drop method (references to variables on the stack) or will exist for the remainder of the program (references to global variables).

This rule over-approximates the safe set of destructors. For example, a drop method that creates a new value and tries to obtain a reference to a field in it (i.e. a projection) cannot be a destructor under FSA, even though the reference cannot outlast the drop method. We found that attempting to relax our rule further to deal with such cases rapidly complicates exposition and implementation.

struct GcNode { value: u8, nbr: Option>> }
impl Drop for GcNode {
  fn drop(&mut self) { self.value = 0; println!("{}", self.nbr.unwrap().borrow().value); }
}

fn main() {
  let mut gc1 = Gc::new(RefCell::new(GcNode{value: 1, nbr: None}));
  let gc2 = Gc::new(RefCell::new(GcNode{value: 2, nbr: None}));
  gc1.borrow_mut().nbr = Some(gc2);
  gc2.borrow_mut().nbr = Some(gc1);
}

Listing 5. An example of the problems that come from mixing cycles and finalization. The salient difference from Listing 1 is that the drop method prints the value of a field inside nbr (line 3). Running this program on a strong memory model machine is likely to print either 2 0 or 1 0 depending on whether gc1 or gc2 is finalized first. The 'seemingly expected' output of 1 2 or 2 1 would never be printed: whichever GCed value is finalized first changes what the other GCed value sees in its finalizer. As that implies, this example is unsound: whichever finalizer runs second leads to undefined behaviour.

7.2 Cycles and Finalization

One of the main motivations for GCs is that they solve problems with cyclic data structures. However, finalizers can be unsound if they access state shared within members of a cycle. Listing 5 shows an example of undefined behaviour when two GCed values create a cycle and both their finalizers reference the other GCed value. Whichever order the finalizers are run in, at least one of the finalizers will see the other GCed value as partly or wholly 'finalized'.

Most languages and systems we are aware of assume that users either don't run into this problem (finalization cycles are considered rare in GCed languages [19, p. 229]) or know how to deal with it when they do (e.g. refactoring the types into parts that do and don't require finalization [6, p. 11]). There is no fully automatic solution to this problem. Some GCs offer weak references, which allow users to detect when finalization cycles have been broken, though they still have to deal with the consequences manually.

We wanted to provide users with static guarantees that their destructors will not behave unexpectedly when used as finalizers in a cycle. A first attempt at enforcing such a property might seem to be that a Gc<T> cannot have, directly or indirectly, fields of type Gc<T>. This would indeed prevent the mistakes we want to catch but also disallow shared ownership! We therefore check only that a type's destructor does not, directly or indirectly, access a Gc<T>. This allows GCed types to express shared ownership so long as their destructor(s) do not access other GC types.

To make this check easier to implement, we introduce an auto trait [27, Section. 11], a kind of marker trait that the compiler propagates automatically. An auto trait A will be automatically implemented for a type T unless one of the following is true: there is an explicit negative implementation of A for T; or T contains a field that is not itself A. Informally, we say that a negative implementation of an auto-trait pollutes containing types.

Our new auto trait is called FinalizerSafe, and we provide a single negative implementation impl<T> !FinalizerSafe for Gc<T>. This naturally handles transitively reachable code, allowing FSA itself to only check that a destructor's direct field accesses are FinalizerSafe.

7.3 Destructors Need to be Runnable on a Finalizer Thread

impl Drop for GcNode {
  fn drop(&mut self) { println!("drop {}", self.value.lock().unwrap()); }
}

fn main() {
  let counter = Rc::new(Mutex::new(0));
  { let _ = Gc::new(GcNode { value: Rc::clone(&counter), nbr: None }); }
  let r1 = counter.lock().unwrap();
  force_gc();
  assert_eq!(*r1, 0);
}

Listing 6. How destructors can cause deadlocks when used as finalizers. The mutator creates a reference-counted mutex (line 6), placing a copy in a GcNode that immediately goes out of scope (line 7). The mutator then acquires the lock (line 8) but before it can release the lock a GC cycle occurs and the GcNode's finalizer run (line 9). If the finalizer is run on the same thread as the mutator, then it will fail to grab the lock (line 2) and cause a deadlock.

Running finalizers on the same thread as a mutator can cause problems when the finalizer accesses state shared with the mutator (see Section 4.1 for a general description and Listing 6 for a concrete example). The most general solution to this problem is to run finalizers on a separate finalizer thread that never runs mutator code.

We must therefore ensure that it is safe to run a type's destructor on the finalizer thread. A conservative definition is that Gc<T> is safe to use if T implements both of Rust's existing Send (denoting a type that can be permanently moved from one thread to another) and Sync (denoting a type that can be safely accessed simultaneously by multiple threads) auto traits. However, requiring that finalization be restricted to types that implement both Send and Sync can be frustrating, particularly because more types implement Send than Sync.

It may seem sufficient for T to implement Send alone so that the value can be safely sent to the finalizer thread. However, this would not prevent a finalizer indirectly accessing state shared with a non-GCed value via a mechanism such as Arc, causing the very problems we are trying to avoid.

FSA thus ignores whether a type implements Send or Sync (or not) and instead examines the destructor directly. To pass FSA: the destructor must not access thread locals; and any types the destructor accesses via projections must implement both Send and Sync. Intuitively, this allows a non-Send-or-Sync type T to have a safe finalizer provided that T's destructor only access the Send and Sync 'subset' of T.

This rule shows clearly that FSA is a form of abstract interpretation rather than a mere extension of the type system⁸. After careful examination we believe this is compatible with Rust's semantics (and rustc and LLVM's implementations) at the time of writing, but it is worth knowing that this rule would be unsafe in other languages and implementations (for example our assumption would be unsafe in Java due to synchronisation removal [31]). We leave it as an open question to others as to whether Rust should deliberately permit or forbid such checks in its semantics.

The implementation of the finalization thread is fairly simple. For example, we do not need to explicitly synchronise memory between the mutator and finalization threads because BDWGC's stop-the-world collection phase already synchronises all memory between threads.

7.4 Putting it All Together

function FinalizerSafetyAnalysis(func):
    for each basic_block ∈ func do
        t ← basic_block.terminator;
        if not IsGcConstructorCall(t) then
            continue;
        ty ← GetTyOfGcValue(t);
        if isFinalizerUnchecked(ty) or not NeedsFinalizer(ty) then
            continue;
        for each drop_method ∈ GetDropGlue(ty) do
            if not IsMIRAvailable(drop_method) then
                EmitFinalizerUnsafeError();
            CheckFunctionSafety(drop_method);

function CheckFunctionSafety(drop):
    for each basic_block ∈ drop do
        for each statement ∈ basic_block do
            for each projection ∈ statement do
                if not IsFinalizerSafe(projection.element) then
                    EmitFinalizerUnsafeError();
        if IsFunctionCall(basic_block.terminator) then
            CheckFunctionSafety(basic_block.terminator);

function IsFinalizerSafe(ty):
    return Impls(ty, Send) and Impls(ty, Sync) and Impls(ty, FinalizerSafe);

Algorithm 2. Finalizer Safety Analysis

FSA integrates the seemingly separate components presented above into one. It iterates over every function in a Rust program analysing destructors of types that are used in Gc<T>. Algorithm 2 shows the essence of FSA (for example eliding details of caching which Alloy uses to speed up compile times).

Because FSA is a form of abstract interpretation, we need to determine when to run FSA on a program. In essence, whenever a previously unchecked type T is used to create a new Gc<T>, FSA is run. As well as the Gc::new constructor, Gc<T> instances can be created with conversion traits such as From. We annotated each such entry point with a new rustc-only attribute rustc_fsa_entry_point: calls to functions with this attribute lead to FSA checks.

A naive implementation of FSA would be a notable cost, so Alloy uses several optimisations. As alluded to above, FSA caches the results of various checks to avoid pointlessly repeating work. We also extend FinalizerSafe with negative implementations for &T, and &mut T. If a type T implements all of FinalizerSafe, Send, and Sync, we know that there can be no unsafe projections used in a destructor, and can bypass most FSA checks entirely (though we still need to check for thread local accesses). Across our benchmark suite, FSA increases compilation time in release mode by a modest 0.8–1.6%.

Algorithm 2 also captures Alloy's approach to error messages. Rather than just inform a user that 'your drop method has not passed FSA', Alloy pinpoints which field or line in a drop method caused FSA to fail: EmitReferenceError informs the user when a reference in a type is used in a way that violates FSA (see Section 7.1); and EmitFinalizerUnsafeError when a drop method contains code which is unsafe (e.g. references a Gc<T> type, an opaque function, etc.). Listing 7 shows an example of the errors reported by Alloy: note that it pinpoints the line within a drop method that caused an FSA error.

error: `RefCell::new(GcNode{value: 2, nbr: None})` cannot be safely finalized.
   --> finalization_cycle.rs:11:21
   |
7  |   fn drop(&mut self) { self.value = 0; println!("{}", self.nbr.unwrap().borrow().value); }
   |                                                       ^^^^^^^^
   |                                                       a finalizer cannot safely dereference this
   |                                                       because it might have already been finalized.
...
11 |   let gc2 = Gc::new(RefCell::new(GcNode{value: 2, nbr: None}));
   |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |       has a drop method which cannot be safely finalized.

Listing 7. The compiler error produced by Alloy for the example in Listing 5. This extends rustc's normal error messages: note that both the use of a type with a FSA-incompatible drop method ('has a drop method') and the line in the drop method ('caused by the expression') are highlighted to the user.

7.4.1 Awkward Kinds of Functions

FSA can encounter two kinds of 'awkward' functions.

First, some functions (e.g. due to use of trait objects, or FFIs) do not have a body available when FSA runs: using such a function necessarily causes an FSA check to fail. One common class of functions which causes this are Rust intrinsics (e.g. min_align_of etc.): we audited the most frequently used of these and annotated those which are FSA-safe with a new rustc_fsa_safe_fn attribute. Other functions whose bodies are unknown cause FSA to fail.

Second, in most cases, FSA runs on Rust functions whose generic types have been replaced with concrete types (in Rust terminology, functions have been 'monomorphised'). Sometimes, however, FSA encounters functions (e.g. intrinsics or functions with certain annotations) whose generic types have not yet been replaced. FSA can still run on such functions, but will reject them unless all generic types imply the FinalizerSafe, Send, and Sync traits. Note that calling a method on a generically typed value will lead to FSA finding a method without a body: as in the first case above, this will cause FSA to fail.

The common theme to both is that we wish FSA to be sound, at which point we forego completeness. This can cause users frustration when FSA raises an error on code they know is FSA safe. As is common in Rust, we therefore provide an unsafe escape hatch which allows users to silence FSA errors when they can prove to their satisfaction that doing so does undermine correctness. We experimented with a per-type approach, but found that unduly restrictive: we therefore provide a per-value escape hatch with the unsafe FinalizerUnchecked<T> type. Values wrapped in this type are considered safe to use at all points in FSA. Our aim is that users should rarely need to resort to this escape hatch, but, as is not uncommon in Rust, there are valid idioms of use where we found it necessary.

8 Evaluation

	Version	Description	#benchmarks
Binary Trees	Debian CLBG Rust#2	Heap allocation microbenchmark	1
Regex-Redux	Debian CLBG Rust#1	Regular expression matching	1
Alacritty	v0.15.0-dev	Terminal emulator	10
fd	v9.0.0	Unix find replacement	7
grmtools	v0.13.4	Lexer / parser library	4
Ripgrep	v14.1.1	Fast grep replacement	13
som-rs-ast	git #35b780	SOM AST VM	26
som-rs-bc	git #35b780	SOM bytecode VM	26

Table 1. The benchmarks (top) and benchmark suites (bottom) that form our experiment. We altered them to use different memory allocation strategies (Alloy, Rc<T>, etc.). Binary Trees and Regex-Redux are classic stand-alone GC benchmarks; the other 'benchmarks' represent benchmark suites (e.g. Ripgrep contains 13 benchmarks). The middle portion of the table shows a variety of 'normal' Rust programs; the bottom portion of the program shows three implementations of the SOM programming language.

In this section we explain our methodology and our experimental results.

8.1 Methodology

8.1.1 The Benchmark Suite

There is no existing benchmark suite for GCs for Rust. Even if such a suite did exist, it may not have been suitable for our purposes because in experiment E_GCvs we want to compare programs using existing shared ownership approaches. We searched through roughly the 100 most popular Rust libraries on crates.io (the de facto standard Rust package system) looking for suitable candidates. In practise this meant we looked for crates using reference counting. In the interests of brevity, for the rest of this section we use 'Rc<T>' to cover both Rc<T> and its thread-safe cousin Arc<T>.

	`Gc<T>`	`Rc<T>`	Weak<T>
Alacritty	107	9,450	1,970
Binary Trees	2	0	0
fd	7	421	1
grmtools	299	1,825	23
Regex-Redux	108	109	0
Ripgrep	104	249	4
som-rs-ast	206	35	0
som-rs-bc	464	39	0

Table 2. How often relevant types are referenced in source code after our porting. For Rc<T>, we also show how many weak references are left in: this is a proxy both for partial porting, and also how the extent weak references. This is a proxy for the extent of changes that cyclic data-structures impose upon source code.

Table 1 shows the resulting suite: note that, except for Binary Trees and Regex-Redux, the 'benchmarks' are themselves benchmark suites. Collectively, our suite contains – depending on whether you count the SOM implementations' (identical) benchmark suites collectively or separately – 62 or 88 benchmarks. Table 2 shows how often relevant types are used after porting. Table 3 shows the distribution of heap data at run-time. This shows that our suite contains benchmarks with a variety of memory patterns.

Binary Trees is allocation intensive and sufficiently simple that it can be easily and meaningfully ported to additional shared ownership strategies: Rust-GC, a user library for GC for Rust [14]; and Arena<T>, a non-GC memory arena [10]. Alacritty, fd, and Ripgrep are well known Rust programs, all of which have their own benchmark suites. grmtools is a parsing library which uses Rc<T> extensively in error recovery: we benchmarked it using 28KLoC of real Java source code, which we mutated with syntax errors.

SOM is a small, but complete, language in the Smalltalk mould, which has a wide variety of implementations. Our suite includes two of these: som-rs-ast (which represents programs as ASTs); and som-rs-bc (which represents programs as bytecode). Both are existing ports of a Java SOM VM into Rust and use Rc<T>. We use the same SOM core-lib benchmarks for both, derived from git commit #afd5a6.

We were not able to port all parts of all programs. In particular, some programs make extensive use of the make_mut and get_mut functions in Rc<T>, which allow a programmer to mutate their contents if, at run-time, they only have a single owner. There is not, and cannot be, equivalent functionality with a copyable Gc<T> type. In some cases we were able to successfully use alternative mechanisms. In others we judged the usages to either be rare at run-time (i.e. not worth porting), or too difficult to port (i.e. too much of the program is built around the resulting assumptions). In a small number of cases we ended up introducing bugs. Alacritty's UTF-8 support is an example, resulting in deadlocks. Whenever we encountered a bug in our porting, we reverted back to Rc<T> for that portion of the port.

	Allocated (#)			GC owned (%)
	`Rc<T>`	Box<T>	`Gc<T>`
Alacritty	125	8,770	2	2.70
Binary Trees	0	3,222,201	3,222,190	100.00
fd	17,821	306,902	61	1.23
grmtools	2,283	19,859,431	4,038,605	44.19
Regex-Redux	45	3,132	78	15.39
Ripgrep	12,786	521,366	26,069	17.97
som-rs-ast	15	8,586,976	1,533,728	76.95
som-rs-bc	15	2,397,931	1,530,325	99.71

Table 3. Run-time heap distributions. The 'Allocated (#)' columns shows the number of values of each type that are allocated (note that most programs also allocate values of other types, but those are not shown directly here). The 'GC Owned' columns shows the proportion of allocated values that are owned, directly and indirectly, by Gc<T> values. For example, a program consisting of a single Gc<Box<T>> would have a 'GC Owned' value of 100% because the Box<T> is owned by the Gc<T>. As we shall see later, there can be a number of knock-on effects when a Gc<T> owns other such values.

8.1.2 What We Couldn't Include in the Benchmark Suite

We tried porting 10 other programs that are not included in our benchmark suite. To avoid readers wondering if we have 'cherry-picked' our eventual benchmark suite, we briefly report why those other programs have been excluded. All excluded benchmarks are shown in Table 4 in the Appendix.

Several programs (e.g. numbat, mini-moka, and salsa), once ported, turned out to be uninteresting from a GC benchmarking perspective. Irrespective of the number of source locations that reference memory allocation types, the benchmarks we could run from them allocated sufficiently little memory that there are no worthwhile differences between different allocation strategies. Put another way: these programs are in a sense 'the same' from our evaluation perspective.

Two programs (bevy and rust-analyzer) did not run correctly after porting. Both extensively use the make_mut or get_mut functions in Rc<T> and reverting those changes made the benchmarks uninteresting.

We also ported RustPython, but were unable to adjust it to faithfully implement Python-level destructors. In essence, in RustPython's default configuration, its representation of objects is not compatible with FSA. This means that we can not run Python __del__ methods in the finalizer thread. Although technically this is still compatible with Python's semantics, we felt this would be a misleading comparison, as our port of RustPython would be doing less work.

8.1.3 Running the Experiment

Our experiment can be seen as a comparison of Alloy against 'normal' Rust. Fortunately, Alloy is a strict superset of 'normal' Rust: only if users explicitly opt into GC does Alloy really become a 'GC for Rust'. This allows us to use the same compiler, standard library, and so on, removing several potential confounding factor in our results. We compile two binaries: one without logging features compiled and one with. We only use the latter when reporting collector related metrics.

A challenge in our experiment is that different allocation strategies can use different underlying allocators. In particular, Alloy has to use BDWGC, but, for example, Rc<T> can use a modern allocator such as jemalloc. Much has changed in the performance of allocators since BDWGC's 1980s roots: in Rc<T>-only benchmarks, we observe an inherent overhead from BDWGC of 2–26% relative to jemalloc (see Table 6 in the Appendix), which is a significant, and variable, confounding factor. Fortunately, BDWGC can be used as a 'traditional' allocator that allocates and frees on demand (i.e. no conservative GC occurs): in the main experiment, we thus use BDWGC as the allocator for all benchmarks.

We want to understand the memory usage of different allocation strategies over the lifetime of a benchmark. However, there is no single metric which captures 'memory usage', nor even an agreed set of metrics [5]. We use two metrics to capture different facets: (1) heap footprint, the amount of live heap memory recorded by by Heaptrack [34] at every allocation and deallocation; and (2) Resident Set Size (RSS), the total physical memory in RAM used by the process (including memory-mapped files, stack, and code/text segments), sampled at 10Hz. The overhead of recording heap footprint is much greater than RSS, but it provides a more detailed view of memory usage.

Another pair of confounding factors are the initial and maximum sizes of the GC heap: too-small values can lead to frequent resizing and/or 'thrashing'; large values to unrealistically few collections. What 'small' and 'large' are varies by benchmark, and 'careful' (or thoughtless) choices can significantly distort one's view of performance. BDWGC uses an adaptive strategy by default, growing the heap size as it detects that it would benefit from doing so. To give some sense of whether a different strategy and/or heap size would make a difference, we ran our benchmarks with three different fixed heap sizes. Doing so either has little effect or speeds benchmarks up; when it does so, the impact is generally under 10% and is at most 28% (the detailed results are presented in Table 9 in the Appendix). Broadly speaking, this suggests that BDWGC's default heap sizing approach, at least in our context, is not significantly distorting our view of performance.

We ran each benchmark in our suite 30 times. We report wall-clock times as returned by the standard Unix time utility. The SOM benchmarks are run using its conventional rebench tool; we adjusted rebench to use time for consistency with our other benchmarks. We ran all benchmarks on an AMD EPYC 7773X 64-Core 3.5GHz CPU with 128GiB RAM, running Debian 12 ('bookworm'). We turned off turbo boost and hyper-threading, as both can colour results.

8.1.4 Data Presentation

Except where otherwise stated, we report means and 99% confidence intervals for all metrics. We use the arithmetic mean for individual benchmarks and the geometric mean for benchmark suites.

When plotting time-series (i.e. sampled) memory metrics, we face the challenge that different configurations of the same benchmark can execute at different speeds. We thus resample each benchmark's data to 1000 evenly spaced points using linear interpolation. We chose 1000 samples because it is considerably above the visual resolution of our plots. After normalization, we calculate the arithmetic mean of the memory footprint measurement at each grid point (and not the raw underlying data) across all runs of the same benchmark. We record 99% confidence intervals at each point and show the result as shaded regions around the mean.

performance comparison chart — Figure 1. comparing the effects of `gc<t>` and `rc<t>` on wall-clock time; heap footprint (i.e. the size of the live set); and rss. the baseline at 1 is `rc<t>`; values less than 1 show `gc<t>` as better than `rc<t>`; and the blue vertical line shows the geometric mean of ratios. the wall-clock times of `gc<t>` and `rc<t>` are similar; the rss somewhat similar; and the average heap footprint often very different. broadly speaking, `gc<t>` increases the average heap footprint because gc, and especially finalization, causes values to live for longer. benchmarks which allocate relatively little memory (particularly ripgrep as shown in table 3) can exaggerate this effect. perhaps surprisingly, the heap footprint and rss do not correlate. this is partly because the sample rate for rss is rather low (which notably affects fast running benchmarks such as those for fd) and partly because rss necessarily includes headroom, that is memory beyond that needed for the live set (and which may later be returned to the os).

8.2 Results

The main results for E_GCvs can be seen in Fig. 1. Though there is variation, Alloy has an overhead on wall-clock time of 5% on our benchmark suite. The effect on memory is more variable though, unsurprisingly, Alloy typically has a larger average heap footprint (i.e. allocated memory lives for longer). This metric needs to treated with slight caution: benchmarks which allocate relatively small amounts of memory (see Table 3) can make the relative effect of average heap footprint seem much worse than it is in absolute terms.

Binary Trees is sufficiently simple that we also used it to compare against Arena<T> and Rust-GC. The time-series data in Fig. 2 is particularly illuminating (for completeness, Table 5 in the Appendix has the raw timings). Alloy is around 3.5× slower than Arena<T>. The time-series data for the latter shows it going through distinct phases: a (relatively long) allocation phase, a (relatively moderate) 'work' phase, and a (relatively short) deallocation phase. Put another way: these clear phases make Binary Trees a perfect match for an arena. In the other approaches, the 'work' phase occupies a much greater proportion of their execution, because it also incorporates allocator work. Alloy is around 1.3× faster than Rc<T>, but both have similar memory profiles. Alloy is around 3× faster than Rust-GC and has an average heap footprint around 4× smaller, reflecting Alloy's advantage in not being a user-land library that relies in part on Rc<T>. Although we caution against over-interpreting a single benchmark, this does give us at least some idea of the performance ceiling and floor for different approaches.

Time-series comparison chart — Figure 2. A selection of time-series data with various GC approaches, showing normalised time on the x-axis and heap footprint (with 99% confidence intervals shaded) on the y-axis. (i.e. the amount of live memory) over time. Binary Trees shows an example of Alloy having a comparable heap footprint to `Rc<T>`; Rust-GC's heap footprint is around 4× greater. Binary Trees is a perfect fit to `Arena<T>`: it frees memory in one batch at the end, and because it is 3× faster than Alloy, this 'wind down' period is a substantial portion of the overall (quick!) execution. Ripgrep Alternates may seem to be an example of a memory leak in Alloy, but it is really the result of the inevitable delay that GC imposes on noticing that values are lived, which is exacerbated by the presence of finalizers. The frequent plateaus and dips show that memory is being freed, but at a later point than one might initially expect. In contrast, som-rs-bc JSON Small shows a real memory leak due to cyclic objects in `Rc<T>`, where Alloy's heap footprint remains steady.

The time-series data in Fig. 2 helps explain other factors. For example, it shows that som-rs-bc leaks memory on the JSON Small benchmark (we suspect it also leaks in some other benchmarks, though rarely as visibly). This is because Rc<T> keeps alive values in cycles; Alloy does not leak memory on som-rs-bc, as it naturally deals correctly with cycles.

We can see from the time-series data that Ripgrep has a complex heap footprint pattern. This may suggest a memory leak, but in fact it is a consequence of the inevitable delay in freeing memory in a GC. In general, GC notices that memory is unused later than reference counting, but this is exacerbated further by finalizers. Surprisingly, finalizers can lengthen or shorten an allocation's lifetime. GCed values with finalizers tend to have longer lifetimes, because they have to wait in the finalizer queue. However, when a finalizer calls free on indirectly owned values, those are immediately marked as not live, rather than having to wait until the next collection to be discovered as such. This, albeit indirectly, explains the seemingly random peaks and troughs in memory usage one can observe in Ripgrep's time-series data.

Finalizer elision effects chart — Figure 3. The effects of finalizer elision on various metrics. The top-left chart shows the proportion of run-time `Gc<T>` values that: have had their finalizers elided; cannot have their finalizers elided; have no finalizers to elide. This chart is best read in conjunction in Table 3 to (a) get a sense of the quantity of run-time memory involved (b) how much indirectly owned memory the `Gc<T>` values have. The other plots use 'no finalizer elision' as the normalization base (i.e. values below 1 show that finalizer elision improves a metric). Total GC pause time is the cumulative time spent in stop-the-world collections. User time captures the time spent in all threads, including the finalizer thread. Broadly speaking, the more finalizers are elided, and the greater the proportion of the overall heap the memory owned by `Gc<T>`, the better the metrics become.

The results of E_Elision are shown in Fig. 3. In general, there is a fairly clear correlation: the more finalizers are removed, and the greater the proportion of the overall heap the memory owned by Gc<T> is, the better the metrics become. However, there are several caveats. First, when all finalizers are removed, BDWGC does not start a finalizer thread or invoke locking related to it, unduly flattering the time-based metrics. Second, the quantity of finalizers is only a partial proxy for cost: some finalizers free up large graphs of indirectly owned values, which can take some time to run. Third, some benchmarks change the work they do: grmtools speeds up so much that its error recovery algorithm has time to do more work, so while finalizer hugely benefits its GC pause time, its wall-clock time changes much less. Finally, since finalizers can cause indirectly owned allocations to be freed earlier than the GC itself does naturally, removing them can cause indirectly owned values to live for longer: Ripgrep's average heap footprint highlights this issue.

The results for E_PremOpt are shown in Fig. 4. We created three configurations of Alloy. None has no fences, and thus is unsound, but allows us to approximate (allowing for possible vagaries from running unsound code!) the best possible outcome. Naive inserts all possible fences. Optimised inserts only necessary fences. Once confidence intervals are taken into account, there are no statistically significant results for this experiment. Although it is possible that benchmarking 'noise' is hiding a meaningful result, our data suggests that any such differences are likely to be minimal. To make up for this disappointment, the fact that there is no difference between any of these suggests that, on non-artificial benchmarks, premature finalizer prevention is not a noticeable cost.

Premature finalization optimization chart — Figure 4. The effect of premature finalization optimisation, normalised to *None* (i.e. no fences). Grey bars represent the ratio for *naive* (all possible fences) and blue bars *optimised* (obviously unnecessary fences removed). Unfortunately, this attempted optimisation has no statistically significant effects.

Any performance judgements we make are necessarily contingent on our methodology the benchmark suite we chose, including the proportion of benchmarks that we ported, and the way we process and present data. For example, we did not port external libraries to use Gc<T> so many benchmarks use a variety of allocation strategies. Even had we ported everything, we would not be able to say, for example, that finalizer elision will always improve performance by exactly the factor we see in our experiment: there undoubtedly exist reasonable, non-pathological, programs which will see performance changes outside the ranges that our results suggest.

Using BDWGC as the allocator for all benchmarks has the advantage of removing 'pure' allocator performance as a confounding factor, but does mean that some of the performance characteristics of benchmarks will be changed (e.g due to the portion of time we spend in the allocator; or BDWGC's adaptive heap sizing strategy). A generic, modern conservative GC, using the insights of recent non-GC allocators, would almost certainly give different – though we suspect not profoundly different – results. To the best of our knowledge there is currently no production-quality modern, generic conservative, GC we could use instead, though we are aware of at least one attempt to create such an alternative: it will be interesting to rerun our experiments if and when that arrives.

The RSS memory metric we collect is at Linux's whim: if it does not update as frequently as we expect, we will see artificially 'smoothed' data that may miss out peaks and troughs. Similar, our interpolation of time-series data onto a normalised grid can also smooth data. We manually checked a large quantity of data to ensure this was not a significant effect; by running benchmarks 30 times means it is also less more likely that peaks and troughs are caught at least sometimes.

10 Related Work

In this paper we hope to have given sufficient background on GC and the use of destructors and finalizers in general. In this section we mostly survey the major parts of the GC for Rust landscape more widely. Our survey is inevitably incomplete, in part because this is a rapidly evolving field (a number of changes have occurred since the most recent equivalent survey we are aware of [16]). We also cover some relevant non-Rust GC work not mentioned elsewhere.

Early versions of Rust had 'managed pointers' (using the @T syntax) which were intended to represent GC types [16]. The core implementation used reference counting though there were several, sometimes short-lived, cycle detectors [17]. Managed pointer support was removed⁹ around a year before the first stable release of Rust. This was not the end of the story for 'GC as a core part of Rust', with core Rust developers exploring the problem space in more detail [15, 21, 22]. Over time these efforts dwindled, and those interested in GC for Rust largely moved from anticipating rustc support to expecting to have to do everything in user-level libraries.

One of the earliest user-level GC for Rust libraries is Bacon-Rajan-CC [12]. This provides a type Cc<T> which is similar in intention to Alloy's Gc<T>. The mechanism by which objects are collected is rather different: they have a naive reference count, which causes objects outside a cycle to have deterministic destruction; and users can manually invoke a cycle detector, which uses trial deletion in the style of Bacon and Rajan [4]¹⁰ to identify objects in unused cycles. Cycle detection requires users manually implementing a Trace trait which traverses a type's fields. Destructors are used as finalizers: to avoid the problems with Rust references we solved in Section 7.1, Bacon-Rajan-CC imposes a T:'static lifetime bound on the type parameter passed to Cc<T>. Simplifying slightly, this means that any references in such a type must be valid for the remaining lifetime of the program, a severe restriction. Unlike our approach to the access of already-finalized values (Section 7.2), it can only detect such accesses at runtime, leading to a (safe) Rust panic.

Probably the best known GC for Rust is Rust-GC [14] (partly covered in Section 4). Rust-GC's Gc<T> provides a similar API to Alloy, with the notable exception that its Gc<T> is not, and cannot be, copyable, thus always requiring calls to Gc::clone. Although, like Alloy, Rust-GC allows Gc<T> values to be converted into pointers, its lack of conservative GC means that users must ensure that a Gc<T> wrapper is kept alive for the entire lifetime of pointers derived from it. Similarly to Bacon-Rajan-CC, GCed values are reference counted, with occasional tracing sweeps to identify cycles, though Rust-GC performs cycle detection automatically (i.e. it doesn't require manual calls to a function such as collect_cycles). Drop methods are not used as finalizers: if a finalizer is required, a manual implementation of the Finalize trait must be provided; finalizer glue can be largely, though not fully (see Section 4), automatically created by the provided Trace macro. Rust-GC detects accesses to already-finalized values dynamically at run-time, panicking if they occur. Unlike Bacon-Rajan-CC, these accesses are detected by recording what the collector's state is in: if the collector is in a 'sweep' phase, any access of a Gc<T> leads to a panic. We have not yet verified whether cross-thread collection / sweeping can evade this check.

An example of moving beyond reference counting in a GC for Rust is Shifgrethor [2]. It requires Gc values to be created by a Root<'root>: the resulting Gc<'root, T> is then tied to the lifetime of the Root<'root>. This allows roots to be precisely identified, but requires explicitly having access to a Root<'root> whenever a Gc<'root, T> is used. As with Rust-GC, Shifgrethor requires users to manually implement a Finalize trait, though Shifgrethor's is more restrictive: not only can other GCed values not be accessed (implicitly solving the same problem as Section 7.2) but any other type without the same 'root lifetime as the GCed value is forbidden. This means that many seemingly safe finalizers require implementing the unsafe UnsafeFinalize trait. We view Shifgrethor as proof that accurately tracking GC roots in normal Rust without reference counting is possible, though it cannot deal with references being converted into pointers and usizes.

A different means of tackling the root-finding problem is GcArena [32], which uses branding in a similar way to GhostCells (see Section 2). In essence, users provide a special 'root' type which is the only place where roots can be stored. Mutating the heap can only be done in the context of functions that are passed a branded reference to the GCed heap. Once such a function has completed, GcArena is in full control of the GC heap, and knows that only the root type needs to be scanned for roots. This leads to a precise guarantee about GC reference lifetimes. However, if code executes in an arena for too long, the system can find itself starved of resources, with no way of recovering, even if much of the arena is no longer used. GcArena was originally part of the Piccolo VM (which was itself previously called Luster), a Lua VM written in Rust. Such VMs have a frequently executed main loop which is a natural point for a program to relinquish references to the GCed heap, but this is not true of many other GCed programs.

One attempt to improve upon Rust-GC is Bronze [11], though it shows how challenging it can be to meaningfully improve GC for Rust: both of its main advances have subsequently been disabled because they are not just unsound but actively lead to crashes. First, Bronze tried to solve the root-finding problem by using LLVM's gc.root intrinsic at function entries to generate stack-maps (a run-time mechanism for accurately tracking active pointers). This rules out the false positives that are inevitable in conservative GC. However, Bronze could not track nested references: if a Gc<T> was used as a field in a struct, it was not tracked by the GC. Second, Bronze tried to give GC in Rust similar semantics to non-ownership languages such as Java. It did this by allowing shared mutation, undermining Rust's borrow checker.

Chrome's rendering engine Blink uses the conservative GC Oilpan. It has the interesting property that it has two classes of finalizers. 'Full finalizers' are similar to finalizers in Alloy, running on a finalizer thread at an indeterminate future point, but with the difference that they can only reference parts of a GCed value. To mitigate this, 'pre-finalizers' are run by the collector on the same thread as mutator as soon as an object as recognised as unused, and can access all of a GCed value. Pre-finalizers are necessary, but not encouraged, because they implicitly pause the stop-the-world phase of the collector. This reflects the fact that latency is a fundamental concern for a rendering engine: Alloy currently makes no pretences to being low latency.

11 Conclusions

We introduced a novel design for GC in Rust that solves a number of outstanding challenges in GC for Rust, as well as – by taking advantage of Rust's unusual static guarantees – some classical GC finalizer problems. By making integration with existing Rust code easier than previous GCs for Rust, we hope to have shown a pragmatic route for partial or wholesale migration of Rust code that would benefit from GC.

Challenges and future opportunities remain. For example, Alloy is an 'all or nothing' cost: if you want to use Gc<T> in a single location, you must pay the costs of the GC runtime and so on. Alloy's absolute speed is, we believe, limited by BDWGC: it is probable that using a semi-precise GC and/or a faster conservative GC could change our view of the absolute performance speed

In summary, we do not claim that Alloy is the ultimate design for GC in Rust – reasonable people may, for example, disagree on whether the costs of conservative GC are worth the gains – but it does show what can be achieved if one is willing to alter the language's design and rustc.

Data Availability Statement

The accompanying artefact [18] contains: the source code necessary to run this paper's experiment (including generating figures etc.) from scratch; and data from a run of the experiment that we used in this paper.

Acknowledgments

This work was funded by an EPSRC PhD studentship and the Shopify / Royal Academy of Engineering Research Chair in Language Engineering. We thank Steve Klabnik and Andy Wingo for comments.

References

[1] Mads Ager, Erik Corry, Vyacheslav Egorov, Kentaro Hara, Gustav Wibling, and Ian Zerny. 2013. Oilpan: tracing garbage collection for Blink. https://docs.google.com/document/d/1y7_0ni0E_kxvrah-QtnreMlzCDKN3QP4BN1Aw7eSLfY. Accessed on 2024-10-15.
[2] Saoirse Aronson. 2018. shifgrethor. https://github.com/withoutboats/shifgrethor/. Accessed: 2024-10-15.
[3] David F Bacon, Perry Cheng, and VT Rajan. 2004. A unified theory of garbage collection. In OOPSLA. 50–68.
[4] David F Bacon and Vadakkedathu T Rajan. 2001. Concurrent cycle collection in reference counted systems. In ECOOP. 207–235. doi:10.1007/3-540-45337-7_12
[5] Stephen M. Blackburn, Zixian Cai, Rui Chen, Xi Yang, John Zhang, and John Zigman. 2025. Rethinking Java Performance Analysis. In ASPLOS. 940–954. doi:10.1145/3669940.3707217
[6] Hans-J Boehm. 2003. Destructors, finalizers, and synchronization. In POPL. doi:10.1145/604131.604153
[7] Hans-J Boehm and Mike Spertus. 2007. Optimization-robust finalization. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2261.html. Accessed: 2024-08-08.
[8] Hans-J Boehm and Mike Spertus. 2009. Garbage collection in the next C++ standard. In ISMM. 30–38. doi:10.1145/1542431.1542437
[9] Hans-Juergen Boehm and Mark Weiser. 1988. Garbage collection in an uncooperative environment. SPE 18, 9 (Sept. 1988), 807–820. doi:10.1002/spe.4380180902
[10] Thom Chiovoloni. 2015. Typed Arena. https://github.com/thomcc/rust-typed-arena/. Accessed: 2025-03-25.
[11] Michael Coblenz, Michelle Mazurek, and Michael Hicks. 2022. Does the Bronze Garbage Collector Make Rust Easier to Use? A Controlled Experiment. In ICSE. doi:10.1145/3510003.3510107
[12] Nick Fitzgerald. 2015. bacon-rajan-cc: A reference counted type with cycle collection for Rust. https://github.com/fitzgen/bacon-rajan-cc. Accessed: 2024-10-15.
[13] Brian Goetz and Mikael Vidstedt. 2021. JEP 421: Deprecate Finalization for Removal. https://openjdk.org/jeps/421. Accessed: 2024-10-15.
[14] Manish Goregaokar. 2015. rust-gc. https://github.com/Manishearth/rust-gc/. Accessed: 2024-10-15.
[15] Manish Goregaokar. 2016. GC support in Rust: API design. https://manishearth.github.io/blog/2016/08/18/gc-support-in-rust-api-design/. Accessed: 2024-10-15.
[16] Manish Goregaokar. 2021. A tour of safe tracing GC designs in Rust. https://manishearth.github.io/blog/2021/03/15/arenas-in-rust/. Accessed: 2024-10-15.
[17] Graydon Hoare. 2022. Reply to 'What should be included in a history of the Rust language?'. https://www.reddit.com/r/rust/comments/za5lh5/comment/iyp0ptm/.
[18] Jacob Hughes and Laurence Tratt. 2025. Reproduction Package for Article 'Garbage Collection for Rust: The Finalizer Frontier'. Zenodo. doi:10.5281/zenodo.17013382
[19] Richard Jones, Antony Hosking, and Eliot Moss. 2023. The Garbage Collection Handbook: the Art of Automatic Memory Management (second ed.). Chapman and Hall/CRC. doi:10.1201/9781003276142
[20] Steve Klabnik and Carol Nichols. 2018. The Rust Programming Language. No Starch Press. doi:10.5555/3271463
[21] Felix S. Klock. 2015. GC and Rust: specifying the problem. http://blog.pnkfx.org/blog/2015/11/10/gc-and-rust-part-1-specing-the-problem/. Accessed: 2024-10-15.
[22] Felix S. Klock. 2016. GC and Rust: The roots of the problem. http://blog.pnkfx.org/blog/2016/01/01/gc-and-rust-part-2-roots-of-the-problem/. Accessed: 2024-10-15.
[23] LLVM. 2014. Garbage collection with LLVM. https://llvm.org/docs/GarbageCollection.html. Accessed: 2024-10-15.
[24] Niko Matsakis. 2013. Destructors and finalizers in Rust. http://smallcultfollowing.com/babysteps/blog/2013/01/17/destructors-and-finalizers-in-rust/. Accessed: 2024-10-15.
[25] Benjamin C Pierce. 2004. Advanced topics in Types and Programming Languages. MIT press. doi:10.7551/mitpress/1104.001.0001
[26] Filip Pizlo. 2017. Introducing Riptide: WebKit's retreating wavefront concurrent garbage collector. https://webkit.org/blog/7122/introducing-riptide-webkits-retreating-wavefront-concurrent-garbage-collector/. Accessed: 2024-10-15.
[27] Rust. 2014. The Rust language reference. https://doc.rust-lang.org/reference/. Accessed: 2024-10-01.
[28] Rifat Shahriyar, Stephen M. Blackburn, and Kathryn S. McKinley. 2014. Fast conservative garbage collection. In OOPSLA. 121–139. doi:10.1145/2660193.2660198
[29] Alekesy Shpilëv. 2020. JVM Anatomy Quark #8: Local Variable Reachability. https://shipilev.net/jvm/anatomy-quarks/8-local-var-reachability/. Accessed: 2024-10-08.
[30] Bjarne Stroustrup. 1997. The C++ Programming Language (third ed.). Addison-Wesley.
[31] Lei Wang and Xikun Sun. 2006. Escape analysis for synchronization removal. In SAC. 1419–1423. doi:10.1145/1141277.1141607
[32] Catherine West. 2019. gc-arena: An experimental system for Rust garbage collection. https://github.com/kyren/gc-arena/. Accessed: 2024-10-15.
[33] Jason Williams. 2018. Boa: an experimental JavaScript lexer, parser and interpreter written in Rust. https://github.com/boa-dev/boa. Accessed: 2024-10-15.
[34] Milian Wolff. 2014. Heaptrack - A Heap Memory Profiler for Linux. https://milianw.de/blog/heaptrack-a-heap-memory-profiler-for-linux.html. Accessed: 2025-03-25.
[35] Joshua Yanovski, Hoang-Hai Dang, Ralf Jung, and Derek Dreyer. 2021. GhostCell: separating permissions from data in Rust. 5 (Aug. 2021), 1–30. doi:10.1145/3473597

Appendix

function RemoveElidableDrops(func):

    for each basic_block ∈ func do
        if IsDropTerminator(basic_block.terminator.kind) then
            ty ← GetTypeOfDroppedValue(block.terminator);
            if IsGcType(ty) then
                if not RequiresFinalizer(ty) then
                    ReplaceTerminator(basic_block);

function ReplaceTerminator(basic_block):

    drop_func ← GetDropFunc(basic_block.terminator);
    last_block ← GetLastBasicBlock(drop_func);
    block.terminator ← last_block.terminator;

Algorithm 3: Removing elidable drops

Additional Experimental Data

Benchmark	Description	Reason for exclusion
bevy	ECS game engine in Rust	Unable to port successfully (see Section 8.1.2)
dyon	Scripting language in Rust	Unable to port successfully (see Section 8.1.2)
jiff	A datetime library for Rust	Too few allocations to measure
mini-moka	Concurrent in-memory cache library	Too few allocations to measure
numbat	Math search engine	Too few allocations to measure
rkyv	Zero-copy deserialization framework	Insufficient `Gc<T>` coverage in benchmarks
RustPython	Python interpreter written in Rust	Difficulty retro-fitting `__del__` semantics (see Section 8.1.2)
rust-analyzer	Language server for Rust	Unable to port successfully (see Section 8.1.2)
salsa	Incremental recomputation library	Too few allocations to measure
WLambda	Scripting language written in Rust	Insufficient `Gc<T>` coverage in benchmarks

Table 4. Rust programs excluded from our benchmark suite after attempted porting to Alloy.

	Wall-clock time (s)
	`Gc<T>`	`Rc<T>`	`Gc<T>` (Rust-GC)	`Arena<T>`
Alacritty	0.41 [0.39, 0.45]	0.40 [0.38, 0.44]	–	–
Binary Trees	0.11 [0.11, 0.11]	0.15 [0.14, 0.15]	0.33 [0.32, 0.33]	0.03 [0.03, 0.04]
fd	0.33 [0.29, 0.38]	0.31 [0.26, 0.37]	–	–
grmtools	3.06 [3.00, 3.14]	3.24 [3.17, 3.31]	–	–
Regex-Redux	0.47 [0.47, 0.47]	0.45 [0.45, 0.46]	–	–
Ripgrep	1.61 [1.55, 1.69]	1.52 [1.45, 1.59]	–	–
som-rs-ast	0.92 [0.88, 0.95]	0.79 [0.76, 0.82]	–	–
som-rs-bc	0.28 [0.27, 0.29]	0.29 [0.28, 0.30]	–	–

Table 5. Wall-clock times (in seconds) with 99% confidence intervals for the E_GCvs experiment comparing different memory management strategies. Strategies not supported by a given benchmark are marked with "–".

	Wall-clock time (s)		Ratio
	jemalloc	BDWGC
Alacritty	0.36 [0.33, 0.40]	0.40 [0.38, 0.44]	1.11
Binary Trees	0.12 [0.12, 0.12]	0.15 [0.14, 0.15]	1.26
fd	0.30 [0.25, 0.36]	0.31 [0.26, 0.37]	1.02
grmtools	3.09 [3.01, 3.17]	3.24 [3.17, 3.31]	1.05
Regex-Redux	0.45 [0.44, 0.45]	0.45 [0.45, 0.46]	1.01
Ripgrep	1.46 [1.40, 1.53]	1.52 [1.45, 1.59]	1.04
som-rs-ast	0.77 [0.74, 0.80]	0.79 [0.76, 0.82]	1.02
som-rs-bc	0.28 [0.27, 0.29]	0.29 [0.28, 0.30]	1.02

Table 6. Wall-clock times comparing jemalloc and BDWGC as allocators for Rc<T>-only code (i.e., no GC). The ratio column shows BDWGC time divided by jemalloc time.

	Wall-clock time (s)
	jemalloc	BDWGC
	`Rc<T>`	`Gc<T>`	`Rc<T>`
Alacritty ▶
Cur. Motion	0.65 ±0.02	0.66 ±0.01	0.66 ±0.01
Dense Cells	2.16 ±0.03	2.16 ±0.02	2.14 ±0.04
Light Cells	0.37 ±0.01	0.37 ±0.01	0.38 ±0.01
Scroll	0.25 ±0.01	0.26 ±0.02	0.26 ±0.01
Scroll (fullscreen)	0.38 ±0.01	0.39 ±0.01	0.38 ±0.01
Scroll Btm	0.32 ±0.00	0.33 ±0.00	0.32 ±0.01
Scroll Btm (small)	0.32 ±0.01	0.33 ±0.00	0.32 ±0.01
Scroll Top	0.25 ±0.02	0.26 ±0.02	0.24 ±0.02
Scroll Top (small)	0.32 ±0.01	0.32 ±0.01	0.33 ±0.00
Unicode	0.23 ±0.02	0.27 ±0.01	0.27 ±0.02
fd ▶
Cmd Exec.	1.29 ±0.01	1.28 ±0.01	1.29 ±0.02
Cmd Exec. (large)	1.24 ±0.01	1.25 ±0.03	1.26 ±0.01
File Extension	0.13 ±0.00	0.15 ±0.01	0.13 ±0.00
File Type	0.12 ±0.00	0.13 ±0.01	0.13 ±0.00
No Pattern	0.18 ±0.00	0.29 ±0.02	0.22 ±0.01
Simple	0.27 ±0.01	0.33 ±0.01	0.32 ±0.00
Simple (-HI)	0.12 ±0.00	0.15 ±0.00	0.13 ±0.00
som-rs-ast ▶
Bounce	0.62 ±0.00	0.96 ±0.02	0.80 ±0.00
BubbleSort	0.56 ±0.01	0.84 ±0.01	0.71 ±0.00
DeltaBlue	0.77 ±0.00	1.09 ±0.01	0.98 ±0.00
Dispatch	0.59 ±0.01	0.97 ±0.00	0.77 ±0.01
Fannkuch	0.65 ±0.00	1.01 ±0.06	0.81 ±0.00
Fibonacci	0.83 ±0.00	1.41 ±0.01	1.08 ±0.00
FieldLoop	0.94 ±0.00	1.20 ±0.01	1.07 ±0.01
GraphSearch	0.31 ±0.00	0.43 ±0.01	0.38 ±0.01
IntegerLoop	0.52 ±0.01	0.84 ±0.00	0.67 ±0.01
JsonSmall	0.86 ±0.00	1.22 ±0.01	1.10 ±0.00
List	0.38 ±0.00	0.53 ±0.01	0.48 ±0.00
Loop	0.53 ±0.01	0.86 ±0.01	0.70 ±0.00
Mandelbrot	0.46 ±0.00	0.69 ±0.01	0.57 ±0.00
NBody	0.31 ±0.00	0.38 ±0.01	0.34 ±0.00
PageRank	0.31 ±0.00	0.44 ±0.00	0.39 ±0.00
Permute	0.67 ±0.00	0.90 ±0.02	0.82 ±0.00
Queens	0.77 ±0.01	1.18 ±0.02	0.98 ±0.00
QuickSort	0.98 ±0.00	1.27 ±0.02	1.26 ±0.01
Recurse	0.60 ±0.00	0.92 ±0.00	0.79 ±0.00
Richards	2.48 ±0.02	3.72 ±0.11	3.13 ±0.01
Sieve	0.67 ±0.00	0.94 ±0.01	0.87 ±0.00
Storage	0.56 ±0.00	0.70 ±0.01	0.69 ±0.00
Sum	0.52 ±0.01	0.83 ±0.01	0.67 ±0.00
Towers	0.31 ±0.00	0.46 ±0.01	0.38 ±0.01
TreeSort	0.32 ±0.00	0.41 ±0.01	0.40 ±0.00
WhileLoop	0.48 ±0.00	0.72 ±0.00	0.60 ±0.00
grmtools ▶
Eclipse	3.48 ±0.03	3.44 ±0.03	3.74 ±0.04
Hadoop	2.86 ±0.01	2.92 ±0.01	3.09 ±0.01
Jenkins	2.90 ±0.00	2.99 ±0.05	3.02 ±0.01
Spring	2.43 ±0.00	2.51 ±0.01	2.63 ±0.02
Ripgrep ▶
Alternates	1.18 ±0.01	1.29 ±0.02	1.21 ±0.01
Alternates (-i)	1.31 ±0.01	1.42 ±0.01	1.33 ±0.01
Literal	1.12 ±0.01	1.23 ±0.01	1.15 ±0.01
Literal (-i)	1.19 ±0.00	1.27 ±0.01	1.20 ±0.01
Literal (default)	1.08 ±0.01	1.19 ±0.01	1.10 ±0.01
Literal (mmap)	1.64 ±0.01	1.75 ±0.02	1.67 ±0.01
Literal (mmap, -i)	1.67 ±0.01	1.80 ±0.01	1.72 ±0.01
Literal (regex)	1.13 ±0.01	1.25 ±0.01	1.15 ±0.01
UTF Greek	3.29 ±0.01	3.41 ±0.01	3.33 ±0.01
UTF Greek (-i)	3.30 ±0.01	3.42 ±0.03	3.33 ±0.01
UTF Word	1.09 ±0.01	1.21 ±0.01	1.14 ±0.00
UTF Word (alt.)	1.11 ±0.01	1.21 ±0.01	1.13 ±0.00
Word	1.12 ±0.00	1.22 ±0.02	1.14 ±0.01
som-rs-bc ▶
Bounce	0.25 ±0.00	0.28 ±0.00	0.30 ±0.00
BubbleSort	0.23 ±0.00	0.27 ±0.00	0.27 ±0.00
DeltaBlue	0.30 ±0.00	0.34 ±0.00	0.36 ±0.00
Dispatch	0.25 ±0.00	0.28 ±0.00	0.29 ±0.01
Fannkuch	0.25 ±0.00	0.29 ±0.00	0.29 ±0.00
Fibonacci	0.30 ±0.01	0.37 ±0.00	0.36 ±0.00
FieldLoop	0.49 ±0.00	0.43 ±0.00	0.54 ±0.01
GraphSearch	0.12 ±0.00	0.12 ±0.00	0.15 ±0.00
IntegerLoop	0.22 ±0.00	0.24 ±0.00	0.26 ±0.00
JsonSmall	0.37 ±0.00	0.39 ±0.01	0.44 ±0.00
List	0.18 ±0.00	0.22 ±0.00	0.21 ±0.01
Loop	0.23 ±0.00	0.26 ±0.00	0.27 ±0.00
Mandelbrot	0.20 ±0.00	0.22 ±0.01	0.23 ±0.00
NBody	0.11 ±0.00	0.12 ±0.00	0.13 ±0.00
PageRank	0.13 ±0.00	0.13 ±0.00	0.14 ±0.00
Permute	0.24 ±0.00	0.27 ±0.00	0.27 ±0.00
Queens	0.28 ±0.01	0.33 ±0.00	0.33 ±0.00
QuickSort	0.36 ±0.00	0.38 ±0.00	0.43 ±0.00
Recurse	0.25 ±0.00	0.31 ±0.00	0.30 ±0.01
Richards	0.92 ±0.00	1.20 ±0.01	1.10 ±0.01
Sieve	0.25 ±0.00	0.26 ±0.00	0.30 ±0.01
Storage	0.22 ±0.00	0.21 ±0.00	0.25 ±0.00
Sum	0.22 ±0.00	0.25 ±0.00	0.27 ±0.00
Towers	0.13 ±0.00	0.15 ±0.00	0.15 ±0.00
TreeSort	0.11 ±0.00	0.12 ±0.00	0.13 ±0.00
WhileLoop	0.22 ±0.00	0.23 ±0.00	0.25 ±0.00

Table 7. Wall-clock execution times (in seconds) for each benchmark in the E_GCvs experiment. Values show arithmetic means over 30 runs and include 99% confidence intervals. Because Binary Trees and Regex-Redux are stand-alone benchmarks they are omitted here; their timings appear in Table 5.

	User time (s)
	jemalloc	BDWGC
	`Rc<T>`	`Gc<T>`	`Rc<T>`
Alacritty ▶
Cur. Motion	1.08 ±0.05	1.09 ±0.04	1.09 ±0.04
Dense Cells	6.66 ±0.13	6.62 ±0.12	6.58 ±0.15
Light Cells	0.34 ±0.03	0.34 ±0.03	0.34 ±0.03
Scroll	0.11 ±0.01	0.11 ±0.01	0.12 ±0.02
Scroll (fullscreen)	0.24 ±0.02	0.28 ±0.03	0.22 ±0.02
Scroll Btm	0.14 ±0.01	0.14 ±0.01	0.14 ±0.01
Scroll Btm (small)	0.14 ±0.01	0.15 ±0.01	0.14 ±0.01
Scroll Top	0.12 ±0.01	0.12 ±0.02	0.11 ±0.01
Scroll Top (small)	0.14 ±0.01	0.14 ±0.01	0.15 ±0.01
Unicode	0.10 ±0.01	0.13 ±0.01	0.12 ±0.01
fd ▶
Cmd Exec.	1.05 ±0.01	1.12 ±0.02	1.12 ±0.02
Cmd Exec. (large)	1.02 ±0.01	1.10 ±0.03	1.11 ±0.02
File Extension	0.04 ±0.00	0.04 ±0.01	0.04 ±0.01
File Type	0.03 ±0.01	0.04 ±0.01	0.04 ±0.01
No Pattern	0.16 ±0.01	0.29 ±0.02	0.23 ±0.01
Simple	0.14 ±0.01	0.18 ±0.01	0.16 ±0.01
Simple (-HI)	0.03 ±0.01	0.04 ±0.01	0.04 ±0.01
som-rs-ast ▶
Bounce	0.62 ±0.00	0.96 ±0.02	0.79 ±0.00
BubbleSort	0.56 ±0.00	0.83 ±0.01	0.71 ±0.00
DeltaBlue	0.77 ±0.00	1.07 ±0.01	0.97 ±0.00
Dispatch	0.59 ±0.01	0.96 ±0.01	0.76 ±0.01
Fannkuch	0.64 ±0.00	0.98 ±0.05	0.80 ±0.00
Fibonacci	0.82 ±0.00	1.40 ±0.01	1.07 ±0.00
FieldLoop	0.94 ±0.00	1.20 ±0.01	1.07 ±0.01
GraphSearch	0.30 ±0.00	0.41 ±0.01	0.37 ±0.00
IntegerLoop	0.52 ±0.01	0.83 ±0.00	0.66 ±0.01
JsonSmall	0.85 ±0.00	1.21 ±0.01	1.10 ±0.00
List	0.38 ±0.00	0.52 ±0.01	0.47 ±0.00
Loop	0.52 ±0.01	0.85 ±0.01	0.70 ±0.00
Mandelbrot	0.45 ±0.00	0.69 ±0.01	0.57 ±0.00
NBody	0.31 ±0.00	0.38 ±0.01	0.34 ±0.00
PageRank	0.30 ±0.00	0.43 ±0.01	0.38 ±0.00
Permute	0.66 ±0.00	0.89 ±0.02	0.82 ±0.00
Queens	0.77 ±0.00	1.18 ±0.02	0.97 ±0.00
QuickSort	0.98 ±0.00	1.26 ±0.02	1.26 ±0.00
Recurse	0.60 ±0.00	0.91 ±0.00	0.79 ±0.00
Richards	2.47 ±0.02	3.72 ±0.11	3.13 ±0.01
Sieve	0.66 ±0.00	0.94 ±0.01	0.87 ±0.00
Storage	0.55 ±0.00	0.69 ±0.01	0.68 ±0.00
Sum	0.52 ±0.00	0.82 ±0.01	0.67 ±0.00
Towers	0.31 ±0.00	0.46 ±0.01	0.38 ±0.00
TreeSort	0.31 ±0.00	0.40 ±0.01	0.40 ±0.00
WhileLoop	0.47 ±0.00	0.72 ±0.00	0.59 ±0.00
grmtools ▶
Eclipse	3.32 ±0.03	3.28 ±0.03	3.59 ±0.04
Hadoop	2.77 ±0.01	2.80 ±0.01	2.99 ±0.01
Jenkins	2.74 ±0.01	2.81 ±0.04	2.90 ±0.01
Spring	2.35 ±0.01	2.40 ±0.01	2.54 ±0.02
Ripgrep ▶
Alternates	0.42 ±0.02	0.51 ±0.02	0.43 ±0.02
Alternates (-i)	0.54 ±0.02	0.65 ±0.02	0.57 ±0.02
Literal	0.34 ±0.02	0.44 ±0.01	0.36 ±0.01
Literal (-i)	0.41 ±0.02	0.50 ±0.02	0.44 ±0.02
Literal (default)	0.30 ±0.02	0.40 ±0.01	0.32 ±0.01
Literal (mmap)	0.43 ±0.02	0.52 ±0.02	0.46 ±0.02
Literal (mmap, -i)	0.48 ±0.02	0.58 ±0.02	0.50 ±0.02
Literal (regex)	0.35 ±0.02	0.44 ±0.02	0.36 ±0.01
UTF Greek	2.55 ±0.02	2.64 ±0.02	2.58 ±0.03
UTF Greek (-i)	2.56 ±0.03	2.66 ±0.03	2.56 ±0.02
UTF Word	0.35 ±0.02	0.42 ±0.02	0.37 ±0.02
UTF Word (alt.)	0.33 ±0.02	0.43 ±0.02	0.35 ±0.01
Word	0.35 ±0.02	0.43 ±0.02	0.36 ±0.01
som-rs-bc ▶
Bounce	0.25 ±0.00	0.28 ±0.00	0.29 ±0.00
BubbleSort	0.22 ±0.00	0.27 ±0.00	0.27 ±0.00
DeltaBlue	0.30 ±0.00	0.34 ±0.00	0.35 ±0.01
Dispatch	0.24 ±0.00	0.28 ±0.00	0.29 ±0.00
Fannkuch	0.24 ±0.00	0.28 ±0.00	0.28 ±0.00
Fibonacci	0.30 ±0.00	0.36 ±0.00	0.36 ±0.00
FieldLoop	0.48 ±0.00	0.42 ±0.01	0.53 ±0.01
GraphSearch	0.12 ±0.00	0.11 ±0.00	0.14 ±0.00
IntegerLoop	0.22 ±0.00	0.24 ±0.00	0.26 ±0.00
JsonSmall	0.36 ±0.00	0.39 ±0.01	0.44 ±0.00
List	0.17 ±0.00	0.22 ±0.00	0.21 ±0.01
Loop	0.22 ±0.00	0.25 ±0.00	0.27 ±0.00
Mandelbrot	0.19 ±0.00	0.21 ±0.01	0.23 ±0.00
NBody	0.11 ±0.00	0.12 ±0.00	0.13 ±0.00
PageRank	0.12 ±0.00	0.12 ±0.00	0.14 ±0.00
Permute	0.23 ±0.00	0.26 ±0.00	0.27 ±0.00
Queens	0.27 ±0.01	0.33 ±0.00	0.33 ±0.00
QuickSort	0.35 ±0.00	0.38 ±0.00	0.42 ±0.00
Recurse	0.25 ±0.00	0.31 ±0.00	0.30 ±0.01
Richards	0.92 ±0.00	1.20 ±0.01	1.10 ±0.01
Sieve	0.25 ±0.00	0.26 ±0.00	0.30 ±0.01
Storage	0.22 ±0.00	0.20 ±0.00	0.25 ±0.00
Sum	0.22 ±0.00	0.24 ±0.00	0.26 ±0.00
Towers	0.12 ±0.00	0.15 ±0.00	0.14 ±0.00
TreeSort	0.11 ±0.00	0.11 ±0.00	0.12 ±0.00
WhileLoop	0.21 ±0.00	0.22 ±0.00	0.25 ±0.00

Table 8. User times (in seconds) for each benchmark in the E_GCvs experiment. Values show arithmetic means over 30 runs and include 99% confidence intervals.

	Heap Size (MiB)	Relative wall-clock time	Benchmarks failed
Alacritty	16	0.96 [0.91, 0.99]
	32	0.98 [0.95, 1.02]
	64	0.94 [0.89, 0.98]
Binary Trees	4	0.88 [0.82, 1.02]
	8	0.90 [0.80, 1.01]
	16	0.87 [0.82, 0.94]
fd	16	0.94 [0.90, 0.99]
	32	0.94 [0.88, 0.98]
	64	0.94 [0.89, 1.00]
grmtools	1024	1.01 [1.00, 1.02]	2/4 (Eclipse, Jenkins)
	2048	1.00 [1.00, 1.01]
	4096	1.01 [1.00, 1.02]
Regex-Redux	256	0.94 [0.92, 0.95]
	512	0.93 [0.90, 0.94]
	1024	0.96 [0.92, 1.07]
Ripgrep	32	0.96 [0.95, 0.96]
	64	0.95 [0.94, 0.95]
	128	0.94 [0.93, 0.95]
som-rs-ast	64	0.72 [0.71, 0.74]	2/4 (Fannkuch, TreeSort)
	96	0.74 [0.73, 0.75]	2/4 (Fannkuch, TreeSort)
	128	0.75 [0.74, 0.76]
som-rs-bc	32	0.79 [0.78, 0.80]
	64	0.79 [0.77, 0.80]
	128	0.84 [0.83, 0.86]

Table 9. The effects of fixing the heap size (see Section 8.1.3) on wall-clock time, reported as ratios relative to BDWGC's default adaptive RSS strategy. We chose per-benchmark heap sizes based on our observations of their memory usage, though some benchmarks fail at smaller values. Broadly speaking, these results show that the performance differences due to different heap sizes are relatively minor, and that BDWGC's adaptive strategy has not unduly coloured our results.

	Fin. elided (%)
Alacritty ▶
Cur. Motion	0.00
Dense Cells	0.00
Light Cells	0.00
Scroll	0.00
Scroll Btm	0.00
Scroll Btm (small)	0.00
Scroll (fullscreen)	0.00
Scroll Top	0.00
Scroll Top (small)	0.00
Unicode	0.00
fd ▶
Cmd Exec.	8.93
Cmd Exec. (large)	9.09
File Extension	72.73
File Type	24.54
No Pattern	0.04
Simple	61.54
Simple (-HI)	61.54
som-rs-ast ▶
Bounce	100.00
BubbleSort	100.00
DeltaBlue	100.00
Dispatch	100.00
Fannkuch	100.00
Fibonacci	100.00
FieldLoop	100.00
GraphSearch	99.99
IntegerLoop	100.00
JsonSmall	100.00
List	100.00
Loop	100.00
Mandelbrot	100.00
NBody	99.99
PageRank	99.99
Permute	100.00
Queens	100.00
QuickSort	100.00
Recurse	100.00
Richards	100.00
Sieve	100.00
Storage	100.00
Sum	100.00
Towers	99.99
TreeSort	99.99
WhileLoop	100.00
grmtools ▶
Eclipse	11.79
Hadoop	32.99
Jenkins	26.74
Spring	37.25
Ripgrep ▶
Alternates	79.04
Alternates (-i)	79.04
Literal	79.04
Literal (-i)	79.04
Literal (mmap, -i)	79.04
Literal (default)	79.04
Literal (mmap)	79.04
Literal (regex)	79.04
UTF Greek	79.04
UTF Greek (-i)	79.04
UTF Word	79.04
UTF Word (alt.)	79.04
Word	79.04
som-rs-bc ▶
Bounce	72.50
BubbleSort	76.34
DeltaBlue	73.86
Dispatch	77.78
Fannkuch	72.74
Fibonacci	70.37
FieldLoop	83.33
GraphSearch	76.56
IntegerLoop	75.00
JsonSmall	71.85
List	79.41
Loop	74.82
Mandelbrot	70.93
NBody	86.17
PageRank	70.95
Permute	79.23
Queens	79.04
QuickSort	69.33
Recurse	82.61
Richards	71.17
Sieve	73.15
Storage	73.84
Sum	75.00
Towers	78.29
TreeSort	65.19
WhileLoop	74.98

Table 10. Percentage of finalizers Alloy was able to elide for each benchmark.

User time performance comparison — Figure 5. Wall-clock and user time performance comparison for finalizer elision on each benchmark. The bars show the relative performance of Alloy after applying our elision optimization, normalized against the baseline (solid black line). The vertical blue line marks the overall geometric mean (with shaded area for CIs). User time often shows greater improvement than wall-clock time, as elision reduces the CPU overhead of the finalization thread.

	Wall-clock time (s)
	Before elision	After elision
Alacritty ▶
Cur. Motion	0.66 ±0.01	0.66 ±0.01
Dense Cells	2.17 ±0.03	2.17 ±0.02
Light Cells	0.39 ±0.01	0.38 ±0.01
Scroll	0.27 ±0.01	0.27 ±0.01
Scroll (fullscreen)	0.39 ±0.01	0.39 ±0.01
Scroll Btm	0.33 ±0.00	0.33 ±0.00
Scroll Btm (small)	0.33 ±0.00	0.33 ±0.00
Scroll Top	0.25 ±0.01	0.24 ±0.04
Scroll Top (small)	0.33 ±0.00	0.33 ±0.01
Unicode	0.26 ±0.02	0.26 ±0.02
fd ▶
Cmd Exec.	1.29 ±0.02	1.29 ±0.02
Cmd Exec. (large)	1.25 ±0.01	1.25 ±0.02
File Extension	0.15 ±0.00	0.14 ±0.01
File Type	0.13 ±0.01	0.13 ±0.01
No Pattern	0.28 ±0.02	0.29 ±0.02
Simple	0.34 ±0.01	0.35 ±0.00
Simple (-HI)	0.14 ±0.01	0.14 ±0.01
som-rs-ast ▶
Bounce	7.68 ±0.40	1.03 ±0.01
BubbleSort	5.27 ±0.29	0.92 ±0.01
DeltaBlue	7.84 ±0.19	1.17 ±0.01
Dispatch	6.52 ±0.46	1.04 ±0.01
Fannkuch	6.76 ±0.09	1.15 ±0.03
Fibonacci	22.16 ±1.87	1.54 ±0.01
FieldLoop	5.14 ±0.16	1.25 ±0.01
GraphSearch	2.82 ±0.06	0.47 ±0.00
IntegerLoop	5.88 ±0.11	0.90 ±0.01
JsonSmall	11.37 ±0.77	1.33 ±0.01
List	4.06 ±0.06	0.58 ±0.00
Loop	6.02 ±0.14	0.90 ±0.01
Mandelbrot	4.67 ±0.11	0.74 ±0.01
NBody	2.91 ±0.16	0.40 ±0.00
PageRank	2.44 ±0.05	0.47 ±0.00
Permute	6.69 ±0.35	0.97 ±0.02
Queens	9.07 ±0.34	1.28 ±0.02
QuickSort	10.86 ±0.66	1.39 ±0.02
Recurse	7.66 ±0.24	0.98 ±0.00
Richards	97.40 ±10.46	3.95 ±0.08
Sieve	7.06 ±0.13	1.01 ±0.01
Storage	5.06 ±0.10	0.75 ±0.01
Sum	5.91 ±0.12	0.89 ±0.01
Towers	3.20 ±0.14	0.50 ±0.01
TreeSort	4.65 ±0.32	0.45 ±0.00
WhileLoop	5.03 ±0.21	0.76 ±0.01
grmtools ▶
Eclipse	5.09 ±0.05	3.59 ±0.06
Hadoop	4.49 ±0.03	3.07 ±0.05
Jenkins	4.45 ±0.02	3.02 ±0.01
Spring	4.11 ±0.03	2.65 ±0.01
Ripgrep ▶
Alternates	1.44 ±0.03	1.39 ±0.01
Alternates (-i)	1.55 ±0.02	1.50 ±0.01
Literal	1.38 ±0.03	1.32 ±0.01
Literal (-i)	1.45 ±0.02	1.38 ±0.01
Literal (default)	1.38 ±0.02	1.30 ±0.01
Literal (mmap)	1.90 ±0.02	1.81 ±0.01
Literal (mmap, -i)	1.90 ±0.03	1.84 ±0.01
Literal (regex)	1.36 ±0.03	1.33 ±0.01
UTF Greek	3.53 ±0.01	3.50 ±0.02
UTF Greek (-i)	3.51 ±0.02	3.52 ±0.02
UTF Word	1.39 ±0.02	1.30 ±0.01
UTF Word (alt.)	1.36 ±0.02	1.31 ±0.02
Word	1.36 ±0.02	1.31 ±0.01
som-rs-bc ▶
Bounce	3.03 ±0.04	0.31 ±0.00
BubbleSort	2.64 ±0.04	0.29 ±0.00
DeltaBlue	4.19 ±0.08	0.37 ±0.00
Dispatch	3.80 ±0.36	0.30 ±0.00
Fannkuch	2.86 ±0.23	0.31 ±0.00
Fibonacci	4.34 ±0.20	0.40 ±0.00
FieldLoop	2.45 ±0.16	0.45 ±0.01
GraphSearch	1.15 ±0.03	0.13 ±0.00
IntegerLoop	3.00 ±0.16	0.26 ±0.00
JsonSmall	4.96 ±0.80	0.42 ±0.01
List	2.69 ±0.09	0.24 ±0.01
Loop	3.07 ±0.18	0.28 ±0.00
Mandelbrot	2.21 ±0.15	0.23 ±0.00
NBody	1.23 ±0.06	0.13 ±0.00
PageRank	1.06 ±0.01	0.14 ±0.00
Permute	2.72 ±0.09	0.29 ±0.00
Queens	3.19 ±0.07	0.35 ±0.00
QuickSort	3.62 ±0.05	0.41 ±0.00
Recurse	4.27 ±1.91	0.33 ±0.00
Richards	15.51 ±0.89	1.31 ±0.01
Sieve	2.57 ±0.07	0.28 ±0.00
Storage	2.45 ±0.05	0.22 ±0.00
Sum	3.06 ±0.37	0.27 ±0.00
Towers	1.59 ±0.05	0.16 ±0.00
TreeSort	1.19 ±0.02	0.13
WhileLoop	3.26 ±0.08	0.24 ±0.01

Table 11. Wall-clock execution times (seconds) for each benchmark in the E_Elision experiment, shown before and after applying Alloy's finalizer elision optimisation. Values show arithmetic means over 30 runs, with 99% confidence intervals.

	User time (s)
	Before elision	After elision
Alacritty ▶
Cur. Motion	1.08 ±0.06	1.12 ±0.05
Dense Cells	6.70 ±0.12	6.68 ±0.13
Light Cells	0.34 ±0.04	0.33 ±0.04
Scroll	0.12 ±0.01	0.12 ±0.01
Scroll (fullscreen)	0.27 ±0.03	0.26 ±0.03
Scroll Btm	0.15 ±0.01	0.15 ±0.01
Scroll Btm (small)	0.15 ±0.01	0.14 ±0.01
Scroll Top	0.12 ±0.01	0.12 ±0.06
Scroll Top (small)	0.15 ±0.01	0.14 ±0.01
Unicode	0.13 ±0.02	0.13 ±0.02
fd ▶
Cmd Exec.	1.13 ±0.01	1.13 ±0.02
Cmd Exec. (large)	1.10 ±0.02	1.11 ±0.02
File Extension	0.05 ±0.01	0.04 ±0.00
File Type	0.04 ±0.00	0.04 ±0.01
No Pattern	0.27 ±0.02	0.30 ±0.02
Simple	0.19 ±0.01	0.20 ±0.01
Simple (-HI)	0.04 ±0.01	0.04 ±0.01
som-rs-ast ▶
Bounce	11.15 ±0.41	1.02 ±0.01
BubbleSort	7.88 ±0.40	0.90 ±0.01
DeltaBlue	11.72 ±0.21	1.15 ±0.01
Dispatch	9.15 ±0.63	1.03 ±0.01
Fannkuch	9.60 ±0.12	1.12 ±0.03
Fibonacci	26.41 ±1.82	1.53 ±0.01
FieldLoop	6.42 ±0.17	1.25 ±0.01
GraphSearch	3.92 ±0.07	0.45 ±0.01
IntegerLoop	8.23 ±0.13	0.90 ±0.01
JsonSmall	16.08 ±0.75	1.32 ±0.01
List	5.79 ±0.09	0.57 ±0.00
Loop	8.51 ±0.17	0.89 ±0.01
Mandelbrot	6.62 ±0.12	0.73 ±0.01
NBody	3.98 ±0.15	0.40 ±0.00
PageRank	3.73 ±0.07	0.47 ±0.00
Permute	9.96 ±0.34	0.96 ±0.02
Queens	13.04 ±0.33	1.27 ±0.02
QuickSort	16.43 ±0.65	1.38 ±0.02
Recurse	11.18 ±0.28	0.98 ±0.00
Richards	109.49 ±10.34	3.95 ±0.08
Sieve	10.49 ±0.18	1.00 ±0.01
Storage	7.70 ±0.12	0.74 ±0.01
Sum	8.27 ±0.14	0.89 ±0.01
Towers	4.61 ±0.14	0.49 ±0.01
TreeSort	5.90 ±0.35	0.44 ±0.00
WhileLoop	6.74 ±0.20	0.76 ±0.01
grmtools ▶
Eclipse	5.81 ±0.07	3.42 ±0.06
Hadoop	5.27 ±0.04	2.96 ±0.05
Jenkins	5.21 ±0.04	2.86 ±0.01
Spring	4.94 ±0.05	2.54 ±0.01
Ripgrep ▶
Alternates	0.58 ±0.02	0.50 ±0.02
Alternates (-i)	0.72 ±0.02	0.64 ±0.02
Literal	0.53 ±0.02	0.44 ±0.02
Literal (-i)	0.59 ±0.02	0.51 ±0.01
Literal (default)	0.48 ±0.02	0.40 ±0.02
Literal (mmap)	0.63 ±0.02	0.53 ±0.02
Literal (mmap, -i)	0.65 ±0.02	0.57 ±0.02
Literal (regex)	0.52 ±0.02	0.43 ±0.01
UTF Greek	2.70 ±0.02	2.66 ±0.03
UTF Greek (-i)	2.71 ±0.03	2.67 ±0.02
UTF Word	0.53 ±0.02	0.42 ±0.02
UTF Word (alt.)	0.52 ±0.02	0.43 ±0.02
Word	0.51 ±0.02	0.44 ±0.02
som-rs-bc ▶
Bounce	4.34 ±0.04	0.30 ±0.00
BubbleSort	3.76 ±0.05	0.29 ±0.00
DeltaBlue	5.76 ±0.11	0.36 ±0.00
Dispatch	4.91 ±0.35	0.29 ±0.00
Fannkuch	3.85 ±0.29	0.30 ±0.00
Fibonacci	6.15 ±0.17	0.40 ±0.00
FieldLoop	3.14 ±0.15	0.45 ±0.01
GraphSearch	1.54 ±0.03	0.12 ±0.00
IntegerLoop	3.94 ±0.15	0.26 ±0.00
JsonSmall	6.70 ±0.73	0.41 ±0.01
List	3.72 ±0.09	0.24 ±0.01
Loop	4.11 ±0.17	0.28 ±0.00
Mandelbrot	2.91 ±0.13	0.23 ±0.00
NBody	1.72 ±0.06	0.12 ±0.00
PageRank	1.51 ±0.02	0.13 ±0.00
Permute	3.84 ±0.09	0.28 ±0.00
Queens	4.53 ±0.07	0.35 ±0.00
QuickSort	5.28 ±0.09	0.40 ±0.00
Recurse	5.84 ±1.85	0.33 ±0.00
Richards	20.23 ±0.81	1.30 ±0.01
Sieve	3.76 ±0.12	0.27 ±0.01
Storage	3.23 ±0.05	0.22 ±0.00
Sum	4.03 ±0.36	0.26 ±0.00
Towers	2.20 ±0.05	0.16 ±0.00
TreeSort	1.71 ±0.03	0.13 ±0.00
WhileLoop	3.90 ±0.09	0.24 ±0.01

Table 12. User times (seconds) for each benchmark in the E_Elision experiment, shown before and after applying Alloy's finalizer elision optimisation. Values show arithmetic means over 30 runs, with 99% confidence intervals.

	Avg. heap footprint (MiB)
	Before elision	After elision
Alacritty ▶
Cur. Motion	3.76 ±0.06	3.79 ±0.06
Dense Cells	3.74 ±0.06	3.74 ±0.06
Light Cells	3.76 ±0.08	3.73 ±0.07
Scroll	3.88 ±0.13	3.87 ±0.12
Scroll (fullscreen)	6.46 ±0.28	6.46 ±0.34
Scroll Btm	3.72 ±0.07	3.73 ±0.07
Scroll Btm (small)	3.75 ±0.05	3.74 ±0.07
Scroll Top	3.78 ±0.08	3.75 ±0.10
Scroll Top (small)	3.76 ±0.06	3.71 ±0.08
Unicode	3.73 ±0.15	3.77 ±0.09
fd ▶
execution	17.18 ±0.33	17.89 ±0.87
extension	13.52 ±0.75	13.68 ±0.84
hi	15.14 ±0.36	14.99 ±0.59
output	17.54 ±0.65	17.47 ±0.53
pattern	37.79 ±5.22	38.03 ±5.25
type	15.80 ±0.40	15.84 ±0.38
som-rs-ast ▶
bounce	309.85 ±43.90	362.44 ±7.24
bubblesort	334.97 ±6.55	209.99 ±3.88
deltablue	305.24 ±31.74	310.83 ±5.32
dispatch	345.26 ±18.67	329.90 ±4.22
fannkuch	412.43 ±5.54	363.19 ±8.31
fibonacci	382.32 ±22.98	587.04 ±4.00
fieldloop	326.47 ±8.38	300.93 ±5.66
graphsearch	100.82 ±4.86	61.15 ±0.26
integerloop	323.50 ±16.42	320.62 ±0.44
jsonsmall	257.67 ±34.84	316.56 ±0.93
list	236.78 ±5.27	156.13 ±4.11
loop	332.20 ±14.75	326.89 ±7.76
mandelbrot	276.68 ±4.14	236.49 ±2.42
nbody	151.00 ±2.63	137.42 ±2.78
pagerank	193.25 ±1.12	168.02 ±0.95
permute	284.84 ±43.08	371.67 ±11.43
queens	278.89 ±12.36	288.03 ±2.63
quicksort	515.72 ±20.82	424.56 ±18.48
recurse	429.81 ±12.37	410.97 ±5.30
richards	1187.31 ±98.12	1472.67 ±31.17
sieve	355.11 ±17.79	360.95 ±6.06
storage	196.36 ±9.70	193.06 ±6.60
sum	320.61 ±11.55	298.24 ±2.27
towers	174.90 ±4.23	175.82 ±0.26
treesort	92.65 ±4.87	98.82 ±0.92
whileloop	272.66 ±10.05	253.07 ±3.60
grmtools ▶
Eclipse	1993.86 ±9.04	1069.95 ±28.40
Hadoop	1882.88 ±1.55	800.15 ±48.62
Jenkins	1876.35 ±14.92	962.16 ±29.20
Spring	1752.72 ±1.68	1036.27 ±19.48
Ripgrep ▶
Alternates	28.23 ±1.59	15.80 ±0.40
Alternates (-i)	29.52 ±1.66	17.50 ±0.66
Literal	25.14 ±1.52	15.26 ±0.33
Literal (-i)	26.69 ±1.56	18.17 ±0.41
Literal (default)	25.62 ±1.59	15.25 ±0.39
Literal (mmap)	26.91 ±1.78	16.03 ±0.39
Literal (mmap, -i)	28.72 ±1.66	17.34 ±0.24
Literal (regex)	25.87 ±2.01	15.05 ±0.46
UTF Greek	26.57 ±1.40	17.43 ±0.47
UTF Greek (-i)	26.85 ±1.54	18.04 ±0.65
UTF Word	28.03 ±1.57	18.13 ±0.14
UTF Word (alt.)	25.56 ±1.60	15.48 ±0.46
Word	29.35 ±2.05	15.82 ±0.51
som-rs-bc ▶
bounce	138.59 ±5.18	134.24 ±0.22
bubblesort	121.69 ±6.72	115.25 ±0.03
deltablue	186.12 ±1.32	135.10 ±2.09
dispatch	168.63 ±7.00	137.08 ±0.01
fannkuch	139.94 ±5.95	115.98 ±0.02
fibonacci	205.88 ±10.08	175.88 ±0.03
fieldloop	98.52 ±4.07	89.07 ±0.01
graphsearch	45.81 ±1.21	24.33 ±0.74
integerloop	133.10 ±6.16	117.24 ±0.01
jsonsmall	164.80 ±5.83	142.20 ±5.29
list	106.38 ±7.84	98.95 ±0.04
loop	144.73 ±6.09	120.89 ±0.00
mandelbrot	90.31 ±4.84	80.51 ±0.03
nbody	58.61 ±1.77	48.15 ±0.03
pagerank	69.21 ±0.08	51.90 ±0.46
permute	126.65 ±5.94	109.74 ±0.03
queens	157.93 ±5.89	134.31 ±0.02
quicksort	183.42 ±4.54	175.50 ±0.04
recurse	169.34 ±7.22	146.41 ±0.00
richards	582.34 ±41.87	487.95 ±0.03
sieve	131.11 ±5.83	114.14 ±3.04
storage	71.95 ±2.32	85.36 ±3.26
sum	136.99 ±5.40	117.56 ±0.00
towers	71.09 ±3.09	58.84 ±0.03
treesort	57.12 ±2.12	44.90 ±1.02
whileloop	132.37 ±2.45	96.28 ±0.02

Table 13. Average heap footprint (MiB) across benchmarks in the E_Elision experiment, measured before and after applying Alloy's finalizer elision optimisation. Values are arithmetic means over 30 runs with resampled traces from heaptrack, with 99% confidence intervals (see Section 8.1.4 for details).

¹ Although it is outside the scope of this paper, it would be preferable for Rust to have different types for 'data width' and 'address'. Modern C, for example, captures this difference with the size_t and uintptr_t types respectively. Rust now has a provenance lint to nudge users in this general direction, but the as keyword still allows arbitrary conversions.
² The lengthier syntax y = Gc::clone(&v) is available, since every copyable type is also cloneable.
³ These features were added to the C+11 specification but removed in C++23.
⁴ Mostly. Rust's 'temporary lifetime extension' delays destruction, but for how long is currently unspecified.
⁵ A program could pause after 'exit' while all queued finalizers are run. We are not aware of a system which does this.
⁶ This is a white lie, though the visible effect is the same. RawTable is contained in the separate hashbrown crate which is then included in Rust's standard library. We previously maintained a fork of this, but synchronising it is painful. For now, at least, we have hacked explicit knowledge of RawTable into the needs_finalize function.
⁷ Initially we wrapped this macro in a function. This led to surprising performance results: an analysis of LLVM IR suggests – but the sheer quantity of data makes it hard to confirm – that this may be due to the effect on inlining heuristics.
⁸ Our initial implementation of FSA was, in essence, an extension of Rust's type system and suffered from false positives. For example, porting Alacritty led to 110 errors — but our abstract interpretation approach raises just 2.
⁹ In commit https://github.com/rust-lang/rust/commit/ade807c6dcf6dc4454732c5e914ca06ebb429773.
¹⁰ While the term 'trial deletion' does not appear in the original paper, it is now widely used to describe part of the algorithm.

Internal

info@soft-dev.org