Reflecting on A Bad Stack

After finishing up the first chapter from Too Many Lists, here are some of the things I’ve learned:

Implementing the first iteration of our linked list.
How the null pointer optimization can reduce memory overhead.
Why the default drop trait on Box type can cause stack overflow for our List.

First iteration of our linked list

Here’s the good way to design “A Bad Stack”

struct List {
    head: Link,
}

enum Link {
    Empty,
    More(Box<Node>),
}

struct Node {
    element: i32,
    next: Link,
}

Why not just use a simple List enum?

The bad way

pub enum List {
    Empty,
    Element(i32, Box<List>),
}

This is problematic for two reasons:

Non-uniform node layout
Only first element of the list is stack allocated, rest of the elements are allocated on the heap. This makes operations like splitting/merging lists inefficient because we have to copy data between stack and heap.
Unnecessary heap allocation
The last node allocates an Empty variant on the heap, occupying 16 bytes: 8 bytes for Box<List>, 4 bytes for i32 and 4 bytes for the padding.

Read the First Layout for a detailed explanation.

Null pointer optimization

Every enum has to store a tag to specify which variant of the enum its bits represent. Rust can however figure out which variant you have without needing the tag if:

You have an enum with two variants
One variant is empty and the other one is of a type that can never be all zeros

Here are some types that can never be all zeros:

References like &T, &mut T because null pointers aren’t allowed in safe Rust
Owned pointers like Box<T>
Types that internally contain a pointer that cannot be null, like String, Vec<T>
Reference-counted pointers like Rc<T>, Arc<T>

For our Link enum, this optimization works since we have two variants where one is empty and the other is Box<Node> which can never be all zeros

I found another example which really solidified my understanding

pub enum EnumOne {
    A,
    B(String),
}

pub enum EnumTwo {
    A,
    B(i32),
}

pub fn main() {
    println!("\nWith null pointer optimization:");
    println!("  String:  {} bytes", std::mem::size_of::<String>());
    println!("  EnumOne: {} bytes", std::mem::size_of::<EnumOne>());

    println!("\nWithout null pointer optimization:");
    println!("  i32:     {} bytes", std::mem::size_of::<i32>());
    println!("  EnumTwo: {} bytes", std::mem::size_of::<EnumTwo>());
}

If we try to run this, we’ll get

❯ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/too-many-lists`

With null pointer optimization:
  String:  24 bytes
  EnumOne: 24 bytes

Without null pointer optimization:
  i32:     4 bytes
  EnumTwo: 8 bytes

Since i32 can be all zeros, Rust needs a separate tag to distinguish variant A from B(0), which is why it needs 8 bytes: 4 bytes for i32, 1 byte for the tag and remaining 3 bytes of padding

Overflowing boxes

The Box type recursively drops its contents before deallocating itself. This can cause stack overflow for large lists.

Here is Rust’s Drop trait implementation for Box type:

#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<#[may_dangle] T: ?Sized, A: Allocator> Drop for Box<T, A> {
    #[inline]
    fn drop(&mut self) {
        // the T in the Box is dropped by the compiler before the destructor is run

        let ptr = self.0;

        unsafe {
            let layout = Layout::for_value_raw(ptr.as_ptr());
            if layout.size() != 0 {
                self.1.deallocate(From::from(ptr.cast()), layout);
            }
        }
    }
}

Consider this list:

Box<Node A> → Box<Node B> → Box<Node C> → Box<Node D>

The drop sequence for this list looks like:

Drop Box<Node A> → Compiler drops Node A → Node A drops its next field, which drops Box<Node B>
Drop Box<Node B> → Compiler drops Node B → Node B drops its next field, which drops Box<Node C>
This goes on recursively for each node…

It’s problematic because the stack will look something like this:

drop Box<Node A>
  drop Node A                    ← drops contents first
    drop Box<Node B>
      drop Node B                ← still waiting
        drop Box<Node C>
          drop Node C            ← still waiting
            drop Box<Node D>
              drop Node D        ← finally done!
              deallocate D
            return
          deallocate C
        return
      deallocate B
    return
  deallocate A
return

Each Box must stay on the stack until all subsequent boxes are dropped, because it still needs to deallocate its own memory afterward.

Now if we try and run it for a large list

#[test]
fn test_stack_overflow() {
    let mut list = List::new();

    for i in 0..100_000 {
        list.push(i);
    }
}

This causes a stack overflow

❯ cargo test
     Running unittests src/lib.rs (target/debug/deps/too_many_lists-108401a94152d12f)

running 1 test

thread 'first::test::test_stack_overflow' has overflowed its stack
fatal runtime error: stack overflow, aborting

To fix this, we’ll have to implement our own Drop trait which does this iteratively. For that we’ll use mem::replace to unlink each node before it drops so that it doesn’t blow up the stack.

impl Drop for List {
    fn drop(&mut self) {
        // Replace the original value of self.head with Link::Empty and
        // take ownership of the original value. We get back the original
        // value of self.head WITHOUT drop
        let mut link = mem::replace(&mut self.head, Link::Empty);

        // Iteratively drop each node
        while let Link::More(mut node) = link {
            // Take ownership of the next link, leaving Empty in its place
            link = mem::replace(&mut node.next, Link::Empty);
            // node goes out of scope and gets dropped here, but since
            // node.next is now Empty, it doesn't trigger recursive drops
        }
    }
}

The stack looks much better now:

drop Box<Node A>
  drop Node A (next = Empty)
  deallocate A
return
drop Box<Node B>
  drop Node B (next = Empty)
  deallocate B
return
drop Box<Node C>
  drop Node C (next = Empty)
  deallocate C
return
drop Box<Node D>
  drop Node D (next = Empty)
  deallocate D
return

We don’t get stack overflow anymore!

❯ cargo test
     Running unittests src/lib.rs (target/debug/deps/too_many_lists-108401a94152d12f)

running 1 test
test first::test::test_stack_overflow ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

And that’s chapter one! Next up is An Ok Stack.