Mononym: Type-Level Named Values in Rust - Part 1: Demo and Implementation

Posted October 31, 2021 ‐ 16 min read

Announcement

I am happy to announce the release of mononym, a Rust library for creating unique type-level names for each value in Rust.

Mononym provides the core type Named<N, T>, which represents a named value of type T with a unique type N as its name. Mononym guarantees that there can be no two values with the same name. With that, the N type serves as a unique representation of a Rust value at the type level.

Mononym enables the use of the design pattern Ghosts of Departed Proofs in Rust. It provides macros that simplify the definition of dependent pairs and proof objects in Rust. Although there is still limited support for a full dependently-typed programming in Rust, Mononym helps us move a small step toward that direction by making it possible to refer to values in types.

You can find out more about Mononym at the project GitHub, and try it out by adding the mononym crate to your Rust project.

Demo Tutorial

As a quick demo, Mononym can be used to construct named values and create proofs that relate multiple values as follows:

pub mod less_than_eq
{
  use mononym::*;

  proof! {
    LessThanEq(x: u32, y: u32);
  }

  pub fn check_less_than_eq<XVal: HasType<u32>, YVal: HasType<u32>>(
    x: &Named<XVal, u32>,
    y: &Named<YVal, u32>,
  ) -> Option<LessThanEq<XVal, YVal>>
  {
    if x.value() <= y.value() {
      Some(LessThanEq::new())
    } else {
      None
    }
  }
}

We define a less_than_eq module that defines a LessThanEq proof type using the proof! macro provided by Mononym. The function check_less_than_eq is then defined to accept two named values x and y, with the type names XVal and YVal representing the values x and y at the type level. In the function body, it checks that if x (XVal) is indeed less than or equal to y (YVal), it would return the proof in the form of LessThanEq<XVal, YVal>.

We can then write a main program that use the less_than_eq module as follows:

use less_than_eq::*;
use mononym::*;

with_seed(|seed| {
  let (seed1, seed2) = seed.replicate();
  let x: Named<_, u32> = seed1.new_named(2);
  let y: Named<_, u32> = seed2.new_named(4);

  let x_is_less_than_y: LessThanEq<_, _> =
    check_less_than_eq(&x, &y).expect("should get proof that x <= y");

  assert!(check_less_than_eq(&y, &x).is_none());
});

We start by calling the with_seed function with a continuation closure. The with_seed function generates a fresh seed type which is then given to the closure as the seed variable. We then call seed.replicate() to create two new copies of seed, because it is an affine value in Rust that can be used at most once.

Each seed value can be used once to generate a named value. We then assign the variable x to seed1.new_named(2), which has the type Named<_, u32>. Mononym provides the Named type to represent Rust values with unique name types. In this case, the _ is used in the first position of the Named type for x, because the new_named method returns a Named type with an opaque type as the name.

Similarly, the variable y is assigned to seed2.new_named(4). Note that even though y also have the type Named<_, u32>, it is in fact a different type than the type of x. With Mononym guaranteeing the uniqueness of name types, we can conceptually fill in the _ and refer to the name type of x being XVal, and the name type of y being YVal which is different from XVal.

We call check_less_than_eq(&x, &y), and expect the function to return a proof that x with a value of 2 is indeed less than y with a value of 4. Similarly, if we call check_less_than_eq(&y, &x), we expect None to be returned and no proof should exist for 4 being less than 2.

Note that unlike in fully formalized languages like Coq, Mononym do not check that the proof LessThan<XVal, YVal> can really only be constructed if and only if x ≤ y. It is up to the implementer of functions such as check_less_than to ensure that the construction of proofs match the underlying invariant.

Nevertheless, proofs that are constructed using mononym are useful for encoding pre- and post-conditions for functions so that they can be composed in a declarative way. For example, we can define another non_zero module that produce proof that a number is non-zero, and together we can create a percentage module that converts the division of two numbers into percentage form if and only if x ≤ y and y is not zero:

pub mod non_zero
{
  use mononym::*;

  proof! {
    NonZero(num: u32);
  }

  pub fn check_non_zero<NumVal: HasType<u32>>(
    num: &Named<NumVal, u32>
  ) -> Option<NonZero<NumVal>>
  {
    if *num.value() != 0 {
      Some(NonZero::new())
    } else {
      None
    }
  }
}

pub mod percentage
{
  use mononym::*;

  use super::{
    less_than_eq::LessThanEq,
    non_zero::NonZero,
  };

  pub fn to_percentage<
    NumeratorVal: HasType<u32>,
    DenominatorVal: HasType<u32>,
  >(
    numerator: &Named<NumeratorVal, u32>,
    denominator: &Named<DenominatorVal, u32>,
    _numerator_lte_denom: &LessThanEq<NumeratorVal, DenominatorVal>,
    _denom_not_zero: &NonZero<DenominatorVal>,
  ) -> f64
  {
    let numerator: f64 = (*numerator.value()).into();
    let denominator: f64 = (*denominator.value()).into();
    numerator / denominator * 100.0
  }
}

Similar to the less_than_eq module, the check_non_zero function in the non_zero module checks and may return a proof NonZero<NumVal> for a value being non-zero.

The percentage module then defines the to_percentage function to make use of both the proofs of LessThanEq<XVal, YVal> and NonZero<YVal> before returning a f64 percentage value. Assuming that the proofs in less_than_eq and non_zero are constructed properly, we can guarantee that the f64 returned by to_percentage is always between 0 and 100.

Using Mononym, the to_percentage function can be defined as a total function, as in it does not need to return an Option<f64> or Result<f64, Error> to handle the cases where either the numerator or denominator values are invalid.

We can write a test function that verifies that the to_percentage function works as we intended:

use less_than_eq::*;
use mononym::*;
use non_zero::*;
use percentage::*;

with_seed(|seed| {
  let (seed1, seed2) = seed.replicate();
  let x: Named<_, u32> = seed1.new_named(2);
  let y: Named<_, u32> = seed2.new_named(4);

  let x_is_less_than_y: LessThanEq<_, _> =
    check_less_than_eq(&x, &y).expect("should get proof that x <= y");

  let y_not_zero =
    check_non_zero(&y).expect("should get proof that y is non zero");

  let percent = to_percentage(&x, &y, &x_is_less_than_y, &y_not_zero);

  assert_eq!(percent, 50.0);
})

It is worth noting that the types provided by mononym also help ensure that the proofs we obtained correspond to the correct value. For example, if we accidentally got the proof of x being non_zero instead of y in check_non_zero(&x), the call to to_percentage would result in a type error informing us that the non-zero proof we have for x is invalid.

Although we cannot use Rust to formally prove that to_percentage will always return a valid percentage value between 0 and 100, Mononym can help reduce significantly the surface area of code that can potentially violate such invariant.

Since we know that the proofs LessThanEq and NonZero can only be produced by less_than_eq and non_zero, we do not need to worry about any other potential cases that to_percentage can be given invalid arguments, no matter how large and how complex our code base become.

We can also use Mononym together with techniques such as property based testing to further ensure that the behavior of to_percentage is correct. In such property-based test, the random generator can attempt to randomly generate named values, and call to_percentage only when valid proofs can be constructed using less_than_eq and non_zero. This would significantly reduce the number of test cases needed, as compared to a brute force generator that have to test the function with the cartesian product of u32 * u32. (Note that integration with testing framework is still a future work planned for Mononym)

Technical Overview

In this section, I will go through the technical details of how Mononym enforce uniquness of name types using language features that are unique to Rust, such as impl Trait and higher-ranked trait bounds. These features allow Mononym to have a simpler implementation of name types, as compared to existing implementations in languages like Haskell.

This section will assume that the reader already have some familiarity to type-driven development, and understand the usefulness of representing values as type-level names for constructing proofs. In the future part of the series, I will walk through readers who are unfamiliar with these concepts with more examples similar to the one shown in the demo section.

`impl Trait` as Opaque Types

At its core, Mononym makes use of the idea that the use of impl Trait in return position produces a new abstract type that is unique to that function.

Consider the following example:

trait Name {}
impl Name for () {}

fn foo() -> impl Name {}
fn bar() -> impl Name {}

fn same<T>(_: T, _: T) {}
same(foo(), foo());
same(bar(), bar());
same(foo(), bar()); // error
same(foo(), ()); // error
same(bar(), ()); // error

We define a dummy trait Name that is implemented only for (). We then define two functions foo() and bar() that return impl Name by returning (). Although both foo() and bar() are in effect returning the same concrete type (), they are considered different types by the Rust compiler.

To test whether the types of two values are equal, we define the function same<T>(_: T, _: T) that pass the compilation check if two values have the same type. When we try to compile the code, we will find that both same(foo(), foo()) and same(bar(), bar()) pass the compilation, but same(foo(), bar()) would result in a type error. Similarly, the check on same(foo(), ()) fails, as Rust treats the returned impl Name different than the underlying type ().

Generic-dependent Uniqueness of `impl Trait`

The use of impl Name in return position provides the first step toward defining unique names for each value. However the above code shows an obvious issue of using returned impl Name as unique type, as the test same(foo(), foo()) pass the compilation. This means that two calls to the same function returning impl Trait will return values of the same opaque type. If we want the name type to be truly unique, the test same(foo(), foo()) should have failed.

We can try to force the opaque type returned in a function returning impl Name unique by making the function generic:

trait Name {}
impl Name for () {}

fn type_name<T>(_: T) -> impl Name {}

fn same<T>(_: T, _: T) {}
same(type_name(()), type_name(()));
same(type_name(0_u64), type_name(0_u64));
same(type_name(()), type_name(0_u64)); // error
same(type_name(0_u32), type_name(0_u64)); // error
same(type_name(||{}), type_name(||{})) // error

In the above example, we define a type_name function that accepts a dummy value of any generic type, and return an impl Name. From there we can see that when type_name is called with the same type, the returned impl Name is considered the same type. As a result, the tests to same(type_name(()), type_name(())) and same(type_name(0_u64), type_name(0_u64)) pass the compilation, but tests such as same(type_name(()), type_name(0_u64)) fail the compilation.

Closure Expressions as Unique Type

In the last test, we also see that the compilation test for same(type_name(||{}), type_name(||{})) has failed. This is because each new closure expression in Rust produces a unique anonymous type. With this, we know that as long as we are providing different closure expressions, the returned impl Name would be considered different type by Rust.

Moving one step further, we can instead define the implementation for Name using the anonymous closure type:

use core::marker::PhantomData;

trait Name {}
struct SomeName<N>(PhantomData<N>);
impl <N> Name for SomeName<N> {}

fn unsafe_new_name<N>(_: N) -> impl Name {
    SomeName::<N>(PhantomData)
}

fn same<T>(_: T, _: T) {}

let f = ||{};
same(unsafe_new_name(f), unsafe_new_name(f));

fn foo() -> impl Name {
    unsafe_new_name(||{})
}
same(foo(), foo())

same(unsafe_new_name(||{}), unsafe_new_name(||{})); // error

We define a struct SomeName<N> that implements Name based on the unique type N. We then rename our name generator function to unsafe_new_name and make it return SomeName based on the given generic type N. We keep the SomeName struct and unsafe_new_name private, so that external users cannot create new impl Name that may violate the uniqueness guarantee.

We can see that at this stage the name creation is still unsafe, as user can still bind the closure expression to a variable like let f = ||{}, and then the resulting test same(unsafe_new_name(f), unsafe_new_name(f)) would pass. Similarly, if we define a function like foo() -> impl Name that calls unsafe_new_name(||{}) inside the function, the resulting test same(foo(), foo()) would still pass the compilation.

Name Seed

To guarantee that every opaque type behind an impl Name is unique, we need to ensure that not only the inner function that returns impl Name is generic, but also all possible functions that return impl Name must be generic as well.

While it is possible to ask end users to provide a unique type using ||{} at each function definition and and call, doing so can be error prone and confusing. Instead, we want to define a name seed that is a unique type itself, and use it to generate new names.

We implement the Seed type as follows:

use core::marker::PhantomData;

pub trait Name {}
struct SomeName<N>(PhantomData<N>);
impl <N> Name for SomeName<N> {}

pub struct Seed<N>(PhantomData<N>);

impl <N: Name> Seed<N> {
  pub fn new_name(self) -> impl Name
  {
    SomeName::<N>(PhantomData)
  }

  pub fn replicate(self) -> (Seed<impl Name>, Seed<impl Name>)
  {
    (Seed(PhantomData::<N>), Seed(PhantomData::<N>))
  }
}

The type Seed<N> is parameterized by a name N and provides two methods. The first method new_name(self) consumes the seed and returns a new impl Name (notice the lack of & in self). Since the seed is consumed when generating the name, the same seed cannot be used to generate another new name of the same type.

Although the seed is consumed during name generation, the second method replicate(self) consumes the original seed, and returns two new seeds with unique names in the form of Seed<impl Name>. Notice that the two Seed<impl Name> returned by replicate are considered different types by Rust, even when they have the same underlying concrete type Seed<N>. By calling replicate one or more times, we will be able to generate multiple names with unique types.

Since there is no public constructor function to create new Seed value, the only way external users can create new seed is by replicating existing seeds. In this way, external functions would have to always accept Seed<impl Name> as an argument somewhere along the function calls to be able to generate new names.

fn test(seed: Seed<impl Name>) {
  fn same<T>(_: T, _: T) {}
  let (seed1, seed2) = seed.replicate();
  same(seed1, seed2); // error
  same(seed1.new_name(), seed2.new_name()); // error
}

By treating our test function as an external function, it is forced to accept a Seed<impl Name> in order to generate new impl Names. We first use seed.replicate() to create two new seeds seed1 and seed2. When we compile the code, we can find out that the test same(seed1, seed2) fails, indicating that the two replicated seeds have different types. Similarly, the test same(seed1.new_name(), seed2.new_name()) fails because the two names are generated from different seeds. It is also not possible to do something like same(seed.new_name(), seed.new_name()), because the affine type system of Rust consumes the seed during name generation and do not allow the seed to be reused.

Named Values

The Seed type we defined earlier provides a new_name method that returns unique impl Name. While having a unique name is not very useful on its own, it can be used to define a Named<N, T> struct to assign unique names to a given value of type T. The Named struct is defined and used as follows:

use core::marker::PhantomData;

pub struct Named<N, T>(T, PhantomData<N>);

impl <N, T> Named<N, T> {
  pub fn value(&self) -> &T { &self.0 }
  pub fn into_value(self) -> T { self.0 }
}

pub trait Name {}
pub struct Seed<N>(PhantomData<N>);

impl <N: Name> Seed<N> {
  pub fn new_named<T>(self, value: T) -> Named<impl Name, T>
  {
    Named::<N, _>(value, PhantomData)
  }

   pub fn replicate(self) -> (Seed<impl Name>, Seed<impl Name>)
  {
    (Seed(PhantomData::<N>), Seed(PhantomData::<N>))
  }
}

The struct Named<N, T> is essentially a newtype wrapper around T, with the underlying value kept private. The Named type provides two public methods, value for getting a reference to the underlying value, and into_value to convert the named value to the underlying value.

The Seed type now provides a new_named method that accepts an owned value of type T, and returns a Named<impl Name, T>. Because the impl Name is nested inside Named, we can guarantee that the new name given to the value is unique, provided that the Seed type is unique.

fn test(seed: Seed<impl Name>) {
  fn same<T>(_: T, _: T) {}
  let (seed1, seed2) = seed.replicate();
  same(seed1.new_named(1), seed2.new_named(1)); // error
}

Similar to earlier, we can test that two named values indeed have different names by writing a test function that accepts a Seed<impl Name>. After replicating the seed, we can verify that the test same(seed1.new_named(1), seed2.new_named(1)) fails with error during compilation. This shows that the Named<impl Name, i32> values returned by the two calls to new_name are indeed unique.

We can think of the type Named<N, T> as being a singleton type, that is, a type with only one possible value. With Rust's affine type system, the singleton guarantee is even stronger that we can never have two Rust values of type Named<N, T> with the same N type.

Unique Lifetime with Higher Ranked Trait Bounds

Our setup for generating uniquely named values is mostly complete, provided we are able to hand over the first unique Seed value to the main function to start generating new names. But we cannot simply expose a function like fn new_seed() -> Seed<impl Name>, as we know that two calls to the same function will return two values of the same type, thereby making them non-unique.

We know that in languages like Haskell, it is possible to generate unique types by using continuation-passing-style with higher-ranked continuations. While Rust do not currently support higher-ranked types, it instead supports higher-ranked trait bounds (HRTB) which can be used in similar way.

use core::marker::PhantomData;

pub trait Name {}
pub struct Life<'name>(PhantomData<*mut &'name ()>);
impl<'name> Name for Life<'name> {}

pub fn with_seed<R>(
  cont: impl for<'name> FnOnce(Seed<Life<'name>>) -> R
) -> R {
  cont(Seed(PhantomData))
}

pub struct Named<N, T>(T, PhantomData<N>);

impl <N, T> Named<N, T> {
  pub fn value(&self) -> &T { &self.0 }
  pub fn into_value(self) -> T { self.0 }
}

pub struct Seed<N>(PhantomData<N>);

impl <N: Name> Seed<N> {
  pub fn new_named<T>(self, value: T) -> Named<impl Name, T>
  {
    Named::<N, _>(value, PhantomData)
  }

   pub fn replicate(self) -> (Seed<impl Name>, Seed<impl Name>)
  {
    (Seed(PhantomData::<N>), Seed(PhantomData::<N>))
  }
}

We first come out with a different way of generating unique types, using the Life type that is parameterized by a unique lifetime. The struct is defined as struct Life<'name>(PhantomData<*mut &'name ()>). The inner type PhantomData<*mut &'name ()> makes Rust treats Life<'name> as if it is a raw pointer of type *mut &'name (). We specifically make it so that Rust treats 'name as an invariant phantom lifetime. This means that if we have two types Life<'name1> and Life<'name2>, Rust would consider them as different types even if there are partial overlaps such as 'name1: 'name2.

Using Life, we now simplify the problem of generating unique names to generating unique lifetimes. We then define the with_seed function, which accepts a continuation with a higher-ranked trait bound impl for<'name> FnOnce(Seed<Life<'name>>) -> R. The for<'name> part forces the continuation closure to work with all possible lifetimes. As a result, we can guarantee that the type Life<'name> is always unique inside the closure. By using Life<'name> as the unique type inside Seed<Life<'name>>, we ensure that the seed type given to the continuation closure is also unique.

fn same<T>(_: T, _: T) {}

with_seed(|seed| {
  let (seed1, seed2) = seed.replicate();
  same(seed1, seed2); // error
  same(seed1.new_named(1), seed2.new_named(1)); // error
});

with_seed(|seed1| {
  with_seed(|seed2| {
    same(seed1, seed2); // error
    same(seed1.new_named(1), seed2.new_named(1)); // error
  });
});

We can now repeat the same test we had earlier, but now with the tests running inside the closure given to with_seed. We can verify that after replicating the seed, the tests same(seed1, seed2) and same(seed1.new_named(1), seed2.new_named(1)) still fail with compilation error, indicating that the names generated are different.

We can also repeat the same test with two nested calls to with_seed, thereby getting two separate fresh seeds seed1 and seed2. Thanks to the magic of HRTB and invariant phantom lifetime, we can verify that even in this case the test same(seed1, seed2) still fails, indicating that Rust is treating the two underlying lifetimes differently. Similarly, the test same(seed1.new_named(1), seed2.new_named(1)) also fails, indicating that names generated by two different seeds are indeed different.

The techniques of using phantom lifetimes as names and using HRTB to generate new lifetimes is first explored in GhostCell. With that, we close the loop of unique name generation by requiring the top level program to generate the first seed using with_seed. Furthermore, since multiple calls to with_seed is safe, this means that there is no way for external users to do unsafe construction of two seeds or named values of the same type. From this, we can safely reason that assuming unsafe Rust is not used, whenever we get a value of type Named<N, T>, the type N is always uniquely assigned to the underlying value.

Up Next

This concludes the first part of the blog post series for Mononym. In the next part, we will go through how proof objects are defined using the proof! macro, and how newly named values can be returned together with proofs by defining dependent pair types using the exists! macro. We will also go through with more examples of how to use proof! and exists! to implement stronger guarantees to our code.