Learning Rust and Lemmy @lemmy.ml maegul (he/they) @lemmy.ml 3mo ago

So ... macros are fun!! (a bit of rant, maybe a kinda tutorial, and a quick hack)

Intro

Having read through the macros section of "The Book" (Chapter 19.6), I thought I would try to hack together a simple idea using macros as a way to get a proper feel for them.

The chapter was a little light, and declarative macros (using macro_rules!), which is what I'll be using below, seemed like a potentially very nice feature of the language ... the sort of thing that really makes the language malleable. Indeed, in poking around I've realised, perhaps naively, that macros are a pretty common tool for rust devs (or at least more common than I knew).

I'll rant for a bit first, which those new to rust macros may find interesting or informative (it's kinda a little tutorial) ... to see the implementation, go to "Implementation (without using a macro)" heading and what follows below.

Using a macro

Well, "declarative macros" (with macro_rules!) were pretty useful I found and easy to get going with (such that it makes perfect sense that they're used more frequently than I thought).

It's basically pattern matching on arbitrary code and then emitting new code through a templating-like mechanism (pretty intuitive).
The type system and rust-analyzer LSP understand what you're emitting perfectly well in my experience. It really felt properly native to rust.

The Elements of writing patterns with "Declarative macros"

Use macro_rules! to declare a new macro

Yep, it's also a macro!

Create a structure just like a match expression

Except the pattern will match on the code provided to the new macro
... And uses special syntax for matching on generic parts or fragments of the code
... And it returns new code (not an expression or value).

Write a pattern as just rust code with "generic code fragment" elements

You write the code you're going to match on, but for the parts that you want to capture as they will vary from call to call, you specify variables (or more technically, "metavariables").
- You can think of these as the "arguments" of the macro. As they're the parts that are operated on while the rest is literally just static text/code.
These variables will have a name and a type.
The name as prefixed with a dollar sign $ like so: $GENERIC_CODE.
And it's type follows a colon as in ordinary rust: $GENERIC_CODE:expr
- These types are actually syntax specifiers. They specify what part of rust syntax will appear in the fragment.
- Presumably, they link right back into the rust parser and are part of how these macros integrate pretty seamlessly with the type system and borrow checker or compiler.
- Here's a decent list from rust-by-example (you can get a full list in the rust reference on macro "metavariables"):
  - block
  - expr is used for expressions
  - ident is used for variable/function names
  - item
  - literal is used for literal constants
  - pat (pattern)
  - path
  - stmt (statement)
  - tt (token tree)
  - ty (type)
  - vis (visibility qualifier)

So a basic pattern that matches on any struct while capturing the struct's name, its only field's name, and its type would be:

macro_rules! my_new_macro {
    (
        struct $name:ident {
            $field:ident: $field_type:ty
        }
    )
}

Now, $name, $field and $field_type will be captured for any single-field struct (and, presumably, the validity of the syntax enforced by the "fragment specifiers").

Capture any repeated patterns with + or *

Yea, just like regex
Wrap the repeated pattern in $( ... )
Place whatever separating code that will occur between the repeats after the wrapping parentheses:
- EG, a separating comma: $( ... ),
Place the repetition counter/operator after the separator: $( ... ),+

Example

So, to capture multiple fields in a struct (expanding from the example above):

macro_rules! my_new_macro {
    (
        struct $name:ident {
            $field:ident: $field_type:ty,
            $( $ff:ident : $ff_type: ty),*
        }
    )
}

This will capture the first field and then any additional fields.
- The way you use these repeats mirrors the way they're captured: they all get used in the same way and rust will simply repeat the new code for each repeated captured.

Writing the emitted or new code

Use => as with match expressions

Actually, it's => { ... }, IE with braces (not sure why)

Write the new emitted code

All the new code is simply written between the braces
Captured "variables" or "metavariables" can be used just as they were captured: $GENERIC_CODE.
Except types aren't needed here
Captured repeats are expressed within wrapped parentheses just as they were captured: $( ... ),*, including the separator (which can be different from the one used in the capture).
- The code inside the parentheses can differ from that captured (that's the point after all), but at least one of the variables from the captured fragment has to appear in the emitted fragment so that rust knows which set of repeats to use.
- A useful feature here is that the repeats can be used multiple times, in different ways in different parts of the emitted code (the example at the end will demonstrate this).

Example

For example, we could convert the struct to an enum where each field became a variant with an enclosed value of the same type as the struct:

macro_rules! my_new_macro {
    (
        struct $name:ident {
            $field:ident: $field_type:ty,
            $( $ff:ident : $ff_type: ty),*
        }
    ) => {
        enum $name {
            $field($field_type),
            $( $ff($ff_type) ),*
        }
    }
}

With the above macro defined ... this code ...

my_new_macro! {
    struct Test {
        a: i32,
        b: String,
        c: Vec<String>
    }
}

... will emit this code ...

enum Test {
    a(i32),
    b(String),
    c(Vec<String>)
}

Application: "The code" before making it more efficient with a macro

Basically ... a simple system for custom types to represent physical units.

The Concept (and a rant)

A basic pattern I've sometimes implemented on my own (without bothering with dependencies that is) is creating some basic representation of physical units in the type system. Things like meters or centimetres and degrees or radians etc.

If your code relies on such and performs conversions at any point, it is way too easy to fuck up, and therefore worth, IMO, creating some safety around. NASA provides an obvious warning. As does, IMO, common sense and experience: most scientists and physical engineers learn the importance of "dimensional analysis" of their calculations.

In fact, it's the sort of thing that should arguably be built into any language that takes types seriously (like eg rust). I feel like there could be an argument that it'd be as reasonable as the numeric abstractions we've worked into programming??

At the bottom I'll link whatever crates I found for doing a better job of this in rust (one of which seemed particularly interesting).

Implementation (without using a macro)

The essential design is (again, this is basic):

A single type for a particular dimension (eg time or length)
Method(s) for converting between units of that dimension
Ideally, flags or constants of some sort for the units (thinking of enum variants here)
- These could be methods too

#[derive(Debug)]
pub enum TimeUnits {s, ms, us, }

#[derive(Debug)]
pub struct Time {
    pub value: f64,
    pub unit: TimeUnits,
}

impl Time {
    pub fn new<T: Into<f64>>(value: T, unit: TimeUnits) -> Self {
        Self {value: value.into(), unit}
    }

    fn unit_conv_val(unit: &TimeUnits) -> f64 {
        match unit {
            TimeUnits::s => 1.0,
            TimeUnits::ms => 0.001,
            TimeUnits::us => 0.000001,
        }
    }

    fn conversion_factor(&self, unit_b: &TimeUnits) -> f64 {
        Self::unit_conv_val(&self.unit) / Self::unit_conv_val(unit_b)
    }

    pub fn convert(&self, unit: TimeUnits) -> Self {
        Self {
            value: (self.value * self.conversion_factor(&unit)),
            unit
        }
    }
}

So, we've got:

An enum TimeUnits representing the various units of time we'll be using
A struct Time that will be any given value of "time" expressed in any given unit
With methods for converting from any units to any other unit, the heart of which being a match expression on the new unit that hardcodes the conversions (relative to base unit of seconds ... see the conversion_factor() method which generalises the conversion values).

Note: I'm using T: Into<f64> for the new() method and f64 for Time.value as that is the easiest way I know to accept either integers or floats as values. It works because i32 (and most other numerics) can be converted lossless-ly to f64.

Obviously you can go further than this. But the essential point is that each unit needs to be a new type with all the desired functionality implemented manually or through some handy use of blanket trait implementations

Defining a macro instead

For something pretty basic, the above is an annoying amount of boilerplate!! May as well rely on a dependency!?

Well, we can write the boilerplate once in a macro and then only provide the informative parts!

In the case of the above, the only parts that matter are:

The name of the type/struct
The name of the units enum type we'll use (as they'll flag units throughout the codebase)
The names of the units we'll use and their value relative to the base unit.

IE, for the above, we only need to write something like:

struct Time {
    value: f64,
    unit: TimeUnits,
    s: 1.0,
    ms: 0.001,
    us: 0.000001
}

Note: this isn't valid rust! But that doesn't matter, so long as we can write a pattern that matches it and emit valid rust from the macro, it's all good! (Which means we can write our own little DSLs with native macros!!)

To capture this, all we need are what we've already done above: capture the first two fields and their types, then capture the remaining "field names" and their values in a repeating pattern.

Implementation of the macro

The pattern

macro_rules! unit_gen {
    (
        struct $name:ident {
            $v:ident: f64,
            $u:ident: $u_enum:ident,
            $( $un:ident : $value:expr ),+
        }
    )
}

Note the repeating fragment doesn't provide a type for the field, but instead captures and expression expr after it, despite being invalid rust.

The Full Macro

macro_rules! unit_gen {
    (
        struct $name:ident {
            $v:ident: f64,
            $u:ident: $u_enum:ident,
            $( $un:ident : $value:expr ),+
        }
    ) => {
        #[derive(Debug)]
        pub struct $name {
            pub $v: f64,
            pub $u: $u_enum,
        }
        impl $name {
            fn unit_conv_val(unit: &$u_enum) -> f64 {
                match unit {
                $(
                    $u_enum::$un => $value
                ),+
                }
            }
            fn conversion_factor(&self, unit_b: &$u_enum) -> f64 {
                Self::unit_conv_val(&self.$u) / Self::unit_conv_val(unit_b)
            }
            pub fn convert(&self, unit: $u_enum) -> Self {
                Self {
                    value: (self.value * self.conversion_factor(&unit)),
                    unit
                }
            }
        }
        #[derive(Debug)]
        pub enum $u_enum {
            $( $un ),+
        }
    }
}

Note the repeating capture is used twice here in different ways.

The capture is: $( $un:ident : $value:expr ),+

And in the emitted code:

It is used in the unit_conv_val method as: $( $u_enum::$un => $value ),+
- Here the ident $un is being used as the variant of the enum that is defined later in the emitted code
- Where $u_enum is also used without issue, as the name/type of the enum, despite not being part of the repeated capture but another variable captured outside of the repeated fragments.
It is then used in the definition of the variants of the enum: $( $un ),+
- Here, only one of the captured variables is used, which is perfectly fine.

Usage

Now all of the boilerplate above is unnecessary, and we can just write:

unit_gen!{
    struct Time {
        value: f64,
        unit: TimeUnits,
        s: 1.0,
        ms: 0.001,
        us: 0.000001
    }
}

Usage from main.rs:

use units::Time;
use units::TimeUnits::{s, ms, us};

fn main() {

    let x = Time{value: 1.0, unit: s};
    let y = x.convert(us);

    println!("{:?}", x);
    println!("{:?}", x);
}

Output:

Time { value: 1.0, unit: s }
Time { value: 1000000.0, unit: us }

Note how the struct and enum created by the emitted code is properly available from the module as though it were written manually or directly.
In fact, my LSP (rust-analyzer) was able to autocomplete these immediately once the macro was written and called.

Crates for unit systems

I did a brief search for actual units systems and found the following

`dimnesioned`

dimensioned documentation

Easily the most interesting to me (from my quick glance), as it seems to have created the most native and complete representation of physical units in the type system
It creates, through types, a 7-dimensional space, one for each SI base unit
This allows all possible units to be represented as a reduction to a point in this space.
- EG, if the dimensions are [seconds, meters, kgs, amperes, kelvins, moles, candelas], then the Newton, m.kg / s^2 would be [-2, 1, 1, 0, 0, 0, 0].
This allows all units to be mapped directly to this consistent representation (interesting!!), and all operations to then be done easily and systematically.

Unfortunately, I'm not sure if the repository is still maintained.

uom

uom documentation

This might actually be good too, I just haven't looked into it much
It also seems to be currently maintained

F#

Interestingly, F# actually has a system built in!

See learning documentation on F# here
Also this older (2008) series of blogs on the feature here

You're viewing a single thread.

14 comments

For ~~figuring out how to write macros~~ anyone wanting to learn about more advanced macros beyond macro_rules, I can recommend this: https://github.com/dtolnay/proc-macro-workshop

Basically, you clone that repo, pick one of the projects, uncomment the first test in the respective tests/progress.rs file and read the steps in the respective unit test file. Then you try to implement a macro to fulfill the test.

It should be said that it isn't spoon-feeding you, you will still need to read actual documentation for macros. But with its test harness, you get a quick feedback loop and it gives at least some pointers for where to start learning.
- Nice!!
  
  It seems that it covers mainly procedural macros, which for those who don’t know are different from what I cover here. They are more involved but more powerful.
  
  Ah, you're right. I've mainly worked through the sorted-chapter and thought the seq!()-macro would be a macro_rules thing, but apparently that's a proc_macro-thing with TokenStream parsing and such, too. I didn't even know that's an option, although it makes perfect sense. 🙃
  
  Yea, and proc_macro TokenStream macros definitely seem worthwhile knowing about without necessarily ever wanting to reach for them, at least not often.
  
  Declarative macros though (using macro_rules! as in the top post) surprised me in how straightforward and useful they are. Basically boilerplate machines built right into the language. I'd previously gotten the impression that all macros were like proc_macro.
  
  It'd be interesting to see some challenges with macro_rules!. I'm not sure there's much scope to challenge people though ... they're pretty simple. But there are some tricks in the system AFAICT I didn't touch on here.
  
  Multiple alternative patterns can be matched on in a single macro (just like match expressions)
  
  Patterns can match on invalid rust, where the tt syntax type, which stands for "Token Tree" and accepts, I think, any arbitrary series of tokens, can be powerful
  
  A macro can call itself recursively
  
  Together it seems you can put together a pseudo parser, with recursive calls passing in flags or markers to dictate which branch the call goes down. I found this suggestion on users.rust-lang to use a "switch" token along with the above tricks).
  
  Yeah, I'm only looking into proc_macros, because I'm working on a library. In application code, I do think they're essentially never going to be worth the complexity that they introduce. But in a library, I can deal with the complexity and hopefully my users don't have to think about it.
  
  Having said that, I actually don't think proc_macros are insanely complex. There's a bit of a learning curve to them, particularly the parsing with the syn-crate takes a moment to understand the concepts.
  But once you've parsed things, you can use the quote-crate to do templating in quite a similar fashion as macro_rules. The thing is just that all the simple cases are covered by the simpler macro_rules, so you just wouldn't reach for proc_macro most of the time in application code.
  
  yea, and it would probably be worth just a quick hack to get a feel for it (procedural macros) at least once so you know what you can reach for when the time comes. As you say, it seems involved, but not really that insanely complex ... and knowing the bits that make the language "your own" can be really valuable. Cheers for the workshop thing though, definitely worth knowing about!

14 comments