70 %

Advent of Code 2020 in Rust day 4: parser error locations with nom

In day 02 we wrote a simpler parser: using nom.

day-02-parser.rs

rust

fn password_line(input: &str) -> IResult<&str, PasswordLine> {
    let (input, lower) = digit1(input)?;
    let (input, _) = tag("-")(input)?;
    let (input, upper) = digit1(input)?;
    let (input, _) = tag(" ")(input)?;
    let (input, parsed_character) = anychar(input)?;
    let (input, _) = tag(": ")(input)?;
    let (input, password) = alpha1(input)?;

    Ok((
        input,
        PasswordLine {
            lower: lower.parse::<u8>().unwrap(),
            upper: upper.parse::<u8>().unwrap(),
            character: parsed_character,
            password,
        },
    ))
}

in day 04, we write a more complicated parser that can handle a variety of different attributes in different orders.

nom_locate and LocatedSpan

I went back through day 4's parser and introduced nom_locate's LocatedSpan to replace the input &str. This allows the grabbing of positional data for tokens at any point. I want to use this information in a custom error type for when documents are invalid (although in advent of code, the input is usually not malformed).

span-example.rs

rust

fn cid(input: Span) -> IResult<Span, PassportParse> {
    let (input, _) = tag("cid:")(input)?;
    let (input, _) = digit1(input)?;
    Ok((input, CID(())))
}

spans acquired via position have empty fragments.

rust

{
    offset: 21,
    line: 1,
    fragment: "",
    extra: (),
}

Whereas if we dbg! the input spans, they contain fragments

rust

Span {
    offset: 46,
    line: 2,
    fragment: "\npid:545766238 ecl:hzl\neyr:2022",
    extra: (),
}

One interesting piece of using parser combinators is that we can build up some of our own parser functionality and re-use them. This example shows a year parser that is used to implement the parsers for byr, iyr, and eyr which all have similar requirements.

composing-parsers.rs

rust

fn year<'a>(prefix: &str, lower: usize, upper: usize, input: Span<'a>) -> IResult<Span<'a>, usize> {
    let (input, _) = tag(prefix)(input)?;
    let (input, year) = digit1(input)?;
    match year.parse::<usize>() {
        Ok(digits) => {
            if digits >= lower && digits <= upper {
                Ok((input, digits))
            } else {
                Err(nom::Err::Error(nom::error::Error {
                    input,
                    code: nom::error::ErrorKind::Digit,
                }))
            }
        }
        _ => Err(nom::Err::Error(nom::error::Error {
            input,
            code: nom::error::ErrorKind::Digit,
        })),
    }
}
fn byr(input: Span) -> IResult<Span, PassportParse> {
    year("byr:", 1920, 2002, input).map(|(i, r)| (i, BYR(r)))
}
fn iyr(input: Span) -> IResult<Span, PassportParse> {
    year("iyr:", 2010, 2020, input).map(|(i, r)| (i, IYR(r)))
}
fn eyr(input: Span) -> IResult<Span, PassportParse> {
    year("eyr:", 2020, 2030, input).map(|(i, r)| (i, EYR(r)))
}

we can .map over the result to turn them into the PassportParse type we need to satisfy alt.

Performance

The runtime of the parser is about 500 us.

| example | lower bound | best guess | upper bound | | ------- | ----------- | ---------- | ----------- | | nom | 578.63 us | 581.37 us | 584.33 us |

Advent of Code 2020 in Rust day 4: parser error locations with nom

nom_locate and LocatedSpan

Sharing Functionality

Performance