70 %
Chris Biscardi

Advent of Code 2020 in Rust day 2: regex vs parser combinators with nom

tldr;

Parsing the input with nom is faster than using the regex crate.

Setup

The input for today looks something like this:

1-3 a: abcde
1-3 b: cdefg
2-9 c: ccccccccc

I used the same scaffolding for each part and set up a process_password_line function to process each line.

rust
pub fn process_part1(input: &str) -> usize {
input
.lines()
.filter_map(|line| match process_password_line(line) {
Ok(true) => Some(true),
Ok(false) => None,
Err(_) => None,
})
.count()
}

process_password_line

My first attempt was using regex to parse each line. This resulted in some unwieldy code that had a lot of unwraps and additional parsing. The error handling would've been annoying to "get right" so as a result I mostly skipped it. There are also some issues with differing types. You can see that character is actually a str which means I have to use contains instead of equality. Overall kind of a messy way to go about solving the problem, but it definitely works.

regex.rs
rust
fn process_password_line(line: &str) -> Result<bool, std::io::Error> {
lazy_static! {
static ref RE: Regex = Regex::new(r"([\d]+)-([\d]+) ([a-z]): ([a-z]+)").unwrap();
}
let caps = RE.captures(line).unwrap();
let lower_bound = caps.get(1).unwrap().as_str().parse::<usize>().unwrap();
let upper_bound = caps.get(2).unwrap().as_str().parse::<usize>().unwrap();
let character = caps.get(3).unwrap().as_str();
let password = caps.get(4).unwrap().as_str();
let char_count = password.chars().filter(|c| character.contains(*c)).count();
if char_count < lower_bound || char_count > upper_bound {
Ok(false)
} else {
Ok(true)
}
}

Given how messy this was, I wanted to try my hand at using a parser written with nom, a parser combinator library. This felt to me more "fit for purpose" than the regex approach.

nom is a parser combinator library with a focus on safe parsing, streaming patterns, and as much as possible zero copy.

This line from the docs explains the motivation for using nom. My favorite answer to day 1 was a victim of too many allocations, so I wanted to see what nom could do compared to regex.

nom.rs
rust
fn process_password_line(line: &str) -> Result<bool, std::io::Error> {
let result = password_line(line).map(|(_, pass)| {
let char_count = pass
.password
.chars()
.filter(|c| pass.character == *c)
.count();
char_count >= pass.lower.into() && char_count <= pass.upper.into()
});
match result {
Ok(b) => Ok(b),
_ => Err(Error::new(ErrorKind::Other, "failed to parse")),
}
}

Given the parser, the actual logic for dealing with each line becomes significantly simpler. My only hangup was on converting nom's errors into a more "standard" error type.

The parser itself is fairly readable, even if you don't know what a parser combinator is. The important parts are that we're shadowing input each time, which returns the result minus the parsed piece for the next parser.

nom-parser.rs
rust
use nom::{bytes::complete::tag, character::complete::*, IResult};
struct PasswordLine<'a> {
lower: u8,
upper: u8,
character: char,
password: &'a str,
}
fn password_line(input: &str) -> IResult<&str, PasswordLine> {
let (input, lower) = digit1(input)?;
let (input, _) = tag("-")(input)?;
let (input, upper) = digit1(input)?;
let (input, _) = tag(" ")(input)?;
let (input, parsed_character) = anychar(input)?;
let (input, _) = tag(": ")(input)?;
let (input, password) = alpha1(input)?;
Ok((
input,
PasswordLine {
lower: lower.parse::<u8>().unwrap(),
upper: upper.parse::<u8>().unwrap(),
character: parsed_character,
password,
},
))
}

Performance

The regex approach benchmarked at about 1ms while the parser approach benchmarked at 145 nanoseconds. I attribute this to the zero-copy approach nom uses and the complexity of regex engines, but I did not check that assumption.

| example | lower bound | best guess | upper bound | | ------- | ----------- | ---------- | ----------- | | nom | 143.80 us | 144.83 us | 145.99 us | | regex | 984.72 us | 994.95 us | 1007.40 us |