Parsing the input with nom is faster than using the regex crate.
The input for today looks something like this:
I used the same scaffolding for each part and set up a
process_password_line function to process each line.
My first attempt was using regex to parse each line. This resulted in some unwieldy code that had a lot of unwraps and additional parsing. The error handling would've been annoying to "get right" so as a result I mostly skipped it. There are also some issues with differing types. You can see that
character is actually a
str which means I have to use
contains instead of equality. Overall kind of a messy way to go about solving the problem, but it definitely works.
Given how messy this was, I wanted to try my hand at using a parser written with nom, a parser combinator library. This felt to me more "fit for purpose" than the regex approach.
nom is a parser combinator library with a focus on safe parsing, streaming patterns, and as much as possible zero copy.
This line from the docs explains the motivation for using nom. My favorite answer to day 1 was a victim of too many allocations, so I wanted to see what nom could do compared to regex.
Given the parser, the actual logic for dealing with each line becomes significantly simpler. My only hangup was on converting nom's errors into a more "standard" error type.
The parser itself is fairly readable, even if you don't know what a parser combinator is. The important parts are that we're shadowing
input each time, which returns the result minus the parsed piece for the next parser.
The regex approach benchmarked at about 1ms while the parser approach benchmarked at 145 nanoseconds. I attribute this to the zero-copy approach nom uses and the complexity of regex engines, but I did not check that assumption.
| example | lower bound | best guess | upper bound | | ------- | ----------- | ---------- | ----------- | | nom | 143.80 us | 144.83 us | 145.99 us | | regex | 984.72 us | 994.95 us | 1007.40 us |