70 %
Chris Biscardi

Iterating over a directory of files with the walkdir Rust crate

Given a directory layout like this tree, how do we get all of the .js files?

src
├── components
│ ├── header.js
│ └── thing.svg
└── pages
├── about.js
└── index.js

TLDR, here's the code

rust
use std::collections::HashMap;
use std::path::Path;
use walkdir::{DirEntry, Error, WalkDir};
fn main() {
let files: Vec<Result<DirEntry, Error>> = WalkDir::new("src")
.into_iter()
.filter(|e| {
println!("testing: {:?}", e);
return e.as_ref().map_or(false, |f| {
f.file_name()
.to_str()
.map(|s| s.ends_with(".js"))
.unwrap_or(false)
});
})
.map(|e| {
println!("{:?}", e);
e
})
.collect();
println!("final: {:?}", files);
}

The console output is:

testing: Ok(DirEntry("src"))
testing: Ok(DirEntry("src/components"))
testing: Ok(DirEntry("src/components/thing.svg"))
testing: Ok(DirEntry("src/components/header.js"))
Ok(DirEntry("src/components/header.js"))
testing: Ok(DirEntry("src/pages"))
testing: Ok(DirEntry("src/pages/about.js"))
Ok(DirEntry("src/pages/about.js"))
testing: Ok(DirEntry("src/pages/index.js"))
Ok(DirEntry("src/pages/index.js"))
final: [Ok(DirEntry("src/components/header.js")), Ok(DirEntry("src/pages/about.js")), Ok(DirEntry("src/pages/index.js"))]

Explanation

We can use the walkdir crate to iterate over a directory with WalkDir::new("src").

We then turn this into an iterator with .into_iter() so that we can .filter out DirEntrys that don't match what we want.

In the case of WalkDir::new we get all files and directories so we need to filter for the files we want (.js files in this case). e in our filter is behind a shared reference, so it can't be moved and we need to access it as a reference, which we can do with .as_ref().

This function needs to return a boolean so we use .map_or. This is an interesting application of map that doesn't appear in some other languages. In this case we are mapping over the Result type and accessing the internal value. This allows us to use the internal value without having to check if the error variant exists because our map function will not run if the error value is present. We take advantage of this feature to set a default value in the case that the Result value is actually an Err and not an Ok. This default value is false. If we have an Ok, our function is called and we can use the DirEntry from the Ok directly.

rust
direntry
.file_name()
.to_str()
.map(|s| s.ends_with(".js"))
.unwrap_or(false)

Given the direntry, we can get the filename using file_name, which returns an OsStr that we can convert to a string slice with to_str. to_str returns an Option<&str> which we can map over in the same way we did to the Result earlier. In this case we map over the Some values and the Nones get passed through. Finally we use unwrap_or(false) to return false if Nothing and the value contained in Some if it exists.

Finally we .collect the iterator which makes the whole thing execute. Our type signature on files is what we collect into Vec<Result<DirEntry, Error>>. We could also write the type signature as a turbofish instead: .collect<Vec<Result<DirEntry, Error>>>().