Thanks to luxagen for getting me to actually set something up for this. Turned out to be mighty useful when I accidentally trashed half my home folder and wanted to know if syncthing had propagated any of the damage.
The use case is slightly different to mine, but RotKraken is worth a look. Its unique feature is storing file hashes in the extended attributes of the same file. This is very tidy but doesn’t help me with catching unwanted deletions, hence going back to md5deep.
You’d think that running md5deep aka hashdeep wouldn’t be worthy of a blog post, but what I found is that the primary use case for hashdeep is actually validating the integrity of an installation in order to detect rootkits etc. This is not the same as what I’m doing which is being able to spot if I’ve lost any files I care about in
/home/tim through carelessness or bit-rot. It turns out that md5deep does actually have what I needed, but the way the options are described means it’s not at all obvious that it would fulfill this need.
You can find my hash and verify scripts as a gist here
It turns out the magic is in the “audit” section of the docs.
The terminology of the verification output is more about what it did than why you care. The important one for spotting bit-rot when verifying is
Known file not used which means that you have a hash but you no longer have a matching file anywhere. Either you deleted it or modified it on purpose or you’ve just lost something you care about. Time to reach for the backups. I like back-in-time to usb disks for backup.
hashdeep -c md5 -of -r -l Music Documents > hash_file.txt
%%%% HASHDEEP-1.0 %%%% size,md5,filename ## Invoked from: /home/tim ## $ hashdeep -c md5 -of -r -l Music Downloads Documents Pictures Phone Dropbox repo ## 3425,3ecc5852703f3846298b381bc2510a39,Music/checksums-verification-Music.txt 461,456e92277eaf9de695bd1229d80f059b,Music/checksums-verification-Music.txt.bak 100663,0d9d53e95e5d80fa43a64f5d02f25b1e,Music/checksums-Music.txt 794926,95c1558e7c97200140c37ffb0d12669d,Music/flac/Mordecai Smyth - Dial M For Mordecai/cover.jpg 17618302,cfb11490aacfdcbb79fd4310cf834e01,Music/flac/Mordecai Smyth - Dial M For Mordecai/Mordecai Smyth - Dial M For Mordecai - 02 Psychedelic Sarah.flac ...
hashdeep -k hash_file.txt -rle -of Music Documents -avv
Documents/hashdeep-checksums-verification.txt.bak: Moved from Documents/hashdeep-checksums-verification.txt Documents/hashdeep-checksums-verification.txt: No match Documents/hashdeep-checksums.txt: No match repo/rust-kata/.idea/workspace.xml: No match repo/rust-kata/.git/logs/HEAD: No match ... Documents/hashdeep-checksums-verification.txt.bak: Known file not used Documents/hashdeep-checksums.txt: Known file not used repo/rust-kata/.idea/workspace.xml: Known file not used ...
I have a monthly calendar reminder to run backups. When that goes off I:
verify-hashes.shand search the output for “Known file not used” to find any rot or churn.
rehash.shto update the hashes.
- Plug a backup HDD in and run back-in-time to update the backup
- Sleep easy.
I run one hash file for all folders. I started with one per top level folder but that meant the verify couldn’t spot things moved between folders and it reported them as missing.
It would be nice to iterate on this but it’s a good start.