I'm a bit of a stickler when it comes to media organization. In regards to music, I keep FLAC separate from mp3. I have a structure that looks like
music/mp3/. There are times when I want FLAC files (listening at home) and times when I want mp3s (mobility). The problem is there is not a one-to-one mapping between my two datasets. This is the point where I began rethinking what it was that I really wanted.
What I am after is a merged dataset where deduplication happens deterministically and with a purpose. There should be two resulting merged sets: duplicate preference from FLAC (high quality), and duplicate preference from mp3 (low quality). I wrote mmirror to handle this for me. It mirrors the merged folder structure by using symbolic links.
I wrote mmirror with my music in mind, but in reality it could be used to deduplicate any pair of folders. A depth parameter tells the script at which level symbolic links should be created, and any folder above it will actually be created. My music is organized as follows:
music/FLAC/The Lawrence Arms/ Metropole/. I run the script with
music/FLAC/ as the high quality input and I want the individual albums - not the artists - to be the symlink; thus, I run with a depth of 2.
Find the code on GitHub.
Two input directories are provided. mmirror is useful when they have overlapping subdirectories, but they by no means must.
To simplify things, the following inputs as arrays instead of folders.
0, 1, 2, 3, 4
2, 3, 4, 5
That will yield the following output arrays, where the l and h prefix correspond to which source is used.
l0, l1, l2, l3, l4, h5
l0, l1, h2, h3, h4, h5
python mmirror.py [OPTIONS] SOURCE_HIGH SOURCE_LOW
--output_high DIRECTORYThe output directory for the high merge.
--output_low DIRECTORYThe output directory for the low merge.
--depth INTEGERDefines the depth at which symlinks will be created. 1 will link folders under source.
--followsymlinksFollow symbolic links in the source paths
--overwritesymlinksOverwrite symlinks in the output directory.
--simulateSimulation mode. Don't actually do anything.
--verbose INTEGER RANGELogging verbosity,
-vvfor very verbose.
--helpShow this message and exit.
In my situation described above, I run the script from
python mmirror.py flac/ mp3/ --output_high high/ --output_low low/ --depth=2
mmirror requires click. It's a library to facilitate command line arguments. In retrospect, it probably wasn't a good idea to add a dependency, but I wanted to try it out. It's available via pip:
pip install click or
pip install -r requirements.txt if you clone the repository.