I'm a bit of a stickler when it comes to media organization. In regards to music, I keep FLAC separate from mp3. I have a structure that looks like music/flac/
and music/mp3/
. There are times when I want FLAC files (listening at home) and times when I want mp3s (mobility). The problem is there is not a one-to-one mapping between my two datasets. This is the point where I began rethinking what it was that I really wanted.
What I am after is a merged dataset where deduplication happens deterministically and with a purpose. There should be two resulting merged sets: duplicate preference from FLAC (high quality), and duplicate preference from mp3 (low quality). I wrote mmirror to handle this for me. It mirrors the merged folder structure by using symbolic links.
I wrote mmirror with my music in mind, but in reality it could be used to deduplicate any pair of folders. A depth parameter tells the script at which level symbolic links should be created, and any folder above it will actually be created. My music is organized as follows: music/FLAC/The Lawrence Arms/[2014] Metropole/
. I run the script with music/FLAC/
as the high quality input and I want the individual albums - not the artists - to be the symlink; thus, I run with a depth of 2.
Find the code on GitHub.
Details
Two input directories are provided. mmirror is useful when they have overlapping subdirectories, but they by no means must.
To simplify things, the following inputs as arrays instead of folders.
Low: 0, 1, 2, 3, 4
High: 2, 3, 4, 5
That will yield the following output arrays, where the l and h prefix correspond to which source is used.
Low: l0, l1, l2, l3, l4, h5
High: l0, l1, h2, h3, h4, h5
Usage
python mmirror.py [OPTIONS] SOURCE_HIGH SOURCE_LOW
Options:
--output_high DIRECTORY
The output directory for the high merge.--output_low DIRECTORY
The output directory for the low merge.-d
,--depth INTEGER
Defines the depth at which symlinks will be created. 1 will link folders under source.--followsymlinks
Follow symbolic links in the source paths--overwritesymlinks
Overwrite symlinks in the output directory.--simulate
Simulation mode. Don't actually do anything.-v
,--verbose INTEGER RANGE
Logging verbosity,-vv
for very verbose.--help
Show this message and exit.
Example
In my situation described above, I run the script from music/
.
python mmirror.py flac/ mp3/ --output_high high/ --output_low low/ --depth=2
This populates music/high/
and music/low/
.
Requirements
mmirror requires click. It's a library to facilitate command line arguments. In retrospect, it probably wasn't a good idea to add a dependency, but I wanted to try it out. It's available via pip: pip install click
or pip install -r requirements.txt
if you clone the repository.