I'm a bit of a stickler when it comes to media organization. In regards to music, I keep FLAC separate from mp3. I have a structure that looks like music/flac/ and music/mp3/. There are times when I want FLAC files (listening at home) and times when I want mp3s (mobility). The problem is there is not a one-to-one mapping between my two datasets. This is the point where I began rethinking what it was that I really wanted.

What I am after is a merged dataset where deduplication happens deterministically and with a purpose. There should be two resulting merged sets: duplicate preference from FLAC (high quality), and duplicate preference from mp3 (low quality). I wrote mmirror to handle this for me. It mirrors the merged folder structure by using symbolic links.

I wrote mmirror with my music in mind, but in reality it could be used to deduplicate any pair of folders. A depth parameter tells the script at which level symbolic links should be created, and any folder above it will actually be created. My music is organized as follows: music/FLAC/The Lawrence Arms/[2014] Metropole/. I run the script with music/FLAC/ as the high quality input and I want the individual albums - not the artists - to be the symlink; thus, I run with a depth of 2.

Find the code on GitHub.


Two input directories are provided. mmirror is useful when they have overlapping subdirectories, but they by no means must.

To simplify things, the following inputs as arrays instead of folders.

Low: 0, 1, 2, 3, 4

High: 2, 3, 4, 5

That will yield the following output arrays, where the l and h prefix correspond to which source is used.

Low: l0, l1, l2, l3, l4, h5

High: l0, l1, h2, h3, h4, h5




  • --output_high DIRECTORY The output directory for the high merge.
  • --output_low DIRECTORY The output directory for the low merge.
  • -d, --depth INTEGER Defines the depth at which symlinks will be created. 1 will link folders under source.
  • --followsymlinks Follow symbolic links in the source paths
  • --overwritesymlinks Overwrite symlinks in the output directory.
  • --simulate Simulation mode. Don't actually do anything.
  • -v, --verbose INTEGER RANGE Logging verbosity, -vv for very verbose.
  • --help Show this message and exit.


In my situation described above, I run the script from music/.

python flac/ mp3/ --output_high high/ --output_low low/ --depth=2

This populates music/high/ and music/low/.


mmirror requires click. It's a library to facilitate command line arguments. In retrospect, it probably wasn't a good idea to add a dependency, but I wanted to try it out. It's available via pip: pip install click or pip install -r requirements.txt if you clone the repository.

Posted by kael on 2 June 2014