This is an ongoing project about taking a large data set of songs, converting them all to the same key and tempo, and then mixing and matching clips of different songs to generate new sounds and ideas for songwriting. This is the process so far:
1: Collect a data set of music.
I collected about 1,000 song and made a separate copy, leaving all the original files untouched. I converted all of the files to .MP3 format, and filtered out any files over 20MB. I used freeware called mp3tag to extract all the metadata from the data set, and made an Excel spreadsheet with all song metadata.
2. Identify the Key and Tempo of each song.
I used a python library called Madmom to identify the key of each song (major or minor, and the root of the chord). This wasn’t perfectly accurate for all songs, but it was close. Next, I use Madmom to extract the beats and downbeats in each song, and estimate the tempo. A lot of songs have variable tempos, so I wrote something to find the max and min tempo and filtered out all the songs that varied too much in tempo. Now key and tempo are added to the Excel data set.
3. Convert all songs to be the same tempo and Pitch
I used the ffmpeg library to convert all songs to the identical pitch and tempo (100bpm). This strongly distorts some of the tracks. Optionally, I can filter out the songs that were too distorted.
4. Split each song into 4-bar clips, each 10 seconds long.
Using the downbeat detection from earlier, I used ffmpeg to split each of the pitch/tempo-adjusted tracks into 4-bar clips. Each is 16 beats total, and exactly 10 seconds, because all songs are 100 beats per minute.
5. Mix and match
Now I have thousands of 10-second clips from hundreds of songs, each of which are theoretically in the same key and the same tempo. Practically, there are many inaccuracies in the previous steps, so quite a lot of the final clips are not accurately transformed. That’s why we start with so many songs, because they will inevitably get whittled down to a few that actually work.
At this point, I select several clips at random and drop them into Audacity, a simple audio mixing platform. I play several clips simultaneously on a loop, and mix the volume, highlighting interesting combinations and moments, and excluding clips that don’t work with the mix. Sometimes I get interesting combinations of clips, such as this mix of two Borns clips, from “Fool”, “Past Lives” and a few other songs in the background.
It’s kind of muddy and chaotic, but the idea is to discover interesting moments, like the last couple of seconds of the clip, where the arrangement gets thin and the few parts work well together.
What’s Next
I’ve also used a library called Demucs that isolates vocals, drums, bass, and all other instrumentation. I could further split these clips into separate tracks. The ultimate goal of all of this is to build a large database of 10-second clips, or “blocks” of 4 bars of music, separated by source (vocals, drums, bass, etc). These can be converted into a spectrogram, or a visual representation of frequencies over time. These could then be used to train a GAN to produce novel spectrograms of typical 4-bar phrases of vocals, bass, etc, to learn their properties. Then the trick is converting that spectrogram back to audio, and we’re getting at artificially “composed” music. For now it’s a fun trick for mixing music to discover new ideas.