Category Archives: Research & Development

A small step into denoising autoencoders. Optimizers.

The following chart are some of the results of creating a denoising autoencoder.

The idea is that a neural network is trained to map a 1482 dimensional input space to a 352 dimensional space in such a way that it will recover 30% of randomly removed data. Once that first stage is trained, its output is used to train the second stage which maps the data to 84 dimensions. The last stage bring it further down to 21 dimensions. The advantage of this method is that such denoising autoencoders grab patterns in the input, which are then at a higher level combined into higher level patterns.

I have been testing various optimizers. The results below show how much of a signal can be recovered. To do that, we take the 1482 dimensional dataset map it through to 21 dimensions and then map it back to 1482 dimensions. After that we compare the original and recovered signal. The error we get is then compared against the most simple predictor; namely the average of the signal.

Now, the first thing we noticed is that although rmsprop style approaches go extremely fast, they do result in an average signal (literally, they just decode the signal by producing the average). Secondly, stochastic data corruption should of course not be combined with an optimizer that compensates for such noise (which the rmsprop and momentum methods do to a certain extend).

In the end, sgd turns out to retain the most ‘local patterns’, yet converges too slowly. Using adam improves the convergence speed. In this case, because mean-normalizing the data fucks up the results we actually modified adam to calculate the variance correctly.

This is of course all very beginner style stuff. Probably in a year or so I will look back at this and think: what the hell was I thinking when I did that ?

optimizers

How did we come to these particular values ?

BpmDj represents each song with a 1482 dimensional vector. So I already had ~200000 entries of those and wanted to play with them. Broken down: the rhythm patterns contain 384 ticks per frequency band and we have 3 frequency bands. (thus 3*384). Aside from that we have loudness quantiles of 30 frequency bands (11*30). Which sums to about 1482 dimensions.

Then the second decission was made to stick to three layers. Mainly because google deepdream already finds high level patterns at the 3th level. No reason to go further than that I thought.

Then I aimed to reduce the same amount in each stage (/~4 as you noticed). So I ballparked the 21. And that was mainly because initially I started with an autoencoder that went 5 stages deep. It so happened I was happy at stage 21 and kept it like that so I could compare results between them.

Now, that 21 dimensional space might still be way too large. Theoretically, we can represent 2^21 classes with them if each neuron would simply say a yes/no. However, there is also a certain amount of redundancy between neurons. They sometimes find the same patterns back and so on.

Latency calculations for Zathras

Because Zathras-1 now has to interact with linearly interpolated timestretches, we had to calculate the latency of our timestretcher.

Multiple results were new to me and interesting to see.

  1. An sliding window fourier transform has a latency of 0. I did not expect that at all.
  2. When an input frame is stretched, then each sample of the inputframe will have  a different latency.
  3. The latencies of all samples, except the middle one, depend on the playback speed. Yes, you read that right: the playback speed affects the latency. And that is really something we do not want to compensate for. Therefore, use as reference the middle of the frame.
  4. It is possible to have negative latencies. Part of the signal is already out before we believe we provided it to the timestretcher.

http://werner.yellowcouch.org/Papers/zathras15/#toc23

Brainwaves note combinations

Below is a table showing which notes generate what brainwave oscilations. The idea is that each note is played in a separate channel.

Brainwave 1.9446332095341745 C1 and C#1
Brainwave 2.060267117566937 C#1 and D1
Brainwave 2.1827769755841757 D1 and D#1
Brainwave 2.3125716488486177 D#1 and E1
Brainwave 2.4500843150167455 E1 and F1
Brainwave 2.5957739098288073 F1 and F#1
Brainwave 2.7501266587643656 F#1 and G1
Brainwave 2.9136576997744896 G1 and G#1
Brainwave 3.086912802506866 G#1 and A1
Brainwave 3.2704701897611983 A1 and A#1
Brainwave 3.464942467254282 A#1 and B1
Brainwave 3.670978668134147 B1 and C2
Brainwave 3.8892664190683774 C2 and C#2
Brainwave 4.004900327101112 C1 and D1
Brainwave 4.1205342351338885 C#2 and D2
Brainwave 4.243044093151113 C#1 and D#1
Brainwave 4.365553951168337 D2 and D#2
Brainwave 4.495348624432793 D1 and E1
Brainwave 4.625143297697278 D#2 and E2
Brainwave 4.762655963865363 D#1 and F1
Brainwave 4.90016863003342 E2 and F2
Brainwave 5.045858224845553 E1 and F#1
Brainwave 5.191547819657643 F2 and F#2
Brainwave 5.345900568593173 F1 and G1
Brainwave 5.500253317528745 F#2 and G2
Brainwave 5.663784358538855 F#1 and G#1
Brainwave 5.827315399548979 G2 and G#2
Brainwave 6.000570502281356 G1 and A1
Brainwave 6.173825605013732 G#2 and A2
Brainwave 6.187677302685287 C1 and D#1
Brainwave 6.3573829922680645 G#1 and A#1
Brainwave 6.540940379522397 A2 and A#2
Brainwave 6.5556157419997305 C#1 and E1
Brainwave 6.73541265701548 A1 and B1
Brainwave 6.929884934508564 A#2 and B2
Brainwave 6.945432939449539 D1 and F1
Brainwave 7.135921135388429 A#1 and C2
Brainwave 7.341957336268308 B2 and C3
Brainwave 7.3584298736941705 D#1 and F#1
Brainwave 7.560245087202524 B1 and C#2
Brainwave 7.778532838136755 C3 and C#3
Brainwave 7.795984883609918 E1 and G1
Brainwave 8.009800654202266 C2 and D2
Brainwave 8.241068470267749 C#3 and D3
Brainwave 8.259558268367662 F1 and G#1
Brainwave 8.486088186302226 C#2 and D#2
Brainwave 8.500248951533905 C1 and E1
Brainwave 8.731107902336703 D3 and D#3
Brainwave 8.750697161045721 F#1 and A1
Brainwave 8.990697248865615 D2 and E2
Brainwave 9.005700057016476 C#1 and F1
Brainwave 9.250286595394527 D#3 and E3
Brainwave 9.271040692042554 G1 and A#1
Brainwave 9.525311927730698 D#2 and F2
Brainwave 9.541206849278346 D1 and F#1
Brainwave 9.800337260066868 E3 and F3
Brainwave 9.822325459522347 G#1 and B1
Brainwave 10.091716449691063 E2 and F#2
Brainwave 10.108556532458536 D#1 and G1
Brainwave 10.383095639315286 F3 and F#3
Brainwave 10.406391325149627 A1 and C2
Brainwave 10.691801137186388 F2 and G2
Brainwave 10.709642583384408 E1 and G#1
Brainwave 10.95033326655065 C1 and F1
Brainwave 11.000506635057462 F#3 and G3
Brainwave 11.025187554456807 A#1 and C#2
Brainwave 11.327568717077725 F#2 and G#2
Brainwave 11.346471070874529 F1 and A1
Brainwave 11.601473966845283 C#1 and F#1
Brainwave 11.654630799097788 G3 and G#3
Brainwave 11.680779322336413 B1 and D2
Brainwave 12.001141004562712 G2 and A2
Brainwave 12.02116735080692 F#1 and A#1
Brainwave 12.291333508042712 D1 and G1
Brainwave 12.347651210027436 G#3 and A3
Brainwave 12.375354605370603 C2 and D#2
Brainwave 12.714765984536129 G#2 and A#2
Brainwave 12.735983159296836 G1 and B1
Brainwave 13.022214232233026 D#1 and G#1
Brainwave 13.081880759044992 A3 and A#3
Brainwave 13.111231483999504 C#2 and E2
Brainwave 13.47082531403096 A2 and B2
Brainwave 13.493304127656494 G#1 and C2
Brainwave 13.546107176379458 C1 and F#1
Brainwave 13.796555385891274 E1 and A1
Brainwave 13.859769869017128 A#3 and B3
Brainwave 13.890865878899035 D2 and F2

Timestretcher

timestretchsegmentsThe timestretcher I’ve been working on is going ahead quite well. I’m currently working on timestretcher parameter changes. When changing the tempo, the segments should still align correctly. In the image  each color is a new synthesized segment. As can be seen, they overlap nicely.  A wave from one segment is picked up by the next, and so on.

 

Speed requirements timestretcher

I’m currently optimizing my timestretcher. Just to give an idea of the speed we need to achieve: with a windowsize of 4096 samples and an overlap of 16 frames. We have about 5.8 milliseconds between frame calculations. Now, this is stereo which means that we only have 2.9 milliseconds/frame. In each frame we have to detect sine waves, extract them relocate them to other positions. Typically we have around ~600 peaks per frame. That leaves us about 4.8 microseconds per peak. Each peak must be detected, extracted, repositioned, resampled and added to the final frame again. Currently I manage to do this in 13 microseconds. And now the push is on. Can we speed this further up with a factor 2.7 ?

BpmDj at Chaos Communication Camp 2015

This year we will again be at the Chaos Communication Camp 2015 (which takes place around Berlin)

At CCC 2011, we gave a talk on how BpmDj performs its audio analysis. Two years later, at OHM 2013, we explained how the nearest neighbor detection and associated weight matrix is created. This year, we won’t talk about the project anymore, but instead give you the opportunity to meet the developers.

Actually, we want you to come to us with ~100 tracks and an idea for a mix. We will then sit together and create that mix. As a reward you will receive one of our heat sensitive ‘stay-tuned-stay-sharp-keep-mixing’ cups.

11782341_10205654209527497_5904057869868228386_o