Category Archives: Research & Development

A new object cache for BpmDj

In BpmDj we load objects on demand: every time a particular object is accessed we load it from the database. This process happens automatically, and is implemented through a dictionary which maps an object id to a runtime representation.

In Java, this dictionary was a WeakDictionary, which is a dictionary from which values can be removed by the garbage collector. When when they got removed and the program accessed that object again, we would load it fresh from the database. This poor man caching is not particularly good because any garbage collect will remove all loaded (but unreferenced) objects, forcing the program to reload those object again. Even if the particular object is often used.

To solve that, we could force references to stay in memory by means of a round robin queue. Every time an object is accesed it is put in the next position in the buffer. As such, we ensure that the cache keeps X instances alive.

Sadly that strategy is unable to deal with a burst of requests. Any often used object will simply be pushed out of the buffer when a batch of new objects is loaded (like for instance when the song selector opens).

To alleviate this problem, we can, with each access, gradually increase the stickiness of a cache item. This idea turned out to be fairly efficient:

  • every entry has a position in the buffer. Whenever the entry is hit, it moves to half its original position.
  • every new element is placed in the middle of the buffer.

This strategy leads to a distribution where often used elements are in front of the buffer. Lesser used elements slowly walk their way out of the buffer until they are evicted. To avoid that items become too sticky (e.g: there can be items that have just enough been accessed to never leave the buffer again), it is useful to add a random element to this

  • reposition an element to a random position between 0 and ratio * originalRank.

One could argue that having too many object id’s and too few actual objects would be a cause of concern, and it clearly is. Nevertheless, there often is a space tradeoff between holding on to an object and using its id.

The image shows the buffer of a cache of capacity 100, with 800 distinct element randomly accessed. The access pattern was shaped according to a power law distribution. The front of the cache are those that are more sticky than the later part of the buffer. The height of each entry indicates its priority in the emitter.

The following picture shows the difference between 3 types of cache. The first is the roundrobin mentioned earlier, the second is a cache which keeps backreferences and the elevator cache is the one implemented here.

The data on which this was ran was the retrieveal of all startup objects BpmDj need, including the opening of the song selector. The total object count was 133632, of which 70291 unique ones.

Adadelta lovely optimizer

I just tested adadelta on the superresolution example that is part of the pytorch examples. The results are quiet nice and I like the fact that LeCun’s intuition to use the hessian estimate actually got implemented in an optimizer (I tried doing it myself but couldn’t get through the notation in the original paper).

Interesting is that the ‘learning-rate of 1’ will scatter throughout the entire space a bit more than what you would expect. Eventually it does not reach the same minima as a learning rate of 0.1.

In the above example we also delineated each epoch every 20 steps. That means, when step%20==0 we cleared all the gradients. Which feelsd a bit odd that we have to do so. In any case, without the delineation into epochs the results are not that good. I do not entirely understand why. It is clear that each epoch allows the optimizer to explore a ‘new’ direction by forgetting the garbage trail it was on, and in a certain way it regularizes how far each epoch can walk away from its original position. Yet _why_ the optimizer does not decide for itself that it might be time to ditch the gradients is something I find interesting.

Orthogonal weight initialization in PyTorch seems kinda weird

I recently gave deep learning another go. This time I looked into pytorch. At least the thing lets you program in a synchronous fashion. One of the examples however did not work as expected.

I was looking into the superresolution example (https://github.com/pytorch/examples) and printed out the weights of the second convolution layer. It turned out these were ‘kinda weird’ (similar to attached picture). So I looked into them and found that the orthogonal weight initialization that was used would not initialize a large section of the weights of a 4 dimensional matrix. Yes, I know that the documentation stated that ‘dimensions beyond 2’ are flattened. Does not mean though that the values of a large portion of the matrix should be empty.

weights_grmpf

The orthogonal initialisation seems to have become a standard (for good reason. See the paper https://arxiv.org/pdf/1312.6120.pdf), yet is one that does not work together well with convolution layers, where a simple input->output matrix is not stratight away available. Better is to use the xavier_uniform initialisation. That is, in the file model.py you should have an initialize_weights as follows:

 def _initialize_weights(self):
   init.xavier_uniform(self.conv1.weight, init.calculate_gain('relu'))
   init.xavier_uniform(self.conv2.weight, init.calculate_gain('relu'))
   init.xavier_uniform(self.conv3.weight, init.calculate_gain('relu'))
   init.xavier_uniform(self.conv4.weight)

With this, I trained a model on the BSDS300 dataset (for 256 epochs) and then tried to upsample a small  image by a factor 2. The upper image is the small image (upsampled using a bicubic filter). The bottom one is the small picture upsampled using the neural net.

The weights we now get at least use the full matrix.

The output when initialized with “orthogonal” weights has some sharp ugly edges:

A better crossfade

A demonstration on the difference between two crossfades. The first is a straightforward crossfade. The second one is a crossfade in which the partials of both tracks are detected, selected (the strongest wins) and then resynthesized.

The normal crossfade:

Crossfading the partials:

To hear the difference listen to the middle of the two tracks (around 16″). While the normal crossfade sounds muddier, the second one retains the same volume and clarity as either track (at least to my ears).

A talk on timestretching at a hackerscamp

Time stretching of audio tracks can be easily done by either interpolating missing samples (slowing down the track), or by throwing away samples (speeding up the track). A drawback is that this results in a pitch change. In order to overcome these issues, we created a time stretcher that would not alter the pitch when the playback speed changed. In this talk we discuss how we created a fast, high quality time stretcher, which is now an integral part of BpmDj. We explain how a sinusoidal model is extracted from the input track, its envelope modeled and then used to synthesize a new audio track. The synthesis timestretches the envelope of all participating sines, yet retains the original pitch. The resulting time stretcher uses only a frame overlap of 4, which reduces the amount of memory access and computation compared to other techniques.

We assume the listener will have a notion about Fourier analysis. We do however approach the topic equally from an educational as well as from a research perspective.

High resolution slides are available at http://werner.yellowcouch.org/Papers/sha2017/index.html

 

Sleep debt, Oxygen deprivation and Resmed

I apologize at the start of this post. I never wanted to sound as someone who wants to document his snoring. Anyway, I do so because I feel I have some important things to share. I will try to stick to some facts that might help you without going too much in my personal situation.

Since half a year or so I try to get my snoring under control. To that end I got a Resmed Airsense 10 from docters/insurers together with a nasal mask.

Nostrils

The first big obstacle was learning to sleep with my mouth closed. Not much to do but to actually do it. This took some weeks.

Then getting up to speed was problematic because my nostrils would be more closed than open during the night. That lead to painful lungs and a not so optimal ‘therapy’. Two things were necessary to resolve this

a- got rid of the airfilter in the machine. The airfilter that was installed would actually pollute the air coming in (it hadn’t been changed in at least 6 months and the provider didn’t feel in a hurry to change it)

b- started using the humidifier at position 4. Every morning I would take it out of the machine and leave it open during the day. That would allow the water to breath. Once in a while I would replace it completely.

With those two tricks I got my nostrils somewhat under control.

Lack of oxygen

Although the headaches during the days vanished immediately after starting to use the machine, I now got symptoms of someone lacking oxygen. I felt really tired in the afternoon. Talking to my doctor did not help very much. He suggested that I could not be lacking oxygen because there was a positive pressure at the inlet.

It took me a couple of months to realize that he was wrong. To understand that just imagine the mask with no breathing holes. If you exhale air you will fill up the long tube, the humidifier and the rest of the machine with used air. The next breath you take will be first the old air, then the new. Now imagine, only 1 or two holes. The machine will be able to generate the necessary pressure, still no real air exchange will take place.

To analyze this further I set up a simulation in which the patient would inhale all the air he just exhaled. Here is what happens to the oxygen level then:

reuse-same-air

The above plots shows the initial oxygen concentration at about 21%. Each breath removes 1/4th of the oxygen, leaving you after 4 breaths with only 1/3th of the necessary oxygen ! That is quiet staggering.

Of course, the resmed machines do not have closed holes. The positive pressure replaces some of the old air with new air.  The question is now: how much ? This can be expressed as a percentage of the air that is swapped’ per breath cycle. For each percentage, we can calculate the amount of oxygen (compared to normal air) that would be available to you during the night.

poison-yourself2 poison-yourself

From the above plot we can see that if the machine is able to swap out 80% of the air (during one breath-cycle),  you will have 93% of normal-air-oxygen. That is 7% less than you need.

Clearly we had found the culprit. I must have had an air exchange percentage that was sufficiently low, leading to a low oxygen availability. The question now was: what to do about it ?

Solution #1: turn of the ‘autoset’ feature of the resmed machine or increase its minimum pressure substantially – Initially, my machine was set in autoset mode, which means that it would try to determine  the best setting for your situation automatically. It will navigate itself between the minimum and maximum boundaries to minimize the number of blockages. Of course, that minimum might lower the average pressure that sends air out of the mask. Thus: it might be that, although it might minimize the apnoes, it no longer vents properly. An easy solution to solve that is to use a continuous positive pressure, or to increase the minimum pressure. That there is some truth to this can be read from online reports of people who went from a normal CPAP mask (resmed 8) to an automatic settings mask (resmed 10) and complained that they felt worse with this new machine.

How does the mathematics look ? Generally, air pressure through a hole can be modeled as the square root of the pressure difference divided by the air density. (ignoring friction and so on). We thus have sqrt(0.016666 P) describing the air speed velocity through the holes. Thus if we raise the minimum pressure by a factor x, we will push out around sqrt(x) more air.

Solution #2: turn off  Exhalation Pressure Release (The EPR setting) – online you often read that people sleep easier when the machine allows you to exhale easier (it does that by lowering the pressure when you exhale), yet in doing so, it drastically reduce the amount of air swapped out.

Solution #3: use a larger mask. Try to close the main hole of the mask and exhale through the little holes. Measure how long it takes to exhale and compare that against a normal exhalation. If you cannot generate enough pressure to get all the air out within one breath cycle, then your mask is too small. This problem can be solved by using a larger mask

Submissions to SHA2017 are out

Allthough SHA 2017 is too expensive for what it is (OHM 2013 was not particularly well organized), I decided that if they pay my entrance ticket I will still participate. I proposed two/three events.

First a talk on Zathras titled: “Time Stretching BpmDj – 8 Secrets the Audio Industry does not want you to know. Nr 5 will shock you.”

Time stretching of audio tracks can be easily done by either interpolating missing samples (slowing down the track), or by throwing away samples (speeding up the track). A drawback is that this results a pitch change. In order to overcome these issues, we created a time stretcher that would not alter the pitch when the playback speed changed.

In this talk we discuss how we created a fast, high quality time stretcher, which is now an integral part of BpmDj. We explain how a sinusoidal model is extracted from the input track, its envelope modeled and then used to synthesize a new audio track. The synthesis timestretches the envelope of all participating sines, yet retains the original pitch. The resulting time stretcher uses only a frame overlap of 4, which reduces the amount of memory access and computation compared to other techniques.

Demos of the time stretcher can be heard at http://werner.yellowcouch.org/log/zathras/
The paper that accompanies this talk is at http://werner.yellowcouch.org/Papers/zathras15/

We assume the listener will have a notion about Fourier analysis. We do however approach the topic equally from an educational as well as from a research perspective.

Then I proposed to play 2 DJ-sets with BpmDj. In itself interesting because I did not play anything the past 10 years. So I had to sell myself somehow.

Dr. Dj. Van Belle, a psyparty DJ who has his roots in the BSG Party Hall (Brussels/Belgium, 1998). After playing popular tunes for them students, he decided to throw in some psytrance… And absolutely no new style was born. He started his trend to be as inconspicuous as possible. In October 2006 he surfaced at the northern Norwegian Insomnia Festival. As an experiment, he played all songs at 85% of their normal speed. Every time he saw a camera, he inconspicuously hid behind the mixing desk. Since then he has done absolutely nothing. His career is as much a standstill as psytrance was between 2000 and 2016. And this makes him the perfect DJ. Bring in some of them good old beats. Some nostalgia for y’all. An academic approach to the real challenge on how to entertain them phone junkies.

Nowadays he plays anything he can get his hands on, mainly to test the DJ software he made. Some of his mixes can be found at https://www.mixcloud.com/5dbb/

I’m curious what they will accept (if anything).

Synthesizing waves

This image represents one of the remaining problems with the Zathras timestretcher I wrote. When synthesizing a new wave, we want to do that fast, so we use an FFT to generate on average 632 sines at the same time. The problem is that whenever a wave has a frequency that does not match any Fourier bin then we need to ‘fix’ it. That is done by applying a phase modulation to it. yet because the Fourier synthesis requires circular waves, the endpoints must match (that is be a multiple of 2Pi). When the wave is modulated with a non 2pi multiple this requirement is not satisfied. The result is that we set out an frequency path (the blue line, with only 8 points), and then assume that the final synthesized wave will be equally linear. The red line shows how this is not the case.

At the moment, to solve this we overlap a lot of these windows so the error fades away in the background. Yet a metallic ring remains.

phase-problem

A second solution is the application of an appropriate window (E.g: Kaiser Bessel); which will push the entire error into the endpoints.

phase-halfsolution

Autoencoder identity mapping

Can an autoencoder learn the identity mapping ? To test that, I went to the extreme: let an optimalisation algorithm (SGD) find the best mapping when then visible units are 0-dimensional (a scalar) and the hidden units as well.

the first remarkable thing is that there is no solution that will have the perfect mapping ! There simply does not exist a relation that will map the input straight to the output when tied weights and sigmoids are used.

Anyway, because the problem is so non-dimensional, we can calculate the cost over an area and plot it in a surface. Two things are worth noting.

  1. The minimum can be found as soon as we get into a very narrow valey… from the right angle… If we were to enter it from the back (B>40) then the value floor is not sufficiently steep to guide us quickly to the minimum.
  2. If we were dropped on this surface at (W:40;B:-20) then the search algorithm would go down from one plateau to the next, blissfully aware of that nice crevasse that we laid out for it.

learningidentity