All posts by werner

Cache coherence and Java

You all know my good old friend Dough Lea ? A moron who instead of putting any sensible locking strategy into Java has been putting the most nonsensical thing in it ? No ? Well, let’s dig into his latest bullshit. Somewhere along the lines he decided that cache coherence is really something you want the programmer to care about.

http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html gives a demonstration of the thing most people would expect to work that does not work. Basically, if you have a synchronized write to a variable, the (non synchronized) reads from other threads can retrieve a stale state version from their cache. That means, that if you have to synchronize to write a variable, you likely have to synchronize all reads as well. If you don’t want to do that, or if you cannot do that, then you should  use the volatile keyword.

That means two things:

  1. synchronizing treemaps/hashmaps through a facade basically means also locking on read accesses. Huge performance penalty.
  2. if you have an executor array of threads that will work their ways through an ordered set of thunks to be executed, then each of these thunks has to have all their variables marked as volatile. Otherwise no cache coherence can be guaranteed and a thunk that would work on thread 1, might just not work when executed on thread 2.

It really hurts me to have to think through bullshit like this. You expect your cache to be coherent. That’s it. If it is not then why the hell am I using a virtual machine ? I could as well start programming kernels.

Android storage permissions

I’m an overall unhappy person. What else would you expect if you have to navigate that excuse for an OS, which is called Android.
This time I’m talking Android Storage Permissions.
With each new release they redefine the rules completely. So whatever BpmDj wrote in the past on your device in a _persistent_ storage will no longer be persistent in newer versions. Even the idea that, when your app requests file write permissions, it should actually have them, seems far fetched. No, before you create a directory you have to ask permission to the OS of doing so. Thereby you have to notify the user (depending on the android version) and request the permission at runtime (why do they have a manifest then anyway). Of course you are not allowed to ask for that permission in the application its main routine. No no no no no. You can only ask that permission when the first user interface element is shown. Until then you can gloriously fuck off with that nice idea of creating the necessary directories and database for your app.
 
Even worse, the user can withdraw that permission at any moment during the run of your application. That means you have to check EVERY BLOODY INSULTING file operation.
 
Now, you think ‘checking permissions’ is an easy thing to do. Also there you are completely wrong. It can only be done in an asynchronous manner. Yes. Fuck that nice function you had.
The correct answer at http://stackoverflow.com/questions/32225506/android-6m-permission-issue-create-directory-not-working gives an idea on how much retardation is exactly involved.

Submissions to SHA2017 are out

Allthough SHA 2017 is too expensive for what it is (OHM 2013 was not particularly well organized), I decided that if they pay my entrance ticket I will still participate. I proposed two/three events.

First a talk on Zathras titled: “Time Stretching BpmDj – 8 Secrets the Audio Industry does not want you to know. Nr 5 will shock you.”

Time stretching of audio tracks can be easily done by either interpolating missing samples (slowing down the track), or by throwing away samples (speeding up the track). A drawback is that this results a pitch change. In order to overcome these issues, we created a time stretcher that would not alter the pitch when the playback speed changed.

In this talk we discuss how we created a fast, high quality time stretcher, which is now an integral part of BpmDj. We explain how a sinusoidal model is extracted from the input track, its envelope modeled and then used to synthesize a new audio track. The synthesis timestretches the envelope of all participating sines, yet retains the original pitch. The resulting time stretcher uses only a frame overlap of 4, which reduces the amount of memory access and computation compared to other techniques.

Demos of the time stretcher can be heard at http://werner.yellowcouch.org/log/zathras/
The paper that accompanies this talk is at http://werner.yellowcouch.org/Papers/zathras15/

We assume the listener will have a notion about Fourier analysis. We do however approach the topic equally from an educational as well as from a research perspective.

Then I proposed to play 2 DJ-sets with BpmDj. In itself interesting because I did not play anything the past 10 years. So I had to sell myself somehow.

Dr. Dj. Van Belle, a psyparty DJ who has his roots in the BSG Party Hall (Brussels/Belgium, 1998). After playing popular tunes for them students, he decided to throw in some psytrance… And absolutely no new style was born. He started his trend to be as inconspicuous as possible. In October 2006 he surfaced at the northern Norwegian Insomnia Festival. As an experiment, he played all songs at 85% of their normal speed. Every time he saw a camera, he inconspicuously hid behind the mixing desk. Since then he has done absolutely nothing. His career is as much a standstill as psytrance was between 2000 and 2016. And this makes him the perfect DJ. Bring in some of them good old beats. Some nostalgia for y’all. An academic approach to the real challenge on how to entertain them phone junkies.

Nowadays he plays anything he can get his hands on, mainly to test the DJ software he made. Some of his mixes can be found at https://www.mixcloud.com/5dbb/

I’m curious what they will accept (if anything).

Making a Desk

The Desk

Because my previous desk was two tables and a chair, my wife decided that I would make one from scratch. The biggest problem was cutting the two pieces of three layered wood such that the errors in both cuts would compensate. Both with regard to the angle as well as with regard to the horizontal variations.

The easiest solution is to cut them both at the same time, in the same direction.

IMG_20161112_112909 IMG_20161112_131252_1

Once that was done, we made the legs and fitted them.

IMG_20161112_204014_1

IMG_20161113_101729

IMG_20161113_095248

We added an organic form because we needed place to actually sit on. The two plates placed together only allowed for 120 cm free room. Cutting out 20 cm we gave us a bit more leaway and opened up the desk. Furthermore, cutting out this form allowed us to cut ‘through’ the seam which made it fit even better.

IMG_20161113_115617

Then we glued them in place. For a change we actually made them fit in a pi/2 angle. One of those things which requires a bit too much fiddling for me to really bother with it. Yet it does matter if you are going to look at this every day.

IMG_20161113_165956

Once done with that, the paint job started. 2 layers for legs and bottom side. 3 layers for the main surface. With 10 hours between each layer this adds up to quite a waiting time.

IMG_20161113_183856

IMG_20161114_083837

IMG_20161114_154021

The cabinets

The side walls of the three cabinets

IMG_20161115_213541_1

IMG_20161115_213603

IMG_20161115_213550

The drawers:

IMG_20161116_132133

Put togetherIMG_20161116_162627

IMG_20161116_212059 IMG_20161117_001649 IMG_20161117_001643 IMG_20161117_001632

Everything put together:

finished

 

Why I left Stack Overflow

I decided to leave stack overflow after about 6 years. Two realizations lead to this decision.

A – moderators and editors are much more valued than people who actually create content. For every question I tried to ask, or tried to answer, I have been talking more with moderators than I actually spend time writing it. Just a grab from my experience,

  • you write a sensible answer, some mega moderator just deletes it because he believes it is link spam. After sending an email that it is not link spam the post gets reinstated.
  • a shitload of stupid questions coming in. Bad grammar, unclear what they are asking and so on. With a signal/noise level of about -24 dB it is very very bad.
  • there are so many disrespectful people who actually demand that you look at their question. “It has to be a completely self standing solution before I will accept the answer” is not uncommon at the site.
  • but if you yourself ask a question, it is downvoted right away, without any explanation.
  • people who game the system: You answer someones question, someone comments on your answer and adds a detail. Then that same person takes your answer, reposts it 2 minutes later, and claims the reward.
  • people who think that downvoting an answer to a question they don’t like is the correct thing to do
  • people who cannot read a question and keep insisting that they cannot give legal advice. This resulted in a of close/reopen/close/reopen round for that particular question.
  • idiots who place a bounty on their question, yet after you answer them they do not award the bounty.
  • yet stack overflow that will not return the bounty to the original poster if not awarded.
  • and of course people who are not really interested in the question. You put a bounty on a question you have. About a day before the deadline expires someone ‘answers’ the question by well formulating some general things I already knew. Clearly hoping to get more than 2 points so he can claim his non-answer.

Thus we end up in a situation where the teachers (those with high reputation) suck the livelihood out of content creators, who maybe don’t need endless discussions. By doing so, they create a new class that will tend to behave the same.

Because the above problems are systemic I decided to leave.

B- There is however a second reason. Namely, reputation is just a number, it doesn’t mean anything.  I really do not need to “play the game”.


What do I loose by leaving ?

The possibility to ask questions ? Only 1 question has been answered sensibly.

The possibility to answer questions ? I will not do that anymore because it really has become too difficult.

The reputation that I am this person at stackoverflow ? Doesn’t mean a thing either except if you believe that a large number makes you a better programmer.

I realize that some people might like to play the S.O. game and that is fine. It is just not for me anymore.

Synthesizing waves

This image represents one of the remaining problems with the Zathras timestretcher I wrote. When synthesizing a new wave, we want to do that fast, so we use an FFT to generate on average 632 sines at the same time. The problem is that whenever a wave has a frequency that does not match any Fourier bin then we need to ‘fix’ it. That is done by applying a phase modulation to it. yet because the Fourier synthesis requires circular waves, the endpoints must match (that is be a multiple of 2Pi). When the wave is modulated with a non 2pi multiple this requirement is not satisfied. The result is that we set out an frequency path (the blue line, with only 8 points), and then assume that the final synthesized wave will be equally linear. The red line shows how this is not the case.

At the moment, to solve this we overlap a lot of these windows so the error fades away in the background. Yet a metallic ring remains.

phase-problem

A second solution is the application of an appropriate window (E.g: Kaiser Bessel); which will push the entire error into the endpoints.

phase-halfsolution

Autoencoder identity mapping

Can an autoencoder learn the identity mapping ? To test that, I went to the extreme: let an optimalisation algorithm (SGD) find the best mapping when then visible units are 0-dimensional (a scalar) and the hidden units as well.

the first remarkable thing is that there is no solution that will have the perfect mapping ! There simply does not exist a relation that will map the input straight to the output when tied weights and sigmoids are used.

Anyway, because the problem is so non-dimensional, we can calculate the cost over an area and plot it in a surface. Two things are worth noting.

  1. The minimum can be found as soon as we get into a very narrow valey… from the right angle… If we were to enter it from the back (B>40) then the value floor is not sufficiently steep to guide us quickly to the minimum.
  2. If we were dropped on this surface at (W:40;B:-20) then the search algorithm would go down from one plateau to the next, blissfully aware of that nice crevasse that we laid out for it.

learningidentity

A reading of a Theano compiled graph

I’m trying to understand a compiled theano function, which I printed with theano.printing.debugprint. The full monster is printed below, yet it is only with a couple of lines that I have some problems.

This code first computes a random yes/no vector in node 6. Node 5 is used to create the appropriate shape (resembling x)

Gemm{inplace} [id A] <TensorType(float32, matrix)> '(dcost/dW)'   23
 |Dot22 [id B] <TensorType(float32, matrix)> ''   22
 | |InplaceDimShuffle{1,0} [id C] <TensorType(float32, matrix)> 'x_tilde.T'   12
 | | |Elemwise{Mul}[(0, 0)] [id D] <TensorType(float32, matrix)> 'x_tilde'   8
 | |   |RandomFunction{binomial}.1 [id E] <TensorType(float32, matrix)> ''   6
 | |   | |<RandomStateType> [id F] 
 | |   | |MakeVector{dtype='int64'} [id G] <TensorType(int64, vector)> ''   5
 | |   | | |Shape_i{0} [id H] <TensorType(int64, scalar)> ''   1
 | |   | | | |x [id I] <TensorType(float32, matrix)>
 | |   | | |Shape_i{1} [id J] <TensorType(int64, scalar)> ''   0
 | |   | |   |x [id I] <TensorType(float32, matrix)>
 | |   | |TensorConstant{1} [id K] <TensorType(int8, scalar)>
 | |   | |TensorConstant{0.75} [id L] <TensorType(float32, scalar)>
 | |   |x [id I] <TensorType(float32, matrix)>
 | |Elemwise{Composite{((i0 - i1) * i2 * i1)}}[(0, 2)] [id M] <TensorType(float32, matrix)> ''   21
 |   |TensorConstant{(1, 1) of 1.0} [id N] <TensorType(float32, (True, True))>
 |   |Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 0)] [id O] <TensorType(float32, matrix)> 'reduced'   15
 |   | |Dot22 [id P] <TensorType(float32, matrix)> ''   11
 |   | | |Elemwise{Mul}[(0, 0)] [id D] <TensorType(float32, matrix)> 'x_tilde'   8
 |   | | |W [id Q] <TensorType(float32, matrix)>
 |   | |InplaceDimShuffle{x,0} [id R] <TensorType(float32, row)> ''   2
 |   |   |B [id S] <TensorType(float32, vector)>
 |   |Dot22 [id T] <TensorType(float32, matrix)> '(dcost/dreduced)'   20
 |     |Elemwise{Composite{((i0 * (i1 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3)) * Composite{scalar_sigmoid((i0 + i1))}(i2, i3) * (i4 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3))) / i5)}}[(0, 2)] [id U] <TensorType(float32, matrix)> ''   18
 |     | |TensorConstant{(1, 1) of -2.0} [id V] <TensorType(float32, (True, True))>
 |     | |x [id I] <TensorType(float32, matrix)>
 |     | |Dot22 [id W] <TensorType(float32, matrix)> ''   17
 |     | | |Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 0)] [id O] <TensorType(float32, matrix)> 'reduced'   15
 |     | | |InplaceDimShuffle{1,0} [id X] <TensorType(float32, matrix)> 'W.T'   3
 |     | |   |W [id Q] <TensorType(float32, matrix)>
 |     | |InplaceDimShuffle{x,0} [id Y] <TensorType(float32, row)> ''   4
 |     | | |B_Prime [id Z] <TensorType(float32, vector)>
 |     | |TensorConstant{(1, 1) of 1.0} [id N] <TensorType(float32, (True, True))>
 |     | |Elemwise{mul,no_inplace} [id BA] <TensorType(float32, (True, True))> ''   16
 |     |   |InplaceDimShuffle{x,x} [id BB] <TensorType(float32, (True, True))> ''   14
 |     |   | |Subtensor{int64} [id BC] <TensorType(float32, scalar)> ''   10
 |     |   |   |Elemwise{Cast{float32}} [id BD] <TensorType(float32, vector)> ''   7
 |     |   |   | |MakeVector{dtype='int64'} [id G] <TensorType(int64, vector)> ''   5
 |     |   |   |Constant{1} [id BE] 
 |     |   |InplaceDimShuffle{x,x} [id BF] <TensorType(float32, (True, True))> ''   13
 |     |     |Subtensor{int64} [id BG] <TensorType(float32, scalar)> ''   9
 |     |       |Elemwise{Cast{float32}} [id BD] <TensorType(float32, vector)> ''   7
 |     |       |Constant{0} [id BH] 
 |     |W [id Q] <TensorType(float32, matrix)>
 |TensorConstant{1.0} [id BI] <TensorType(float32, scalar)>
 |InplaceDimShuffle{1,0} [id BJ] <TensorType(float32, matrix)> ''   19
 | |Elemwise{Composite{((i0 * (i1 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3)) * Composite{scalar_sigmoid((i0 + i1))}(i2, i3) * (i4 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3))) / i5)}}[(0, 2)] [id U] <TensorType(float32, matrix)> ''   18
 |Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 0)] [id O] <TensorType(float32, matrix)> 'reduced'   15
 |TensorConstant{1.0} [id BI] <TensorType(float32, scalar)>
RandomFunction{binomial}.0 [id E]  ''   6

Then at the second marked bold section, we see that node 16 performs the multiplication of two subnodes 14 and 13; which both are very similar except for a constant 0 or 1.

The code for node 14 reuses the same vector as node 5, but then casts it to float32 (which is node 7). And then the magic, the thing I do not understand happens. From that casted vector, a subtensor with constant 0 is selected. What does this do ?

The questions:

  1. Does the SubTensor (node 10 or node 9) select the first element of the tensor from node 7 ?
  2. Node 7 is merely a casted version of node 5. Does that vector contain the random data generated in node 6 ?
  3. Once the subtensors of node 7 are selected they are allowed to broadcast (node 13 and 14) to finally be multiplied with each other in node 16. Is it correct to say that node 16 thus computes the multiplication of the first random element with the second random element (from a random vector that might be quiet somewhat larger) ?
  4. When I print out the types of the nodes, we see that the output of the subtensor is indeed a scalar (as expected), yet the type of InPlaceDimensionShuffle and the ElementWise{mul} is (True,True). What for type is that ?
  5. If ElementWise{Mul} is not specifying ‘inplace’, (as happens in node 6), which of the two children is then modified ? Is it the node associated with the RandomFunction (thus node5) or does the randomFunction (node 6) provide us with another copy that can be modified ?

The image graph of the above function is given below. The yellow box is the one that confuses me because, if I read that right its output is merely a broadcastable 0.

gradient_marked

After a day bashing my head against this nonsense I fugerd out the following:

Makevector in node 5 uses the shape as input parameters and concatenates them into a small vector (See http://deeplearning.net/software/theano/library/tensor/opt.html#theano.tensor.opt.MakeVector).

Thus its output is a vector with two values: the dimensions of x. The random generator then uses that as input to generate a correctly sized vector. And the yellow box in the computation merely multiplies the two dimensions with each other to calculate the element-count of the input. Answering each of the subquestions:

  1. yes the subtensor selects respectively the 0th and 1st element of the intput vector.
  2. node 7, does indeed contain the casted data from node 5. However node 5 does not contain the random data.
  3. it is wrong to say that node 16 computes the multiple of the two first random values. Right is: node 16 computes the multiplication of the dimension sizes of the input vector x.
  4. The (True,True) type merely tells us the broadcasting pattern. See section http://deeplearning.net/software/theano/library/tensor/basic.html#all-fully-typed-constructors
  5. Without inplace an elementwise multiplication destroys one of its inputs. The inputs that are destroyed are marked in red. documentation on color coding http://deeplearning.net/software/theano/library/printing.html#theano.printing.pydotprint

A small step into denoising autoencoders. Optimizers.

The following chart are some of the results of creating a denoising autoencoder.

The idea is that a neural network is trained to map a 1482 dimensional input space to a 352 dimensional space in such a way that it will recover 30% of randomly removed data. Once that first stage is trained, its output is used to train the second stage which maps the data to 84 dimensions. The last stage bring it further down to 21 dimensions. The advantage of this method is that such denoising autoencoders grab patterns in the input, which are then at a higher level combined into higher level patterns.

I have been testing various optimizers. The results below show how much of a signal can be recovered. To do that, we take the 1482 dimensional dataset map it through to 21 dimensions and then map it back to 1482 dimensions. After that we compare the original and recovered signal. The error we get is then compared against the most simple predictor; namely the average of the signal.

Now, the first thing we noticed is that although rmsprop style approaches go extremely fast, they do result in an average signal (literally, they just decode the signal by producing the average). Secondly, stochastic data corruption should of course not be combined with an optimizer that compensates for such noise (which the rmsprop and momentum methods do to a certain extend).

In the end, sgd turns out to retain the most ‘local patterns’, yet converges too slowly. Using adam improves the convergence speed. In this case, because mean-normalizing the data fucks up the results we actually modified adam to calculate the variance correctly.

This is of course all very beginner style stuff. Probably in a year or so I will look back at this and think: what the hell was I thinking when I did that ?

optimizers

How did we come to these particular values ?

BpmDj represents each song with a 1482 dimensional vector. So I already had ~200000 entries of those and wanted to play with them. Broken down: the rhythm patterns contain 384 ticks per frequency band and we have 3 frequency bands. (thus 3*384). Aside from that we have loudness quantiles of 30 frequency bands (11*30). Which sums to about 1482 dimensions.

Then the second decission was made to stick to three layers. Mainly because google deepdream already finds high level patterns at the 3th level. No reason to go further than that I thought.

Then I aimed to reduce the same amount in each stage (/~4 as you noticed). So I ballparked the 21. And that was mainly because initially I started with an autoencoder that went 5 stages deep. It so happened I was happy at stage 21 and kept it like that so I could compare results between them.

Now, that 21 dimensional space might still be way too large. Theoretically, we can represent 2^21 classes with them if each neuron would simply say a yes/no. However, there is also a certain amount of redundancy between neurons. They sometimes find the same patterns back and so on.

How to solve memcpy error when compiling Caffe

If you try to use cuda with gcc 5.0.3 according on Ubuntu 16.4 you might have the following problem:

Makefile:586: die Regel für Ziel „.build_release/cuda/src/caffe/layers/sigmoid_layer.o“ scheiterte
make: *** [.build_release/cuda/src/caffe/layers/sigmoid_layer.o] Fehler 1
/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
 return (char *) memcpy (__dest, __src, __n) + __n;

To solve this add -D_FORCE_INLINES to the following line in the Makefile:

COMMON_FLAGS += -DCAFFE_VERSION=$(DYNAMIC_VERSION_MAJOR).$(DYNAMIC_VERSION_MINOR).$(DYNAMIC_VERSION_REVISION)

The result should be:

COMMON_FLAGS += -DCAFFE_VERSION=$(DYNAMIC_VERSION_MAJOR).$(DYNAMIC_VERSION_MINOR).$(DYNAMIC_VERSION_REVISION) -D_FORCE_INLINES