All posts by werner

Making a Desk

The Desk

Because my previous desk was two tables and a chair, my wife decided that I would make one from scratch. The biggest problem was cutting the two pieces of three layered wood such that the errors in both cuts would compensate. Both with regard to the angle as well as with regard to the horizontal variations.

The easiest solution is to cut them both at the same time, in the same direction.

IMG_20161112_112909 IMG_20161112_131252_1

Once that was done, we made the legs and fitted them.

IMG_20161112_204014_1

IMG_20161113_101729

IMG_20161113_095248

We added an organic form because we needed place to actually sit on. The two plates placed together only allowed for 120 cm free room. Cutting out 20 cm we gave us a bit more leaway and opened up the desk. Furthermore, cutting out this form allowed us to cut ‘through’ the seam which made it fit even better.

IMG_20161113_115617

Then we glued them in place. For a change we actually made them fit in a pi/2 angle. One of those things which requires a bit too much fiddling for me to really bother with it. Yet it does matter if you are going to look at this every day.

IMG_20161113_165956

Once done with that, the paint job started. 2 layers for legs and bottom side. 3 layers for the main surface. With 10 hours between each layer this adds up to quite a waiting time.

IMG_20161113_183856

IMG_20161114_083837

IMG_20161114_154021

The cabinets

The side walls of the three cabinets

IMG_20161115_213541_1

IMG_20161115_213603

IMG_20161115_213550

The drawers:

IMG_20161116_132133

Put togetherIMG_20161116_162627

IMG_20161116_212059 IMG_20161117_001649 IMG_20161117_001643 IMG_20161117_001632

Everything put together:

finished

 

Why I left Stack Overflow

I decided to leave stack overflow after about 6 years. Two realizations lead to this decision.

A – moderators and editors are much more valued than people who actually create content. For every question I tried to ask, or tried to answer, I have been talking more with moderators than I actually spend time writing it. Just a grab from my experience,

  • you write a sensible answer, some mega moderator just deletes it because he believes it is link spam. After sending an email that it is not link spam the post gets reinstated.
  • a shitload of stupid questions coming in. Bad grammar, unclear what they are asking and so on. With a signal/noise level of about -24 dB it is very very bad.
  • there are so many disrespectful people who actually demand that you look at their question. “It has to be a completely self standing solution before I will accept the answer” is not uncommon at the site.
  • but if you yourself ask a question, it is downvoted right away, without any explanation.
  • people who game the system: You answer someones question, someone comments on your answer and adds a detail. Then that same person takes your answer, reposts it 2 minutes later, and claims the reward.
  • people who think that downvoting an answer to a question they don’t like is the correct thing to do
  • people who cannot read a question and keep insisting that they cannot give legal advice. This resulted in a of close/reopen/close/reopen round for that particular question.
  • idiots who place a bounty on their question, yet after you answer them they do not award the bounty.
  • yet stack overflow that will not return the bounty to the original poster if not awarded.
  • and of course people who are not really interested in the question. You put a bounty on a question you have. About a day before the deadline expires someone ‘answers’ the question by well formulating some general things I already knew. Clearly hoping to get more than 2 points so he can claim his non-answer.

Thus we end up in a situation where the teachers (those with high reputation) suck the livelihood out of content creators, who maybe don’t need endless discussions. By doing so, they create a new class that will tend to behave the same.

Because the above problems are systemic I decided to leave.

B- There is however a second reason. Namely, reputation is just a number, it doesn’t mean anything.  I really do not need to “play the game”.


What do I loose by leaving ?

The possibility to ask questions ? Only 1 question has been answered sensibly.

The possibility to answer questions ? I will not do that anymore because it really has become too difficult.

The reputation that I am this person at stackoverflow ? Doesn’t mean a thing either except if you believe that a large number makes you a better programmer.

I realize that some people might like to play the S.O. game and that is fine. It is just not for me anymore.

Synthesizing waves

This image represents one of the remaining problems with the Zathras timestretcher I wrote. When synthesizing a new wave, we want to do that fast, so we use an FFT to generate on average 632 sines at the same time. The problem is that whenever a wave has a frequency that does not match any Fourier bin then we need to ‘fix’ it. That is done by applying a phase modulation to it. yet because the Fourier synthesis requires circular waves, the endpoints must match (that is be a multiple of 2Pi). When the wave is modulated with a non 2pi multiple this requirement is not satisfied. The result is that we set out an frequency path (the blue line, with only 8 points), and then assume that the final synthesized wave will be equally linear. The red line shows how this is not the case.

At the moment, to solve this we overlap a lot of these windows so the error fades away in the background. Yet a metallic ring remains.

phase-problem

A second solution is the application of an appropriate window (E.g: Kaiser Bessel); which will push the entire error into the endpoints.

phase-halfsolution

Autoencoder identity mapping

Can an autoencoder learn the identity mapping ? To test that, I went to the extreme: let an optimalisation algorithm (SGD) find the best mapping when then visible units are 0-dimensional (a scalar) and the hidden units as well.

the first remarkable thing is that there is no solution that will have the perfect mapping ! There simply does not exist a relation that will map the input straight to the output when tied weights and sigmoids are used.

Anyway, because the problem is so non-dimensional, we can calculate the cost over an area and plot it in a surface. Two things are worth noting.

  1. The minimum can be found as soon as we get into a very narrow valey… from the right angle… If we were to enter it from the back (B>40) then the value floor is not sufficiently steep to guide us quickly to the minimum.
  2. If we were dropped on this surface at (W:40;B:-20) then the search algorithm would go down from one plateau to the next, blissfully aware of that nice crevasse that we laid out for it.

learningidentity

A reading of a Theano compiled graph

I’m trying to understand a compiled theano function, which I printed with theano.printing.debugprint. The full monster is printed below, yet it is only with a couple of lines that I have some problems.

This code first computes a random yes/no vector in node 6. Node 5 is used to create the appropriate shape (resembling x)

Gemm{inplace} [id A] <TensorType(float32, matrix)> '(dcost/dW)'   23
 |Dot22 [id B] <TensorType(float32, matrix)> ''   22
 | |InplaceDimShuffle{1,0} [id C] <TensorType(float32, matrix)> 'x_tilde.T'   12
 | | |Elemwise{Mul}[(0, 0)] [id D] <TensorType(float32, matrix)> 'x_tilde'   8
 | |   |RandomFunction{binomial}.1 [id E] <TensorType(float32, matrix)> ''   6
 | |   | |<RandomStateType> [id F] 
 | |   | |MakeVector{dtype='int64'} [id G] <TensorType(int64, vector)> ''   5
 | |   | | |Shape_i{0} [id H] <TensorType(int64, scalar)> ''   1
 | |   | | | |x [id I] <TensorType(float32, matrix)>
 | |   | | |Shape_i{1} [id J] <TensorType(int64, scalar)> ''   0
 | |   | |   |x [id I] <TensorType(float32, matrix)>
 | |   | |TensorConstant{1} [id K] <TensorType(int8, scalar)>
 | |   | |TensorConstant{0.75} [id L] <TensorType(float32, scalar)>
 | |   |x [id I] <TensorType(float32, matrix)>
 | |Elemwise{Composite{((i0 - i1) * i2 * i1)}}[(0, 2)] [id M] <TensorType(float32, matrix)> ''   21
 |   |TensorConstant{(1, 1) of 1.0} [id N] <TensorType(float32, (True, True))>
 |   |Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 0)] [id O] <TensorType(float32, matrix)> 'reduced'   15
 |   | |Dot22 [id P] <TensorType(float32, matrix)> ''   11
 |   | | |Elemwise{Mul}[(0, 0)] [id D] <TensorType(float32, matrix)> 'x_tilde'   8
 |   | | |W [id Q] <TensorType(float32, matrix)>
 |   | |InplaceDimShuffle{x,0} [id R] <TensorType(float32, row)> ''   2
 |   |   |B [id S] <TensorType(float32, vector)>
 |   |Dot22 [id T] <TensorType(float32, matrix)> '(dcost/dreduced)'   20
 |     |Elemwise{Composite{((i0 * (i1 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3)) * Composite{scalar_sigmoid((i0 + i1))}(i2, i3) * (i4 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3))) / i5)}}[(0, 2)] [id U] <TensorType(float32, matrix)> ''   18
 |     | |TensorConstant{(1, 1) of -2.0} [id V] <TensorType(float32, (True, True))>
 |     | |x [id I] <TensorType(float32, matrix)>
 |     | |Dot22 [id W] <TensorType(float32, matrix)> ''   17
 |     | | |Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 0)] [id O] <TensorType(float32, matrix)> 'reduced'   15
 |     | | |InplaceDimShuffle{1,0} [id X] <TensorType(float32, matrix)> 'W.T'   3
 |     | |   |W [id Q] <TensorType(float32, matrix)>
 |     | |InplaceDimShuffle{x,0} [id Y] <TensorType(float32, row)> ''   4
 |     | | |B_Prime [id Z] <TensorType(float32, vector)>
 |     | |TensorConstant{(1, 1) of 1.0} [id N] <TensorType(float32, (True, True))>
 |     | |Elemwise{mul,no_inplace} [id BA] <TensorType(float32, (True, True))> ''   16
 |     |   |InplaceDimShuffle{x,x} [id BB] <TensorType(float32, (True, True))> ''   14
 |     |   | |Subtensor{int64} [id BC] <TensorType(float32, scalar)> ''   10
 |     |   |   |Elemwise{Cast{float32}} [id BD] <TensorType(float32, vector)> ''   7
 |     |   |   | |MakeVector{dtype='int64'} [id G] <TensorType(int64, vector)> ''   5
 |     |   |   |Constant{1} [id BE] 
 |     |   |InplaceDimShuffle{x,x} [id BF] <TensorType(float32, (True, True))> ''   13
 |     |     |Subtensor{int64} [id BG] <TensorType(float32, scalar)> ''   9
 |     |       |Elemwise{Cast{float32}} [id BD] <TensorType(float32, vector)> ''   7
 |     |       |Constant{0} [id BH] 
 |     |W [id Q] <TensorType(float32, matrix)>
 |TensorConstant{1.0} [id BI] <TensorType(float32, scalar)>
 |InplaceDimShuffle{1,0} [id BJ] <TensorType(float32, matrix)> ''   19
 | |Elemwise{Composite{((i0 * (i1 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3)) * Composite{scalar_sigmoid((i0 + i1))}(i2, i3) * (i4 - Composite{scalar_sigmoid((i0 + i1))}(i2, i3))) / i5)}}[(0, 2)] [id U] <TensorType(float32, matrix)> ''   18
 |Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 0)] [id O] <TensorType(float32, matrix)> 'reduced'   15
 |TensorConstant{1.0} [id BI] <TensorType(float32, scalar)>
RandomFunction{binomial}.0 [id E]  ''   6

Then at the second marked bold section, we see that node 16 performs the multiplication of two subnodes 14 and 13; which both are very similar except for a constant 0 or 1.

The code for node 14 reuses the same vector as node 5, but then casts it to float32 (which is node 7). And then the magic, the thing I do not understand happens. From that casted vector, a subtensor with constant 0 is selected. What does this do ?

The questions:

  1. Does the SubTensor (node 10 or node 9) select the first element of the tensor from node 7 ?
  2. Node 7 is merely a casted version of node 5. Does that vector contain the random data generated in node 6 ?
  3. Once the subtensors of node 7 are selected they are allowed to broadcast (node 13 and 14) to finally be multiplied with each other in node 16. Is it correct to say that node 16 thus computes the multiplication of the first random element with the second random element (from a random vector that might be quiet somewhat larger) ?
  4. When I print out the types of the nodes, we see that the output of the subtensor is indeed a scalar (as expected), yet the type of InPlaceDimensionShuffle and the ElementWise{mul} is (True,True). What for type is that ?
  5. If ElementWise{Mul} is not specifying ‘inplace’, (as happens in node 6), which of the two children is then modified ? Is it the node associated with the RandomFunction (thus node5) or does the randomFunction (node 6) provide us with another copy that can be modified ?

The image graph of the above function is given below. The yellow box is the one that confuses me because, if I read that right its output is merely a broadcastable 0.

gradient_marked

After a day bashing my head against this nonsense I fugerd out the following:

Makevector in node 5 uses the shape as input parameters and concatenates them into a small vector (See http://deeplearning.net/software/theano/library/tensor/opt.html#theano.tensor.opt.MakeVector).

Thus its output is a vector with two values: the dimensions of x. The random generator then uses that as input to generate a correctly sized vector. And the yellow box in the computation merely multiplies the two dimensions with each other to calculate the element-count of the input. Answering each of the subquestions:

  1. yes the subtensor selects respectively the 0th and 1st element of the intput vector.
  2. node 7, does indeed contain the casted data from node 5. However node 5 does not contain the random data.
  3. it is wrong to say that node 16 computes the multiple of the two first random values. Right is: node 16 computes the multiplication of the dimension sizes of the input vector x.
  4. The (True,True) type merely tells us the broadcasting pattern. See section http://deeplearning.net/software/theano/library/tensor/basic.html#all-fully-typed-constructors
  5. Without inplace an elementwise multiplication destroys one of its inputs. The inputs that are destroyed are marked in red. documentation on color coding http://deeplearning.net/software/theano/library/printing.html#theano.printing.pydotprint

A small step into denoising autoencoders. Optimizers.

The following chart are some of the results of creating a denoising autoencoder.

The idea is that a neural network is trained to map a 1482 dimensional input space to a 352 dimensional space in such a way that it will recover 30% of randomly removed data. Once that first stage is trained, its output is used to train the second stage which maps the data to 84 dimensions. The last stage bring it further down to 21 dimensions. The advantage of this method is that such denoising autoencoders grab patterns in the input, which are then at a higher level combined into higher level patterns.

I have been testing various optimizers. The results below show how much of a signal can be recovered. To do that, we take the 1482 dimensional dataset map it through to 21 dimensions and then map it back to 1482 dimensions. After that we compare the original and recovered signal. The error we get is then compared against the most simple predictor; namely the average of the signal.

Now, the first thing we noticed is that although rmsprop style approaches go extremely fast, they do result in an average signal (literally, they just decode the signal by producing the average). Secondly, stochastic data corruption should of course not be combined with an optimizer that compensates for such noise (which the rmsprop and momentum methods do to a certain extend).

In the end, sgd turns out to retain the most ‘local patterns’, yet converges too slowly. Using adam improves the convergence speed. In this case, because mean-normalizing the data fucks up the results we actually modified adam to calculate the variance correctly.

This is of course all very beginner style stuff. Probably in a year or so I will look back at this and think: what the hell was I thinking when I did that ?

optimizers

How did we come to these particular values ?

BpmDj represents each song with a 1482 dimensional vector. So I already had ~200000 entries of those and wanted to play with them. Broken down: the rhythm patterns contain 384 ticks per frequency band and we have 3 frequency bands. (thus 3*384). Aside from that we have loudness quantiles of 30 frequency bands (11*30). Which sums to about 1482 dimensions.

Then the second decission was made to stick to three layers. Mainly because google deepdream already finds high level patterns at the 3th level. No reason to go further than that I thought.

Then I aimed to reduce the same amount in each stage (/~4 as you noticed). So I ballparked the 21. And that was mainly because initially I started with an autoencoder that went 5 stages deep. It so happened I was happy at stage 21 and kept it like that so I could compare results between them.

Now, that 21 dimensional space might still be way too large. Theoretically, we can represent 2^21 classes with them if each neuron would simply say a yes/no. However, there is also a certain amount of redundancy between neurons. They sometimes find the same patterns back and so on.

How to solve memcpy error when compiling Caffe

If you try to use cuda with gcc 5.0.3 according on Ubuntu 16.4 you might have the following problem:

Makefile:586: die Regel für Ziel „.build_release/cuda/src/caffe/layers/sigmoid_layer.o“ scheiterte
make: *** [.build_release/cuda/src/caffe/layers/sigmoid_layer.o] Fehler 1
/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
 return (char *) memcpy (__dest, __src, __n) + __n;

To solve this add -D_FORCE_INLINES to the following line in the Makefile:

COMMON_FLAGS += -DCAFFE_VERSION=$(DYNAMIC_VERSION_MAJOR).$(DYNAMIC_VERSION_MINOR).$(DYNAMIC_VERSION_REVISION)

The result should be:

COMMON_FLAGS += -DCAFFE_VERSION=$(DYNAMIC_VERSION_MAJOR).$(DYNAMIC_VERSION_MINOR).$(DYNAMIC_VERSION_REVISION) -D_FORCE_INLINES

Java join() slow

And there I was thinking that Java would offer an efficient implementation of join on threads. One that would be triggered when a thread signals it had come to an end.

Well… not so. join(0), which means, wait indefintely for the thread, results in a completely braindead while(isAlive()) wait() loop.

public final synchronized void join(long millis)
throws InterruptedException {
    long base = System.currentTimeMillis();
    long now = 0;

    if (millis < 0) {
        throw new IllegalArgumentException("timeout value is negative");
    }

    if (millis == 0) {
        while (isAlive()) {
            wait(0);
        }
    } else {
        while (isAlive()) {
            long delay = millis - now;
            if (delay <= 0) {
                break;
            }
            wait(delay);
            now = System.currentTimeMillis() - base;
        }
    }
}

To make this work as it should we had to wait for a threadcounter to reach 0 and let each thread lower the threadcounter and call notifyAll. A very tedious process. Why do we have a join if it doesn’t work in a performant manner ?

Latency calculations for Zathras

Because Zathras-1 now has to interact with linearly interpolated timestretches, we had to calculate the latency of our timestretcher.

Multiple results were new to me and interesting to see.

  1. An sliding window fourier transform has a latency of 0. I did not expect that at all.
  2. When an input frame is stretched, then each sample of the inputframe will have  a different latency.
  3. The latencies of all samples, except the middle one, depend on the playback speed. Yes, you read that right: the playback speed affects the latency. And that is really something we do not want to compensate for. Therefore, use as reference the middle of the frame.
  4. It is possible to have negative latencies. Part of the signal is already out before we believe we provided it to the timestretcher.

http://werner.yellowcouch.org/Papers/zathras15/#toc23