All posts by werner

Orthogonal weight initialization in PyTorch seems kinda weird

I recently gave deep learning another go. This time I looked into pytorch. At least the thing lets you program in a synchronous fashion. One of the examples however did not work as expected.

I was looking into the superresolution example (https://github.com/pytorch/examples) and printed out the weights of the second convolution layer. It turned out these were ‘kinda weird’ (similar to attached picture). So I looked into them and found that the orthogonal weight initialization that was used would not initialize a large section of the weights of a 4 dimensional matrix. Yes, I know that the documentation stated that ‘dimensions beyond 2’ are flattened. Does not mean though that the values of a large portion of the matrix should be empty.

weights_grmpf

The orthogonal initialisation seems to have become a standard (for good reason. See the paper https://arxiv.org/pdf/1312.6120.pdf), yet is one that does not work together well with convolution layers, where a simple input->output matrix is not stratight away available. Better is to use the xavier_uniform initialisation. That is, in the file model.py you should have an initialize_weights as follows:

 def _initialize_weights(self):
   init.xavier_uniform(self.conv1.weight, init.calculate_gain('relu'))
   init.xavier_uniform(self.conv2.weight, init.calculate_gain('relu'))
   init.xavier_uniform(self.conv3.weight, init.calculate_gain('relu'))
   init.xavier_uniform(self.conv4.weight)

With this, I trained a model on the BSDS300 dataset (for 256 epochs) and then tried to upsample a small  image by a factor 2. The upper image is the small image (upsampled using a bicubic filter). The bottom one is the small picture upsampled using the neural net.

The weights we now get at least use the full matrix.

The output when initialized with “orthogonal” weights has some sharp ugly edges:

A better crossfade

A demonstration on the difference between two crossfades. The first is a straightforward crossfade. The second one is a crossfade in which the partials of both tracks are detected, selected (the strongest wins) and then resynthesized.

The normal crossfade:

Crossfading the partials:

To hear the difference listen to the middle of the two tracks (around 16″). While the normal crossfade sounds muddier, the second one retains the same volume and clarity as either track (at least to my ears).

A talk on timestretching at a hackerscamp

Time stretching of audio tracks can be easily done by either interpolating missing samples (slowing down the track), or by throwing away samples (speeding up the track). A drawback is that this results in a pitch change. In order to overcome these issues, we created a time stretcher that would not alter the pitch when the playback speed changed. In this talk we discuss how we created a fast, high quality time stretcher, which is now an integral part of BpmDj. We explain how a sinusoidal model is extracted from the input track, its envelope modeled and then used to synthesize a new audio track. The synthesis timestretches the envelope of all participating sines, yet retains the original pitch. The resulting time stretcher uses only a frame overlap of 4, which reduces the amount of memory access and computation compared to other techniques.

We assume the listener will have a notion about Fourier analysis. We do however approach the topic equally from an educational as well as from a research perspective.

High resolution slides are available at http://werner.yellowcouch.org/Papers/sha2017/index.html

 

JavaFx 8 command line options

I have been struggling with an “interesting” javafx render problem. In BpmDj, given enough updates, the rendertree would partially stop updating. It was clear this was a concurrency bug, yet one that was not my fault. As a preliminary solution to fix this I would grab a screenshot everytime I would add a node. Recently however that solution no longer worked and the dirty logic would be so fucked up that elements which should be visible were not visible. It was even possible to have a font change its color halfway through its rendering.

So I started to look around for javafx options, and there are a few of them that proved to be useful.

What Flag Default
VSync prism.vsync true
Dirty region optimizations prism.dirtyopts true
Occlusion Culling prism.occlusion.culling true
dirtyRegionCount prism.dirtyregioncount 15
Scrolling cache optimization prism.scrollcacheopt false
Dirty region optimizations prism.threadcheck false
Draws overlay rectangles showing where the dirty regions were prism.showdirty false
Draws overlay rectangles showing not only the dirty regions, but how many times each area within that dirty region was drawn (covered by bounds of a drawn object). prism.showoverdraw false
Prints out the render graph, annotated with dirty opts information prism.printrendergraph false
Force scene repaint on every frame prism.forcerepaint false
disable fallback to another toolkit if prism couldn’t be init-ed prism.noFallback false
Shape caching optimizations prism.cacheshapes complex
New javafx-iio image loader prism.newiio true
Verbose output prism.verbose false
Prism statistics print frequency, <=0 means “do not print” prism.printStats 0
Debug output prism.debug false
Trace output prism.trace false
Print texture allocation data prism.printallocs” false
Disable bad driver check warning prism.disableBadDriverWarning” false
Force GPU, if GPU is PS 3 capable, disable GPU qualification check. prism.forceGPU false
Skip mesh normal computation prism.experimental.skipMeshNormalComputation false
Which driver to use prism.order
prism.forcepowerof2 false
prism.noclamptozero false
Try -Dprism.maxvram=[kKmMgG] prism.allowhidpi true
prism.maxvram 512 * 1024 * 1024
Try -Dprism.targetvram=[kKmMgG]|<double(0,100)>% prism.targetvram
prism.poolstats false
prism.pooldebug false
prism.maxTextureSize
prism.minrttsize
prism.disableRegionCaching
prism.disableD3D9Ex false
prism.disableEffects false
prism.glyphCacheWidth 1024
prism.glyphCacheHeight 1024
Enable the performance logger, print on exit, print on first paint etc. sun.perflog
sun.perflog.fx.exitflush
sun.perflog.fx.firstpaintflush
sun.perflog.fx.firstpaintexit
prism.supershader true
Force uploading painter (e.g., to avoid Linux live-resize jittering) prism.forceUploadingPainter false
Force the use of fragment shader that does alpha testing (i.e. discard if alpha == 0.0) prism.forceAlphaTestShader false
Force non anti-aliasing (not smooth) shape rendering prism.forceNonAntialiasedShape false
Set Single GUI Threading quantum.singlethreaded false
Print quantum verbose quantum.verbose false
JavaFx framerate in FPS javafx.animation.pulse 60

FX thread collision with Render thread ?

When running with the prism.threadcheck option I got the following error:

ERROR: PrismPen / FX threads co-running: DIRTY: false
FX: java.lang.Thread.getStackTrace(Thread.java:1559)
FX: com.sun.javafx.tk.quantum.QuantumRenderer.checkRendererIdle(QuantumRenderer.java:247)
FX: com.sun.javafx.tk.quantum.QuantumToolkit.checkFxUserThread(QuantumToolkit.java:424)
FX: javafx.scene.Scene$MouseHandler.process(Scene.java:3680)
FX: javafx.scene.Scene$MouseHandler.access$1500(Scene.java:3485)
FX: javafx.scene.Scene$MouseHandler$1.run(Scene.java:3521)
FX: com.sun.javafx.application.PlatformImpl.lambda$null$173(PlatformImpl.java:295)
FX: java.security.AccessController.doPrivileged(Native Method)
FX: com.sun.javafx.application.PlatformImpl.lambda$runLater$174(PlatformImpl.java:294)
FX: com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
FX: com.sun.glass.ui.gtk.GtkApplication._runLoop(Native Method)
FX: com.sun.glass.ui.gtk.GtkApplication.lambda$null$49(GtkApplication.java:139)
FX: java.lang.Thread.run(Thread.java:748)
QR: com.sun.javafx.sg.prism.NGCanvas.getStroke(NGCanvas.java:777)
QR: com.sun.javafx.sg.prism.NGCanvas.setupStroke(NGCanvas.java:785)
QR: com.sun.javafx.sg.prism.NGCanvas.handleRenderOp(NGCanvas.java:1212)
QR: com.sun.javafx.sg.prism.NGCanvas.renderStream(NGCanvas.java:1097)
QR: com.sun.javafx.sg.prism.NGCanvas.renderContent(NGCanvas.java:606)
QR: com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2053)
QR: com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1945)
QR: com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235)
QR: com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:576)
QR: com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2053)
QR: com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1945)
QR: com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235)
QR: com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:576)
QR: com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2053)
QR: com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1945)
QR: com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235)
QR: com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:576)
QR: com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2053)
QR: com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1945)
QR: com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235)
QR: com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:576)
QR: com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2053)
QR: com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1945)
QR: com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235)
QR: com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:576)
QR: com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2053)
QR: com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1945)
QR: com.sun.javafx.tk.quantum.ViewPainter.doPaint(ViewPainter.java:477)
QR: com.sun.javafx.tk.quantum.ViewPainter.paintImpl(ViewPainter.java:330)
QR: com.sun.javafx.tk.quantum.PresentingPainter.run(PresentingPainter.java:91)
QR: java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
QR: java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
QR: com.sun.javafx.tk.RenderJob.run(RenderJob.java:58)
QR: java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
QR: java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
QR: com.sun.javafx.tk.quantum.QuantumRenderer$PipelineRunnable.run(QuantumRenderer.java:125)
QR: java.lang.Thread.run(Thread.java:748)

Of course how that related to my problem wasn’t clear, allthough maybe it could explain why the dirty flags were all ****ed up.

Overdraw and dirty draws

overdraw

The option -Dprism.showoverdraw=true is interesting because it shows us which area we drew once (red), which twice (green), and so on. This showed me that the areas that didn’t update were not even marked as ‘dirty’. Suggesting that the problem was indeed with the setting and tracking of the dirty flag in the renderer. Therefore we tried the option -Dprism.dirtyopts=false and hooray all problems disappeared. Of course the app ran slightly slower, but at least it worked now.

Java ReentrantReadWriteLock oddities

When I was hit with startvation of some of the update threads in BpmDj, I was a bit puzzled. After all, I did use a ReentrantReadWriteLock in fair mode. A simple profiling showed that certain transaction where substantially more heavy (a writelock taking lets’ say 10 seconds), while the databasereads would merely take 1 second.

From that I concluded that because the writelock was held longer, other threads did not have the opportunity to have a fair amount of locktime themselves. E.g: the write lock is released, the longest waiting readlock is granted: that transaction is done within a second, and the writelock is granted again. And is not released for another 10 seconds.

To test this I set up a program to create 10 reader threads and 1 writer thread. Each thread would acquire a lock, wait some time (to simulate the ‘work’ done in the locked section) and then release the lock. This would be performed in a loop for about 10 seconds. Afterwards, we could measure how much time within the lock was spend for each thread and compare that with the amount of work the thread wanted to perform.

These were the results:

Unfair Lock

The unfair lock behaved, as expected, fairly unfair. If the writer had 10 times less work than the readerthreads, its locktime would be 8 times higher. If the writer had the same amount of work, it would have 16 times more locktime and if the writer had 10 times more work, then it would be granted 50 times more locktime.

Thus:
/10.0 => *7.944541604031417
1.0 => *15.917042652441687
*10.0 => *50.5366207048361

A fair lock

When we created the read/write lock in a fair fashion, the results were more in line with what we would expect:

/10.0 => /8.810361366979999
1.0 => /1.0021791947397343
*10.0 => *9.009623837637745

That is, when the worker has 10 times less work than the readers, it has 8 times less locktime. If it has the same amount of work, it receives the same amount of locktime and if it has 10 times more work, then it is granted 9 times more locktime.

This is completely as expected, yet not something we might want, because it allows heavy tasks to block the lighter tasks.

A fair lock, prefixed with tryLock()

tryLock allows an app to check for a lock, and if it is not granted the lock to continue with something else. There are two tryLock variants. The first without parameters (tryLock()), the second with a timeout.

Trylock() screws up any scheduling that might have been in place and just barges in on anything the algorithm might be planning to do.

/10.0 => /925.0224470413133
1.0 => /99.7994977890176
*10.0 => /11.036514077119893

In this scenario, the writer thread pretty much does not get anything done. Whether it is performing 10 times less or 10 times more work, its locktime ranges from ~ /1000 to /10. This is very bad, because you might expect the tryLock to make the lock unfair, yet the results of an unfair lock (see above) are completely the opposite.

A fair lock, prefixed with tryLock(timeout)

There is a second variant of tryLock: one with a timeout; which can indeed be 0. If we apply that, we get the following results:
/10.0 /8.949334569534225
1.0 /1.0091078687281538
10.0 *9.016406673440223

which is in line with the straightforward fair lock.

In BpmDj, we used the tryLock() instead of tryLock(0), assuming that a timeOut of 0 would result in the same behavior between them.

Sleep debt, Oxygen deprivation and Resmed

I apologize at the start of this post. I never wanted to sound as someone who wants to document his snoring. Anyway, I do so because I feel I have some important things to share. I will try to stick to some facts that might help you without going too much in my personal situation.

Since half a year or so I try to get my snoring under control. To that end I got a Resmed Airsense 10 from docters/insurers together with a nasal mask.

Nostrils

The first big obstacle was learning to sleep with my mouth closed. Not much to do but to actually do it. This took some weeks.

Then getting up to speed was problematic because my nostrils would be more closed than open during the night. That lead to painful lungs and a not so optimal ‘therapy’. Two things were necessary to resolve this

a- got rid of the airfilter in the machine. The airfilter that was installed would actually pollute the air coming in (it hadn’t been changed in at least 6 months and the provider didn’t feel in a hurry to change it)

b- started using the humidifier at position 4. Every morning I would take it out of the machine and leave it open during the day. That would allow the water to breath. Once in a while I would replace it completely.

With those two tricks I got my nostrils somewhat under control.

Lack of oxygen

Although the headaches during the days vanished immediately after starting to use the machine, I now got symptoms of someone lacking oxygen. I felt really tired in the afternoon. Talking to my doctor did not help very much. He suggested that I could not be lacking oxygen because there was a positive pressure at the inlet.

It took me a couple of months to realize that he was wrong. To understand that just imagine the mask with no breathing holes. If you exhale air you will fill up the long tube, the humidifier and the rest of the machine with used air. The next breath you take will be first the old air, then the new. Now imagine, only 1 or two holes. The machine will be able to generate the necessary pressure, still no real air exchange will take place.

To analyze this further I set up a simulation in which the patient would inhale all the air he just exhaled. Here is what happens to the oxygen level then:

reuse-same-air

The above plots shows the initial oxygen concentration at about 21%. Each breath removes 1/4th of the oxygen, leaving you after 4 breaths with only 1/3th of the necessary oxygen ! That is quiet staggering.

Of course, the resmed machines do not have closed holes. The positive pressure replaces some of the old air with new air.  The question is now: how much ? This can be expressed as a percentage of the air that is swapped’ per breath cycle. For each percentage, we can calculate the amount of oxygen (compared to normal air) that would be available to you during the night.

poison-yourself2 poison-yourself

From the above plot we can see that if the machine is able to swap out 80% of the air (during one breath-cycle),  you will have 93% of normal-air-oxygen. That is 7% less than you need.

Clearly we had found the culprit. I must have had an air exchange percentage that was sufficiently low, leading to a low oxygen availability. The question now was: what to do about it ?

Solution #1: turn of the ‘autoset’ feature of the resmed machine or increase its minimum pressure substantially – Initially, my machine was set in autoset mode, which means that it would try to determine  the best setting for your situation automatically. It will navigate itself between the minimum and maximum boundaries to minimize the number of blockages. Of course, that minimum might lower the average pressure that sends air out of the mask. Thus: it might be that, although it might minimize the apnoes, it no longer vents properly. An easy solution to solve that is to use a continuous positive pressure, or to increase the minimum pressure. That there is some truth to this can be read from online reports of people who went from a normal CPAP mask (resmed 8) to an automatic settings mask (resmed 10) and complained that they felt worse with this new machine.

How does the mathematics look ? Generally, air pressure through a hole can be modeled as the square root of the pressure difference divided by the air density. (ignoring friction and so on). We thus have sqrt(0.016666 P) describing the air speed velocity through the holes. Thus if we raise the minimum pressure by a factor x, we will push out around sqrt(x) more air.

Solution #2: turn off  Exhalation Pressure Release (The EPR setting) – online you often read that people sleep easier when the machine allows you to exhale easier (it does that by lowering the pressure when you exhale), yet in doing so, it drastically reduce the amount of air swapped out.

Solution #3: use a larger mask. Try to close the main hole of the mask and exhale through the little holes. Measure how long it takes to exhale and compare that against a normal exhalation. If you cannot generate enough pressure to get all the air out within one breath cycle, then your mask is too small. This problem can be solved by using a larger mask

And then it happened. I ditched Firefox

The latest installment of Firefox forces linux users to use Pulseaudio. Pulseaudio is an excuse for a sound pipeline.  It does not work. It is as simple as that.

BpmDj using pulseaudio skips through the audiostream because the java audiodriver apparently cannot keep track of the pulseaudio stream. You would say: yes it it probably a java problem. Not so, many applications, also non java apps, have this problem.

https://aliver.wordpress.com/2016/02/17/why-i-dislike-pulseaudio/ has some more ranting on the state of pulseaudio. A must read.

Aside from that, pulseaudio with firefox is a complete CPU hog. Doing nothing in firefox: 47% CPU usage for the Webcontent process. Throw pulseaudio off your system: a mere 9% CPU usage (which I already find too much but you can see how pulseaudio affects my fan life).

Then, there is the problem that the mozilla developers ignore the pleas of their users wrt to this ‘feature/bug’ https://bugzilla.mozilla.org/show_bug.cgi?id=1247056 is a bloody painful read. None of the developers make any sense. It seems like they have all been lobotomized. Their arguments boil down to: ‘yeah, it was easier to program. Fuck you. And BTW, fuck you again. Please, send us your telemetry if you need more help.’ They won’t be getting much telemetry from me anymore. I ditched firefox and switched to the iridium-browser. And my goodness that thing works fast and good.

Cache coherence and Java

You all know my good old friend Dough Lea ? A moron who instead of putting any sensible locking strategy into Java has been putting the most nonsensical thing in it ? No ? Well, let’s dig into his latest bullshit. Somewhere along the lines he decided that cache coherence is really something you want the programmer to care about.

http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html gives a demonstration of the thing most people would expect to work that does not work. Basically, if you have a synchronized write to a variable, the (non synchronized) reads from other threads can retrieve a stale state version from their cache. That means, that if you have to synchronize to write a variable, you likely have to synchronize all reads as well. If you don’t want to do that, or if you cannot do that, then you should  use the volatile keyword.

That means two things:

  1. synchronizing treemaps/hashmaps through a facade basically means also locking on read accesses. Huge performance penalty.
  2. if you have an executor array of threads that will work their ways through an ordered set of thunks to be executed, then each of these thunks has to have all their variables marked as volatile. Otherwise no cache coherence can be guaranteed and a thunk that would work on thread 1, might just not work when executed on thread 2.

It really hurts me to have to think through bullshit like this. You expect your cache to be coherent. That’s it. If it is not then why the hell am I using a virtual machine ? I could as well start programming kernels.

Android storage permissions

I’m an overall unhappy person. What else would you expect if you have to navigate that excuse for an OS, which is called Android.
This time I’m talking Android Storage Permissions.
With each new release they redefine the rules completely. So whatever BpmDj wrote in the past on your device in a _persistent_ storage will no longer be persistent in newer versions. Even the idea that, when your app requests file write permissions, it should actually have them, seems far fetched. No, before you create a directory you have to ask permission to the OS of doing so. Thereby you have to notify the user (depending on the android version) and request the permission at runtime (why do they have a manifest then anyway). Of course you are not allowed to ask for that permission in the application its main routine. No no no no no. You can only ask that permission when the first user interface element is shown. Until then you can gloriously fuck off with that nice idea of creating the necessary directories and database for your app.
 
Even worse, the user can withdraw that permission at any moment during the run of your application. That means you have to check EVERY BLOODY INSULTING file operation.
 
Now, you think ‘checking permissions’ is an easy thing to do. Also there you are completely wrong. It can only be done in an asynchronous manner. Yes. Fuck that nice function you had.
The correct answer at http://stackoverflow.com/questions/32225506/android-6m-permission-issue-create-directory-not-working gives an idea on how much retardation is exactly involved.

Submissions to SHA2017 are out

Allthough SHA 2017 is too expensive for what it is (OHM 2013 was not particularly well organized), I decided that if they pay my entrance ticket I will still participate. I proposed two/three events.

First a talk on Zathras titled: “Time Stretching BpmDj – 8 Secrets the Audio Industry does not want you to know. Nr 5 will shock you.”

Time stretching of audio tracks can be easily done by either interpolating missing samples (slowing down the track), or by throwing away samples (speeding up the track). A drawback is that this results a pitch change. In order to overcome these issues, we created a time stretcher that would not alter the pitch when the playback speed changed.

In this talk we discuss how we created a fast, high quality time stretcher, which is now an integral part of BpmDj. We explain how a sinusoidal model is extracted from the input track, its envelope modeled and then used to synthesize a new audio track. The synthesis timestretches the envelope of all participating sines, yet retains the original pitch. The resulting time stretcher uses only a frame overlap of 4, which reduces the amount of memory access and computation compared to other techniques.

Demos of the time stretcher can be heard at http://werner.yellowcouch.org/log/zathras/
The paper that accompanies this talk is at http://werner.yellowcouch.org/Papers/zathras15/

We assume the listener will have a notion about Fourier analysis. We do however approach the topic equally from an educational as well as from a research perspective.

Then I proposed to play 2 DJ-sets with BpmDj. In itself interesting because I did not play anything the past 10 years. So I had to sell myself somehow.

Dr. Dj. Van Belle, a psyparty DJ who has his roots in the BSG Party Hall (Brussels/Belgium, 1998). After playing popular tunes for them students, he decided to throw in some psytrance… And absolutely no new style was born. He started his trend to be as inconspicuous as possible. In October 2006 he surfaced at the northern Norwegian Insomnia Festival. As an experiment, he played all songs at 85% of their normal speed. Every time he saw a camera, he inconspicuously hid behind the mixing desk. Since then he has done absolutely nothing. His career is as much a standstill as psytrance was between 2000 and 2016. And this makes him the perfect DJ. Bring in some of them good old beats. Some nostalgia for y’all. An academic approach to the real challenge on how to entertain them phone junkies.

Nowadays he plays anything he can get his hands on, mainly to test the DJ software he made. Some of his mixes can be found at https://www.mixcloud.com/5dbb/

I’m curious what they will accept (if anything).