voodoo doll

Uncomfortable Truth

After reading my thesis, Meinhard Müller commented that my algorithms are cluttered with too much detail. And he recommended to throw away everything that does not improve performance significantly so that the most important ideas become more accessible to the reader. Of course he was right, but back then in 2016 my main concern was to FINISH that damn PhD. After 13 years, I had not an ounce of energy left to make any major changes to the thesis.

Still I took Meinhards advice to heart and it lingered around in my brain for quite a while. After the thesis defence I was not able to touch my code for two and a half years. Then there was ISMIR in Paris and I really missed the crowd. So I decided to go. The conference was a blast. I got a WIMIR mentor. I was fueled with motivation. Afterwards I had to search my computer for that old code. I actually wanted to tackle the clutter problem, but first I had to tidy up my codebase, move it to a git repository, and find a version that actually runs … decently.

The next step would be to verify whether nitpicky additions to the code were actually worth it. That meant to use my precious spare time to turn of algorithmic acrobatics, re-adjust all parameters that may be affected by this move and note down the results in a list. This is not work that makes you feel like a giant of science, to the contrary it is extremely boring.

It is also not helping that you work on an algorithm that is really not state-of-the-art any more. Old school signal processing has been left behind, almost all SOA algorithms in melody extraction and multi-pitch estimation are some deep, deep neural networks nowadays. One just has to accept, that those artificial learning machines are incredibly good at the systematic optimization of their networks. And they can do it using HUGE datasets, and they tune an incredible amount of parameters/weights, while the idea of the network itself remains quite simple.

Although I would prefer a simple solution over a complex one, I never really fell in love with the process of machine learning. The results make me jealous, but for me there is to little fun and too little insight in why some things work better than others. Sure, my work process also includes much trial and error, but before every try there has been some glorious idea. This is extremely exiting. I love all that hope which is attached to the idea and the sweet dreams of the breakthrough that will surely come with the implementation … (which then will suck in 99 out of 100 cases)

First I had to settle which version of my algorithm I wanted to use for my next endeavour. Second, I needed to rethink my work process. To be honest, my algorithm did not significantly improve over the last 10 years. So there needed to be some change. The problem is: the more complex an algorithm becomes, the more difficult it is to make additions or changes. Most modules and parameters have been tuned to work best with the current version, so even changes that could mean an improvement will first come with a decrease in accuracy.

I decided that I would spend more time on testing and recombining distinct implementations of whole processing steps instead of just fine-tuning parameters in the end. For starters, I chose two modules (peaks and pitches) and compared versions from the years 2012, 2014 and 2016. After spending a huge amount of time to streamline the interfaces, the interesting result was, that there was a significant improvement between the 2012 and 2014 version and a worsening of the accuracy for the final implementation. (I was taking heavy medication against my mental illness when I coded that last stuff, so I am almost relieved that I can turn back to the 2014 version, because the state of that last code is bad.) Well, at least the final peaks module was total crap, the pitches module did not change much.

Then, I did as suggested by Meinhard and randomly deleted “optimizations” to the pitch algorithm — some of them have been over 10 years old. It turned out that most of them were useless if I just adjusted a magnitude threshold afterwards to find the new optimum value. The only tweak that really helped (a bit) was to boost sinusoids with a (deterministically) changing frequency. All the other stuff was just clutter. Yikes, it took me so much time to write down all the formulas.

Finally, I implemented and easier idea for the pitch estimation which performed just as well as the old one. HAHA.

The process was quite helpful and there are modules still waiting to be examined, but I am aware that I need to have some automatism installed to make that work less time consuming. Let’s see when I will have the time to tackle this.

Leave a Reply

Your email address will not be published. Required fields are marked *