Category
Computer Science / Machine Learning / Signal Processing / Audio
Scope
2 students completing 30 credits (20 weeks) each.
Background
One approach to machine learning with audio data is to transform the audio into a spectrogram and then treat it as a picture, showing frequency content over time. Common techniques for picture analysis can then be applied, for example using a Convolutional Neural Network(CNN) to classify the content of the picture.
This thesis will instead focus on sounds that are longer than what would fit into a decent sized spectrogram analysis, and investigate other techniques suitable for audio.
Recurrent Neural Networks (RNN) keep state from previous predictions in memory, and use that to help determine the current prediction, exhibiting temporal dynamic behavior.
Rather then accepting a fixed-sized input, producing a fixed-sized output with a fixed amount of computational steps, a RNN allows operations over sequences of data.
Long Short-Term Memory networks are a particular type of RNN, that are slightly more complicated in their structure, but with more expressive power.
This type of network architecture could prove very useful on audio signals.
Goal
Investigate the training and validation of recurrent neural networks for audio analytics and classification, using a time domain signal as input.
Compare different network architectures.
Vary input pre-processing and measure the effect.
Build a prototype running in real time on an Axis camera.
Who are you?
For this Thesis proposal we target students with a strong interest in machine learning and signal processing. Most likely you are studying a Master Program in Computer Science or Mathematics.
OK, I am interested! What do I do now?
You are valuable to us – how nice that you are interested in one of our proposals! There are a few things for you to keep in mind when applying.
Who to contact for any questions regarding the position!
Magnus Söderdahl, Engineering Manager
+46 46 272 1800, [email protected]