DeVocalizer
Used bi-directional LSTM and a convnet to try to suppress the voices in
songs, so only the music could be audible. We use pydub and scipy's wavfile to edit the audio and
Short Term Fourier Transform to extract features and convert it into training data.