A human’s cognitive ability to understand the emotion in someone’s voice is an analytical process. You associate the tone and the loudness of a person’s voice to emotions and respond in a certain way. In more scientific terms, the emotion behind every voice has a frequency and an amplitude range. So, what comes across as the essence of human interaction is something even machines should be able to do.
Exotel is inviting all algorithmic programmers, data scientists and machine learning enthusiasts to build a system that will be able to detect the emotions in conversations.
You will be given a training data set (sample audio files). You must use speech recognition algorithms and machine learning to write code that can take this voice as an input and recognize the emotion behind it. The code you submit will be run across thousands of other voice samples. The algorithm with the most accurate results will be adjudged the winner.
This challenge will be open for 18 days. At the start of the challenge, the training data set will be accessible to you. You must submit your code and a brief of the logic that you have employed in writing the algorithm. You are advised to use audio processing libraries in building your algorithm.
Github Repository for Submission
The github repository is available here - https://github.com/HackerEarth-Challenges/hack-the-talk-exotel.
You can download the zip file here - https://github.com/HackerEarth-Challenges/hack-the-talk-exotel/archive/master.zip.
Here are some useful resources -
http://cmusphinx.sourceforge.net/
https://wiki.python.org/moin/PythonInMusic
https://wiki.python.org/moin/Audio/
There are exciting prizes to be won -
- 1st Prize - Rs. 3 lakhs
- 2nd Prize - Rs. 1 lakh
- Top 10 teams get gift vouchers worth Rs. 5000/- each
This is a great opportunity for you to learn machine learning and work on a cool problem.
Rules for Submission
The submission must be in a tar.gz format. It should have a run.sh in the top directory that will take care of compilation, creating the executable and running your code. Your code must read all the input files from the current directory. The Input File will have a list of audio files like:
one.mp3
two.mp3
three.mp3
four.mp3
... and So on. These files one.mp3, two.mp3 etc will be files in the current directory.
Your program would need to output the emotion detected for each file (one.mp3, two.mp3 etc) as happy unhappy angry neutral
The above output means one.mp3 was a "happy" conversation, two.mp3 was a "Unhappy" conversation etc.