Jaano India

1467 Registered Allowed team size: 1 - 2

Winners are announced.

starts on:
Jul 15, 2018, 01:31 PM
ends on:
Aug 15, 2018, 01:29 PM



There is a lot of public data present over the internet related but nothing is centralized. Even if centralized, it’s not much use for citizens of India, as people are not very familiar with data manipulation and extraction. The objective behind this problem is to empower the citizens of India to extract the knowledge or any insights without diving deep into the enormous datasets.

We would like the participants to build a smart and intuitive interface that behaves as an optimized search tool for the database provided. You can think of it as Google for a database.

All the candidates will have to go through a preliminary screening round of the technology test. The qualified candidates will be invited via email to participate in the hackathon. Dataset collected via open source websites on Indian data will be provided to you over email if you successfully qualify the preliminary screening round. This dataset contains district-level information on various sectors of India like health, education, telecommunication etc.

The following basic features must be implemented in the interface thus built:

  1. It should provide a column as a search result if asked for
  2. It should select a column and put a filter on the selected column to provide with the subsetted data or column that addresses the request
  3. It should provide with the values to certain standard functions like mean or standard deviation if asked for
  4. If the direct solution of a request is not found in the database then it should try and derive new columns from existing columns to address the request.
  5. If no direct or derived solution is found then it should display the most relevant data frame in which one might find what the request is asking for.

The above features can be implemented via numerous ways so you have an option to build it either by taking inputs through a web form viz-a-viz Version 1 OR by implementing NLP viz-a-viz Version 2

Either of Version 1 OR Version 2 will be considered as a complete submission for the Prizemoney. 30% extra marks will be added for successfully attempting version 2.

Summary of steps for completing the problem:

  • Step 1: Register yourself on the platform and upload your resume.
  • Step 2: Complete the preliminary screening round for the challenge
  • Step 3: Successful candidates will be invited via email. The required datasets will be shared in the invitation mail.
  • Step 4: Submit your solution for Version1 OR Version2 of the problem. Extra points for Version 2.

Here is the link to the preliminary screening round:




A.1 The questions are asked by the user via a web form of selection of a number of drop down menus.
Metric drop down
State drop down → Selection aggregated or queries data over the state District drop down → Selection queries or aggregates data over the district

A.2 The user will also have the ability to create their own metric (ARTIFICIAL METRIC) by using two metrics and doing basic multiplication, addition, division functions Like:
Density metric= Population metric/ Area metric
Ability to select the ARTIFICIAL METRIC in a drop down and select geography as above. For example:

“Average number of persons in a household” must return a list of all the districts and corresponding number of persons in a household. (A new column is to be derived by dividing total number TOT_P by No_HH)

A.3 Correlations between any TWO metrics: (Calculated on demand from data based on metric compared)
“Relationship between % of girls enrolling in school and the % of female teachers” must return a correlation coefficient for India and if clicked on it then correlation in states.

VERSION 2: DATA CHATBOT For the “experimentalists”

Let’s make this more interesting for the experimentalists amongst you. Let’s do the same as above, but this time with NLP. We recommend the participants to include NLP Packages in the interface developed for better performance and the relevant information on NLP for this could be found on Rasa.ai but we believe that there are numerous other methods and libraries that participant may resort to, maintaining or building your own look up table/dictionary should also suffice.
Few of the sample questions and their answers/approach are listed below:

● “Net enrollment rate of education” should provide a list of states and the value for each state, when clicked on a state it should provide with the districts of that state and their net enrollment rate respectively.
ANDHRA PRADESH : 78.48% . . .

DAMAN AND DIU : 80.05%
DIU : 78.8%
DAMAN : 81.3%

And so on...

● “%houses in rural areas of Anantpur with mobile phones” must return:
ANANTPUR : 0.97%

● “Districts where literacy rate of girls is more than that of boys” must return: JAINTIA HILLS

● “Percentage of working population in Anantpur” must return:

ANANTPUR : 49.89%
Rural: 54.37%
Urban: 38.42%
While the implementation is important, sharp focus on overall approach and thinking, along with problem solving skills is what we are particularly looking for.
Looking forward to some interactive and engaging interfaces.


Main Prizes
Winners (3)

Upto 3 people will be selected based on their code to collaborate with Swaniti-Ank Aha on a part time or full-time basis to deliver the real version of Jaano India chatbot to the country. Each will also be awarded in ₹ 50,000 prize money each.

Social Share

View All Notifications