Personalized experience to the users on our platform is one of the key metrics we follow at Hotstar. Recommendation engines play a vital role in it by providing a top few items that user is more likely to watch. In this challenge, we are looking for solutions that can improve the quality of movies recommendations.
The problem translates to the prediction of a ranked list of movies that user is likely to watch next given the set of movies he has already watched.
We are looking for a machine learning/deep learning based solution. Participants are free to use any open source external data.
You are given four files to download: train.csv, test.csv, movies_meta.csv and sample_submission.json. You have to build a recommendation system for 6285 users given in the test file.
train.csv (field separator = '|' )
Variable | Description |
---|---|
user_id | unique id of viewer |
title | movie title |
watch_time | movie duration watched by viewer (seconds) |
movies_meta.csv (field separator = '|' )
Variable | Description |
---|---|
title | movie title |
Genre | movie genre |
Language | language of movie available (multiple languages possible) |
Duration | total duration of them movie |
test.csv (field separator = ',')
Variable | Description |
---|---|
user_id | for these users you have to recommend movies |
A participant has to submit a .json file containing key:user_id and value: list of top 20 recommended movies which the user hasn't watched before. Check the sample submission file for the format.
{‘u1’: ['m1','m2','m3',...], ‘u2’: ['m5','m3',....]}
Submissions will be evaluated based on NDCG@20 (Normalized Discounted Cumulative Gain) averaged across all users in the test set.