Predict the Segment - Hotstar

★★★★★

4.5

6 votes

Medium-Hard

Details

Submissions

Discussion

Problem

Problem Statement

Determining the demographics of customers is one of the most key tasks in the advertising domain. Advertisers usually want to target customers based on demographic attributes. However, it is difficult to get demographic data from all the customers since that can add friction to the user experience.

At Hotstar, we have detailed information on all the content that customers watch, let’s call it “watch patterns” and we’d like to use this signal to fine tune demo-targeting.

We are seeking a machine learning based solution using which we can learn patterns from customers whose watch patterns are already known. In this competition, the task is to generate predictive models that can best capture the behaviour. Participants are free to use any open source external data.

Download Dataset

Data Information

A zipped file containing train, test and sample submission files are given. The training dataset consists of data corresponding to 200,000 customers and the test dataset consists of 100,000 customers. Both training and test data is in the form of json dict, where key is masked user ID and value is aggregation of all records corresponding to the user as described below.

Variable	Description
ID	unique identifier variable
titles	titles of the shows watched by the user and watch_time on different titles in the format “title:watch_time” separated by comma, e.g. “JOLLY LLB:23, Ishqbaaz:40”. watch_time is in seconds
genres	same format as titles
cities	same format as titles
tod	total watch time of the user spreaded across different time of days (24 hours format) in the format “time_of_day:watch_time” separated by comma, e.g. “1:454, “17”:5444”
dow	total watch time of the user spreaded across different days of week (7 days format) in the format “day_of_week:watch_time” separated by comma, e.g. “1:454, “6”:5444”
segment	target variable. consider them as interest segments. For modeling, encode pos = 1, neg = 0

Submission

The user has to submit a csv file with the ID and predicted probabilities. Check sample submission file.

ID,segment
test-23855,0.2
test-23854,0.1
test-23857,0.6
test-23856,0.3
test-23851,0.7

Evaluation Metric

Submissions will be evaluated based on AUC ROC score. For more information about this metric, read_here.

Note : This is Round 1 of India Hacks Machine Learning 2017. In this round, two problems are given. Make sure you solve both of them. Go to Problems (See above).

Scripts

Correct way to read files(in R) - Click Here
Simple way to read files(in Python) - Click Here
Simple Features + Random Forest Starter (0.79) - Click Here
Exploring Hotstar Data in R - Click Here
Python Random Forest Starter - Click Here

Update

Amazon AWS is offering credits worth $100 to all participants in this challenge. Please ensure you make at least one submission. Claim Here.

Time Limit: 5

Memory Limit: 256

Source Limit: