reddit is a social news aggregation website that receives more than 1 million unique users every month. You are required to predict the score of a reddit post.
You have relevant data of the all-time popular posts from the top 2500 subreddits between 2006 and 2013. The top subreddits were determined by subscriber count.
The data set comprises a score that is given to every post that was made during the specified period. The magnitude of a score determines the popularity or failure of a post. The data set also comprises three files:
The master file contains the subscriber count for subreddits. You can use the information from the master file for model training.
The following variables are available in the data:
The following variables are available in the master file:
id, score
The evaluation metric is scaled Mean Absolute Error (MAE). The maximum value of an error is capped at 100 i.e. you will be assigned a score of zero if your MAE is 100 or more. Higher the score, better the model.
The score is calculated as follows:
(100-MAE), if MAE<100
0, if MAE>100 or more
For more information about MAE, click here.