Social Media analytics

★★★★★

0 votes

普通, Medium

Details

Problem

Problem statement

reddit is a social news aggregation website that receives more than 1 million unique users every month. You are required to predict the score of a reddit post.

Data

You have relevant data of the all-time popular posts from the top 2500 subreddits between 2006 and 2013. The top subreddits were determined by subscriber count.

The data set comprises a score that is given to every post that was made during the specified period. The magnitude of a score determines the popularity or failure of a post. The data set also comprises three files:

Train
Test
Master

The master file contains the subscriber count for subreddits. You can use the information from the master file for model training.

Variables in the data

The following variables are available in the data:

score: Variable that you need to predict
domain: Post that contains the link from this website
id: Unique ID assigned to every post
title: Title of the post
author: Author of the post/user who posted
ups: Number of upvotes on the post
downs: Number of downvotes on the post
num_comments: Number of comments on the post
permalink: Unique post link generated by reddit
over_18: Indicates whether the post contains obscene content
subreddit_id: Unique subreddit ID
Is_self: Indicates whether the post is a discussion or not
name: Unique name given to a post
url: Link shared in the post
created_date: Date/time stamp

Variables in the master file

The following variables are available in the master file:

uri: subreddit URL
subscribers: Number of subscribers
name: Name of subreddit

Download data

Submission

id, score

Evaluation

The evaluation metric is scaled Mean Absolute Error (MAE). The maximum value of an error is capped at 100 i.e. you will be assigned a score of zero if your MAE is 100 or more. Higher the score, better the model.

The score is calculated as follows:

(100-MAE), if MAE<100
0, if MAE>100 or more

For more information about MAE, click here.

Time Limit: 5

Memory Limit: 256

Source Limit: