Social Media analytics

0

0 votes
普通, Medium
Problem

Problem statement

reddit is a social news aggregation website that receives more than 1 million unique users every month. You are required to predict the score of a reddit post.

Data

You have relevant data of the all-time popular posts from the top 2500 subreddits between 2006 and 2013. The top subreddits were determined by subscriber count.

The data set comprises a score that is given to every post that was made during the specified period. The magnitude of a score determines the popularity or failure of a post. The data set also comprises three files:

  • Train
  • Test
  • Master

The master file contains the subscriber count for subreddits. You can use the information from the master file for model training.

Variables in the data

The following variables are available in the data:

  • score: Variable that you need to predict
  • domain: Post that contains the link from this website
  • id: Unique ID assigned to every post
  • title: Title of the post
  • author: Author of the post/user who posted
  • ups: Number of upvotes on the post
  • downs: Number of downvotes on the post
  • num_comments: Number of comments on the post
  • permalink: Unique post link generated by reddit
  • over_18: Indicates whether the post contains obscene content
  • subreddit_id: Unique subreddit ID
  • Is_self: Indicates whether the post is a discussion or not
  • name: Unique name given to a post
  • url: Link shared in the post
  • created_date: Date/time stamp

Variables in the master file

The following variables are available in the master file:

  • uri: subreddit URL
  • subscribers: Number of subscribers
  • name: Name of subreddit

Download data

Submission

id, score

Evaluation

The evaluation metric is scaled Mean Absolute Error (MAE). The maximum value of an error is capped at 100 i.e. you will be assigned a score of zero if your MAE is 100 or more. Higher the score, better the model.

The score is calculated as follows:

(100-MAE), if MAE<100
0, if MAE>100 or more

For more information about MAE, click here.

Time Limit: 5
Memory Limit: 256
Source Limit:
Editor Image

?