Tag Recommendation System

★★★★★

2 votes

Deep Learning, Natural Language Processing, Hard, Machine Learning

Details

Problem

Problem Statement

HackerEarth wants to improve its customer experience by suggesting tags for any idea submitted by a participant for a given hackathon. Currently, tags can only be manually added by a participant. HackerEarth wants to automate this process with the help of machine learning. To help the machine learning community grow and enhance its skills by working on real-world problems, HackerEarth challenges all the machine learning developers to build a model that can predict or generate tags relevant to the idea/ article submitted by a participant.

You are provided with approximately 1 million technology-related articles mapped to relevant tags. You need to build a model that can generate relevant tags from the given set of articles.

Data Description

The dataset consists of ‘train.csv ’, ‘test.csv’ and ‘sample_submission.csv’. Description of the columns in the dataset is given below:

Variable	Description
id	Unique id for each article
title	Title of the article
article	Description of the article (raw format)
tags	Tags associated with the respective article. If multiple tags are associated with an article then they are seperated by '\|'.

Download Dataset

Torrent File

Submission

The submission file submitted by the candidate for evaluation has to be in the given format. The submission file is in .csv format. Check sample_submission for details. Remember, incase of multiple tags for a given article, they are seperated by '|'.

id,tags
HE-efbc27d,java|freemarker
HE-d1fd267,phpunit|pear|osx-mountain-lion
HE-ffd4152,javascript|jquery|ajax|onclick
HE-d3ab268,forms|select|dojo
HE-ed2fa45,php|mysql|login|locking|ip-address

For challenge related queries, discussions and announcements join our Slack channel.

Evaluation Metric

The predicted tags will be evaluated on the metrics $F1\ score$ . For each article, $F1\ score$ is calculated as

$F1\ score = \frac{2*recall * precision} {recall + precision}$

where,

$Precision(u) = \frac{|Recommended(u)\ \cap \ Testing(u)|}{|Recommended(u)|}$

$Recall(u) = \frac{|Recommended(u)\ \cap \ Testing(u)|}{|Testing(u)|}$

The final score is calculated as:

$Leaderboard\ score = \frac{1}{n} \cdot \sum_{i=1}^{n}{(F1\ score)_i}$

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Announcement

Update: 16th October 2018: Corrections to the dataset have been made, please re-download the dataset.

Update: 21st October 2018 : The leaderboard metrics has been updated to F1 score.

Update: 10th December 2018 : The final leaderboard has been updated. You can check your final standings.

Time Limit: 5

Memory Limit: 256

Source Limit: