Tag Recommendation System

5

2 votes
Deep Learning, Natural Language Processing, Hard, Machine Learning
Problem

Problem Statement

HackerEarth wants to improve its customer experience by suggesting tags for any idea submitted by a participant for a given hackathon. Currently, tags can only be manually added by a participant. HackerEarth wants to automate this process with the help of machine learning. To help the machine learning community grow and enhance its skills by working on real-world problems, HackerEarth challenges all the machine learning developers to build a model that can predict or generate tags relevant to the idea/ article submitted by a participant.

You are provided with approximately 1 million technology-related articles mapped to relevant tags. You need to build a model that can generate relevant tags from the given set of articles.

Data Description

The dataset consists of ‘train.csv ’, ‘test.csv’ and ‘sample_submission.csv’. Description of the columns in the dataset is given below:

Variable

Description

id

Unique id for each article

title

Title of the article

article

Description of the article (raw format)

tags

Tags associated with the respective article. If multiple tags are associated with an article then they are seperated by '|'.  

Download Dataset

Torrent File

Submission

The submission file submitted by the candidate for evaluation has to be in the given format. The submission file is in .csv format. Check sample_submission for details. Remember, incase of multiple tags for a given article, they are seperated by '|'. 

id,tags
HE-efbc27d,java|freemarker
HE-d1fd267,phpunit|pear|osx-mountain-lion
HE-ffd4152,javascript|jquery|ajax|onclick
HE-d3ab268,forms|select|dojo
HE-ed2fa45,php|mysql|login|locking|ip-address

For challenge related queries, discussions and announcements join our Slack channel.

Evaluation Metric

The predicted tags will be evaluated on the metrics F1 score. For each article, F1 score is calculated as

F1 score=2recallprecisionrecall+precision

where, 

Precision(u)=|Recommended(u)  Testing(u)||Recommended(u)|

Recall(u)=|Recommended(u)  Testing(u)||Testing(u)|

The final score is calculated as:

Leaderboard score=1nni=1(F1 score)i

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Announcement 

Update: 16th October 2018: Corrections to the dataset have been made, please re-download the dataset.

Update: 21st October 2018 :  The leaderboard metrics has been updated to F1 score.

Update: 10th December 2018 : The final leaderboard has been updated. You can check your final standings.

Time Limit: 5
Memory Limit: 256
Source Limit:
Editor Image

?