Sentiment analysis

★★★★★

0 votes

Easy

Details

Problem

Given dataset contains data of tweets on various airline’s twitter handles.

It contains a total of 12 columns, out of which one column specifies the sentiment of the tweet. All other columns provide various information related to what was the tweet, where was it posted from, when was it posted, it’s retweets; etc.

Your task is to build a machine learning / deep learning model to predict the sentiment of the tweet using all or some of the other given columns.

The submission should be a csv file stating tweet_id along with the predicted sentiment of the tweet.

Please check sample_submissions.csv file and make sure that your submission file in exact same format.

You have to predict the column named “airline_sentiment”.

Description of columns of dataset is given below -

Sr No	Column name	Description
1.	tweet_id	Id of the tweet
2.	airline_sentiment	Sentiment of the tweet
3.	airline_sentiment_confidence	Confidence with which the given sentiment was determined
4.	negativereason_confidence	Confidence with which the negative reason of tweet was predicted
5.	name	Name of the person who tweeted
6.	retweet_count	Number of retweets
7.	text	Text of the tweet whose sentiment has to be predicted
8.	tweet_created	Time at which the tweet was created
9.	tweet_location	Location from where the tweet was posted
10.	user_timezone	Time zone from where tweet was posted
11.	negativereason	Reason for which user posted a negetive tweet
12.	airline	Airline for which the tweet was posted

Files Provided in Dataset -

Following are the files that will be provided in the dataset -

Train.csv - This file contains all the above mentioned columns. You are expected to train your models on this file.
Test.csv - This file contains all the above mentioned columns except “airline_sentiment” column. You have to predict this column for each records given in this file.
Sample_submission.csv - This file consists of sample submissions. Your submission should be in exact same format.

Download dataset

Time Limit: 5

Memory Limit: 256

Source Limit: