Given dataset contains data of tweets on various airline’s twitter handles.
It contains a total of 12 columns, out of which one column specifies the sentiment of the tweet. All other columns provide various information related to what was the tweet, where was it posted from, when was it posted, it’s retweets; etc.
Your task is to build a machine learning / deep learning model to predict the sentiment of the tweet using all or some of the other given columns.
The submission should be a csv file stating tweet_id along with the predicted sentiment of the tweet.
Please check sample_submissions.csv file and make sure that your submission file in exact same format.
You have to predict the column named “airline_sentiment”.
Description of columns of dataset is given below -
Sr No |
Column name |
Description |
1. |
tweet_id |
Id of the tweet |
2. |
airline_sentiment |
Sentiment of the tweet |
3. |
airline_sentiment_confidence |
Confidence with which the given sentiment was determined |
4. |
negativereason_confidence |
Confidence with which the negative reason of tweet was predicted |
5. |
name |
Name of the person who tweeted |
6. |
retweet_count |
Number of retweets |
7. |
text |
Text of the tweet whose sentiment has to be predicted |
8. |
tweet_created |
Time at which the tweet was created |
9. |
tweet_location |
Location from where the tweet was posted |
10. |
user_timezone |
Time zone from where tweet was posted |
11. |
negativereason |
Reason for which user posted a negetive tweet |
12. |
airline |
Airline for which the tweet was posted |
Files Provided in Dataset -
Following are the files that will be provided in the dataset -
Train.csv - This file contains all the above mentioned columns. You are expected to train your models on this file.
Test.csv - This file contains all the above mentioned columns except “airline_sentiment” column. You have to predict this column for each records given in this file.
Sample_submission.csv - This file consists of sample submissions. Your submission should be in exact same format.