In an era where technology plays a significant role in people’s lives, one cannot deny that it changes the way people interact and communicate with others. Today, technology has caused some significant changes in the dating world as well. Online dating is a new trend that is influencing many people around the world.
As a data scientist, you are required to predict the match percentage between the users in a matrix format based on the attributes provided by the user on a dating website.
Note
Based on the user’s sexual orientation, you are required to perform the following:
The dataset folder contains a data.csv file that contains the following structure:
Column name | Description |
user_id | Represents unique user IDs |
username | Represents the name of a user |
age | Represents the age of a user |
status | Represents the relationship status of a user (Single, available, and so on) |
sex | Represents the gender of a user |
orientation | Represents the sexual orientation of a user (gay, bisexual, or straight) |
drinks | Represents if a user likes to drink or not |
drugs | Represents if a user consumes drugs or not |
height | Represents the height of a user in inches |
job | Represents the profession that a user |
location | Represents where a user resides |
pets | Represents if a user likes pets or not |
smokes | Represents if a user smokes or not |
language | Represents the languages spoken by a user |
new_languages | Represents if a user is interested to learn a new language |
body_profile | Represents the type of body a user has |
education_level | Represents the educational level of a user |
dropped_out | Represents if a user dropped out of school or college |
bio | Represents a user's description |
interests | Represents the interests of a user |
other_interests | Represents other interests of a user |
location_preference | Represents the preferred location to find a date |
The submission file is required to be in a matrix format. For example, if the number of users in the dataset provided is 1000, then the submission.csv file must contain a matrix of size 1000×1000.
You can refer to the 'sample submission.csv' file for the sample dataset provided in the dataset folder.
Note: Ensure that 'user_id' of the users is mentioned correctly in the submission.csv file.
The evaluation metric that is used is the root mean square error metric. The score is calculated using the following:
score=max(0,100−root_mean_squared_error(actual,predicted))