Predict the match percentage

5

2 votes
Machine Learning
Problem

In an era where technology plays a significant role in people’s lives, one cannot deny that it changes the way people interact and communicate with others. Today, technology has caused some significant changes in the dating world as well. Online dating is a new trend that is influencing many people around the world.

As a data scientist, you are required to predict the match percentage between the users in a matrix format based on the attributes provided by the user on a dating website.

Note

Based on the user’s sexual orientation, you are required to perform the following:

  • If a user is heterosexual (prefers the opposite sex), then the match percentage must be 0 for this user with respect to other users of the same gender if the other users have the same behavior.
  • If a user is a homosexual (prefers the same sex), then the match percentage must be 0 for this user with respect to other users of the opposite gender if the other users have the same behavior.
  • The match percentage of a user with her/himself must be zero.

Dataset description

The dataset folder contains a data.csv file that contains the following structure:

Column name Description
user_id Represents unique user IDs
username Represents the name of a user
age Represents the age of a user
status Represents the relationship status of a user (Single, available, and so on)
sex Represents the gender of a user
orientation Represents the sexual orientation of a user (gay, bisexual, or straight)
drinks Represents if a user likes to drink or not
drugs Represents if a user consumes drugs or not
height Represents the height of a user in inches
job Represents the profession that a user
location Represents where a user resides 
pets Represents if a user likes pets or not
smokes Represents if a user smokes or not
language Represents the languages spoken by a user
new_languages Represents if a user is interested to learn a new language
body_profile Represents the type of body a user has
education_level Represents the educational level of a user
dropped_out Represents if a user dropped out of school or college
bio Represents a user's description
interests Represents the interests of a user
other_interests Represents other interests of a user
location_preference Represents the preferred location to find a date

Submission file format

The submission file is required to be in a matrix format. For example, if the number of users in the dataset provided is 1000, then the submission.csv file must contain a matrix of size 1000×1000.  

You can refer to the 'sample submission.csv' file for the sample dataset provided in the dataset folder.

Note: Ensure that 'user_id' of the users is mentioned correctly in the submission.csv file.

Evaluation metric

The evaluation metric that is used is the root mean square error metric. The score is calculated using the following:

score=max(0,100root_mean_squared_error(actual,predicted))

 

Time Limit: 5
Memory Limit: 256
Source Limit:
Contributers:
Editor Image

?