A data scientist’s job is one of the most sought after jobs of the 21st century. But how do you hire a data scientist who fits the bill?
According to Firstround.com, in a competitive field like Data Science, strong candidates often receive 3 or more offers, so the success rates of hiring are typically below 50%. The key is to have prospective candidates go through the recruiting process quickly, thus helping recruiters close data scientist positions faster. This is possible only if the right objective is set before hiring starts.
Organizational use cases of Data Science
It is imperative for your organization to set the right expectations for the Data Science platform and for your hiring needs to align with it. You could have a large amount of data and no idea about what to do with it.
In most cases, organizations look at achieving the following using Data Science:
Solve optimization problems
Simply put, reshaping processes by analyzing data. An example could be a logistics company where the supply chain can be optimized so that delivery drivers can use less fuel and reach customers faster
Using data to form predictive models for companies to better understand their target customers. E-commerce companies use this to recommend products based on the consumer buying behavior and also monitor stock levels in warehouses
Provide business intelligence
Business intelligence is all about data management— arranging data, and producing information from data via dashboards. These business insights play an important role in the decision-making process of any organization
Combination of the points mentioned above
Some organizations also look at combining various aspects from the areas discussed above to derive meaningful insights and also drive product decisions.
Data Science Project life cycle
Data Science job parameters
Now that the ultimate goal of data science within your organization has been set, every hiring manager needs to look at certain skills that are important for data scientists to have.
Statistics and linear algebra
This is a decision-making skill. Prospective candidates should be good at collecting, analyzing, and making inferences from data
This is the art of classifying or grouping data for prediction. An ideal data scientist should be able to use big data technologies to create pipelines that feed Machine Learning algorithms
This refers to handling and cleaning data. A data scientist should be able to visualize and mine raw data to derive meaningful insights from it
A data scientist should be able to maximize the outcome based on factors that he/she can control
Every data scientist should be well versed in the following:
– Libraries such as pandas, NumPy, scikit-learn, OpenCV, and Matplotlib
– Data structures and algorithms, Excel, Tableau, Hadoop, SAS, etc.
Other skills that are good to have for a data scientist include natural language processing (NLP), image recognition, time series analysis, econometrics, etc.
Different types of data scientists
Let us now look at whom to hire. Data scientists are broadly classified into two—Researchers and Engineers. For any organization, it is good to have a mix of both.
Things to look out for when hiring a researcher
Data researchers have a strong background in math or statistics. They should be skilled to develop custom algorithms to make the most of data and inquisitive to find solutions from data. They should be well-versed in technical skills such as R, Python, and SQL. To pull data, candidates should be able to understand relational databases. Using SQL to query data is a needed skill and having an experience of storing data using NoSQL is a plus point.
Things to look out for when hiring a data engineer
Data engineers typically have a stronger coding background. They should be capable of structuring things well and prototyping quickly. They should be well-versed in data tools and languages such as Python, Scala, Java, and MATLAB. For the extracted data to be used, engineers should be capable of creating a visualization or building a Machine Learning model.
Skills to assess in a Data Scientist
Finding the right candidate for the role of a data scientist can be tricky and challenging. This article will help you understand what Data Science is and what skill sets to look for in a candidate when hiring for a data scientist.
Data science is an interdisciplinary field that uses a blend of data inference and algorithm development to solve complex analytical problems. An ideal candidate has skills in the following 3 fields:
- Mathematics and statistics
- Machine Learning and programming
- Business/domain knowledge
Mathematics and statistics
A candidate applying for the role of a data scientist should have a good understanding of certain mathematical concepts. This includes topics like statistics (both descriptive and inferential), linear algebra, probability, and differential calculus.
Machine Learning and programming
Any candidate applying for the role of a data scientist should have strong programming skills. The candidate must have a good understanding of basic programming concepts, data structures such as trees and graphs, and the most-commonly used algorithms. The candidate should be able to code in either of the languages—Python or R—which are the most widely-used languages in Data Science.
Apart from programming skills, the candidate should have a good understanding of Machine Learning concepts such as:
- Classification and regression
- Supervised learning and unsupervised learning
- Clustering algorithms such as k-means and k-nearest neighbor
- Decision trees and random forest classifiers
- Naive Bayes algorithm
- Boosting and bagging
- Bias—Variance Tradeoff
- Binary, multiclass, and multi-label classification
- Neural networks
- Knowledge of different metrics used to evaluate the performance of a model
Candidates should have a basic understanding of the business or the industry in which they are applying for as data scientists. They should be able to understand the problem from the perspective of the company’s business, translate that problem into a Data Science problem, and solve it using the skill sets described above. Finally, he should be able to present insights from the solution effectively. However, it is important to keep in mind that the depth of the business or domain knowledge will depend upon the experience of the candidates.
Data Scientist salaries
- Experience – People who are experienced in data science, engineering or analytics get paid more than others with lesser experience. Also Data Scientists in managerial roles tend to be paid higher
- Academic achievement – Data Scientists with PHDs make more on average than those with Bachelors’ degrees
- Company size – Salary of a Data Scientist also depends on the size of the organization hiring the Data Scientist. Though lots of startups hire Data Scientists at competitive salaries, there are a lot of smaller start ups which pay lesser than the industry average
Sourcing Data Science talent
Tech communities are full of potential hires waiting to be discovered. Here are 3 such communities from where you can source talent for free.
Hiring Data Scientists from GitHub
GitHub is one of the world’s largest code hosts, with close to 31 million developers. It’s like a tech recruiter’s dream. A developer’s GitHub profile gives you a wealth of information.
- The first step is to create a profile on GitHub
- Once the profile is created, run a search using 3 parameters – language, location, and followers.
- By default, GitHub shows results for the list of repositories. You can change this to users by choosing it from the left hand side menu. You now have a list of developers you can reach out to.
- Check their repositories to familiarize yourself with their work. This would be mutually beneficial as you can filter out candidates who you think will not fit into the job role on offer.
- Cross-reference their profiles on either Linkedin or Twitter to be doubly sure if they would be a perfect fit or not.
- Don’t judge profiles on how active or complete they are. Sometimes developers do not tend to share code publicly for security reasons. Also, not having a great social following is not an indication of how good their tech skills are.
Hiring developers from StackOverflow
StackOverflow is a Q&A site for professional and enthusiast programmers. Just like GitHub, StackOverflow is also a great platform to hire amazing Data Science talent.
- StackOverflow is more of a Q&A site where developers post and answer technical questions. You would need to look at candidates addressing such specific questions to see if they fit your requirements.
- Developers are segregated based on their user badges and reputation scores. An ideal candidate ranks high for both.
- Every question which is posted has tags associated with it. You can use these tags to find users who fit the bill.
Hiring Data Scientists from Machine Learning challenges and hackathons
Hackathons and coding challenges are great ways for candidates to show their skills in action. When you are hiring top Data Science talent, testing candidates on real-time problem-solving skills can boost your recruitment efforts.
JDs for Data Science roles
Here are some sample Data Science job descriptions for hiring challenges at HackerEarth –
Data Scientist Job Description
HackerEarth provides enterprise software solutions that help organisations with their innovation management and technical recruitment needs. HackerEarth has conducted 1000+ hackathons and 10,000+ programming challenges till date. Since its inception, HackerEarth has built a developer base of over 2 million+.
As a Data Science Engineer, you will significantly contribute to identifying best-fit architectural solutions for one or more projects; apply data science techniques to analyze large amounts of data, presenting data insights using high impact visualization, provide regular support/guidance to project teams on complex coding, issue resolution and execution. You will collaborate with some of the best talent in the industry to create and implement innovative high quality solutions. You will be part of a learning culture, where teamwork and collaboration are encouraged, excellence is rewarded, and diversity is respected and valued.
- Bachelor’s degree or foreign equivalent required. Master’s in Statistics, Mathematics, Computer Science or another quantitative field (Preferred)
- At least 4 years of experience and excellent understanding of: Machine learning techniques and algorithms for classification, clustering and prediction such as Neural Networks, Naive Bayes, SVM, Decision Forests, etc. NLP, text analytics technologies.
- Common data science toolkits such as Python Data Science Libraries, R, MatLab, etc. Excellence in Python is highly desirable. Ability to enhance the standard algorithms is highly expected.
- Developing the algorithms and testing on the real data sets and fine tuning the algorithms to ensure business objectives are met.
- Implementing the ML algorithms in the production instance and integrating with necessary data sources to address specific business problems, Extending to add custom algorithms
- Big data technology of HDFS, Hive, Spark, Scala etc. Data visualization tools such as Tableau, Query languages such as SQL, Hive.
- Good applied statistics skills, such as distributions, statistical testing, regression, etc.
Roles and Responsibilities
- You will be a core member of a team that does whatever it takes to delight customers, take an iterative and result oriented approach to software development. In this position you will provide best-fit architectural solutions for multi-product, multi-project, multi-industry portfolios providing technology consultation and assisting in defining scope and sizing of work.
- You will be responsible for delivering high-value next-generation products on aggressive deadlines and will be required to write high-quality, highly optimized/high-performance and maintainable code that your fellow developers love.
- You will be the anchor in Proof of Concept developments and support opportunity identification and pursuit processes and evangelize Infosys brand
- You will collaborate with some of the best talent in the industry to create and implement innovative high quality solutions, lead and participate in sales and pursuits focused on our clients’ business needs
- You will be part of a learning culture, where teamwork and collaboration are encouraged, excellence is rewarded, and diversity is respected and valued
- The role involves high end technology and hence would require you to be an expert in coding.
Recruiter email templates
Subject – Join our amazing Data Science team at <Company name>
I am <Name> and I work as a Recruiter for <Company name>. I came across your profile on <Social media or Job board> and I was very impressed with your skills especially <describe a project or a particular programming skill set>.
We are currently looking for a Data Scientist to join our amazing team and I think you would be a great fit. Here are some of the cool projects that we are working on currently – <provide a link to projects at your organization>
If this is something that interests you, please write back to me and I will be happy to explain more over a call.
Have a great day, and I hope to hear back from you soon!
Subject – Following up!
Hope you are doing great!
Have you had a chance to read my previous mail?
We are looking for some super talented Data Scientists to join our team at <Company name> and I thought you would be a great fit.
Our Data Science team has been working on some cool projects <link some of your work> and I thought you would find them interesting.
And if you are wondering what it is like to work for , here is a short video of what our employees think – <Include an employer branding video>
If you are interested in this opportunity, do drop me an email so we can take this forward. Have a great day!
Assessing Data Scientists using a developer assessment software
HackerEarth’s Developer Assessment platform
HackerEarth’s developer assessment platform can help you streamline your Data Science recruitment in two simple steps:
1. Testing Data Science skills within a short time frame using data science questions
Solving a real-world Machine Learning problem involves many tasks such as data exploration, data analysis, data preprocessing, model creation, model training, and testing, etc. Hence, evaluating the skills of candidates on real-world problems can take a long time. Therefore, to assess the skills of candidates, our platform offers a set of approximate questions where large datasets are broken down into simpler ones so that candidates can exhibit their skills within the stipulated time frame. This also helps hiring managers shortlist candidates to work on more in-depth projects or even finalize candidates for entry-level positions.
2. Testing data science skills using elaborate data sets
The developer assessment platform also offers recruiters the opportunity to assess candidates’ skills on real-world Machine Learning problems. These questions typically take longer to solve and help to evaluate candidates better, before they are moved ahead to further interview rounds or before rolling out the final offer.
Candidates are given training and testing datasets. The candidates train their model based on the given training dataset and then use that model to predict the values of the testing dataset. The candidates finally upload a .csv file (containing the predictions of the testing dataset) along with the code file. The platform automatically assesses the predictions submitted by the candidates and generates an accuracy score. The platform provides an option for a leaderboard that sorts candidates based on the score they receive.
The platform also allows recruiters to get an overview of the test and even monitor the performance of all the participants and the currently active participants along with an option to shortlist candidates. Additionally,, recruiters can request a detailed report of all the participating candidates which is directly emailed to the recruiters’ email.
Data Science interview questions
We’ve asked a couple of Data Scientists on Reddit on what they would like to be quizzed on. This is what they said –