Inside The Mind Of A Data Scientist
There’s a port somewhere in the world that wants to maximize profits.
Said port hires a data scientist to look at the numerous variables affecting ship movement and operational efficiency – factors that affect profitability in the long run.
The data scientist looks at how many ships enter the port on a daily basis, where they are loaded and unloaded, the size of ships coming in versus the length of the docks where they are anchored, the time lost when a ship of the wrong size enters a dock and then has to re-dock correctly, the number of port employees required to unload a single ship by length and type of cargo, the future plans for the port and the predicted volume of ships entering.
Then they begin their analysis.
Our data hero announces that the port will have to hire at a rate of 3% every year to keep up with increasing volume. They also help the authorities set up a system that helps ships navigate to the correct dock and alerts authorities in advance when a ship is approaching. This leads to increased efficiency overall, better communication between the docks and the ships; thus decreasing time lost in re-docking, and increases profits for the port.
Accounting for seasonal variations in traffic, and the time and effort needed to train the staff in using the new navigation system, the data scientist predicts that the port can look at a probable profit increase of 20% in 3 years.
**The key word here is ‘probable’.**
Let’s read that first part again. The solution seems so simple, right? That simple solution, however, requires months of data crunching and historical analysis to create operational models for the future.
The end result in this scenario is a probability and not a number written in stone, because several factors (trade wars, a pandemic, oil prices, consumer demand) can affect the port’s operations. These are factors one cannot guarantee, or foresee, but a good data scientist is expected to account for all of these and still come up with a reliable prediction.
This is why good data scientists are so in-demand across the tech sector. Also, why assessing and hiring good data scientists is so hard.
Data scientists are not the same as generalist programmers
Assessing a data scientist is not the same as assessing another developer. The above example would have helped you understand the difference between the problems that a data scientist works on and those that a programmer solves.
There are differences even in the skill sets required for a data scientist role, and those required by other developers as illustrated below:
Traditional IDEs, therefore, don’t cut it for data scientists
Most IDEs include a source code editor, debugger, and compiler. They work perfectly for tech assessments for programmers and developers. Not for data science and machine learning assignments though.
In many data science problems, the solution can be a simple prediction or a ‘Yes/No’ answer. Or, if we go back to the question we started this blog with, it can be a prediction about the probability of achieving the desired goal. Is it going to rain in Atlanta tomorrow? Yes. Will my company grow 5X in the next two years? Ummm, there’s a 20% chance of doing that given you do these 10 other things well.
As we have already established, arriving at this answer requires hours of logical analysis. When assessing a data scientist for a job, therefore, recruiters and hiring managers need to be able to understand the logical choices the candidate made while arriving at the seemingly simple conclusion. A traditional IDE is not enough here.
SUBSCRIBE to the HackerEarth blog and enrich your monthly reading with our free e-newsletter – Fresh, insightful and awesome articles like these straight into your inbox from around the tech recruiting world!
Hence, Jupyter Notebooks
At HackerEarth, we have seen an increasing demand for Data Science and Machine Learning – a trend reflected in our year-end recruiter survey as well. To make data science assessments easier for recruiters, we have now integrated Jupyter Notebooks on our assessment platform, which helps recruiters get right inside the mind of the candidate they are trying to hire.
The Jupyter Notebook is an open-source web application that allows users to create and share documents containing live code, equations, visualizations, and narrative text. The easy-to-use, interactive data science environment provided by Jupyter works across several programming languages such as Python and R. Jupyter Notebooks not only work like an IDE, but also as a presentation or education tool, and are great for data science assessments where the candidate is required to answer questions in a visual format.
Here are some of the ways Jupyter Notebooks score over traditional IDEs:
- Individual cells for better analysis
Jupyter Notebooks allow candidates to code using separate units or ‘cells’ that can be used independently of each other while writing code (denoted by red arrows in the image below). This makes it easier for candidates to compute how various data parameters work with each other and to add notes, or to partially write and test code.
This is essential for recruiters to understand the analytical approach taken by the candidate when solving a problem.
- Interactive elements for better data visualization
The Notebook offers an interactive shell with embeddable graphics and tables, reusable cells, and some other presentation features which are relevant to the job at hand. This enables candidates to present their output in a graphical format if needed; something that a traditional IDE does not support.
- Enhanced candidate experience
It is well known that candidates perform better when they are using a test environment they are familiar with. Notebooks are a preferred tool in the data science world. Using the Jupyter platform for an assessment ensures that your candidate is comfortable and ready, and is approaching a problem the way they would in real life.
Better data science assessments are made of these
When the candidate starts the assignment, they are given a choice to use the Monaco editor (IDE) or Jupyter Notebooks. The Notebooks use a dedicated machine to provide enough resources to each user. Thus by ensuring a dedicated machine for every assignment our candidates take, we affirm that the candidate has no restrictions and completely feels at home. This directly translates to better candidate output in the test, and an objective skill-based assessment process.
The most interesting bit about the Jupyter Notebook integration is the output section, which not only captures the final submission in CSV format but allows recruiters to review each and every step taken by the candidate as they solved the data problem before them.
So, even if a candidate gets a Yes/No prediction wrong, you can still review their work to see how they analyzed the data – the most crucial part of a data scientist’s role.
Find better candidates with Jupyter Notebooks. Thank us later!
While data science as a field dates back to 1962 when mathematician John W. Tukey predicted the effect of modern-day electronic computing on data analysis as an empirical science. However, it reached the modern-day tech hiring lexicon only in recent years.
The trends we have seen tell us that tech jobs in AI (Artificial Intelligence), ML (Machine Learning), and Data Science would be the most in-demand roles in the future. With growing opportunities for AI and ML specialists in industries as diverse as banking, fintech, public safety, and healthcare, there will be a surge in these roles in the coming days. Today, every business big or small needs BIG DATA, and with the advent of various technologies that allow easy application of data science, all businesses are looking at using data to make their solutions smarter, their operations more efficient, and their user experiences more personalized.
This predicted surge in hiring also underlines the need to objectively assess and hire the best data scientists in the market. Traditional modes of evaluation do not do justice to the skills and expectations associated with this role. With the Jupyter notebook support on our HackerEarth Assessments platform, however, you can now assess and hire the best data scientists out there, and improve your business pipeline.
Try it out and let us know? You can even mail our product manager Akash Bhat (email@example.com) to know more about this feature.
Recommended Read: HackerEarth’s Complete Guide to Hiring A Data Scientist
Get advanced recruiting insights delivered every month
How Values-Based Recruitment In Tech Solves Hiring Struggles
You won’t attract most candidates – no matter how hard you sell or how much employer branding content you drown them in (even…
HackerEarth, 3 years and a new logo
Few people know of it, but me and Vivek had started working on HackerEarth even before we graduated from college. To be specific…