How to become a data scientist
“Today’s world is drowning in data and starving for insights. Our digital lives have created an overwhelming flood of information. In the last 5 years data scientists have come to the rescue by trying to make sense of it all. The sexy job in the next 10 years will be statisticians, and I’m not kidding.” – Hal Varian, chief economist at Google
Until the end of the last decade, the word “data scientist” hardly existed. However, new possibilities have opened up new frontiers owing to the huge volumes of data that keeps piling up. And, irrevocable changes in the way businesses are run have spawned loads of analysts and number crunchers to “manage” data and predict successful future strategies and outcomes.
Organizations are still falling over themselves trying to hire data scientist who can harness the power of data to hasten the data-to-action process. Although not as many companies as should be are relying on data-driven decision making, by the turn of this decade, analytics will have taken over. Just ask early adopters such as Facebook, Amazon, and LinkedIn.
Rest assured, automated programs aren’t going to make data scientists obsolete anytime soon.
In this article, you’ll find the most recommended learning path to become a data scientist. In addition, we’ve added links for best tutorials to get started on your data scientist path.
Who is a Data Scientist?
Here’s an interesting definition of a data scientist on the web: “A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
You can read two conflicting views in Larry Wasserman’s Data Science: The End of Statistics? and Andrew Gelman’s Statistics is the least important part of data science.
It is far “safer” to say a data scientist wears many hats!
Data scientists typically uncover commercially valuable information hidden in tons of unstructured and structured data. They apply a formidable skillset of programming, statistics, math, business acumen, great communication, and some psychology on huge data sets to provide actionable insights.
These big data wranglers need to have contextual understanding and intuition to come up with magic. Identifying whether the data is meaningful requires an excellent blend of technical and business skills. And that’s what aspiring data scientists should build on.
Before you organize, package, and deliver data, you need to know how.
What skills do you need to become a consummate data scientist?
Looking at social scientist Drew Conway’s famous Data Science Venn diagram, hacking skills, math and stats knowledge, and substantive expertise (commonly assumed to be domain knowledge) portray the interdisciplinary nature of a data scientist’s strengths.
- A data scientist needs hacking skills to collect and munge e-data, math and stats knowledge to apply the right tools and techniques to glean key insights, and substantive expertise to ask motivating questions and make predictions. Conway says a major part of the data science cycle lies in hacking skills, which is focused on tools such as Python, R, and Hadoop.
- Modeling follows exploring data. This is where math and stat come into play. The trick lies in finding the most suitable technique to apply on big data to identify the least error-prone predictive model.
- The final step would involve a data scientist knowing how to interpret the results and ask interesting questions.
There is a series of Venn diagrams modeled on Conway’s version.
The 2016 version by Gartner perhaps makes more sense with its specific call-outs.
In the video below, Jeff Hammerbacher, Cloudera Co-founder and a prominent data scientist, calls data scientists “data rats.” Hammerbacher, who coined the term data scientist, says there is no perfect background to becoming a data scientist. In practice, he believes there are five components you need to be trained on to do your job properly:
- Data collection and integration
- Data visualization (dashboard design)
- Large-scale experimentation
- Causal inference and observational studies
- Data products (fitting machine learning models, deploying in production, setting up a regular refresh cycle, and evaluating performance)
So what’s in a data scientist’s technical toolkit?
A data scientist has to be more than proficient in the following tools and techniques. We’ve provided a few useful links to help you get an idea about the specific topics.
- Mathematics—linear algebra, calculus
- Statistics—probability, Bayesian stats, inferential and descriptive stats, game theory, optimization
- Programming Languages (R, Python, C++, Java etc)—algorithms and data structures, distributed computing
- Machine learning in R or Python—random forests, neural nets, k-nearest neighbors, decision trees, SVM, etc.
- Databases and querying languages— MySQL, MongoDB, Cassandra
- Big Data technologies—Apache, Hadoop, Pig, Spark, Hive, Flume, Cloudera
- Cloud services like Amazon S3
Check out Top Data Science Skills by Job Role here.
How can you become a data scientist?
Watch this great webinar by Jesse Steinweg-Woods on “How to become a data scientist in 2017” to get answers to some really specific questions.
For the professional:
If you want to switch to a career in data science, then taking free MOOCs (Coursera, Udacity, EdX, Khan Academy) or enrolling for online classes could be your best bet. People from diverse backgrounds can find themselves doing really well in analytical jobs because of their amazing talent for problem-solving, curiosity, and communication skills.
For the student:
Universities world over offer graduate courses in data science, business intelligence, analytics, and big data technologies. For math, statistics, or computer science undergraduates, this could be a fantastic option. If you want to study on your own, that’s fine too because there are lots of free e-books to help you master the skills you need.
Joining competitions, attending data science meet-ups, doing projects for experience, and updating your repertoire of skills will ensure that you are near-perfect for the job. It is really all about practice. And tons of it.
Before you go all out, you can get an internship or join a bootcamp with companies such as Amazon, Zipfian, and Twitter just to be sure that this is the right career choice for you.
Why become a Data Scientist?
You don’t need to be sold the idea. Really?
You love numbers. You love data. But truthfully, aren’t the big bucks and the job security great incentives?
Data scientists make about $130,000 a year on average. Since 2013, job postings for data scientists have grown by 108%. Research says that the career path for a data scientist is expected to touch almost 19% this decade. And, Glassdoor says that data science jobs have great average scores for work-life balance. Data scientists are critical assets for any organization today.
With studies saying that demand is expected to outpace supply and top companies all over looking for the brightest analysts, you can figure out the answer for yourself quite easily.
By the way, did you know there are several job roles in data science?
- A complete guide to hiring a data scientist
- The most promising tech jobs for 2018
- 8 ways to hire a developer [Actionable tips]
Get advanced recruiting insights delivered every month
Now In Tech: AI, Assessments, And The Great Over-Correction
If you are craving some stability and have had enough adventures, then you are definitely not alone. Over the last 3 years, I’ve…
A Detailed Overview: Cost-Per Hire in Recruitment
Before you invest in hiring an employee, you need to ask yourself this one question: “What is the cost of hiring a new…