How to hire a big data engineer, with Qubole co-founder, Joydeep Sen Sarma (also one half of the team behind Apache Hive)

Wikipedia defines big data as the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Big data engineers work to solve the problems of capture, curation, storage, search, sharing, transfer, analysis and visualisation of data.

But everyone, from engineers, to entrepreneurs, to investors to marketers are talking about it as if it were the next big thing. But rest assured, more than 90% of them wouldn't know what it means. It's become a fad. So much so, that behavioural economics guru, Dan Ariely said this about big data -

"Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it."

The overuse of this phrase becomes particularly worrying when you look at the job descriptions for big data engineers. Adjectives are thrown around carelessly, giving the applicant everything

So what exactly should you be looking for when you're looking for a big data engineer? Speak to enough people who have actually worked on big data projects, and they'll tell you to look for a deep understanding of stats and regressions, correlations, Bayseian modelling and probability theory. Along with this, you're also looking for the usual technology skills - Good with coding, and writing programs for scale. Big data engineers should also not be confused with data scientists. In fact, most of the time, they're working with comparatively smaller data sets.

I had the chance to speak with Joydeep Sen Sarma, co-founder of Qubole, and one the founders of Apache Hive, the popular data warehouse infrastructure built on top of Hadoop, on what he thought about hiring big data engineers. He said, "In my honest opinion, that's a sort of meaningless question. There's no such category called big data. A lot of people who work on data processing backends are just regular programmers with backend expertise."

"If u are talking about people who put things in production - then familiarity with the basic tooling that has emerged in this space helps. It is standard list of software. That would be the closest to 'big-data engineer' I can think of."

I further asked him what a company is actually looking for when they say they want to hire a big data engineer, he says, "Typically they are looking for people who can implement big data projects collecting logs, designing large scale warehouses, building backends for real-time and batch reporting, integrating different backend systems and so on. These are people who work with big data tooling. They are in great demand. Very few of them have to be expert in data science or actually writing code for large scale distributed systems. Their value lies in understanding and mastery of a complex and emerging basket of tools used for manipulating large data sets"

So the next time you're browsing a job site, don't be deterred by the phrase big data. You might be more than a fit for the role. And on the part of the company's, the term big data in the context of engineering is very ambiguous.

Looking to conduct online coding tests to hire developers for your organization? Try HackerEarth Recruit free for 14 days to start creating tests for your candidates right away.


About the Author

Raghu Mohan
Raghu is an engineering grad handles Marketing at HackerEarth. Prior to this, he was an editor at When he’s not working, you can find him at the nearest music shop having a jam session.