Sachin Gupta

Co-founder at HackerEarth
Bangalore, Karnataka, India
Django, CUDA, Node.js, jQuery
Tools & Libraries:
Matlab, Vim, Git, MySQL
Indian Institute of Technology - Roorkee
Join HackerEarth and view Sachin Gupta's full profile. It's absolutely free!
HackerEarth is the network of top developers across the world, enabling them to connect to start-ups, tech companies, organizations and discover the best developer jobs.
Language Activity
Technical Skills
Backend Development, Frontend Development
Django, CUDA, Node.js, jQuery
Tools & Libraries
Matlab, Vim, Git, MySQL, jMeter
Work Experience
HackerEarth, Bangalore, Karnataka, India
Oct 2012 - Present (4 years and 6 months)
HackerEarth's is helping companies around the world hire better programmers, the smarter way. Finding the right kind of developers is becoming increasing difficult for companies today. HackerEarth is a discovery platform where companies can find relevant talent for their tech teams and like wise programmers can find companies where they would want to work.
Software Developer
Google, Bangalore, Karnataka, India
Jul 2012 - Oct 2012 (4 months)
Worked on the Confucius project of the Emerging Markets team, where my work was to use a push notification engine to convert all the feeds and notification on the website to real time updates.
Skills: C++ | Java | JavaScript
Microsoft, Hyderabad, Andhra Pradesh, India
May 2011 - Jul 2011 (3 months)
Developed a Test Framework consisting of Load Distributor and Log Parser to carry out scale testing of Microsoft Online Backup Service. The basic requirement was to develop a structure which could as per test specification, generate artificial clients that were to access the Backup service, record the success or failure of the actions and generate consolidated metrics at the completion of the test input.
Motion Detection in Low Resolution Grayscale Videos Using Fast Normalized Cross Correrelation on
17 Apr, 2010
Motion estimation (ME) has been widely used in many computer vision applications, such as object tracking, object detection, pattern recognition and video compression. The most popular block based similarity measures are the sum of absolute differences (SAD), the sum of squared differences (SSD) and the normalized cross correlation (NCC). Similarity measure obtained using NCC is more robust under varying illumination changes as compared to SAD and SSD. However NCC is computationally expensive and application of NCC using full or exhaustive search method further increases required computational time. Relatively efficient way of calculating the NCC is to pre-compute sum-tables to perform the normalization referred to as fast NCC (FCC). In this paper we propose real time implementation of full search FCC algorithm applied to gray scale videos using NVIDIA’s Compute Unified Device Architecture (CUDA). We present fine-grained optimization techniques for fully exploiting computational capacity of CUDA. Novel parallelization strategies adopted for extracting data parallelism substantially reduce computational time of exhaustive FCC. We show that by efficient utilization of global, shared and texture memories available on CUDA, we can obtain the speedup of the order of 10x as compared to the sequential implementation of FCC.
Skills: Cuda | Image Processing
Efficient Variable Size Template matching Using Fast Normalized Cross Correlation on Multicore
LNCS Springer
18 Dec, 2011
Normalized Cross Correlation (NCC) is an efficient and robust way for finding the location of a template in given image. However NCC is computationally expensive. Fast normalized cross correlation (FNCC) makes use of pre-computed sum-tables to improve the computational efficiency of NCC. In this paper we propose a strategy for parallel implementation of FNCC algorithm using NVIDIA’s Compute Unified Device Architecture (CUDA) for real-time template matching. We also present an approach to make proposed method adaptable to variable size templates which is an important challenge to tackle. Efficient parallelization strategies adopted for pre-computing sum-tables and extracting data parallelism by dividing the image into series of blocks substantially reduce required computational time. We show that by optimal utilization different memories available on CUDA and using idling time of host CPU to perform independent tasks we can obtain the speedup of the order of 17X as compared to the sequential implementation.
Skills: Cuda | Image Processing
Indian Institute of Technology - Roorkee
B.Tech, Computer Science
2008 - 2012
Skills: C++ | Python | Django | Cuda
Rainbow High School
10+2, Sciences, Commerce
1994 - 2008
Subscribe to HackerEarth news
Join programming club on Facebook

View All Notifications