Home
/
Blog
/
Tech Assessment
/
Data Visualization for Beginners-Part 3

Data Visualization for Beginners-Part 3

Author
Shubham Gupta
Calendar Icon
July 9, 2018
Timer Icon
15 min read
Share

Bonjour! Welcome to another part of the series on data visualization techniques. In the previous two articles, we discussed different data visualization techniques that can be applied to visualize and gather insights from categorical and continuous variables. You can check out the first two articles here:

In this article, we’ll go through the implementation and use of a bunch of data visualization techniques such as heat maps, surface plots, correlation plots, etc. We will also look at different techniques that can be used to visualize unstructured data such as images, text, etc.

 ### Importing the required libraries   
 import pandas as pd   
 import numpy as np  
 import seaborn as sns   
 import matplotlib.pyplot as plt   
 import plotly.plotly as py  
 import plotly.graph_objs as go  
 %matplotlib inline  

Heatmaps

A heat map(or heatmap) is a two-dimensional graphical representation of the data which uses colour to represent data points on the graph. It is useful in understanding underlying relationships between data values that would be much harder to understand if presented numerically in a table/ matrix.

### We can create a heatmap by simply using the seaborn library.   
 sample_data = np.random.rand(8, 12)  
 ax = sns.heatmap(sample_data)  
Heatmaps, seaborn, python, matplot, data visualization
Fig 1. Heatmap using the seaborn library

Let’s understand this using an example. We’ll be using the metadata from Deep Learning 3 challenge. Link to the dataset. Deep Learning 3 challenged the participants to predict the attributes of animals by looking at their images.

 ### Training metadata contains the name of the image and the corresponding attributes associated with the animal in the image.  
 train = pd.read_csv('meta-data/train.csv')  
 train.head()  

We will be analyzing how often an attribute occurs in relationship with the other attributes. To analyze this relationship, we will compute the co-occurrence matrix.

 ### Extracting the attributes  
 cols = list(train.columns)  
 cols.remove('Image_name')  
 attributes = np.array(train[cols])  
 print('There are {} attributes associated with {} images.'.format(attributes.shape[1],attributes.shape[0]))  
 Out: There are 85 attributes associated with 12,600 images.  
 # Compute the co-occurrence matrix  
 cooccurrence_matrix = np.dot(attributes.transpose(), attributes)  
 print('\n Co-occurrence matrix: \n', cooccurrence_matrix)  
 Out: Co-occurrence matrix:   
  [[5091 728 797 ... 3797 728 2024]  
  [ 728 1614  0 ... 669 1614 1003]  
  [ 797  0 1188 ... 1188  0 359]  
  ...  
  [3797 669 1188 ... 8305 743 3629]  
  [ 728 1614  0 ... 743 1933 1322]  
  [2024 1003 359 ... 3629 1322 6227]]  
 # Normalizing the co-occurrence matrix, by converting the values into a matrix  
 # Compute the co-occurrence matrix in percentage  
 #Reference:https://stackoverflow.com/questions/20574257/constructing-a-co-occurrence-matrix-in-python-pandas/20574460  
 cooccurrence_matrix_diagonal = np.diagonal(cooccurrence_matrix)  
 with np.errstate(divide = 'ignore', invalid='ignore'):  
   cooccurrence_matrix_percentage = np.nan_to_num(np.true_divide(cooccurrence_matrix, cooccurrence_matrix_diagonal))  
 print('\n Co-occurrence matrix percentage: \n', cooccurrence_matrix_percentage)  

We can see that the values in the co-occurrence matrix represent the occurrence of each attribute with the other attributes. Although the matrix contains all the information, it is visually hard to interpret and infer from the matrix. To counter this problem, we will use heat maps, which can help relate the co-occurrences graphically.

 fig = plt.figure(figsize=(10, 10))  
 sns.set(style='white')  
 # Draw the heatmap with the mask and correct aspect ratio   
 ax = sns.heatmap(cooccurrence_matrix_percentage, cmap='viridis', center=0, square=True, linewidths=0.15, cbar_kws={"shrink": 0.5, "label": "Co-occurrence frequency"}, )  
 ax.set_title('Heatmap of the attributes')  
 ax.set_xlabel('Attributes')  
 ax.set_ylabel('Attributes')  
 plt.show()  
Heatmap, data visualization, python, co occurence, seaborn
Fig 2. Heatmap of the co-occurrence matrix indicating the frequency of occurrence of one attribute with other

Since the frequency of the co-occurrence is represented by a colour pallet, we can now easily interpret which attributes appear together the most. Thus, we can infer that these attributes are common to most of the animals.

Machine learning challenge, ML challenge

Choropleth

Choropleths are a type of map that provides an easy way to show how some quantity varies across a geographical area or show the level of variability within a region. A heat map is similar but doesn’t include geographical boundaries. Choropleth maps are also appropriate for indicating differences in the distribution of the data over an area, like ownership or use of land or type of forest cover, density information, etc. We will be using the geopandas library to implement the choropleth graph.

We will be using choropleth graph to visualize the GDP across the globe. Link to the dataset.

 # Importing the required libraries  
 import geopandas as gpd   
 from shapely.geometry import Point  
 from matplotlib import cm  
 # GDP mapped to the corresponding country and their acronyms  
 df =pd.read_csv('GDP.csv')  
 df.head()  
COUNTRY GDP (BILLIONS) CODE
0 Afghanistan 21.71 AFG
1 Albania 13.40 ALB
2 Algeria 227.80 DZA
3 American Samoa 0.75 ASM
4 Andorra 4.80 AND
### Importing the geometry locations of each country on the world map  
 geo = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))[['iso_a3', 'geometry']]  
 geo.columns = ['CODE', 'Geometry']  
 geo.head()  
# Mapping the country codes to the geometry locations  
 df = pd.merge(df, geo, left_on='CODE', right_on='CODE', how='inner')  
 #converting the dataframe to geo-dataframe  
 geometry = df['Geometry']  
 df.drop(['Geometry'], axis=1, inplace=True)  
 crs = {'init':'epsg:4326'}  
 geo_gdp = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)  
 ## Plotting the choropleth  
 cpleth = geo_gdp.plot(column='GDP (BILLIONS)', cmap=cm.Spectral_r, legend=True, figsize=(8,8))  
 cpleth.set_title('Choropleth Graph - GDP of different countries')  
choropleth maps, choropleth graphs, data visualization techniques, python, big data, machine learning
Fig 3. Choropleth graph indicating the GDP according to geographical locations

Surface plot

Surface plots are used for the three-dimensional representation of the data. Rather than showing individual data points, surface plots show a functional relationship between a dependent variable (Z) and two independent variables (X and Y).

It is useful in analyzing relationships between the dependent and the independent variables and thus helps in establishing desirable responses and operating conditions.

 from mpl_toolkits.mplot3d import Axes3D  
 from matplotlib.ticker import LinearLocator, FormatStrFormatter  
 # Creating a figure  
 # projection = '3d' enables the third dimension during plot  
 fig = plt.figure(figsize=(10,8))  
 ax = fig.gca(projection='3d')  
 # Initialize data   
 X = np.arange(-5,5,0.25)  
 Y = np.arange(-5,5,0.25)  
 # Creating a meshgrid  
 X, Y = np.meshgrid(X, Y)  
 R = np.sqrt(np.abs(X**2 - Y**2))  
 Z = np.exp(R)  
 # plot the surface   
 surf = ax.plot_surface(X, Y, Z, cmap=cm.GnBu, antialiased=False)  
 # Customize the z axis.  
 ax.zaxis.set_major_locator(LinearLocator(10))  
 ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))  
 ax.set_title('Surface Plot')  
 # Add a color bar which maps values to colors.  
 fig.colorbar(surf, shrink=0.5, aspect=5)  
 plt.show()  

One of the main applications of surface plots in machine learning or data science is the analysis of the loss function. From a surface plot, we can analyze how the hyperparameters affect the loss function and thus help prevent overfitting of the model.

python, 3d plot, machine learning, data visualization, machine learning, loss function, gradient descent, big data
Fig 4. Surface plot visualizing the dependent variable w.r.t the independent variables in 3-dimensions

Visualizing high-dimensional datasets

Dimensionality refers to the number of attributes present in the dataset. For example, consumer-retail datasets can have a vast amount of variables (e.g. sales, promos, products, open, etc.). As a result, visually exploring the dataset to find potential correlations between variables becomes extremely challenging.

Therefore, we use a technique called dimensionality reduction to visualize higher dimensional datasets. Here, we will focus on two such techniques :

  • Principal Component Analysis (PCA)
  • T-distributed Stochastic Neighbor Embedding (t-SNE)

Principal Component Analysis (PCA)

Before we jump into understanding PCA, let’s review some terms:

  • Variance: Variance is simply the measure of the spread or extent of the data. Mathematically, it is the average squared deviation from the mean position.varaince, PCA, prinicipal component analysis
  • Covariance: Covariance is the measure of the extent to which corresponding elements from two sets of ordered data move in the same direction. It is the measure of how two random variables vary together. It is similar to variance, but where variance tells you the extent of one variable, covariance tells you the extent to which the two variables vary together. Mathematically, it is defined as:

A positive covariance means X and Y are positively related, i.e., if X increases, Y increases, while negative covariance means the opposite relation. However, zero variance means X and Y are not related.

PCA, Principal Component Analysis , dimension reduction, python, machine learning, big data, image classification
Fig 5. Different types of covariance

PCA is the orthogonal projection of data onto a lower-dimension linear space that maximizes variance (green line) of the projected data and minimizes the mean squared distance between the data point and the projects (blue line). The variance describes the direction of maximum information while the mean squared distance describes the information lost during projection of the data onto the lower dimension.

Thus, given a set of data points in a d-dimensional space, PCA projects these points onto a lower dimensional space while preserving as much information as possible.

 principal component analysis, machine learning, dimension reduction technqieus, data visualization techniques, deep learning, ICA, PCA
Fig 6. Illustration of principal component analysis

In the figure, the component along the direction of maximum variance is defined as the first principal axis. Similarly, the component along the direction of second maximum variance is defined as the second principal component, and so on. These principal components are referred to the new dimensions carrying the maximum information.

 # We will use the breast cancer dataset as an example  
 # The dataset is a binary classification dataset  
 # Importing the dataset  
 from sklearn.datasets import load_breast_cancer  
 data = load_breast_cancer()  
 X = pd.DataFrame(data=data.data, columns=data.feature_names) # Features   
 y = data.target # Target variable   
 # Importing PCA function  
 from sklearn.decomposition import PCA  
 pca = PCA(n_components=2) # n_components = number of principal components to generate  
 # Generating pca components from the data  
 pca_result = pca.fit_transform(X)  
 print("Explained variance ratio : \n",pca.explained_variance_ratio_)  
 Out: Explained variance ratio :   
  [0.98204467 0.01617649]  

We can see that 98% (approx) variance of the data is along the first principal component, while the second component only expresses 1.6% (approx) of the data.

 # Creating a figure   
 fig = plt.figure(1, figsize=(10, 10))  
 # Enabling 3-dimensional projection   
 ax = fig.gca(projection='3d')  
 for i, name in enumerate(data.target_names):  
   ax.text3D(np.std(pca_result[:, 0][y==i])-i*500 ,np.std(pca_result[:, 1][y==i]),0,s=name, horizontalalignment='center', bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))  
 # Plotting the PCA components    
 ax.scatter(pca_result[:,0], pca_result[:, 1], c=y, cmap = plt.cm.Spectral,s=20, label=data.target_names)  
 plt.show()  
PCA, principal component analysis, pca, ica, higher dimension data, dimension reduction techniques, data visualization of higher dimensions
Fig 7. Visualizing the distribution of cancer across the data

Thus, with the help of PCA, we can get a visual perception of how the labels are distributed across given data (see Figure).

T-distributed Stochastic Neighbour Embedding (t-SNE)

T-distributed Stochastic Neighbour Embeddings (t-SNE) is a non-linear dimensionality reduction technique that is well suited for visualization of high-dimensional data. It was developed by Laurens van der Maten and Geoffrey Hinton. In contrast to PCA, which is a mathematical technique, t-SNE adopts a probabilistic approach.

PCA can be used for capturing the global structure of the high-dimensional data but fails to describe the local structure within the data. Whereas, “t-SNE” is capable of capturing the local structure of the high-dimensional data very well while also revealing global structure such as the presence of clusters at several scales. t-SNE converts the similarity between data points to joint probabilities and tries to maximize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embeddings and high-dimension data. In doing so, it preserves the original structure of the data.

 # We will be using the scikit learn library to implement t-SNE  
 # Importing the t-SNE library   
 from sklearn.manifold import TSNE  
 # We will be using the iris dataset for this example  
 from sklearn.datasets import load_iris  
 # Loading the iris dataset   
 data = load_iris()  
 # Extracting the features   
 X = data.data  
 # Extracting the labels   
 y = data.target  
 # There are four features in the iris dataset with three different labels.  
 print('Features in iris data:\n', data.feature_names)  
 print('Labels in iris data:\n', data.target_names)  
 Out: Features in iris data:  
  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']  
 Labels in iris data:  
  ['setosa' 'versicolor' 'virginica']  
 # Loading the TSNE model   
 # n_components = number of resultant components   
 # n_iter = Maximum number of iterations for the optimization.  
 tsne_model = TSNE(n_components=3, n_iter=2500, random_state=47)  
 # Generating new components   
 new_values = tsne_model.fit_transform(X)  
 labels = data.target_names  
 # Plotting the new dimensions/ components  
 fig = plt.figure(figsize=(5, 5))  
 ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)  
 for label, name in enumerate(labels):  
   ax.text3D(new_values[y==label, 0].mean(),  
        new_values[y==label, 1].mean() + 1.5,  
        new_values[y==label, 2].mean(), name,  
        horizontalalignment='center',  
        bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))  
 ax.scatter(new_values[:,0], new_values[:,1], new_values[:,2], c=y)  
 ax.set_title('High-Dimension data visualization using t-SNE', loc='right')  
 plt.show()  
Iris data set, Tsne, data visualization of words, data visualization techniques, dimension reduction techniques, higher dimension data
Fig 8. Visualizing the feature space of the iris dataset using t-SNE

Thus, by reducing the dimensions using t-SNE, we can visualize the distribution of the labels over the feature space. We can see that in the figure the labels are clustered in their own little group. So, if we’re to use a clustering algorithm to generate clusters using the new features/components, we can accurately assign new points to a label.

Conclusion

Let’s quickly summarize the topics we covered. We started with the generation of heatmaps using random numbers and extended its application to a real-world example. Next, we implemented choropleth graphs to visualize the data points with respect to geographical locations. We moved on to implement surface plots to get an idea of how we can visualize the data in a three-dimensional surface. Finally, we used two- dimensional reduction techniques, PCA and t-SNE, to visualize high-dimensional datasets.

I encourage you to implement the examples described in this article to get a hands-on experience. Hope you enjoyed the article. Do let me know if you have any feedback, suggestions, or thoughts on this article in the comments below!

Subscribe to The HackerEarth Blog

Get expert tips, hacks, and how-tos from the world of tech recruiting to stay on top of your hiring!

Author
Shubham Gupta
Calendar Icon
July 9, 2018
Timer Icon
15 min read
Share

Hire top tech talent with our recruitment platform

Access Free Demo
Related reads

Discover more articles

Gain insights to optimize your developer recruitment process.

Vibe Coding: Shaping the Future of Software

A New Era of CodeVibe coding is a new method of using natural language prompts and AI tools to generate code. I have seen firsthand that this change makes software more accessible to everyone. In the past, being able to produce functional code was a strong advantage for developers. Today,...

A New Era of Code

Vibe coding is a new method of using natural language prompts and AI tools to generate code. I have seen firsthand that this change makes software more accessible to everyone. In the past, being able to produce functional code was a strong advantage for developers. Today, when code is produced quickly through AI, the true value lies in designing, refining, and optimizing systems. Our role now goes beyond writing code; we must also ensure that our systems remain efficient and reliable.

From Machine Language to Natural Language

I recall the early days when every line of code was written manually. We progressed from machine language to high-level programming, and now we are beginning to interact with our tools using natural language. This development does not only increase speed but also changes how we approach problem solving. Product managers can now create working demos in hours instead of weeks, and founders have a clearer way of pitching their ideas with functional prototypes. It is important for us to rethink our role as developers and focus on architecture and system design rather than simply on typing code.

The Promise and the Pitfalls

I have experienced both sides of vibe coding. In cases where the goal was to build a quick prototype or a simple internal tool, AI-generated code provided impressive results. Teams have been able to test new ideas and validate concepts much faster. However, when it comes to more complex systems that require careful planning and attention to detail, the output from AI can be problematic. I have seen situations where AI produces large volumes of code that become difficult to manage without significant human intervention.

AI-powered coding tools like GitHub Copilot and AWS’s Q Developer have demonstrated significant productivity gains. For instance, at the National Australia Bank, it’s reported that half of the production code is generated by Q Developer, allowing developers to focus on higher-level problem-solving . Similarly, platforms like Lovable enable non-coders to build viable tech businesses using natural language prompts, contributing to a shift where AI-generated code reduces the need for large engineering teams. However, there are challenges. AI-generated code can sometimes be verbose or lack the architectural discipline required for complex systems. While AI can rapidly produce prototypes or simple utilities, building large-scale systems still necessitates experienced engineers to refine and optimize the code.​

The Economic Impact

The democratization of code generation is altering the economic landscape of software development. As AI tools become more prevalent, the value of average coding skills may diminish, potentially affecting salaries for entry-level positions. Conversely, developers who excel in system design, architecture, and optimization are likely to see increased demand and compensation.​
Seizing the Opportunity

Vibe coding is most beneficial in areas such as rapid prototyping and building simple applications or internal tools. It frees up valuable time that we can then invest in higher-level tasks such as system architecture, security, and user experience. When used in the right context, AI becomes a helpful partner that accelerates the development process without replacing the need for skilled engineers.

This is revolutionizing our craft, much like the shift from machine language to assembly to high-level languages did in the past. AI can churn out code at lightning speed, but remember, “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” Use AI for rapid prototyping, but it’s your expertise that transforms raw output into robust, scalable software. By honing our skills in design and architecture, we ensure our work remains impactful and enduring. Let’s continue to learn, adapt, and build software that stands the test of time.​

Ready to streamline your recruitment process? Get a free demo to explore cutting-edge solutions and resources for your hiring needs.

Guide to Conducting Successful System Design Interviews in 2025

What is Systems Design?Systems Design is an all encompassing term which encapsulates both frontend and backend components harmonized to define the overall architecture of a product.Designing robust and scalable systems requires a deep understanding of application, architecture and their underlying components like networks, data, interfaces and modules.Systems Design, in its...

What is Systems Design?

Systems Design is an all encompassing term which encapsulates both frontend and backend components harmonized to define the overall architecture of a product.

Designing robust and scalable systems requires a deep understanding of application, architecture and their underlying components like networks, data, interfaces and modules.

Systems Design, in its essence, is a blueprint of how software and applications should work to meet specific goals. The multi-dimensional nature of this discipline makes it open-ended – as there is no single one-size-fits-all solution to a system design problem.

What is a System Design Interview?

Conducting a System Design interview requires recruiters to take an unconventional approach and look beyond right or wrong answers. Recruiters should aim for evaluating a candidate’s ‘systemic thinking’ skills across three key aspects:

How they navigate technical complexity and navigate uncertainty
How they meet expectations of scale, security and speed
How they focus on the bigger picture without losing sight of details

This assessment of the end-to-end thought process and a holistic approach to problem-solving is what the interview should focus on.

What are some common topics for a System Design Interview

System design interview questions are free-form and exploratory in nature where there is no right or best answer to a specific problem statement. Here are some common questions:

How would you approach the design of a social media app or video app?

What are some ways to design a search engine or a ticketing system?

How would you design an API for a payment gateway?

What are some trade-offs and constraints you will consider while designing systems?

What is your rationale for taking a particular approach to problem solving?

Usually, interviewers base the questions depending on the organization, its goals, key competitors and a candidate’s experience level.

For senior roles, the questions tend to focus on assessing the computational thinking, decision making and reasoning ability of a candidate. For entry level job interviews, the questions are designed to test the hard skills required for building a system architecture.

The Difference between a System Design Interview and a Coding Interview

If a coding interview is like a map that takes you from point A to Z – a systems design interview is like a compass which gives you a sense of the right direction.

Here are three key difference between the two:

Coding challenges follow a linear interviewing experience i.e. candidates are given a problem and interaction with recruiters is limited. System design interviews are more lateral and conversational, requiring active participation from interviewers.

Coding interviews or challenges focus on evaluating the technical acumen of a candidate whereas systems design interviews are oriented to assess problem solving and interpersonal skills.

Coding interviews are based on a right/wrong approach with ideal answers to problem statements while a systems design interview focuses on assessing the thought process and the ability to reason from first principles.

How to Conduct an Effective System Design Interview

One common mistake recruiters make is that they approach a system design interview with the expectations and preparation of a typical coding interview.
Here is a four step framework technical recruiters can follow to ensure a seamless and productive interview experience:

Step 1: Understand the subject at hand

  • Develop an understanding of basics of system design and architecture
  • Familiarize yourself with commonly asked systems design interview questions
  • Read about system design case studies for popular applications
  • Structure the questions and problems by increasing magnitude of difficulty

Step 2: Prepare for the interview

  • Plan the extent of the topics and scope of discussion in advance
  • Clearly define the evaluation criteria and communicate expectations
  • Quantify constraints, inputs, boundaries and assumptions
  • Establish the broader context and a detailed scope of the exercise

Step 3: Stay actively involved

  • Ask follow-up questions to challenge a solution
  • Probe candidates to gauge real-time logical reasoning skills
  • Make it a conversation and take notes of important pointers and outcomes
  • Guide candidates with hints and suggestions to steer them in the right direction

Step 4: Be a collaborator

  • Encourage candidates to explore and consider alternative solutions
  • Work with the candidate to drill the problem into smaller tasks
  • Provide context and supporting details to help candidates stay on track
  • Ask follow-up questions to learn about the candidate’s experience

Technical recruiters and hiring managers should aim for providing an environment of positive reinforcement, actionable feedback and encouragement to candidates.

Evaluation Rubric for Candidates

Facilitate Successful System Design Interview Experiences with FaceCode

FaceCode, HackerEarth’s intuitive and secure platform, empowers recruiters to conduct system design interviews in a live coding environment with HD video chat.

FaceCode comes with an interactive diagram board which makes it easier for interviewers to assess the design thinking skills and conduct communication assessments using a built-in library of diagram based questions.

With FaceCode, you can combine your feedback points with AI-powered insights to generate accurate, data-driven assessment reports in a breeze. Plus, you can access interview recordings and transcripts anytime to recall and trace back the interview experience.

Learn how FaceCode can help you conduct system design interviews and boost your hiring efficiency.

How Candidates Use Technology to Cheat in Online Technical Assessments

Impact of Online Assessments in Technical Hiring In a digitally-native hiring landscape, online assessments have proven to be both a boon and a bane for recruiters and employers. The ease and...

Impact of Online Assessments in Technical Hiring


In a digitally-native hiring landscape, online assessments have proven to be both a boon and a bane for recruiters and employers.

The ease and efficiency of virtual interviews, take home programming tests and remote coding challenges is transformative. Around 82% of companies use pre-employment assessments as reliable indicators of a candidate's skills and potential.

Online skill assessment tests have been proven to streamline technical hiring and enable recruiters to significantly reduce the time and cost to identify and hire top talent.

In the realm of online assessments, remote assessments have transformed the hiring landscape, boosting the speed and efficiency of screening and evaluating talent. On the flip side, candidates have learned how to use creative methods and AI tools to cheat in tests.

As it turns out, technology that makes hiring easier for recruiters and managers - is also their Achilles' heel.

Cheating in Online Assessments is a High Stakes Problem



With the proliferation of AI in recruitment, the conversation around cheating has come to the forefront, putting recruiters and hiring managers in a bit of a flux.



According to research, nearly 30 to 50 percent of candidates cheat in online assessments for entry level jobs. Even 10% of senior candidates have been reportedly caught cheating.

The problem becomes twofold - if finding the right talent can be a competitive advantage, the consequences of hiring the wrong one can be equally damaging and counter-productive.

As per Forbes, a wrong hire can cost a company around 30% of an employee's salary - not to mention, loss of precious productive hours and morale disruption.

The question that arises is - "Can organizations continue to leverage AI-driven tools for online assessments without compromising on the integrity of their hiring process? "

This article will discuss the common methods candidates use to outsmart online assessments. We will also dive deep into actionable steps that you can take to prevent cheating while delivering a positive candidate experience.

Common Cheating Tactics and How You Can Combat Them


  1. Using ChatGPT and other AI tools to write code

    Copy-pasting code using AI-based platforms and online code generators is one of common cheat codes in candidates' books. For tackling technical assessments, candidates conveniently use readily available tools like ChatGPT and GitHub. Using these tools, candidates can easily generate solutions to solve common programming challenges such as:
    • Debugging code
    • Optimizing existing code
    • Writing problem-specific code from scratch
    Ways to prevent it
    • Enable full-screen mode
    • Disable copy-and-paste functionality
    • Restrict tab switching outside of code editors
    • Use AI to detect code that has been copied and pasted
  2. Enlist external help to complete the assessment


    Candidates often seek out someone else to take the assessment on their behalf. In many cases, they also use screen sharing and remote collaboration tools for real-time assistance.

    In extreme cases, some candidates might have an off-camera individual present in the same environment for help.

    Ways to prevent it
    • Verify a candidate using video authentication
    • Restrict test access from specific IP addresses
    • Use online proctoring by taking snapshots of the candidate periodically
    • Use a 360 degree environment scan to ensure no unauthorized individual is present
  3. Using multiple devices at the same time


    Candidates attempting to cheat often rely on secondary devices such as a computer, tablet, notebook or a mobile phone hidden from the line of sight of their webcam.

    By using multiple devices, candidates can look up information, search for solutions or simply augment their answers.

    Ways to prevent it
    • Track mouse exit count to detect irregularities
    • Detect when a new device or peripheral is connected
    • Use network monitoring and scanning to detect any smart devices in proximity
    • Conduct a virtual whiteboard interview to monitor movements and gestures
  4. Using remote desktop software and virtual machines


    Tech-savvy candidates go to great lengths to cheat. Using virtual machines, candidates can search for answers using a secondary OS while their primary OS is being monitored.

    Remote desktop software is another cheating technique which lets candidates give access to a third-person, allowing them to control their device.

    With remote desktops, candidates can screen share the test window and use external help.

    Ways to prevent it
    • Restrict access to virtual machines
    • AI-based proctoring for identifying malicious keystrokes
    • Use smart browsers to block candidates from using VMs

Future-proof Your Online Assessments With HackerEarth

HackerEarth's AI-powered online proctoring solution is a tested and proven way to outsmart cheating and take preventive measures at the right stage. With HackerEarth's Smart Browser, recruiters can mitigate the threat of cheating and ensure their online assessments are accurate and trustworthy.
  • Secure, sealed-off testing environment
  • AI-enabled live test monitoring
  • Enterprise-grade, industry leading compliance
  • Built-in features to track, detect and flag cheating attempts
Boost your hiring efficiency and conduct reliable online assessments confidently with HackerEarth's revolutionary Smart Browser.
Top Products

Explore HackerEarth’s top products for Hiring & Innovation

Discover powerful tools designed to streamline hiring, assess talent efficiently, and run seamless hackathons. Explore HackerEarth’s top products that help businesses innovate and grow.
Frame
Hackathons
Engage global developers through innovation
Arrow
Frame 2
Assessments
AI-driven advanced coding assessments
Arrow
Frame 3
FaceCode
Real-time code editor for effective coding interviews
Arrow
Frame 4
L & D
Tailored learning paths for continuous assessments
Arrow
Get A Free Demo