Find out how HackerEarth can boost your tech recruiting

Learn more
piller_image

Building your own Lisp Parser Part I

While writing a full-blown compiler for a programming language is a difficult and frustrating task, writing a smaller and more specific parser can be surprisingly easy if you know a small trick.

On the other hand, parsing problems pops up at several places in modern-day programming. So, learning this useful trick can be rewarding.

Source: http://mox.ingenierotraductor.com/2015/12/translation-is-like.html

Prerequisites

You need to know the basics of Python, more specifically, you should know the concepts of recursion and flow of control.

Objectives

After reading and understanding this post, you will be able to create simple calculators, interactive interpreters, parsers, very limited and small programming languages, etc. In general, you should be able to take input, tokenize it, perform whatever actions you want to on the tokens, and output the result of the process.

At the end of this post, you will have created a simple, Lisp-like prefix calculator. Following is a demonstration of how it’s going to look:

> ( + 3 2 )
= 5
> ( / 2 0 )
DivisionByZero
> ( – -3 2 )
= -5
> -2
= -2
> ( + ( * 3 2 ) 5 )
= 11

Step 1: Writing the Grammar

The first step to writing a parser is to write a clear grammar for its syntax. The grammar determines what is and what is not right. Once you have written the grammar, translating it to Python code is a trivial chore. The grammar will serve as our first pseudocode.

For our tiny calculator, we know that the input can come in two forms: a Number (-2, .5, +8, 8.5, 9.) or a more complicated Expression begins with a (, followed by an operator, etc.).

For writing a grammar, we need to identify different elements of the syntax. So far, we have Expression, Number, and Operator. The next important thing to do is to structure the elements (known as terms) into a hierarchical form. This is shown below:

Expression:
Number

( Operator Expression Expression )

Number:
a floating-point number ([-+][0-9][*\.][0-9]*)

Operators:

            +
            –
            *
            /

You will notice that Operator and Expression have no parent; they are independent terms.

A grammar is read from the bottom up and different choices appear on distinct lines. Our grammar says that:

  • an Operator is one of +, -, *, /.
  • a Number is a floating-point number which matches the RegEx [-+][0-9]*[\.][0-9]*
  • an Expression is either a Number or a ( followed by an Operator, followed by two other Expressions, and finally ends in a ). Note that the definition of an Expression is recursive.

Step 2: Translating the Grammar into Pseudocode

Pseudocode is fake code resembling English which is supposed to be an intermediate code that can easily be converted into real code. Although writing pseudocode is optional, it is really helpful.

The trick here is to put each term from our grammar into a separate function. Whenever we need to apply the grammar of a certain term, we only have to call the function. Following is the pseudocode implementing the grammar above:

https://gist.github.com/HackerEarthBlog/f0a5a4304326936142da39b0d853f944

This is our rough pseudo-code that should be good enough for our purpose. In the next step, we will write the real code.

3. Writing the Code

It is said very profoundly about Python that reading and writing Python feels like doing pseudocode. The same applies here, but there is one small caveat— Python doesn’t provide any function for “unreading” or putting a character back in the input buffer.

For this, I have created a small class which extends the file object to include this feature. To keep things simple, I have avoided inheritance and my class is not compatible with the file object provided by Python. Treat it like a black-box if you don’t want to understand it.

https://gist.github.com/HackerEarthBlog/6465f93e1ca155ded5e8b0c8294f16ba

Here is the buffer.py file which handles buffered input:

https://gist.github.com/HackerEarthBlog/5330e5f11f96a22608b45affa61fa858

Explanation

expression():

expression() is our top-level function and maps the Expression grammar term. We first ignore all the whitespace. After that, it takes a single non-whitespace character as input and checks it against several possibilities.

If the input string starts with +, -, ., or a digit, it is a number. We put the character back and input the entire number.

If the input string starts with (, a complete expression is to follow. We input the operator, two more expressions which will serve as the operands, and finally the closing parenthesis. We then calculate the result and return it.

number():

The number function maps the Number grammar term and is very simple—just a wrapper around getword. We input a whole word and if it converts to a float, we return it, otherwise the function returns an error message.

operator():

The operator function inputs a single character and tests it for equality against several known operators. Like the above two functions, it also maps a grammar term, i.e., Operator. In case the given operator is not valid, an error message is returned.

calc():

The calc function is actually not necessary but makes the code substantially better. In an ideal program, each function should do only one logical task. calc removes some burden from expression.

UngetableInput

Although Python 3 supports buffered input through stdin.buffer, Python 2 has no such facility. Plus, Python 3’s stdin.buffer would still require us to create some wrapper of our own.

The UngetableInput class wraps Python’s basic input to go through a buffer. We take input into the buffer and put a character back into the buffer when ungetc is called. Unless the buffer is empty, all input comes from the buffer.

Homework

This code works and leaves a lot of cleaning as homework for the reader. 🙂 Following is a list of things you can do to improve and extend the rudimentary calculator:

Improve buffer.py to handle input whitespace more accurately. Hint: You might want to use a string as the buffer.
Implement a function to get a single character while skipping all whitespace and replace the whitespace skipping loop with it.
Add the ability to create variables. Following the Lisp syntax, it should look something like the following:

( define var_name 839457.892 )

What’s Coming Next?

One of the most important parts of our program is the input buffer we created. Unfortunately, it’s not general purpose and can break when used in something more complicated than our tiny calculator program. In the next article, we will examine a bigger module which does this chore better.

Hackerearth Subscribe

Get advanced recruiting insights delivered every month

Related reads

Best Interview Questions For Assessing Tech Culture Fit in 2024
Best Interview Questions For Assessing Tech Culture Fit in 2024

Best Interview Questions For Assessing Tech Culture Fit in 2024

Finding the right talent goes beyond technical skills and experience. Culture fit plays a crucial role in building successful teams and fostering long-term…

Best Hiring Platforms in 2024: Guide for All Recruiters
Best Hiring Platforms in 2024: Guide for All Recruiters

Best Hiring Platforms in 2024: Guide for All Recruiters

Looking to onboard a recruiting platform for your hiring needs/ This in-depth guide will teach you how to compare and evaluate hiring platforms…

Best Assessment Software in 2024 for Tech Recruiting
Best Assessment Software in 2024 for Tech Recruiting

Best Assessment Software in 2024 for Tech Recruiting

Assessment software has come a long way from its humble beginnings. In education, these tools are breaking down geographical barriers, enabling remote testing…

Top Video Interview Softwares for Tech and Non-Tech Recruiting in 2024: A Comprehensive Review
Top Video Interview Softwares for Tech and Non-Tech Recruiting in 2024: A Comprehensive Review

Top Video Interview Softwares for Tech and Non-Tech Recruiting in 2024: A Comprehensive Review

With a globalized workforce and the rise of remote work models, video interviews enable efficient and flexible candidate screening and evaluation. Video interviews…

8 Top Tech Skills to Hire For in 2024
8 Top Tech Skills to Hire For in 2024

8 Top Tech Skills to Hire For in 2024

Hiring is hard — no doubt. Identifying the top technical skills that you should hire for is even harder. But we’ve got your…

How HackerEarth and Olibr are Reshaping Tech Talent Discovery
How HackerEarth and Olibr are Reshaping Tech Talent Discovery

How HackerEarth and Olibr are Reshaping Tech Talent Discovery

In the fast-paced tech world, finding the right talent is paramount. For years, HackerEarth has empowered tech recruiters to identify top talent through…

Hackerearth Subscribe

Get advanced recruiting insights delivered every month

View More

Top Products

Hackathons

Engage global developers through innovation

Hackerearth Hackathons Learn more

Assessments

AI-driven advanced coding assessments

Hackerearth Assessments Learn more

FaceCode

Real-time code editor for effective coding interviews

Hackerearth FaceCode Learn more

L & D

Tailored learning paths for continuous assessments

Hackerearth Learning and Development Learn more