Big Data Interview Questions on Spark, SQL, Python, Data Modeling, and Data Warehouse, Data Structure & Algorithm

This is part four of the data engineer interview series. If you have not already, I highly recommend you to read the introduction section of Part I to build an understanding of my career background.

In this blog, I will discuss two questions each on Python and Spark.

Python

1. What is a translator? What are the different types of translators? Why Python is called an interpreter language?

A translator…


Big Data Interview Questions on Spark, SQL, Python, Data Modeling, and Data Warehouse, Data Structure & Algorithm

This is part three of the data engineer interview series. If you have not already, I highly recommend you to read the introduction section of Part one to build an understanding of my career background.

In this blog, I will discuss two questions on each topics Python, Spark, and SQL.

Python

Q. What is polymorphism and how do you implement it in Python?


Big Data Interview Questions Spark, SQL, Python, Data Modeling, and Data Warehouse, Data Structure & Algorithm

This is part two of the data engineer interview series. If you have not already, I highly recommend you to read the introduction section of Part one to build an understanding of my career background.

In this blog, I will discuss two questions on each topics Python, Spark, and SQL.

Python

Q. What is self in Python?


Big Data Interview Questions Spark, SQL, Python, Data Modeling, and Data Warehouse, Data Structure & Algorithm

I am a data engineering with 2.4 Years of experience. During the last 4 months, I have attended 75 interview sessions for the role of data engineering with 26 different companies.

Amazon, ANZ, Apisero, Aviyel, Amagi, Busigence, BCG, BitClass, couture.ai, Fractal, Flipkart, Indeed, Healthplix, Lead School, Lumiq, Moveworks, Nagarro, Novo…


Query full dynamo DB table beyond 1MB with filters in place

In this blog, we will see examples of how to do a full query of a dynamo DB table. We will also dig into the concepts of applying filters and conditions to obtain only desirable records.

Why?

The question is why do we need to scan the DB if we can…


Using the boto3 prefix in Python we will extract all the keys of an s3 bucket at the subfolder level.

In this blog, we will see how to extract all the keys of an s3 bucket at the subfolder level and keys with specific extension. Along with this, we will also cover different examples with the boto3 client and resource. …


Wait, I will join in a minute!

There is nothing more to understand than looking at the two images below.


JSON CloudFormation template for SQS integrated with Lambda Triggers

SQS is a message queueing service by AWS which accepts messages from one service(say S3) and passes them to another service(AWS lambda in this case). …


Setup ODBC connection to Redshift to make SQL queries to Redshift

In this blog, we will see how to set up an ODBC connection to Redshift Database and use this connection to query the table from PowerBI. …


Errors while running Glue job, crawler, or connection

Glue is a managed and serverless ETL offering from AWS. Many a time while setting up Glue jobs, crawler, or connections you will encounter unknown errors that are hard to find on the internet. Here in this blog, I have captured 5 different error scenarios. …

Aman Ranjan verma

Engineer who loves forest, mountains, and general science. https://www.linkedin.com/in/ar-verma/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store