Asking questions, chopping up data, and possibly even optimizing a few queries are all part of your hands-on SQL experience. You will quickly learn, however, that asking questions is just a part of the work. You will require something more adaptable to gather data, clean it, and automate your tasks.

Let us look at Python.

Python is how you make things happen with data, if SQL is how you talk to it. When SQL reaches its limits, most data engineers turn to this tool, which serves as a bridge between ideas and action.

Python: Why?

Python is not only well-liked but also useful. It is easy for beginners to understand, has a vast ecosystem designed for data, and is readable.

Let us say you have data within an API, on a website, or in an Excel sheet. You will not be using SQL to query that. What about Python? Just a few lines of code can take care of all of that and more.

Additionally, once Python has your data, you can:

Python is not a substitute for SQL. They have a partnership. SQL retrieves the information. Python reshapes, moves, and transforms it into something useful.

Start Easy: Your Initial Script

Here’s what a basic data engineering script might do:

  1. Load a CSV file of sales data
  2. Filter only the orders from the past week
  3. Calculate the total revenue
  4. Save the result in a new file or send it in an email

You’ll likely use the pandas library here — it’s the go-to for anything involving rows, columns, or tabular data.

import pandas as pd

df = pd.read_csv(“sales.csv”)

recent = df[df[‘date’] >= ‘2025-04-01’]

total = recent[‘amount’].sum()

print(f”Total revenue this week: ${total}”)

Simple, right? But powerful.

Handling Real-World Data: It Gets Messy

Data in the real world isn’t clean. Dates are in the wrong format. Some rows are missing. There are duplicates, typos, or entire columns you didn’t expect.

This is where Python really shines. With tools like:

You can drop invalid rows, standardize formats, extract information, and handle edge cases. Learning to clean data in Python is like learning to use a good knife in the kitchen — it makes everything smoother.

Beyond Files: APIs, Databases, and the Web

You will eventually wish to obtain data from sources other than your personal computer. This is where websites or services that let you request data through APIs are useful.

Almost any API can be accessed by Python using the requests library.

request imports

response = requests.get(“https://api.exchangerate-api.com/v4/latest/USD”)

data = response.json()

print(data[“rates”][“EUR”])  # Get the USD to EUR exchange rate

Now imagine automating this daily. Storing it in a database. Creating a dashboard that tracks currency trends. That’s data engineering in action.

When you are ready, Python can also work with cloud services like AWS or GCP and establish direct connections to databases like Postgres or MySQL.

Automation and Scheduling: Taking Over on Autopilot

Making your scripts run without you is the next step after they are functional.

It is possible to:

Here’s a simple example of scheduling with Python:

import schedule

import time

def job():

    print(“Running daily job…”)

schedule.every().day.at(“08:00”).do(job)

while True:

    schedule.run_pending()

    time.sleep(1)

This might be your first step into pipelines — a core part of any data engineer’s work.

A Quick Note on Jupyter Notebooks

Jupyter Notebooks are great for experimenting. They let you mix code, notes, and visuals in the same place. Perfect for:

But for production workflows? Stick to .py scripts and version control.

Conclusion: You Do not Have to Be an Expert in Everything

You do not have to master Python. You do not have to commit every library and function to memory. What you do require is sufficient understanding to:

With practice, the rest will become clear.

The toolkit you use is Python. Not every tool will be used daily. However, being aware of their existence and knowing how to retrieve them offers you flexibility and independence.

Coming Up Next: Where Does Data Live?

In the next post, we’ll explore data storage — from relational databases to modern cloud warehouses and data lakes.

Because once you’ve got your data, you need a place to put it.

Stay curious — and keep building.

Leave a Reply

Your email address will not be published. Required fields are marked *

Embrace the Success

Take a First Step