profile

Rodrigo 🐍🚀

🐍🚀 what's the point of generators?

published2 months ago
4 min read

Hey Reader, how are you doing?

In this issue of the Mathspp Insider 🐍🚀 we will talk about what generators are and why they matter.
I will also say a few words about ChatGPT and how I have been using it to boost my productivity.


Filtering some transactions

Loading the CSV data

My free book “Pydon'ts – Write elegant Python” code has been downloaded over 17,000 times from Gumroad (yay, me! 🤣).
Most people choose to download it for free (that is why I make it available for free) but on occasion, some people pay money for it.

Given the CSV file with all transactions, how could I find the paid ones?

Using the module csv from the standard library, I could write something like this:

import csv

BUYER_COL = ...  # index of the buyer name column
PRICE_COL = ...  # index of the price paid column

def get_paid_sales(path):
    with open(path, "r") as csv_file:
        reader = csv.Reader(csv_file)
        data = list(reader)

    paid_sales = [sale[BUYER_COL] for sale in data if sale[PRICE_COL] > 0]

for paid_sale_user in get_paid_sales("pydont_sales.csv"):
    # Do something.

This code loads in all the data, it uses a list comprehension to get the names of the buyers that supported the development of the book, and then returns them.

What is an issue with this code, though?

Holding everything in memory

The suboptimal thing with the code above is that we are loading all of the CSV file into memory without needing to.
If we are going to discard all of the free downloads, we can do that directly when reading the file for the first time.

In a CSV file with 17,000 rows, this may not make a huge difference, but that does not mean we cannot be mindful of how we write our code.

So, an improvement we could make to our code would be to filter out the free downloads directly:

import csv

BUYER_COL = ...  # index of the buyer name column
PRICE_COL = ...  # index of the price paid column

def get_paid_sales(path):
    with open(path, "r") as csv_file:
        reader = csv.Reader(csv_file)
        paid_sales = [sale[BUYER_COL] for sale in reader if sale[PRICE_COL] > 0]
    return paid_sales

for paid_sale_user in get_paid_sales("pydont_sales.csv"):
    # Do something.

But we could make this code even more elegant.

Suppose that I only wanted the most recent paid sales, because I periodically check who paid for the book out of curiosity.
Now, my code is going through all the sales and returning all the paid ones, while I only need to look at some.
What is more, I only need to look at a single sale at a time...

Working with a single sale at a time

Instead of writing a function get_paid_sales that returns all paid sales, I could write a function that goes through the sales in the CSV and gives back a paid user at a time.
That is what generators do.

They are like for loops that build lists of results, but instead of returning everything all at once, they “return” a single item at a time.
However, we cannot use the keyword return because it already means “exit this function with this result”, so we use a new keyword yield to mean “temporarily pause this function with this intermediate result”:

import csv

BUYER_COL = ...  # index of the buyer name column
PRICE_COL = ...  # index of the price paid column

def get_paid_sales(path):
    with open(path, "r") as csv_file:
        reader = csv.Reader(csv_file)
        for sale in reader:
            if sale[PRICE_COL] > 0:
                yield sale[BUYER_COL]  # <-- using `yield`

for paid_sale_user in get_paid_sales("pydont_sales.csv"):
    # Do something.

This is one of the greatest advantages of generators!
Because they “return” one item at a time (because they yield one item at a time):

  • you get huge potential savings memory-wise – because you do not have to hold everything in memory all at once; and
  • you get huge potential savings in computation – because you do not need to compute everything upfront and you may end up quitting early.

A simple generator

Generators can look intimidating, but they really are not.
As soon as you get used to the keyword yield, you will see how easy it is to work with them!

If you want a simple generator to play around with, take this one:

def squares(n):
    for i in range(n):
        yield i ** 2

This generator will produce square numbers up to the limit provided.
Try adding calls to print inside that generator and then use it in a for loop, to see when things get printed.

Then, try using squares in a loop that exits early with a break statement.
How many print calls from within squares did you see?

Generators everywhere

The Python language and the standard library is riddled with generators.
Some examples of generators and generator-like tools include:

Generators are being covered in my book “Comprehending Comprehensions” and I am adding exercises about them, so keep an eye out for that!


Using ChatGPT to boost my productivity

Yesterday, I wrote a short article on my blog about how I have been using ChatGPT for quick prototyping.
The article shows a couple of interactions I had with ChatGPT, where I used it to help me create new features for my blog.
The article also includes three clear tips that I have been using to get the best results out of ChatGPT.

I would be very interested in your feedback regarding the article, and I would also like to hear about your experiences with ChatGPT and other generative models.

Feel free to reply to this email with your feedback, stories, and usage examples!


A note about last week’s issue

Last week, I wrote about the statements try, except, else, and finally.
I want to add a clarification to a sentence I wrote, which is inaccurate.

By the end of the issue, in the section “Mix and match statements”, I wrote

The except, else, and finally, statements are all optional.
You can have some and not the others.

This has the caveat that the statement else can only be present if the statement except is also present.

Thanks to the reader who pointed this issue out!


Thanks for reading, and I'll see you next time! 👋

Rodrigo.

Rodrigo 🐍🚀

Taking your Python 🐍 skills to the next level 🚀