profile

Rodrigo Girão Serrão

🐍📝 working with generators and predicates

published28 days ago
4 min read

Hey there, how are you doing?

In this issue of the Mathspp Insider we will talk about three idioms you can use when manipulating iterators and, in particular generators and generator expressions.


Why write about generators?

My book “Comprehending Comprehensions” was supposed to be a simple book with 100 exercises on list comprehensions.
However, when I started writing it, I realised I had much more to give than just a bland series of 100 exercises.
In the meantime, the book has grown to 240+ exercises and the chapter on generator expressions has been a very challenging chapter to write.

Today, to make up for the fact that I have been taking too long to finish the book, I thought I would share a section of the chapter about generators.
In today's newsletter, that corresponds to a section of the chapter about generator expressions of my book “Comprehending Comprehensions”, we will talk about three (more advanced) idioms that are useful when you are working with generators and generator expressions.

Generator idioms

Today, you will learn a couple of idioms with generators and predicates:

  • count how many elements satisfy a predicate;
  • see if all elements satisfy a predicate and, if not, find an element that does not; and
  • see if any element satisfies a predicate and, if so, fine an element that does.

Recall that a predicate is just a function that returns True or False, and that typically can be interpreted as a function that asks a question of “yes” or “no”.

To count the number of elements in an iterable that satisfy a predicate, we will use the built-in sum.
We will also use the fact that the type bool is a subclass of int:

>>> issubclass(bool, int)
True

This means that Boolean values can also be added together, both with each other and with other numbers:

>>> False + True
1
>>> True + True
2
>>> False + False
0

Previously, with list comprehensions, one could determine how many elements from an iterable satisfied a predicate pred by building the list with all the elements that satisfied the predicate and then checking the length of that list.
For example, the code below counts how many words in a file data/wordlist.txt have a length of 10 characters or more:

with open("data/wordlist.txt", "r") as f:
    words = [line.strip() for line in f if len(line.strip()) >= 10]

print(len(words))  # 67796

However, this builds a list of length 67796, which is unnecessary.
Another alternative is to use the built-in sum and use the predicate directly.
The values True and False returned by the predicate will be interpreted as 1 for elements that satisfy the predicate and 0 for those that do not, which means that the sum of everything will count how many elements satisfy the predicate:

with open("data/wordlist.txt", "r") as f:
print(sum(len(line.strip()) >= 10 for line in f)) # 67796

I think this is a very elegant idiom.
However, my experience tells me this is not a consensual idiom.
Be sure to test the waters before using this idiom in production code that others also maintain.
(How do you feel about this idiom?)

The idiom sum(pred(elem) for elem in iterable) counts how many elements of iterable satisfy the predicate pred.

The idiom with sum determines how many elements satisfy a given predicate.
Sometimes, it suffices to know if there are any elements that do satisfy the given predicate or if all of the elements satisfy the given predicate.
We know that the built-in functions any and all have a good synergy with generators because the built-ins any and all will stop early if they find what they are looking for.
In particular:

  • the built-in any stops early if it finds a True or a Truthy value; and
  • the built-in all stops early if it finds a False or a Falsy value.

We can exploit this early stopping even further in two other situations.
We will start by looking at any.

Assignment expressions (discussed in my free book Pydon'ts) can be used with generator expressions and functions that stop early to achieve an interesting effect.
Take a look at the code below:

>>> any((x ** 2) > 10 for x in range(10))
True

The generator expression above checks if there are any perfect squares above 10.
The result True says there are.
However, we are none the wiser with regards to the element that the generator expression found that is greater than 10.
If we use an assignment expression, we can get insight into that:

>>> any((sq := x ** 2) > 10 for x in range(10))
True
>>> sq
16

By using an assignment expression inside the generator expression, we get access to the last element that the generator expression processes.
Because the built-in any stops processing elements as soon as it finds what it needs, we get access to the first element that satisfied the predicate we were working with.

Similarly, an assignment expression inside a generator expression that is the argument to the built-in all will reveal the first element that does not satisfy the predicate.
For example, the code below checks if all of the words in data/wordlist.txt have a length under 20 characters:

with open("data/wordlist.txt", "r") as f:
print(all(len(line.strip()) < 20 for line in f)) # False

We know there is at least one word with a length of 20 characters or more, but we have no idea what word that might be.
If we include an assignment expression, we get access to that word:

with open("data/wordlist.txt", "r") as f:
print(all(len(w := line.strip()) < 20 for line in f)) # False

print(w) # 'acetylcholoinesterase'

Conclusion

These are the three idioms I wanted to share with you:

  • using sum to count how many elements satisfy a predicate;
  • using any and an assignment expression to find an element that satisfies a predicate; and
  • using all and an assignment expression to find an element that does not satisfy a predicate.

Do you think these will be helpful in your Python endeavours?


Thanks for reading, and I'll see you next time!

Rodrigo.