Hey there, how are you doing?

In this issue of the **Mathspp Insider** we will talk about three idioms you can use when manipulating iterators and, in particular generators and generator expressions.

# Why write about generators?

My book “Comprehending Comprehensions” was supposed to be a simple book with 100 exercises on list comprehensions.

However, when I started writing it, I realised I had much more to give than just a bland series of 100 exercises.

In the meantime, the book has grown to 240+ exercises and the chapter on generator expressions has been a very challenging chapter to write.

Today, to make up for the fact that I have been taking too long to finish the book, I thought I would share a section of the chapter about generators.

In today's newsletter, that corresponds to a section of the chapter about generator expressions of my book “Comprehending Comprehensions”, we will talk about three (more advanced) idioms that are useful when you are working with generators and generator expressions.

# Generator idioms

Today, you will learn a couple of idioms with generators and predicates:

- count how many elements satisfy a predicate;
- see if all elements satisfy a predicate and, if not, find an element that does not; and
- see if any element satisfies a predicate and, if so, fine an element that does.

Recall that a predicate is just a function that returns `True`

or `False`

, and that typically can be interpreted as a function that asks a question of “yes” or “no”.

To count the number of elements in an iterable that satisfy a predicate, we will use the built-in `sum`

.

We will also use the fact that the type `bool`

is a subclass of `int`

:

`>>> issubclass(bool, int)`

True

This means that Boolean values can also be added together, both with each other and with other numbers:

`>>> False + True`

1

>>> True + True

2

>>> False + False

0

Previously, with list comprehensions, one could determine how many elements from an iterable satisfied a predicate `pred`

by building the list with all the elements that satisfied the predicate and then checking the length of that list.

For example, the code below counts how many words in a file `data/wordlist.txt`

have a length of 10 characters or more:

```
with open("data/wordlist.txt", "r") as f:
words = [line.strip() for line in f if len(line.strip()) >= 10]
print(len(words)) # 67796
```

However, this builds a list of length 67796, which is unnecessary.

Another alternative is to use the built-in `sum`

and use the predicate directly.

The values `True`

and `False`

returned by the predicate will be interpreted as `1`

for elements that satisfy the predicate and `0`

for those that do not, which means that the sum of everything will count how many elements satisfy the predicate:

`with open("data/wordlist.txt", "r") as f:`

print(sum(len(line.strip()) >= 10 for line in f)) # 67796

I think this is a very elegant idiom.

However, my experience tells me this is not a consensual idiom.

Be sure to test the waters before using this idiom in production code that others also maintain.

(How do ** you** feel about this idiom?)

`sum(pred(elem) for elem in iterable)`

counts how many elements of `iterable`

satisfy the predicate `pred`

.The idiom with `sum`

determines *how many* elements satisfy a given predicate.

Sometimes, it suffices to know if there are *any* elements that do satisfy the given predicate or if *all* of the elements satisfy the given predicate.

We know that the built-in functions `any`

and `all`

have a good synergy with generators because the built-ins `any`

and `all`

will stop early if they find what they are looking for.

In particular:

- the built-in
`any`

stops early if it finds a`True`

or a Truthy value; and - the built-in
`all`

stops early if it finds a`False`

or a Falsy value.

We can exploit this early stopping even further in two other situations.

We will start by looking at `any`

.

Assignment expressions (discussed in my **free** book Pydon'ts) can be used with generator expressions and functions that stop early to achieve an interesting effect.

Take a look at the code below:

`>>> any((x ** 2) > 10 for x in range(10))`

True

The generator expression above checks if there are any perfect squares above 10.

The result `True`

says there are.

However, we are none the wiser with regards to the element that the generator expression found that *is* greater than 10.

If we use an assignment expression, we can get insight into that:

`>>> any((sq := x ** 2) > 10 for x in range(10))`

True

>>> sq

16

By using an assignment expression inside the generator expression, we get access to the last element that the generator expression processes.

Because the built-in `any`

stops processing elements as soon as it finds what it needs, we get access to the first element that satisfied the predicate we were working with.

Similarly, an assignment expression inside a generator expression that is the argument to the built-in `all`

will reveal the first element that does not satisfy the predicate.

For example, the code below checks if all of the words in `data/wordlist.txt`

have a length under 20 characters:

`with open("data/wordlist.txt", "r") as f:`

print(all(len(line.strip()) < 20 for line in f)) # False

We know there is at least one word with a length of 20 characters or more, but we have no idea what word that might be.

If we include an assignment expression, we get access to that word:

`with open("data/wordlist.txt", "r") as f:`

print(all(len(w := line.strip()) < 20 for line in f)) # False

print(w) # 'acetylcholoinesterase'

# Conclusion

These are the three idioms I wanted to share with you:

- using
*sum*to count how many elements satisfy a predicate; - using
*any*and an assignment expression to find an element that satisfies a predicate; and - using
*all*and an assignment expression to find an element that does not satisfy a predicate.

Do you think these will be helpful in your Python endeavours?

Thanks for reading, and I'll see you next time!

Rodrigo.