Hey there, how are you doing?
In this issue of the Mathspp Insider we will see how you can find the longest word in a list of words...
But we will do that by going through the journey of self-improvement that a Python beginner would experience.
So, buckle up! It's time to learn.
Here is a small challenge for you:
Give it some thought before you keep on reading.
A beginner's solution
The problem that I posed is not what I would call a complicated problem.
Sure, it is also unfair if I say “this is really easy” and you just started out coding, but trust me when I say this is not a very complex task.
After giving it some thought, a typical beginner would probably write something like this:
def get_longest(words): longest = "" for idx in range(len(words)): if len(words[idx]) > len(longest): longest = words[idx] return longest
Now, this is an excellent solution already.
To be completely honest (and I'm basing this on my personal experience teaching Python), a beginner tends to not name variables very well and to play fast and loose with the whitespace.
This is understandable, but also something that is very much worth fighting against.
Using proper names makes programs much easier to understand and I also strongly believe that elegance matters in programming and being mindful of how you use whitespace plays an important part in that.
So, if you wrote something like this, how can you improve it?
Is there room for improvement?
In what follows, I will present successive suggestions of improvement.
You will not agree with everything I suggest, which is normal.
After you are done reading, reply with your own suggestions and your opinion on the things I did.
Save the length of the longest word
One “issue” with the code above is that we keep computing the length of the longest word over and over.
The function len is not terribly slow, so this is not a terrible issue, but it is good to pay attention to situations in which you can save computations by storing values that are relevant.
In this case, we can keep track of the length of the longest word so far and use that instead:
def get_longest(words): longest = "" length = 0 for idx in range(len(words)): if len(words[idx]) > length: longest = words[idx] length = len(longest) return longest
This will save us some computations... On average.
To be absolutely sure that we are saving computations, we actually need to save the length of the current word in an auxiliary variable and use that instead:
def get_longest(words): longest = "" length = 0 for idx in range(len(words)): this_length = len(words[idx]) if this_length > length: longest = words[idx] length = this_length return longest
With this version, it is guaranteed that we cut to half the number of times the function len is called.
Again, the function len isn't necessarily a function that consumes a lot of resources, but there are definitely cases in which you want to be mindful of all the function calls you make and knowing how to save important values for later use is quite important.
Python iterables and for loops
The next improvement ties with how for loops work in Python.
Take a look at the loop written above.
The loop starts with for idx in ... which lets you iterate over all the indices that cover all the words available in the list of words...
But what do you need the indices for?
The indices are only being used to index into the list and access the current word, so you do not really care about the indices...
What you care about are the words!
Loops in Python let you iterate directly over things like lists, strings, dictionaries, etc.
These things you can iterate over are called iterables, and that means you do not need to use indices if what you want are the indices.
This means we can rewrite the function as such:
def get_longest(words): longest = "" length = 0 for word in words: this_length = len(word) if this_length > length: longest = word length = this_length return longest
By iterating directly over the list of words we can immediately give a meaningful name to each element of the list (to each word) and that makes the program easier to follow.
There is another benefit to iterating directly over the list of words, which is that the function no longer needs the variable words to be a list.
As it is, the function get_longest can accept any iterable (instead of lists only) that contain words.
Grouping related assignments
The next refactoring I would like to suggest is the grouping of related assignments.
In two different moments (right at the start of the function and inside the if statement) we assign to the variables longest and length.
The two assignments could be happening at the same time by coincidence, or just because it was convenient to do both at the same time, but in this case the reason is stronger: the two assignments happen at the same time because the two variables are supposed to be related to one another.
Can you say what is the relationship that the variables longest and length satisfy?
During the execution of the function, we should always have len(longest) == length.
So, it is clear that the two variables are connected by something stronger than the coincidence of being changed close to one another.
When this happens, when two or more variables are strictly related to one another, I like to group their assignments on the same line.
Something like this:
def get_longest(words): longest, length = "", 0 for word in words: this_length = len(word) if this_length > length: longest, length = word, this_length return longest
By grouping these two assignemnts together, you make it easier for the reader to understand that the two assignments are related.
The fact that they are both on the same line also makes it easier to read them as a single unit that corresponds to “update the information about the longest word”.
Computing all lengths in batch
One thing that bothers me a bit is that the assignment this_length = len(word) inside the for loop looks bad.
I am not very good at describing what I feel, but I do feel it looks ugly, so I look for ways of removing it.
One thing that comes to mind is computing the lengths of all the words beforehand, and then iterating over those as well.
This could be done in several ways.
For example, one could use a list comprehension to compute all lengths before the loop and then use the built-in zip to traverse words and corresponding lengths at the same time:
def get_longest(words): longest, length = "", 0 word_lengths = [len(word) for word in words] for word, word_length in zip(words, word_lengths): if word_length > length: longest, length = word, word_length return longest
(If you do not know list comprehensions, let me tell you my book “Comprehending Comprehensions” teaches you list comprehensions from A to Z with 200+ practical exercises.)
By computing all lengths beforehand, I feel like the for loop now has more space to express its purpose, which is to find the longest word of all.
Another alternative way of computing all lengths beforehand is by using the built-in map:
def get_longest(words): longest, length = "", 0 for word, word_length in zip(words, map(len, words)): if word_length > length: longest, length = word, word_length return longest
In my eyes, the version with map has a couple of advantages:
- removes the need for another auxiliary variable; and
- does not compute all the lengths ahead of time because map is lazy.
Again, the laziness of map is also discussed in the book “Comprehending Comprehensions”, so I will not write a lot about it now.
Of course, some might argue against the version with map because you have the call to map inside zip, but I personally like it.
Getting clever with sorting tuples
The next version I am about to show is great, but at the same time it isn't.
The next version is great because it exploits the way tuples are ordered in Python to give us our answer with less work from our part, but it is not so great because it does so in a slightly obfuscated way.
Here is what I mean:
def get_longest(words): max_pair = max([(len(word), word) for word in words]) return max_pair
This version creates pairs with each word and its length, but the length is first because of the way tuples are ordered in Python.
In Python, two tuples are compared item by item.
So, for two tuples that start with integers, Python starts by comparing the two integers to determine which tuple is “greater” than the other.
If the two integers at the index 0 match, then Python looks at the index 1, and then index 2, etc.
In this case, we are always comparing pairs that have a length and then a word, so Python will determine that the “maximum” tuple is going to be the one that has the largest integer in the first position, which is the length of the longest word.
Thus, after finding the value of max_pair, we can return the second value because that will be the word we care about.
To make this slightly more idiomatic, we can use unpacking to give a name to the part of the tuple that we care about.
While we are at it, we can also use a generator expression instead of a list comprehension:
def get_longest(words): _, longest = max((len(word), word) for word in words) return longest
(Yes, I also explain how generator expressions work in “Comprehending Comprehensions” 🤣.)
I am sure some of you are squirming at the sight of this solution.
Some might argue we have abused the way tuples are sorted to find the longest word.
I will not argue with you, I'll just segue straight to the next solution...
The best solution
The best solution is just lovely and really makes good use of Python's built-ins.
I will write about the best solution to this problem in next week's issue.
Meanwhile, you will have time to think about it and try to come up with the best solution yourself.
If you find a good solution or if you have any comments regarding the different versions I showed here (for example, if you disagree or if you would have done things differently), just reply to this email and we can talk about it!
I am looking forward to your emails!
Thanks for reading, and I'll see you next time!