Advanced collections

Day 23: Generators and Generator Expressions

Python guru sitting down with a screen instead of a face, and day 23 displayed on it.

Welcome to day 23 of the 30 Days of Python series! Today we're going to be looking at some ways of creating our own iterators using generators and generator expressions.

We're also going to be looking at an important function called iter which returns an iterator for any iterable we pass to it. This is going to let us confirm a lot of the theory we discussed in yesterday's post, and we're also going to be able to use it to get a deeper understanding of for loops.

This is post is going to build a great deal on what we discussed in yesterday's post, so if you haven't read it yet, I'd recommend you take a look before reading any further.

The iter function

Python has a built in function called iter which returns an iterator for the iterable we provide as an argument.

For example, let's take a simple list of numbers like this:

numbers = [1, 2, 3, 4, 5]

If we pass this list of numbers to iter we'll get back an iterator for that list.

numbers = [1, 2, 3, 4, 5]
numbers_iter = iter(numbers)

print(numbers_iter)  # <list_iterator object at 0x7f57d138af70>

In this case we get a list_iterator object, which is going to let us access values in numbers. Different types have their own iterators which understand how to give us items from those iterables. Getting elements from a dictionary is somewhat different from getting items from a list, after all.

We can use this list_iterator object just like any other iterator. We can pass it to next, for example.

numbers = [1, 2, 3, 4, 5]
numbers_iter = iter(numbers)

print(next(numbers_iter))  # 1
print(next(numbers_iter))  # 2

One interesting question is, what happens when we call iter on the list_iterator?

This is perfectly legal, because iter just expects an iterable, and all iterators are iterables. It also produces an interesting effect.

numbers = [1, 2, 3, 4, 5]
numbers_iter = iter(numbers)

print(numbers_iter is iter(numbers_iter))  # True

We find that passing numbers_iter to the iter function causes iter to return the very same iterator. This might seem odd at first, but it makes quite a lot of sense.

Yesterday we talked about iterators being the means by which we access items in an iterable. When we want to iterate over an iterable, we need to ask for an iterator that knows how to get those values.

If we ask the iterator to give us a way to access those values, it offers up itself, since it's already capable of doing what we want.

Replicating for loops with iter

One cool thing we can do with the iter function is replicate the behaviour of Python's for loop. This is going to give us a little peek at what for loops really do behind the scenes.

Let's use our list of numbers for this example again.

numbers = [1, 2, 3, 4, 5]
numbers_iter = iter(numbers)

We have an iterator, which is an important first step, but we need a couple of other tools to make this work. First, we need a while loop, because we want to loop a potentially infinite number of times. Second, we need a try statement so that we can look out for a StopIteration exception.

numbers = [1, 2, 3, 4, 5]
numbers_iter = iter(numbers)

while True:
    try:
        number = next(numbers_iter)
    except StopIteration:
        break
    else:
        print(number)

And just like that we have a for loop written with while.

We have our loop variable number defined inside the try, and we have the loop body inside the else clause. Once we run out of numbers, the loop is going to terminate, just like we see with a for loop.

This is actually extremely close to how an actual for loop works under the hood. It does request an iterator for whatever we want to iterate over, and it does call next to retrieve values from that iterator. When a StopIteration is raised, Python handles that error by breaking the loop.

This isn't something you should be doing in your production code, but it's an interesting peek behind the curtain that helps us better understand the structures we've been using since week 1.

Generators

Let's leave iter alone for a moment and turn to the topic of creating our own iterators using generators.

There are quite a few ways to create custom iterators in Python, but most of them are beyond the scope of this series. This isn't really much of a limitation though, and we can do a great deal of very complicated things using generators.

The generator syntax is actually going to be very familiar to us, because a generator is actually just a function. The only thing which differentiates a generator from a regular function is a special keyword called yield.

Before we dive into this new yield keyword, let's look at a simple generator example.

def first_hundred():
    for number in range(1, 101):
        yield number

Here I've defined a generator, which is just a special function, and I've called it first_hundred.

We can see from the function body that it has something to do with the numbers 1 to 100 inclusive, and we can probably infer that it's going to give us the first hundred integers, starting with 1.

Let's call our function and see what happens.

def first_hundred():
    for number in range(1, 101):
        yield number

g = first_hundred()
print(g)

If you run this code, we certainly don't get anything like the numbers 1 to 100 printed to the console. We get this generator object:

<generator object first_hundred at 0x7faaa563fc80>

This is actually called a generator iterator, which is what gets returned when we call any function that contains the yield keyword.

As the name would imply, this is an iterator, and we can use it just like any other.

def first_hundred():
    for number in range(1, 101):
        yield number

g = first_hundred()

print(next(g))  # 1
print(next(g))  # 2
print(next(g))  # 3

Important

When we call a generator, it gives us back a new generator iterator. Each of these generator iterators is an independent iterator, so be careful you don't do something like this:

def first_hundred():
    for number in range(1, 101):
        yield number

print(next(first_hundred()))  # 1
print(next(first_hundred()))  # 1
print(next(first_hundred()))  # 1

Each call to first_hundred gave us a new iterator, so we're only getting the first value from each one. You also don't assign the iterator anywhere, so it's not really possible for us to call next on the same iterator again.

The yield keyword

Now that we've seen a generator in action, it's time to talk about what this yield keyword is doing.

We know already that it signals to Python that we're defining a generator, but it also seems to have some role in actually providing the values we want from the resulting generator iterator.

What yield actually does is create a pause in the execution of the function body. When we call next and pass in our generator iterator, the code in the function body is going to run until we hit that yield keyword.

The value after the yield keyword is what we actually want to provide before we pause the execution of the function body. In this way we can think of yield as something like a non-terminating return statement.

We can see all this by adding a few print calls to first_hundred.

def first_hundred():
    print("First value requested\n")

    for number in range(1, 101):
        print("Starting new iteration")
        yield number
        print("Ending this iteration\n")

g = first_hundred()

At this point, nothing is printed. The generator iterator has been created, but we haven't actually tried to access any values. Now let's pass g to next a couple of times.

def first_hundred():
    print("First value requested\n")

    for number in range(1, 101):
        print("Starting new iteration")
        yield number
        print("Ending this iteration\n")

g = first_hundred()

print(next(g))
print(next(g))

Now our output looks like this:

First value requested

Starting new iteration
1
Ending this iteration

Starting new iteration
2

First we get the "First value requested\n" string, and then we enter the for loop. At this point we get a value from the range object, which is assumed to number, and we print the "Starting new iteration" string.

We then encounter the yield keyword which pauses the execution of the function body, and our generator iterator spits out 1, which is the current value of number. This value is returned by the call to next and we print it to the console.

We then call next again, and we continue from where we left off. This means we print the "Ending this iteration\n" string, and we move onto a new iteration of the for loop.

We call the print function at the start of the loop again, and then we hit the yield one more time. We yield the number, which is what next returns once again. This is then printed to the console, just as before.

For this second iteration, you'll note that we don't print the "Ending this iteration\n" string, because yield paused the execution before we reached that point.

If we were to call next again, we'd get this string printed first, before starting a third iteration of the loop.

Note

yield is actually a very complicated keyword, and it can do a great deal more than what we're using it for. We're not going to be covering this additional behaviour in this series, however, because it only has applications in much more advanced code.

I'm mentioning this only so that you know there is more to learn once you're a little further along in your Python career.

Generator Expressions

In addition to creating generator iterators through functions, we can also use generator expressions.

The generator expression syntax is also going to be very familiar to us, because it's exactly the same as the comprehension syntax we say in day 15. The only difference is that we use regular parentheses, rather than square brackets or curly braces.

We can use them very much like comprehensions, but they come with all the benefits of iterators that map and filter provide. If you've wanted to have those benefits, but didn't like the syntax for map and filter, generator expressions are for you.

For example, let's create a simple generator expression that squares every number in a range.

squares = (number ** 2 for number in range(1, 11))

Since squares refers to an iterator, printing it directly doesn't give us anything too useful, but it does at least confirm we're working with a generator iterator.

<generator object <genexpr> at 0x7f33225a0c80>

If we want to get values out, we can either pass it to a for loop, we can destructure it, or we can use next to perform manual iteration.

squares = (number ** 2 for number in range(1, 11))

for square in squares:
    print(square)

squares = (number ** 2 for number in range(1, 11))

print(*squares, sep=", ")

squares = (number ** 2 for number in range(1, 11))

print(next(squares))  # 1
print(next(squares))  # 4
print(next(squares))  # 9

Remember that the values in squares get consumed when we iterate over the iterator, so you need to redefine squares if you want to iterate over it more than once.

Style note

One nice thing about generator expressions is that we can forego the parentheses when we use the generator expression as the sole argument in a function or method.

This is totally legal syntax for example:

total = sum(number ** 2  for number in  range(1,  11))
print(total)  # 385

This helps us reduce nested brackets when they would only hinder readability.

Exercises

1) Write a generator that generates prime numbers in a specified range. You can make use of your solution to exercise 3 from day 8 as a starting point.

2) Below we have an example where map is being used to process names in a list. Rewrite this code using a generator expression.

names = [" rick", " MORTY  ", "beth ", "Summer", "jerRy    "]
names = map(lambda name: name.strip().title(), names)

3) Write a small program to deal cards for a game of Texas Hold'em. The order of the deal is as follows:

  • The deck is shuffled.
  • One card is handed to each player in order.
  • A second card is handed to each player order.

Then comes the more complicated part of the deal.

  • First, the top card of the deck is discarded. This is called the burn.
  • Three cards are then placed in the centre of the table, which is called the flop.
  • Another card is burned, meaning we discard another card from the top of the deck.
  • We add another card to the centre, which is called the turn.
  • We burn another card.
  • Finally, there's the river, where a fifth and final card is added to the centre.

The desired output for the program is something like this:

How many players are there? 2

Player 1 was dealt: (4, hearts), (4, clubs)
Player 2 was dealt: (9, clubs), (jack, diamonds)

The flop: (jack, clubs), (4, diamonds), (king, spades)
The turn: (8, hearts)
The river: (ace, hearts)

As the example would indicate, the program should accept a variable number of players. There must be at least 2 players, and no more than 10.

After the flop, the turn, and the river there's usually a round of betting, so if you want to extend this exercise, you may want to give the user the option to pause at each of these points.

Hint: We can shuffle cards using the random.shuffle method. This shuffles a sequence in-place, which means it modifies the original sequence. We can then create an iterator from that sequence using iter to make is easy for us to retrieve cards one at a time.

You can find documentation for random.shufflehere.