Generators in python , a tutorial

What is a generator

A generator simply generates data that we are going to use someplace. For example ,if we want to generate some random characters , or if we want to generate a number in a mathematical series , or something that is infinite such as the series of even number , we can use a generator . A generator doesn’t use a lot of space , as data is computed dynamically.

To create a generator in python , we can create a generator function that uses the yield keyword in order to return the data . After that we create a generator object from this generator function to get the data .

We can also create a generator in python by directly creating a generator object using a generator expression .

A generator object can only be used once , it acts like an iterator , and when it has no more data it will raise a StopIteration exception .

Each generator object has a saved state , which is the state of its variables declared inside its generator function or generator expression

Creating the generator function

A generator function in python , is simply a function , that uses the yield keyword in order to return data .

def generateFlower():
    ''' generator function to generate a 
        generator flower object 
    '''
    yield 'water'
    yield 'seed'
    yield 'every day'
    yield 'quantity'
    yield  [3 , 1,2] 
    yield  30
    yield 'flower'

Creating the generator object

To create the generator object, we call the generator function that we have created .

We can access the data from the generator object , by either :

  • using the next function , which will return the data from the generator object, or which will raise a StopIteration exception if there are no more data .
  • or by using the created generator object , in a loop or in any place that takes an iterator , such as the constructor of a list ..
def generateFlower():
    ''' generator function to generate a 
        generator flower object
    '''
    yield 'water'
    yield 'seed'
    yield 'every day'
    yield 'quantity'
    yield  [3 , 1,2] 
    yield  30
    yield 'flower'

>>> flower = generateFlower()
>>> flower 
<generator object generateFlower at 0x106b7d750>
# we have created the generator object

>>> next(flower) 
'water'
# looping through our generator by calling next

>>> next(flower)
'seed'

>>> next(flower)
'every day'

>>> next(flower)
'quantity'

>>> next(flower)
[3, 1, 2]

>>> next(flower)
30

>>> next(flower)
'flower'

>>> next(flower) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
# no more items are available 
# this will raise a StopIteration 

# ##### next  #######

>>> list(flower)
# ##### a generator object can only be used once  #######

>>> list(generateFlower())
['water', 'seed', 'every day', 'quantity', [3, 1, 2], 30, 'flower']
# we created a new generator , since a generator 
# can only be used once and we passed it to 
# list

# ##### list  #######



>>> flower = generateFlower()
>>> next(flower) 
'water'
>>> next(flower)
'seed'
>>> list(flower)
['every day', 'quantity', [3, 1, 2], 30, 'flower']

# ##### next + list  #######

Generator Object state

The state of the variables defined inside the generator function is saved , for each generator object that is created from the generator function .

def generateCartesianProductOfTwoSets(setOne, setTwo):
    '''
        return the cartesian product of setOne by setTwo
        e.g [1,2,3] * [4,5,6] = [(1,4) , (1,5) , (1,6) ,
                                (2,4) , (2,5) , (2,6)  ,
                                (3,4) , (3,5) , (3,6)]
    '''
    lengthSetOne = len(setOne)
    # get the length of set one

    lengthSetTwo = len(setTwo)
    # get the length of set two

    counter = 0
    # initialize the counter to zero

    numberOfTuples = lengthSetOne * lengthSetTwo
    # number of tuples of the cartesian product
    # of setOne by setTwo

    while counter < numberOfTuples:
        # while counter less number of tuples

        indexOfElementInSetOne = counter // lengthSetTwo
        # element in set One is at the index
        # counter // lengthSetTwo
        # e.g  A= a , b ; B = c , d , e =>
        # generate the tuples starting a , next b
        # a.c a.d a.e
        # b.c b.d b.e

        indexOfElementInSetTwo = counter % lengthSetTwo
        # element in set Two is at the index
        # counter % lengthSetTwo

        yield (setOne[indexOfElementInSetOne], setTwo[indexOfElementInSetTwo])
        # return the tuple e.g (a , b)

        counter += 1
        # increment the counter

>>> setOne = [1,2,3]
>>> setTwo = [4,5,6]

>>> lettersOne = 'lb'
>>> lettersTwo = 'aeo'

>>> setOneSetTwoProduct = generateCartesianProductOfTwoSets(setOne , setTwo)

>>> lettersOneLettersTwoProduct = generateCartesianProductOfTwoSets(lettersOne , lettersTwo)

>>> next(setOneSetTwoProduct)
(1, 4)

>>> next(lettersOneLettersTwoProduct)
('l', 'a')

>>> next(lettersOneLettersTwoProduct)
('l', 'e')

>>> next(setOneSetTwoProduct) 
(1, 5)

the lengthSetOne , lengthSetTwo , counter , numberOfTuples, setOne, setTwo state is saved for each generator object created from the generateCartesianProductOfTwoSets generator function .

What is a generator expression

A generator expression is used to create a generator object from a comprehension such as a list comprehension or dictionary comprehension … using parenthesis .

Instead of applying the comprension on all the elements at once , the comprehension is applied as needed .

opOnData = ( data * data for data in [1,2,3,4,5,6,7,])
# Create a generator object using a generator expression 
# A generator expression create a generator object from
# a comprehension , such as list dictionary ...

>>> next(opOnData)
1

>>> next(opOnData)
4
# instead of applying the comprehension on all the 
# data , we apply it when we want or when needed

>>> [ data * data for data in [1,2,3,4,5,6,7,]]
[1, 4, 9, 16, 25, 36, 49]
# applying all comprehension on a list 
# calculate the square

The generator object created by a generator expression can only be used once .
>>> opData = ( data.upper() for data in ('a','b','c','d'))

>>> next(opData)
'A'
>>> next(opData)
'B'
>>> next(opData)
'C'
>>> next(opData)
'D'
>>> next(opData)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration