Effective Python #4 | Comprehensions and Generators


Introduction

  • Python provides a special syntax, called comprehensions, for succinctly iterating through these types (list, dict, set) and creating derivative data structures

  • This style of processing is extended to functions with generators, which enable a stream of values to be incrementally returned by a function.



Item 27. Use Comprehensions Instead of map and filter

  1. Comprehensions support multiple levels of loops and multiple conditions per loop level.

  2. Comprehensions with more than two control subexpressions are very difficult to read and should be avoided. => Use normal if and for statements and write a helper function (Item30)

# multiple levels of looping
flat = [x for row in matrix for x in row]

# multiple if conditions 
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [x for x in a if x > 4 if x % 2 == 0]
c = [x for x in a if x > 4 and x % 2 == 0]


Item 28: Avoid More Than Two Control Subexpressions in Comprehensions

  1. Functions that return None to indicate special meaning are error prone because None and other values (e.g., zero, the empty string) all evaluate to False in conditional expressions.

  2. Raise exceptions to indicate special situations instead of returning None. Expect the calling code to handle exceptions properly when they’re documented

  3. Type annotations can be used to make it clear that a function will never return the value None, even in special situations.

# before
def careful_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return None
# after
def careful_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError as e:
        raise ValueError('Invalid inputs')


Item 29: Avoid Repeated Work in Comprehensions by Using Assignment Expressions (:=)

  1. Assignment expressions make it possible for comprehensions and generator expressions to reuse the value from one condition elsewhere in the same comprehension, which can improve readability and performance.

  2. Although it’s possible to use an assignment expression outside of a comprehension or generator expression’s condition, you should avoid doing so.

# get_batches(stock.get(name, 0)) expression is repeated.
found = {name: get_batches(stock.get(name, 0))
         for name in order
         if get_batches(stock.get(name, 0))}

# use the walrus operator
found = {name: batches for name in order
         if (batches := get_batches(stock.get(name, 0), 8))}


Item30 : Consider Generators Instead of Returning Lists.

  1. Using generators can be clearer than the alternative of having a function return a list of accumulated results.

  2. The iterator returned by a generator produces the set of values passed to yield expressions within the generator function’s body.

  3. Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn’t include all inputs and outputs.

# Returning Lists : can cause a program to run out of memory
def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index + 1)
    return result

# Generators
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index + 1


Item31. Be Defensive When Iterating Over Arguments (list)

  1. Beware of functions and methods that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.

  2. Python’s iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expressions.

  3. You can easily define your own iterable container type by implementing the __iter__ method as a generator.

  4. You can detect that a value is an iterator (instead of a container) if calling iter on it produces the same value as what you passed in. Alternatively, you can use the isinstance built-in function along with the collections.abc.Iterator class.

# Before : iterate over input iterator multiple times.
def normalize(numbers):
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

# Problem : create iterator (Item30)
def read_visits(data_path):
    with open(data_path) as f:
        for line in f:
            yield int(line)
it = read_visits('my_numbers.txt')
print(list(it)) # >>> [15, 35, 80]
print(list(it)) # >>> [] # Already exhausted

## After1 : Defensively copies the input iterator => input could be extremely large (OOM)
def normalize_copy(numbers):
    numbers_copy = list(numbers)  # Copy the iterator
    total = sum(numbers_copy)
    result = []
    for value in numbers_copy:
        percent = 100 * value / total
        result.append(percent)
    return result

## After2 : accept a funtion that returns a new iterator and write your own iterable container
def normalize_func(get_iter):
    total = sum(get_iter())   # New iterator
    result = []
    for value in get_iter():  # New iterator
        percent = 100 * value / total
        result.append(percent)
    return result

class ReadVisits:
    def __init__(self, data_path):
        self.data_path = data_path
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)
visits = ReadVisits(path)
percentages = normalize_func(visits)

# After3 : Be more defensive
def normalize_defensive(numbers):
    if isinstance(numbers, Iterator):  # Another way to check
        raise TypeError('Must supply a container')
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result


Item32. Consider Generator Expressions for Large List Comprehensions

  • List comprehension may cause problems for large inputs by using too much memory.

  • Generator expressions avoid memory issue by producing outputs one at a time as iterators

  • Generator expressions can be composed by passing the iterator from one generator expression into the for subexpression of another.

  • Generator expressions execute very quickly when chained together and are memory efficient

# before  
value = [len(x) for x in open('my_file.txt')] 
print(value) # >>> [100, 57, 15, 1, 12, 75, 5, 86, 89, 11]

# after
it = (len(x) for x in open('my_file.txt'))
print(it) # >>> <generator object <genexpr> at 0x108993dd0>
print(next(it)) # >>> 100
print(next(it)) # >>> 57

# generators can be composed together
roots = ((x, x**0.5) for x in it)


Item33. Compose Multiple Generators with yield from

  1. The yield from expression allows you to compose multiple nested generators together into a single combined generator

  2. yield from provides better performance than manually iterating nested generators and yielding their outputs

# example generator
def move(period, speed):
    for _ in range(period):
        yield speed
def pause(delay):
    for _ in range(delay):
        yield 0

# before
def animate():
    for delta in move(4, 5.0):
        yield delta
    for delta in pause(3):
        yield delta
    for delta in move(2, 3.0):
        yield delta

# after
def animate_composed():
    yield from move(4, 5.0)
    yield from pause(3)
    yield from move(2, 3.0)


Item 34. Avoid Injecting Data into Generators with send

  1. The send method can be used to inject data into a generator by giving the yield expression a value that can be assigned to a variable.

  2. Using send with yield from expressions may cause surprising behavior, such as None values appearing at unexpected times in the generator output

  3. Providing an input iterator to a set of composed generators is a better approach than using the send method, which should be avoided



Item 35. Avoid Causing State Transitions in Generators with throw

  1. The throw method can be used to re-raise exceptions within generators at the position of the most recently executed yield expression.
  2. Using throw harms readability because it requires additional neting and boilerplate in order to raise and catch exceptions
  3. A better way to provide exceptinal behavior in generators is to use a class that implements the __iter__ method along with methods to cause exceptional state transitions.


Item 36. Consider itertools for Working with Iterators and Generators

  1. The itertools functions fall into three main categories for working with iterators and generators

    • linking iterators together

    • filtering items they output

    • producting combinations of items

  2. There are more advanced functions, additional parameters, and useful reciples available in the documentation at help(itertools)