Effective Python #4 | Comprehensions and Generators


Introduction

  • Python provides a special syntax, called comprehensions, for succinctly iterating through these types (list, dict, set) and creating derivative data structures

  • This style of processing is extended to functions with generators, which enable a stream of values to be incrementally returned by a function.



Item 27. Use Comprehensions Instead of map and filter

  1. List comprehensions are clearer than the map and filter built-in functions because they don’t require lambda expressions.

  2. List comprehensions allow you to easily skip items from the input list, a behavior that map doesn’t support without help from filter.

  3. Dictionaries and sets may also be created using comprehensions.



Item 28: Avoid More Than Two Control Subexpressions in Comprehensions ****

  • Comprehensions support multiple levels of loops and multiple conditions per loop level.

  • Comprehensions with more than two (>2) control subexpressions are very difficult to read and should be avoided.

    • two conditions, two loops, or one condition and one loop

    • Use normal if and for statements and write a helper function (Item30)

# multiple levels of looping
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]

# multiple if conditions 
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [x for x in a if x > 4 if x % 2 == 0]
c = [x for x in a if x > 4 and x % 2 == 0]


Item 29: Avoid Repeated Work in Comprehensions by Using Assignment Expressions (:=)

  1. Assignment expressions make it possible for comprehensions and generator expressions to reuse the value from one condition elsewhere in the same comprehension, which can improve readability and performance.

  2. Although it’s possible to use an assignment expression outside of a comprehension or generator expression’s condition, you should avoid doing so.

# get_batches(stock.get(name, 0)) expression is repeated.
found = {name: get_batches(stock.get(name, 0))
         for name in order
         if get_batches(stock.get(name, 0))}

# use the walrus operator
found = {name: batches for name in order
         if (batches := get_batches(stock.get(name, 0), 8))}


Item30 : Consider Generators Instead of Returning Lists.

  1. Using generators can be clearer than the alternative of having a function return a list of accumulated results.

  2. The iterator returned by a generator produces the set of values passed to yield expressions within the generator function’s body.

  3. Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn’t include all inputs and outputs.

# Returning Lists : can cause a program to run out of memory
def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index + 1)
    return result

# Generators
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index + 1


Item31. Be Defensive When Iterating Over Arguments (list)

  1. Beware of functions and methods that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.

  2. Python’s iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expressions.

  3. You can easily define your own iterable container type by implementing the __iter__ method as a generator.

  4. You can detect that a value is an iterator (instead of a container) if calling iter on it produces the same value as what you passed in. Alternatively, you can use the isinstance built-in function along with the collections.abc.Iterator class.

# Before : iterate over input iterator multiple times.
def normalize(numbers):
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

# Problem : create iterator (Item30)
def read_visits(data_path):
    with open(data_path) as f:
        for line in f:
            yield int(line)
it = read_visits('my_numbers.txt')
print(list(it)) # >>> [15, 35, 80]
print(list(it)) # >>> [] # Already exhausted

## After1 : Defensively copies the input iterator => input could be extremely large (OOM)
def normalize_copy(numbers):
    numbers_copy = list(numbers)  # Copy the iterator
    total = sum(numbers_copy)
    result = []
    for value in numbers_copy:
        percent = 100 * value / total
        result.append(percent)
    return result

## After2 : accept a funtion that returns a new iterator and write your own iterable container
def normalize_func(get_iter):
    total = sum(get_iter())   # New iterator
    result = []
    for value in get_iter():  # New iterator
        percent = 100 * value / total
        result.append(percent)
    return result

class ReadVisits:
    def __init__(self, data_path):
        self.data_path = data_path
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)
visits = ReadVisits(path)
percentages = normalize_func(visits)

# After3 : Be more defensive
def normalize_defensive(numbers):
    if isinstance(numbers, Iterator):  # Another way to check
        raise TypeError('Must supply a container')
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result


Item32. Consider Generator Expressions for Large List Comprehensions

  • List comprehension may cause problems for large inputs by using too much memory.

  • Generator expressions avoid memory issue by producing outputs one at a time as iterators

  • Generator expressions can be composed by passing the iterator from one generator expression into the for subexpression of another.

  • Generator expressions execute very quickly when chained together and are memory efficient

# before  
value = [len(x) for x in open('my_file.txt')] 
print(value) # >>> [100, 57, 15, 1, 12, 75, 5, 86, 89, 11]

# after : generator expressions using ()
it = (len(x) for x in open('my_file.txt'))
print(it) # >>> <generator object <genexpr> at 0x108993dd0>
print(next(it)) # >>> 100
print(next(it)) # >>> 57

# generators can be composed together
roots = ((x, x**0.5) for x in it)


Item33. Compose Multiple Generators with yield from

  1. The yield from expression allows you to compose multiple nested generators together into a single combined generator

  2. yield from provides better performance than manually iterating nested generators and yielding their outputs

# example generator
def move(period, speed):
    for _ in range(period):
        yield speed
def pause(delay):
    for _ in range(delay):
        yield 0

# before
def animate():
    for delta in move(4, 5.0):
        yield delta
    for delta in pause(3):
        yield delta
    for delta in move(2, 3.0):
        yield delta

# after
def animate_composed():
    yield from move(4, 5.0)
    yield from pause(3)
    yield from move(2, 3.0)


Item 34. Avoid Injecting Data into Generators with send

  1. The send method can be used to inject data into a generator by giving the yield expression a value that can be assigned to a variable.

  2. Using send with yield from expressions may cause surprising behavior, such as None values appearing at unexpected times in the generator output

  3. Providing an input iterator to a set of composed generators is a better approach than using the send method, which should be avoided



Item 35. Avoid Causing State Transitions in Generators with throw

  1. The throw method can be used to re-raise exceptions within generators at the position of the most recently executed yield expression.
  2. Using throw harms readability because it requires additional neting and boilerplate in order to raise and catch exceptions
  3. A better way to provide exceptinal behavior in generators is to use a class that implements the __iter__ method along with methods to cause exceptional state transitions.


Item 36. Consider itertools for Working with Iterators and Generators

  1. The itertools functions fall into three main categories for working with iterators and generators

    • Linking iterators together

      • chain : Use chain to combine multiple iterators into a single sequential iterator

      • repeat: Use repeat to output a single value forever, or use the second parameter to specify a maximum number of times

      • tee: Use tee to split a single iterator into the number of parallel iterators specified by the second parameter.

      • zip_longest: This variant of the zip built-in function returns a placeholder value when an iterator is exhausted, which may happen if iterators have different lengths

    • Filtering items they output

      • takewhile : returns items from an iterator until a predicate function returns False for an item

      • dropwhile: returns items from an iterator until a predicate function returns False for an item

    • Producting combinations of items

      • accumulate: folds an item from the iterator into a running value by applying a function that takes two parameters.

      • product: returns the Cartesian product of items from one or more iterators

        • itertools.product([1, 2], ['a', 'b']) => [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
      • permutations : returns the unique ordered permutations of length N with items from an iterator

      • combinations: returns the unordered combinations of length N with unrepeated items from an iterator

  2. There are more advanced functions, additional parameters, and useful reciples available in the documentation at help(itertools)