Introduction
-
Python provides a special syntax, called comprehensions, for succinctly iterating through these types (list, dict, set) and creating derivative data structures
-
This style of processing is extended to functions with generators, which enable a stream of values to be incrementally returned by a function.
Item 27. Use Comprehensions Instead of map
and filter
-
Comprehensions support multiple levels of loops and multiple conditions per loop level.
-
Comprehensions with more than two control subexpressions are very difficult to read and should be avoided. => Use normal
if
andfor
statements and write a helper function (Item30)
# multiple levels of looping
flat = [x for row in matrix for x in row]
# multiple if conditions
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [x for x in a if x > 4 if x % 2 == 0]
c = [x for x in a if x > 4 and x % 2 == 0]
Item 28: Avoid More Than Two Control Subexpressions in Comprehensions
-
Functions that return None to indicate special meaning are error prone because None and other values (e.g., zero, the empty string) all evaluate to False in conditional expressions.
-
Raise exceptions to indicate special situations instead of returning None. Expect the calling code to handle exceptions properly when they’re documented
-
Type annotations can be used to make it clear that a function will never return the value None, even in special situations.
# before
def careful_divide(a, b):
try:
return a / b
except ZeroDivisionError:
return None
# after
def careful_divide(a, b):
try:
return a / b
except ZeroDivisionError as e:
raise ValueError('Invalid inputs')
Item 29: Avoid Repeated Work in Comprehensions by Using Assignment Expressions (:=
)
-
Assignment expressions make it possible for comprehensions and generator expressions to reuse the value from one condition elsewhere in the same comprehension, which can improve readability and performance.
-
Although it’s possible to use an assignment expression outside of a comprehension or generator expression’s condition, you should avoid doing so.
# get_batches(stock.get(name, 0)) expression is repeated.
found = {name: get_batches(stock.get(name, 0))
for name in order
if get_batches(stock.get(name, 0))}
# use the walrus operator
found = {name: batches for name in order
if (batches := get_batches(stock.get(name, 0), 8))}
Item30 : Consider Generators Instead of Returning Lists.
-
Using generators can be clearer than the alternative of having a function return a list of accumulated results.
-
The iterator returned by a generator produces the set of values passed to yield expressions within the generator function’s body.
-
Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn’t include all inputs and outputs.
# Returning Lists : can cause a program to run out of memory
def index_words(text):
result = []
if text:
result.append(0)
for index, letter in enumerate(text):
if letter == ' ':
result.append(index + 1)
return result
# Generators
def index_words_iter(text):
if text:
yield 0
for index, letter in enumerate(text):
if letter == ' ':
yield index + 1
Item31. Be Defensive When Iterating Over Arguments (list)
-
Beware of functions and methods that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.
-
Python’s iterator protocol defines how containers and iterators interact with the
iter
andnext
built-in functions, for loops, and related expressions. -
You can easily define your own iterable container type by implementing the
__iter__
method as a generator. -
You can detect that a value is an iterator (instead of a container) if calling
iter
on it produces the same value as what you passed in. Alternatively, you can use theisinstance
built-in function along with the collections.abc.Iterator class.
# Before : iterate over input iterator multiple times.
def normalize(numbers):
total = sum(numbers)
result = []
for value in numbers:
percent = 100 * value / total
result.append(percent)
return result
# Problem : create iterator (Item30)
def read_visits(data_path):
with open(data_path) as f:
for line in f:
yield int(line)
it = read_visits('my_numbers.txt')
print(list(it)) # >>> [15, 35, 80]
print(list(it)) # >>> [] # Already exhausted
## After1 : Defensively copies the input iterator => input could be extremely large (OOM)
def normalize_copy(numbers):
numbers_copy = list(numbers) # Copy the iterator
total = sum(numbers_copy)
result = []
for value in numbers_copy:
percent = 100 * value / total
result.append(percent)
return result
## After2 : accept a funtion that returns a new iterator and write your own iterable container
def normalize_func(get_iter):
total = sum(get_iter()) # New iterator
result = []
for value in get_iter(): # New iterator
percent = 100 * value / total
result.append(percent)
return result
class ReadVisits:
def __init__(self, data_path):
self.data_path = data_path
def __iter__(self):
with open(self.data_path) as f:
for line in f:
yield int(line)
visits = ReadVisits(path)
percentages = normalize_func(visits)
# After3 : Be more defensive
def normalize_defensive(numbers):
if isinstance(numbers, Iterator): # Another way to check
raise TypeError('Must supply a container')
total = sum(numbers)
result = []
for value in numbers:
percent = 100 * value / total
result.append(percent)
return result
Item32. Consider Generator Expressions for Large List Comprehensions
-
List comprehension may cause problems for large inputs by using too much memory.
-
Generator expressions avoid memory issue by producing outputs one at a time as iterators
-
Generator expressions can be composed by passing the iterator from one generator expression into the
for
subexpression of another. -
Generator expressions execute very quickly when chained together and are memory efficient
# before
value = [len(x) for x in open('my_file.txt')]
print(value) # >>> [100, 57, 15, 1, 12, 75, 5, 86, 89, 11]
# after
it = (len(x) for x in open('my_file.txt'))
print(it) # >>> <generator object <genexpr> at 0x108993dd0>
print(next(it)) # >>> 100
print(next(it)) # >>> 57
# generators can be composed together
roots = ((x, x**0.5) for x in it)
Item33. Compose Multiple Generators with yield from
-
The
yield from
expression allows you to compose multiple nested generators together into a single combined generator -
yield from
provides better performance than manually iterating nested generators and yielding their outputs
# example generator
def move(period, speed):
for _ in range(period):
yield speed
def pause(delay):
for _ in range(delay):
yield 0
# before
def animate():
for delta in move(4, 5.0):
yield delta
for delta in pause(3):
yield delta
for delta in move(2, 3.0):
yield delta
# after
def animate_composed():
yield from move(4, 5.0)
yield from pause(3)
yield from move(2, 3.0)
Item 34. Avoid Injecting Data into Generators with send
-
The
send
method can be used to inject data into a generator by giving theyield
expression a value that can be assigned to a variable. -
Using
send
withyield
from expressions may cause surprising behavior, such asNone
values appearing at unexpected times in the generator output -
Providing an input iterator to a set of composed generators is a better approach than using the
send
method, which should be avoided
Item 35. Avoid Causing State Transitions in Generators with throw
- The throw method can be used to re-raise exceptions within generators at the position of the most recently executed yield expression.
- Using
throw
harms readability because it requires additional neting and boilerplate in order to raise and catch exceptions - A better way to provide exceptinal behavior in generators is to use a class that implements the
__iter__
method along with methods to cause exceptional state transitions.
Item 36. Consider itertools
for Working with Iterators and Generators
-
The
itertools
functions fall into three main categories for working with iterators and generators-
linking iterators together
-
filtering items they output
-
producting combinations of items
-
-
There are more advanced functions, additional parameters, and useful reciples available in the documentation at
help(itertools)