Saturday 5:10 p.m.–5:40 p.m.
A Winning Strategy with The Weakest Link: how to use weak references to make your code more robust
Jim Baker
- Audience level:
- Intermediate
- Category:
- Best Practices & Patterns
Description
Working with weak references should not just be for Python wizards. Whether you have a cache, memoizing a function, tracking objects, or various other bookkeeping needs, you definitely do not want code leaking memory or resources. In this talk, we will look at illuminating examples drawn from a variety of sources on how to use weak references to prevent such bugs.
Abstract
What exactly are weak references anyway, especially when compared to
normal (hard) references? And do they really matter for the code we
write?
That second question is easy to answer. You need to think about weak
references if you are writing the following and want to avoid resource leaks:
* Objects having reference cycles on CPython, such as a parent
referring to its children, and the child to its parent.
* Caches and lookup tables, which often come up in Python
metaprogramming. Using weak references for this scenario may be
applicable, regardless of the Python implementation. For example,
you might want to track the membership of all objects for a given
class. If you only have hard references, this isn't possible without
causing memory leaks.
Now maybe someone has already done this hard coding work, but it's
best to know these scenarios to avoid resource leak bugs in the
framework, module, or recipe you might be using. Especially in
production!
But first, what exactly are weak references? Although weak references
were initially proposed in [PEP 205](http://legacy.python.org/dev/peps/pep-0205/) and implemented
in Python 2.1 (released April 2001), they are still not widely known.
So it helps to try out some examples to build our intuition.
First, let's import `WeakSet`. Many uses of weak references are with
respect to the collection provided by the `weakref` module:
from weakref import WeakSet
Define a class `MyStr` like so:
MyStr(str):
pass
Next, let's construct a weak set and add an element to it. We then
list the set:
s = WeakSet()
s.add(MyStr('foo'))
list(s)
And most likely we will see that it is empty - we get `[]`. Or at the
very least after a garbage collection with `gc.collect()`.
With a bit of thought, we realize that nothing was holding on to the
instance `MyStr("foo")` - the point of a weak reference, including
collections that maintain only a weak reference to their contents like
`WeakSet`, is that only strong references keep objects from being
collected. We can see this by adding a strong reference:
a = MyStr('fum')
s.add(a)
list(s)
This returns
['fum']
We now have a variable `a` in our global namespace that's strongly
referencing this value. Of course if we delete this name, we might
expect the value to disappear, as it in fact is guaranteed to do,
again at least after a garbage collection:
del a
list(s)
probably still returns
['fum']
But after a collection
gc.collect()
list(s)
the resulting list is empty:
[]
So now we have some understanding of the behavior, but why would this
sort of thing even be useful?
Perhaps the most important thing to know is that once we master the
mechanics, with minimal fuss and just a few lines of code we can
readily apply to the problems they can solve. In particular this means
supporting caching/lookup tables without introducing memory/resource
leaks, as well as reference cycle elimination on CPython.
Django uses weak references in the implementation of its
[signal mechanism](https://docs.djangoproject.com/en/dev/topics/signals/):
> Django includes a “signal dispatcher” which helps allow decoupled
> applications get notified when actions occur elsewhere in the
> framework. In a nutshell, signals allow certain senders to notify a
> set of receivers that some action has taken place. They’re
> especially useful when many pieces of code may be interested in the
> same events.
Such decoupling is a perfect usage of weak references. Although it is
certainly possible to compute this coupling between senders and
receivers on the fly, it's expensive to do, so caching is
preferred. In Django's case, because the caching is implemented with a
`WeakKeyDictionary`, cleanup is straightforward.
But there's at least one other usage to consider. Although CPython
does support the collection of reference cycles, there are important
caveats:
1. Such cycles can be only be collected upon the stop-the-world GC
collection. Without calling `gc.collect` directly, such collections
are run per the decision criteria in the `gc.set_threshold`
function, which has been further enhanced since 2.5 to support
generations. Suffice to say, it doesn't occur necessarily when you
would need it to, and certainly not on the basis of running out of
memory, or on a periodic basis.
2. Using `__del__` with such cycles creates uncollectable garbage.
As a consequence, when creating numerous parent-child relationships,
as seen in `xml.sax.expatreader` or previous/next links, as seen in
`OrderedDict`, it is important to use weak references for one side of
the relationship, or else it's very easy to see problems arise. In
particular, `OrderedDict` uses `weakref.proxy` for previous links. By
doing so, using code automatically chases the extra level of
indirection to the previous object; but the next link ensures that a
hard reference is present, so the object doesn't disappear.