Saturday 5:10 p.m.–5:40 p.m.
A Winning Strategy with The Weakest Link: how to use weak references to make your code more robust
- Audience level:
- Best Practices & Patterns
Working with weak references should not just be for Python wizards. Whether you have a cache, memoizing a function, tracking objects, or various other bookkeeping needs, you definitely do not want code leaking memory or resources. In this talk, we will look at illuminating examples drawn from a variety of sources on how to use weak references to prevent such bugs.
What exactly are weak references anyway, especially when compared to normal (hard) references? And do they really matter for the code we write? That second question is easy to answer. You need to think about weak references if you are writing the following and want to avoid resource leaks: * Objects having reference cycles on CPython, such as a parent referring to its children, and the child to its parent. * Caches and lookup tables, which often come up in Python metaprogramming. Using weak references for this scenario may be applicable, regardless of the Python implementation. For example, you might want to track the membership of all objects for a given class. If you only have hard references, this isn't possible without causing memory leaks. Now maybe someone has already done this hard coding work, but it's best to know these scenarios to avoid resource leak bugs in the framework, module, or recipe you might be using. Especially in production! But first, what exactly are weak references? Although weak references were initially proposed in [PEP 205](http://legacy.python.org/dev/peps/pep-0205/) and implemented in Python 2.1 (released April 2001), they are still not widely known. So it helps to try out some examples to build our intuition. First, let's import `WeakSet`. Many uses of weak references are with respect to the collection provided by the `weakref` module: from weakref import WeakSet Define a class `MyStr` like so: MyStr(str): pass Next, let's construct a weak set and add an element to it. We then list the set: s = WeakSet() s.add(MyStr('foo')) list(s) And most likely we will see that it is empty - we get ``. Or at the very least after a garbage collection with `gc.collect()`. With a bit of thought, we realize that nothing was holding on to the instance `MyStr("foo")` - the point of a weak reference, including collections that maintain only a weak reference to their contents like `WeakSet`, is that only strong references keep objects from being collected. We can see this by adding a strong reference: a = MyStr('fum') s.add(a) list(s) This returns ['fum'] We now have a variable `a` in our global namespace that's strongly referencing this value. Of course if we delete this name, we might expect the value to disappear, as it in fact is guaranteed to do, again at least after a garbage collection: del a list(s) probably still returns ['fum'] But after a collection gc.collect() list(s) the resulting list is empty:  So now we have some understanding of the behavior, but why would this sort of thing even be useful? Perhaps the most important thing to know is that once we master the mechanics, with minimal fuss and just a few lines of code we can readily apply to the problems they can solve. In particular this means supporting caching/lookup tables without introducing memory/resource leaks, as well as reference cycle elimination on CPython. Django uses weak references in the implementation of its [signal mechanism](https://docs.djangoproject.com/en/dev/topics/signals/): > Django includes a “signal dispatcher” which helps allow decoupled > applications get notified when actions occur elsewhere in the > framework. In a nutshell, signals allow certain senders to notify a > set of receivers that some action has taken place. They’re > especially useful when many pieces of code may be interested in the > same events. Such decoupling is a perfect usage of weak references. Although it is certainly possible to compute this coupling between senders and receivers on the fly, it's expensive to do, so caching is preferred. In Django's case, because the caching is implemented with a `WeakKeyDictionary`, cleanup is straightforward. But there's at least one other usage to consider. Although CPython does support the collection of reference cycles, there are important caveats: 1. Such cycles can be only be collected upon the stop-the-world GC collection. Without calling `gc.collect` directly, such collections are run per the decision criteria in the `gc.set_threshold` function, which has been further enhanced since 2.5 to support generations. Suffice to say, it doesn't occur necessarily when you would need it to, and certainly not on the basis of running out of memory, or on a periodic basis. 2. Using `__del__` with such cycles creates uncollectable garbage. As a consequence, when creating numerous parent-child relationships, as seen in `xml.sax.expatreader` or previous/next links, as seen in `OrderedDict`, it is important to use weak references for one side of the relationship, or else it's very easy to see problems arise. In particular, `OrderedDict` uses `weakref.proxy` for previous links. By doing so, using code automatically chases the extra level of indirection to the previous object; but the next link ensures that a hard reference is present, so the object doesn't disappear.