Python for SRE. Slots, should you bother?
I’m not a programmer, but my job involves programming. Mainly for our Azure in-house tool and also to automate repetitive tasks (away with toil!). The resulting software has a fairly small number of users: the rest of my team or other teams in the Operations and Engineering department. Sometimes the only user is myself. Nevertheless, given the usual complexity of the software I handle - a couple of hundred lines if we exclude the in-house tool - it doesn’t seem profitable to worry about efficiency as much as worrying about it working and getting the job done. However, as I’ve already hinted, I like what I do to be efficient or at least not so inefficient. I believe that if a piece of software can take 10ms and 500 bytes, it has no excuse to take 100ms and 1000 bytes. Even if we have spare time and memory.
So I’ve been wondering about the impact of __slots__
in Python. Would it be significant, would it be worth it? The theory, online discussions, and the official documentation clearly say yes. But I wanted to see it for myself and try it out to answer the question that gives this post its tittle: “Is it worth worrying about it?”. Here is the resulting test, which has served as an excuse for me to get started in Python memory profiling.
Slots, the theory
__slots__
allows to explicitly declare the attributes of a class in Python. It makes it kinda similar to C-like languages, where if we don’t declare a variable beforehand, it will complain with some kind of “UndefinedVariable” error. However, usually this is entirely legal in Python:
class Example:
def __init__(self, number):
self.number = number
def do_stuff(self):
if (self.number % 2 == 0):
return (self.number)
else:
self.new_num = 1 + self.number
return (self.new_num)
Above example works perfectly in Python. The variable new_num
is not declared as an instance attribute (those that go in the constructor) but directly in the else
. There is no problem. And there isn’t because Python objects have the __dict__
dictionary to dynamically map their attributes. This gives flexibility when writing code, at the cost of overhead in memory and access time.
Using __slots__
in a class allows us to deny the creation of that __dict__
, forcing us to declare all the attributes that each object of the class will have. By the way, this also means that we won’t have objects of the same class with different attributes depending on what happens at runtime.
class Example:
__slots__ = ('number')
def __init__(self, number):
self.number = number
def do_stuff(self):
if (self.number % 2 == 0):
return (self.number)
else:
self.new_num = 1 + self.number
return (self.new_num)
This will no longer work in Python. new_num
is not declared in __slots__
, so the object cannot have an attribute other than number
. We will encounter an error:
AttributeError: 'Example' object has no attribute 'new_num'.
To fix it, you just declare the variable in __slots__
isn´t it C-like?
class Example:
__slots__ = ('number', 'new_num')
def __init__(self, number):
self.number = number
def do_stuff(self):
if (self.number % 2 == 0):
return (self.number)
else:
self.new_num = 1 + self.number
return (self.new_num)
BTW, we could also simply make new_num
a local variable of the method and not an instance attribute (we declare it without self.
) and it would work because __slots__
“restricts” the attributes of an instance. I just wanted to mention it, although this example doesn’t deal with that.
What’s the purpose of this behavior change brought by the use of __slots__
? It’s not about restricting anything, as clarified long ago by the BDFL Guido van Rossum:
Some people mistakenly assume that the intended purpose of slots is to increase code safety (by restricting the attribute names). In reality, my ultimate goal was performance.
It’s a matter of performance. Accessing attributes declared in __dict__
is slower, and moreover, __dict__
itself has a larger memory footprint. I don’t want to delve too much into this to keep the post concise. In the above linked explanation by Guido he goes into more detail and with firsthand knowledge.
The test
The way I’ve set out to compare performance is intuitive:
- An example class. Two versions, one using
__slots__
and one without it. - Measure the memory usage of objects from both versions.
- Compare the usage of each to answer: Is there a gain? In what proportion?
For the test I designed a basic class that multiplies a range of numbers from 0 to max by a multiplier, and also stores the name of the “caller” because I wanted the class to store a string. I know, I’m not too creative with examples. The second one is the same class but leveraging __slots__
to see the differences.
class MyTestClass:
def __init__(self, mult, rang, caller):
self.multiplier = mult
self.list = []
self.caller = caller
for number in range(0, rang):
print(f"Adding {number} x {self.multiplier}")
self.list.append(number*self.multiplier)
class MySTestClass:
__slots__ = ('multiplier', 'list', 'caller')
def __init__(self, mult, rang, caller):
self.multiplier = mult
self.list = []
self.caller = caller
for number in range(0, rang):
print(f"Adding {number} x {self.multiplier}")
self.list.append(number*self.multiplier)
The first important question is, how do we measure the memory usage of an object in Python? We can’t simply use sys.getsizeof()
since it will return the size of the object but not the objects it refers to. That is, since my object contains references to other objects, it won’t return reliable results. The demonstration of this is trivial: create a class that contains a list, instantiate two objects of the class and add 10 elements to one and 100 to the other, then getsizeof()
will return the same value for both of them.
import sys
class testClass:
def __init__(self, ran):
self.ran = ran
self.lst = []
for element in range(0, self.ran):
self.lst.append(element)
small = testClass(10)
big = testClass(400)
print(f"Size of small is {sys.getsizeof(small)}")
print(f"Size of big is {sys.getsizeof(big)}")
Output shows us this limitation of getsizeof()
Size of small is 48
Size of big is 48
The documentation itself references a recursive sizeof
, but I turned to an easy to use third-party memory profiler called Pympler that serves my purpose. The level of memory profiling I want to achieve is quite basic, so I didn’t have time to “reinvent the wheel,” fun as it sounded.
So here’s the example usage with Pympler.
import pympler.asizeof as pasizeof
class MyTestClass:
def __init__(self, mult, rang, caller):
self.multiplier = mult
self.list = []
self.caller = caller
for number in range(0, rang):
print(f"Adding {number} x {self.multiplier}")
self.list.append(number*self.multiplier)
class MySTestClass:
__slots__ = ('multiplier', 'list', 'caller')
def __init__(self, mult, rang, caller):
self.multiplier = mult
self.list = []
self.caller = caller
for number in range(0, rang):
print(f"Adding {number} x {self.multiplier}")
self.list.append(number*self.multiplier)
Once everything is ready for testing, it’s time to try out cases. And this is where the influence of __slots__
can shine. The value of rang
is the only thing that will vary, as it represents the amount of dynamic memory I want the object to consume. To prepare the output, I’ve decided to refer to objects using __slots__
as “sobjects”. It’s not a convention, it’s just a quick way to differentiate them when printing their memory footprint.
Test 1: One small and one large:
test_subject = MyTestClass(2.5, 5, "John")
test_subject2 = MyTestClass(2.5, 1000, "Kelly")
stest_subject = MySTestClass(2.5, 5, "John")
stest_subject2 = MySTestClass(2.5, 1000, "Kelly")
print(f"First object size is {pasizeof.asizeof(test_subject)}B")
print(f"Second object size is {pasizeof.asizeof(test_subject2)}B")
print(f"First sobject size is {pasizeof.asizeof(stest_subject)}B")
print(f"Second sobject size is {pasizeof.asizeof(stest_subject2)}B")
print(f"First object takes {pasizeof.asizeof(test_subject) / pasizeof.asizeof(stest_subject)} times the memory of First sobject")
print(f"Second object takes {pasizeof.asizeof(test_subject2) / pasizeof.asizeof(stest_subject2)} times the memory of Second sobject")
The results (verbose output from print(f"Adding {number} x {self.multiplier}")
is clipped):
[...]
Adding 998 x 2.5
Adding 999 x 2.5
First object size is 648B
Second object size is 33264B
First sobject size is 376B
Second sobject size is 32992B
First object takes 1.7234042553191489 times the memory of First sobject
Second object takes 1.0082444228903977 times the memory of Second sobject
In the first case memory gains are significant. The memory usage of the class object that doesn’t use __slots__
is over 1.723 times that of the one that does use it. In the second case, the difference is much smaller: 1.008, okay, not as impressive, although we’re shaving off 272 bytes which is never bad. The gains from using __slots__
lose impact as the size of the list grows. The use of slots will have more impact if we instantiate many objects with a smaller memory footprint compared to just a big one containing collections.
Let’s see the output if we change the rang
when instantiating the objects from 5
and 1000
to 100
and 10000
:
[...]
Adding 9998 x 2.5
Adding 9999 x 2.5
First object size is 3728B
Second object size is 325584B
First sobject size is 3456B
Second sobject size is 325312B
First object takes 1.0787037037037037 times the memory of First sobject
Second object takes 1.0008361204013378 times the memory of Second sobject
No surprises there. There´s still some gains but less significant. So…
Then, is it worth worrying about?
For me, the answer is: Yes, BUT.
We’ve seen that memory savings can be important, just as the official documentation promises. Gaining efficiency in memory and access speed in such a simple way is not something to overlook. However, I believe it’s something that should be approached differently depending on how we’re working:
- If we have time for an orderly design phase (which is not often the case for us SREs, due to time and workload constraints), we can start leveraging
__slots__
from the beginning. - If it’s part of a more “adhoc” development process, I would wait until the script or software is at least a bit mature. In other words, while we’re testing and creating attributes, it’s not worth going crazy adding
__slots__
as it slows us down. But once we’re clear on what we need, why not?
In both cases, we’ll minimize access time and shed off a few (potentially quite a few) bytes. That’s always a good thing.