If you’re used to languages such as C++, C# or Java, you’ll know that in these languages, variables must be declared before they can be used. Declaring a variable in these languages means that its data type must be specified. This allows the program to allocate space in memory for the declared variable and, in some cases, initialize its value.
In Python, variables are never declared, and must be assigned a value before they can be used anywhere else. To use Python variables properly, you need to understand how they are handled.
A Python variable name such as x is used only as a label. In order for x to have any meaning, it must be attached to some object. An object in Python can be as simple as a single integer, or as complex as a user-defined class with multiple data fields. When a variable is assigned to a data object, the data object is created and the variable name is attached to that object. One way of thinking of it is to imagine the data object is constructed separately and placed in a box, and then the variable name is like a label that is attached to the box. The variable is just one way of referring to this independent object. The properties of the object are contained in the object itself (inside the box), so they don’t need to be stored with the variable name (the label attached to the box).
One effect of this method of data storage is that a variable name such as x can be removed from one object and attached to another. The new object need not be the same data type as the first object, since all the attributes of the object are stored within the object and not with the variable name used to refer to it. In the box analogy, we can peel a label off one box and stick it onto another box, even if the boxes have completely different contents.
Let’s see how this works in practice. We’ll consider the following code:
x = 42 x y = x y x = 51 x y
In memory, this sequence of assignments looks like this:
x = 42 creates an integer object with the value 42 and attaches x as a label that refers to this object. Printing x after this assignment just echoes the value 42.
y = x is executed in two stages. First, the variable x on the right-hand side is used to locate the object to which it refers, which is the integer 42. Then y is attached as a label for this object. The result is that the integer object 42 now has two labels (x and y) attached to it. Note that there is no direct link between x and y; they are just independent labels that happen to refer to the same object. As a result, echoing x and then y after this assignment returns the value 42 in both cases.
Now consider the final assignment of
x = 51. This creates a new integer object (independent of the original 42 object) and gives it the value 51. The label x is then removed from the 42 object and attached to the new 51 object. This has no effect on y, as it still refers to 42. Thus printing x gives 51 and printing y gives 42.
Note that we could have replaced the statement
x = 51 with a statement like
x = 'wibble', in which a string object with the value ‘wibble’ is created and then x is assigned as a label for this object. The same label x is allowed to refer to any data type, since the properties of the data type are always contained within the object, and not within the label applied to that object.
The same process applies to more complex expressions. The statement
z = x + y, for example, first analyzes the RHS by looking up the objects referred to by x and y (which we’re assuming are integers), then adding them, then creating a new integer object with the sum and finally attaching z as the label for this new object.
At the level of simple expressions involving primitive data types, this is really all you need to know to understand dynamic typing. When we use more complex data types, however, you do need to be more careful. Although we haven’t yet examined lists, we can use a simple list here to illustrate the point.
A list in Python is just an ordered collection of objects, enclosed in square brackets. Consider the following code:
L = [42, 51, 60] M = L L M L = 73 L M
This code is illustrated as follows.
The first line creates a list with 3 elements, and the label L is assigned to this list. The important point here is that each of the three elements within the list is itself a label which refers to the integer object stored at that point in the list. Lists are indexed with the initial element given the index 0, so the first list element is
L, which here refers to the integer object 42.
The second line
M = L first looks up the object to which L refers (which is the original list) and then assigns M as a second label for the same list. Note that this has no effect on the elements of the list; M is just a second label for the top level object, which is the list itself. We can therefore access the list’s elements using either label, L or M, thus
M both print out 42.
Now suppose we give the command
L = 73. This creates a new integer object with the value 73 and assigns the label
L to refer to it. However, since this label is inside the list, the contents of the list are changed for both L and M. Thus if we print out the list using the L statement (on a line by itself) or M, we see that we get the same list values in both cases.
Finally, we consider this code:
L = [42, 51, 60] M = L[:] L M L = 73 L M
In the first line, we create the same list as before and give it the label L. The second line
M = L[:] is the syntax for copying a list (we’ll get to list syntax a bit later, so you can just accept this for now). If we print out L and M, we see we get what appears to be the same list. However, in this case, M refers to an independent copy of the list referred to by L, as in the diagram:
This time, if we change list L by the statement
L = 73, this affects only the list referred to by L and not the copy referred to by M. Thus
L now refers to the value 73 while
M still refers to 42.
References and garbage collection
If you read through the above carefully, one thing might bother you. In a series of statements such as
x = 42 x = 73 x = 61
we create an integer object with value 42, then another integer object with value 73 and finally another object with value 61. The same label x is successively applied to the three objects. After the final assignment, however, there is no label that refers to either the 42 or the 73, so what happens to them? As they were allocated memory but now have no references, do they just pile up until eventually the computer runs out of memory?
Fortunately not, as Python contains a built-in garbage collection system. Each object that is created contains a counter of how many labels (variable names) refer to that object. When the 42 object is created and x applied to it, the counter is incremented by 1. With the next assignment, a 73 object is created and its counter incremented, but since the label x was removed from the 42 object, its counter is decremented and returns to zero.
Whenever an object’s reference count becomes zero, Python automatically deletes it, thus freeing up its memory. If you’re curious about how many references an object has, you can use the getrefcount() function in the sys module:
from sys import * getrefcount(M) getrefcount(42)
In many cases, the count will probably be higher than you’d expect, as a lot of Python’s internal code creates and maintains objects that may be the same as those you define in your program. To save space, Python will often create a single instance of an integer with a given value, such as 42, and use this single instance in every place where the number 42 is needed. This doesn’t cause any problems, because integers are immutable, meaning that once they are created they can’t be changed. If a variable is to be assigned a new integer value, a new integer object is created and the variable assigned to it.