The set
data type in Python implements most of the operations from mathematical sets. The most common representation of sets in mathematics uses Venn diagrams (overlapping circles) which I’m assuming are familiar.
In Python, a set
is a collection data type which is mutable (can be altered), but whose elements must be immutable data types, such as primitive numeric data types, strings and tuples. A set cannot contain duplicate values.
The elements in a set are unordered, which means they may not appear in the same order every time when a set is used as an iterator, for example, in a for loop. Because of this, it is not possible to access members of sets using numerical indexes or slicing (such as you would use to access members of lists and tuples).
Creating sets
A set can be created by listing its elements in braces (curly brackets):
mySet = {1, 2, 'wibble', 3.14}
As usual in Python, we can mix data types in a set.
A set may also be created from a preexisting list or tuple by using the set constructor, as in:
a = [1, 2, 3, 4] setA = set(a) b = 'x', 'y', 'z' setB = set(b)
If you print out setA
and setB
, you may find that the order of the elements is not the same as in the original list or tuple; this is because sets are unordered.
As mentioned above, the elements of a set must be immutable data types. This means we can create a set of primitive data types like ints, floats, strings and complex numbers. As tuples are immutable, we can also create a set of tuples. However, we cannot create a set of lists, as lists are mutable (their elements can be changed, and they can be extended or contracted by adding or deleting elements).
As we saw in the last example, however, you can create a set from the elements in a list, provided that the list elements are immutable. The list elements are extracted and inserted into the set.
Adding and deleting set elements
You can add an element to a set using the add()
function, as in setA.add(44)
. For setA
as defined above, the new contents of setA
are now {1, 2, 3, 4, 44}.
You can add multiple elements to a set using update()
. The argument to update()
can be an iterable type such as a list or tuple, but cannot be a primitive type. Thus setA.update([45, 53, 77])
(adding a list) and setA.update((45, 53, 77))
(adding a tuple; note the double parentheses) are allowed, but setA.update(45, 53, 77)
is not. The argument to update()
can also be another set, so we can have setA.update({45, 53, 77})
. This latter option is equivalent to forming the union of setA
with the set in the update function, and storing the result back in setA
.
An element can be removed from a set using either discard()
or remove()
, both of which take a single argument which is the element to be discarded or removed. The two methods are equivalent if their argument is present in the set. If the argument is not present, discard()
will silently do nothing, while remove()
will generate an error.
Set operations
The usual mathematical set operations are supported. Set union can be done using either the |
(logical OR) operator or with the method union()
, as in setA.union(setB)
. If we want the union of setA
and setB
above, we can write either setC = setA | setB
or setC = setA.union(setB)
. Note that if the same element occurs in both setA
and setB
, it appears only once in setC
because sets do not store duplicates. The union operation returns a new set that is the union of its two arguments, so the two original sets are not changed.
Intersection can be done using either the &
(logical AND) operator or with the intersection()
method. Thus we can have either setD = setA & setB
or setD = setA.intersection(setB)
.
The difference between setA
and setB
(that is, the set of element in setA
but not in setB
) is done using either the – (minus) operator or with the method difference()
.
The symmetric difference (elements in setA
or setB
but not in both) can be done using the ^ (logical XOR) operator or with the method symmetric_difference()
.
There are various methods that return a boolean value. The methods isdisjoint()
, issubset()
and issuperset()
test to see if a set is disjoint (as in setA.isdisjoint(setB)
has no elements in common), is a subset (all the elements of the first set are also in the second set) or is a superset(all the elements of the second set are also in the first set).
Exercise
Write a program that asks the user to enter some details about several people, including their name, age, salary and gender, where name and gender are strings, age is an int and salary is a decimal. Store the details for each person in a namedtuple, and add each namedtuple to a set.
From this master set, construct sets containing each of the following. Use list comprehension where appropriate to specify the elements of some of the sets.
- a set of all the males
- a set of all the females (feel free to expand these categories to allow for more genders if you like)
- a set of everyone under age 40
- a set of everyone age 40 or over
- a set of everyone with a salary under 10,000
- a set of everyone with a salary of 10,000 or more
- a set of all the males with a salary of 10,000 or more
- a set of all the females and everyone age 40 or over
Print out each set. If you just use a print command on a set of namedtuples, the output isn’t exactly pretty, but it will do for now. Feel free to clean it up if you like.
See answerfrom collections import * from decimal import * Person = namedtuple('Person', ['name', 'age', 'salary', 'gender']) personSet = set() while True: data = input('Enter name, age, salary, gender (comma separators), or \'quit\':') if data == 'quit': break data = data.split(',') personSet.add(Person(data[0], int(data[1]), Decimal(data[2]), data[3])) males = set([x for x in personSet if x.gender == 'm']) females = personSet - males under40 = set([x for x in personSet if x.age < 40]) over40 = personSet - under40 under10k = set([x for x in personSet if x.salary < 10000]) over10k = personSet - under10k malesOver10k = males & over10k femalesPlusOver40 = females | over40 print('Males:',males) print('Females:',females) print('Under age 40:',under40) print('Over age 40:',over40) print('Salary under 10k:',under10k) print('Salary over 10k:',over10k) print('Males over 10k:',malesOver10k) print('All females & everyone over age 40:',femalesPlusOver40)
We define the namedtuple on line 3 and initialize the master set personSet
on line 4.
The while loop on line 5 reads in the data for each person, with the fields separated by commas (note that you will need to input the data separated only by commas (with no additional whitespace) in order for the split(‘,’) method on line 9 to work). On line 10, we add a namedtuple to the set personSet
.
After the user types ‘quit’, we build the sets starting on line 12. We use list comprehension to select all the namedtuples with male gender. The females set is the set obtained from the males set by taking the difference between the master set and the males set.
Similarly, we construct the sets for the age groups and salary groups on lines 14 through 17.
The males with a salary over 10,000 is formed by set intersection of the males set and over10k set. The set consisting of all females and everyone over age 40 is the union of females and over40.
Thank you for creating such informative blog. keep up the excellent work.
Thank you so much for your help. It really saved a lot of time for me. I couldn’t figure it out all by myself.