Sets

The set data type in Python implements most of the operations from mathematical sets. The most common representation of sets in mathematics uses Venn diagrams (overlapping circles) which I’m assuming are familiar.

In Python, a set is a collection data type which is mutable (can be altered), but whose elements must be immutable data types, such as primitive numeric data types, strings and tuples. A set cannot contain duplicate values.

The elements in a set are unordered, which means they may not appear in the same order every time when a set is used as an iterator, for example, in a for loop. Because of this, it is not possible to access members of sets using numerical indexes or slicing (such as you would use to access members of lists and tuples).

Creating sets

A set can be created by listing its elements in braces (curly brackets):

mySet = {1, 2, 'wibble', 3.14}

As usual in Python, we can mix data types in a set.

A set may also be created from a preexisting list or tuple by using the set constructor, as in:

a = [1, 2, 3, 4]
setA = set(a)
b = 'x', 'y', 'z'
setB = set(b)

If you print out setA and setB, you may find that the order of the elements is not the same as in the original list or tuple; this is because sets are unordered.

As mentioned above, the elements of a set must be immutable data types. This means we can create a set of primitive data types like ints, floats, strings and complex numbers. As tuples are immutable, we can also create a set of tuples. However, we cannot create a set of lists, as lists are mutable (their elements can be changed, and they can be extended or contracted by adding or deleting elements).

As we saw in the last example, however, you can create a set from the elements in a list, provided that the list elements are immutable. The list elements are extracted and inserted into the set.

Adding and deleting set elements

You can add an element to a set using the add() function, as in setA.add(44). For setA as defined above, the new contents of setA are now {1, 2, 3, 4, 44}.

You can add multiple elements to a set using update(). The argument to update() can be an iterable type such as a list or tuple, but cannot be a primitive type. Thus setA.update([45, 53, 77]) (adding a list) and setA.update((45, 53, 77)) (adding a tuple; note the double parentheses) are allowed, but setA.update(45, 53, 77) is not. The argument to update() can also be another set, so we can have setA.update({45, 53, 77}). This latter option is equivalent to forming the union of setA with the set in the update function, and storing the result back in setA.

An element can be removed from a set using either discard() or remove(), both of which take a single argument which is the element to be discarded or removed. The two methods are equivalent if their argument is present in the set. If the argument is not present, discard() will silently do nothing, while remove() will generate an error.

Set operations

The usual mathematical set operations are supported. Set union can be done using either the | (logical OR) operator or with the method union(), as in setA.union(setB). If we want the union of setA and setB above, we can write either setC = setA | setB or setC = setA.union(setB). Note that if the same element occurs in both setA and setB, it appears only once in setC because sets do not store duplicates. The union operation returns a new set that is the union of its two arguments, so the two original sets are not changed.

Intersection can be done using either the & (logical AND) operator or with the intersection() method. Thus we can have either setD = setA & setB or setD = setA.intersection(setB).

The difference between setA and setB (that is, the set of element in setA but not in setB) is done using either the – (minus) operator or with the method difference().

The symmetric difference (elements in setA or setB but not in both) can be done using the ^ (logical XOR) operator or with the method symmetric_difference().

There are various methods that return a boolean value. The methods isdisjoint(), issubset() and issuperset() test to see if a set is disjoint (as in setA.isdisjoint(setB) has no elements in common), is a subset (all the elements of the first set are also in the second set) or is a superset(all the elements of the second set are also in the first set).

Exercise

Write a program that asks the user to enter some details about several people, including their name, age, salary and gender, where name and gender are strings, age is an int and salary is a decimal. Store the details for each person in a namedtuple, and add each namedtuple to a set.

From this master set, construct sets containing each of the following. Use list comprehension where appropriate to specify the elements of some of the sets.

  • a set of all the males
  • a set of all the females (feel free to expand these categories to allow for more genders if you like)
  • a set of everyone under age 40
  • a set of everyone age 40 or over
  • a set of everyone with a salary under 10,000
  • a set of everyone with a salary of 10,000 or more
  • a set of all the males with a salary of 10,000 or more
  • a set of all the females and everyone age 40 or over

Print out each set. If you just use a print command on a set of namedtuples, the output isn’t exactly pretty, but it will do for now. Feel free to clean it up if you like.

See answer
from collections import *
from decimal import *
Person = namedtuple('Person', ['name', 'age', 'salary', 'gender'])
personSet = set()
while True:
    data = input('Enter name, age, salary, gender (comma separators), or \'quit\':')
    if data == 'quit':
        break
    data = data.split(',')
    personSet.add(Person(data[0], int(data[1]), Decimal(data[2]), data[3]))

males = set([x for x in personSet if x.gender == 'm'])
females = personSet - males
under40 = set([x for x in personSet if x.age < 40])
over40 = personSet - under40
under10k = set([x for x in personSet if x.salary < 10000])
over10k = personSet - under10k
malesOver10k = males & over10k
femalesPlusOver40 = females | over40

print('Males:',males)
print('Females:',females)
print('Under age 40:',under40)
print('Over age 40:',over40)
print('Salary under 10k:',under10k)
print('Salary over 10k:',over10k)
print('Males over 10k:',malesOver10k)
print('All females & everyone over age 40:',femalesPlusOver40)

    

We define the namedtuple on line 3 and initialize the master set personSet on line 4.

The while loop on line 5 reads in the data for each person, with the fields separated by commas (note that you will need to input the data separated only by commas (with no additional whitespace) in order for the split(‘,’) method on line 9 to work). On line 10, we add a namedtuple to the set personSet.

After the user types ‘quit’, we build the sets starting on line 12. We use list comprehension to select all the namedtuples with male gender. The females set is the set obtained from the males set by taking the difference between the master set and the males set.

Similarly, we construct the sets for the age groups and salary groups on lines 14 through 17.

The males with a salary over 10,000 is formed by set intersection of the males set and over10k set. The set consisting of all females and everyone over age 40 is the union of females and over40.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.