## Sets

The `set` data type in Python implements most of the operations from mathematical sets. The most common representation of sets in mathematics uses Venn diagrams (overlapping circles) which I’m assuming are familiar.

In Python, a `set` is a collection data type which is mutable (can be altered), but whose elements must be immutable data types, such as primitive numeric data types, strings and tuples. A set cannot contain duplicate values.

The elements in a set are unordered, which means they may not appear in the same order every time when a set is used as an iterator, for example, in a for loop. Because of this, it is not possible to access members of sets using numerical indexes or slicing (such as you would use to access members of lists and tuples).

## Creating sets

A set can be created by listing its elements in braces (curly brackets):

`mySet = {1, 2, 'wibble', 3.14}`

As usual in Python, we can mix data types in a set.

A set may also be created from a preexisting list or tuple by using the set constructor, as in:

```a = [1, 2, 3, 4]
setA = set(a)
b = 'x', 'y', 'z'
setB = set(b)```

If you print out `setA` and `setB`, you may find that the order of the elements is not the same as in the original list or tuple; this is because sets are unordered.

As mentioned above, the elements of a set must be immutable data types. This means we can create a set of primitive data types like ints, floats, strings and complex numbers. As tuples are immutable, we can also create a set of tuples. However, we cannot create a set of lists, as lists are mutable (their elements can be changed, and they can be extended or contracted by adding or deleting elements).

As we saw in the last example, however, you can create a set from the elements in a list, provided that the list elements are immutable. The list elements are extracted and inserted into the set.

## Adding and deleting set elements

You can add an element to a set using the `add()` function, as in `setA.add(44)`. For `setA` as defined above, the new contents of `setA` are now {1, 2, 3, 4, 44}.

You can add multiple elements to a set using `update()`. The argument to `update()` can be an iterable type such as a list or tuple, but cannot be a primitive type. Thus `setA.update([45, 53, 77])` (adding a list) and `setA.update((45, 53, 77))` (adding a tuple; note the double parentheses) are allowed, but `setA.update(45, 53, 77)` is not. The argument to `update()` can also be another set, so we can have `setA.update({45, 53, 77})`. This latter option is equivalent to forming the union of `setA` with the set in the update function, and storing the result back in `setA`.

An element can be removed from a set using either `discard()` or `remove()`, both of which take a single argument which is the element to be discarded or removed. The two methods are equivalent if their argument is present in the set. If the argument is not present, `discard()` will silently do nothing, while `remove()` will generate an error.

## Set operations

The usual mathematical set operations are supported. Set union can be done using either the `|` (logical OR) operator or with the method `union()`, as in `setA.union(setB)`. If we want the union of `setA` and `setB` above, we can write either `setC = setA | setB` or `setC = setA.union(setB)`. Note that if the same element occurs in both `setA` and `setB`, it appears only once in `setC` because sets do not store duplicates. The union operation returns a new set that is the union of its two arguments, so the two original sets are not changed.

Intersection can be done using either the `&` (logical AND) operator or with the `intersection()` method. Thus we can have either `setD = setA & setB` or `setD = setA.intersection(setB)`.

The difference between `setA` and `setB` (that is, the set of element in `setA` but not in `setB`) is done using either the – (minus) operator or with the method `difference()`.

The symmetric difference (elements in `setA` or `setB` but not in both) can be done using the ^ (logical XOR) operator or with the method `symmetric_difference()`.

There are various methods that return a boolean value. The methods `isdisjoint()`, `issubset()` and `issuperset()` test to see if a set is disjoint (as in `setA.isdisjoint(setB)` has no elements in common), is a subset (all the elements of the first set are also in the second set) or is a superset(all the elements of the second set are also in the first set).

## Exercise

Write a program that asks the user to enter some details about several people, including their name, age, salary and gender, where name and gender are strings, age is an int and salary is a decimal. Store the details for each person in a namedtuple, and add each namedtuple to a set.

From this master set, construct sets containing each of the following. Use list comprehension where appropriate to specify the elements of some of the sets.

• a set of all the males
• a set of all the females (feel free to expand these categories to allow for more genders if you like)
• a set of everyone under age 40
• a set of everyone age 40 or over
• a set of everyone with a salary under 10,000
• a set of everyone with a salary of 10,000 or more
• a set of all the males with a salary of 10,000 or more
• a set of all the females and everyone age 40 or over

Print out each set. If you just use a print command on a set of namedtuples, the output isn’t exactly pretty, but it will do for now. Feel free to clean it up if you like.

```from collections import *
from decimal import *
Person = namedtuple('Person', ['name', 'age', 'salary', 'gender'])
personSet = set()
while True:
data = input('Enter name, age, salary, gender (comma separators), or \'quit\':')
if data == 'quit':
break
data = data.split(',')

males = set([x for x in personSet if x.gender == 'm'])
females = personSet - males
under40 = set([x for x in personSet if x.age < 40])
over40 = personSet - under40
under10k = set([x for x in personSet if x.salary < 10000])
over10k = personSet - under10k
malesOver10k = males & over10k
femalesPlusOver40 = females | over40

print('Males:',males)
print('Females:',females)
print('Under age 40:',under40)
print('Over age 40:',over40)
print('Salary under 10k:',under10k)
print('Salary over 10k:',over10k)
print('Males over 10k:',malesOver10k)
print('All females & everyone over age 40:',femalesPlusOver40)

```

We define the namedtuple on line 3 and initialize the master set `personSet` on line 4.

The while loop on line 5 reads in the data for each person, with the fields separated by commas (note that you will need to input the data separated only by commas (with no additional whitespace) in order for the split(‘,’) method on line 9 to work). On line 10, we add a namedtuple to the set `personSet`.

After the user types ‘quit’, we build the sets starting on line 12. We use list comprehension to select all the namedtuples with male gender. The females set is the set obtained from the males set by taking the difference between the master set and the males set.

Similarly, we construct the sets for the age groups and salary groups on lines 14 through 17.

The males with a salary over 10,000 is formed by set intersection of the males set and over10k set. The set consisting of all females and everyone over age 40 is the union of females and over40.

### This Post Has 2 Comments

1. Thank you for creating such informative blog. keep up the excellent work.

2. Thank you so much for your help. It really saved a lot of time for me. I couldn’t figure it out all by myself.

This site uses Akismet to reduce spam. Learn how your comment data is processed.