Strings and formatting

A string in Python is a sequence of Unicode characters.

Defining strings

A string can be defined as an immutable object by giving the character sequence between a pair of either single or double quotes. For example:

str1 = 'This is a test string between single quotes.'
str2 = "This is another string, between double quotes."
str3 = 'She said: "How\'s it going?"'
str4 = "He answered: \"I'm fine.\""

str1 and str2 show how a simple string can be defined, using either type of quotes. If we want a string to contain one of the quote marks, we can use the escape (backslash) character to include it. Thus str3 uses single quotes to enclose it, but we also want a single quote within the string, so we add a \ before the ‘ in How’s. Similarly, in str4, if we use to double quotes to enclose the string, we must escape any double quotes within the string.

Using a pair of single or double quotes is sufficient for a string that is on a single line. If we want a string to extend over more than one line, we can enclose it in a pair of triple quotes, as in:

""" A test
string on
several
lines. """
""" for x in range(15):
        print(x ** 2)
"""

The first example shows an ordinary string on 4 lines. The second example shows that the triple-quote technique can be used to comment out sections of code. The triple quote notation is often used to add multi-line comments to code as well.

Accessing characters within strings

As mentioned above, a string is immutable, meaning that once it has been defined, it cannot be altered. However, individual characters or substrings within a string can be accessed on a read-only basis. We can use the same notation as that for lists to access single characters or combinations of characters from a string. For example:

str1 = 'This is a test string between single quotes.'
print(str1[5])
print(str1[3:8])
print(str1[-5:-1])
print(str1[2:8:3])
print(str1[1::2])

This gives the output (review the discussion of indexes and slicing in the post on lists):

i
s is
otes
ii
hsi  etsrn ewe igeqoe.

str1[5] returns the 6th character (since indexing starts at 0), which is i. str1[3:8] is the substring from index 3 to 7 (remember that slicing stops at the index before the number in the second slot), which is ‘s is ‘ (the final character is a space). str1[-5:-1] starts at the 5th character from the end and goes up to the second character from the end, which is ‘otes’. str1[2:8:3] starts at the third character and goes up to the 7th character in steps of 3, so we get ‘ii’. Finally str1[1::2] starts at the 2nd character and goes up to the end in steps of 2, so we get every second character.

Because of the immutability, an attempt to reassign a character or substring causes an error. Thus we cannot say str1[4] = 'x'.

A string can also serve as a list in a for loop. For example:

str1 = 'This is a test string between single quotes.'
count = 0
vowels = ['a','e','i','o','u']
for c in str1:
    if c in vowels:
        count += 1
print('Number of vowels:', count)

The for loop on line 4 iterates over each character in str1 and tests if that character is in the vowels list.

String operators

There are three arithmetic operators that are defined to work with strings. The + operator concatenates (joins) two strings are returns a new string with the result. Thus we have

hello = 'Hello!'
ask = 'How are things?'
print(hello + ' ' + ask)

This joins the strings hello and ask with a blank between them, so we see the output Hello! How are things?

The += operator generates a new string that is the concatenation of the string on the left with the string on the right. Thus hello += ask produces a new string with the name hello and value Hello!How are things?. Note that this reassigns the variable hello to point to a new string; it doesn’t change the old string that was originally assigned to hello.

The multiplication operator * takes a string and an int and duplicates the string the given number of times. For example, print((hello + ' ') * 5) produces the output Hello! Hello! Hello! Hello! Hello!.

Exercise

Keeping in mind that strings are immutable, write a program that takes a string as input and then prints out that string with every second character replaced by ‘x’.

See answer
str1 = input('Enter a string: ')
str2nd = str1[::2]
strx = ''
for c in str2nd:
    strx += c + 'x'
print(strx)

String methods

There are a large number of built-in methods for dealing with strings, so we don’t have room to go into them all here. To see a complete list, type help(str) at a prompt in the Python console, or look up a reference page on the web.

A couple of methods that we will use often are len() and split(). len() just returns the length (number of characters) in a string, so len(str1) gives us 44 for the string str1 above.

split(<delimiter>) splits its input into a list of substrings by chopping the string at each occurrence of the delimiter. It’s useful for extracting separate bits of data from an input statement, for example. The <delimiter> can be either a single character such as a blank or a comma, or it can be a longer string. If no delimiter is specified, split() splits a string at any whitespace (blank, tab, return, etc). In its primitive form, split() does not accept a list of delimiters, so you can’t, for example, specify that a string should be split at either a blank or a comma. To do that requires a regular expression, which we haven’t covered yet.

A simple example is:

data = input('First name, last name, age: ')
dataList = data.split()
print(dataList)

If we enter Otis Wibble 43, the split() function produces the list ['Otis', 'Wibble', '43'] as output. Note that the ’43’ is a string, not an int, so if we want to treat it as an int, we need to cast it, as in age = int('43').

String formatting

One other string method deserves some attention. Frequently we wish to print out a string consisting of a combination of static text, variable text and variable numbers. The format() method is a powerful tool for doing this.

The simplest use of format() is to use a pair of empty braces {} as a placeholder for a value to be inserted at the point in a string. For example, here’s a table of the squares from 1 to 10:

for i in range(1,11):
    print('{}**2 = {}'.format(i, i**2))

#Output
1**2 = 1
2**2 = 4
3**2 = 9
4**2 = 16
5**2 = 25
6**2 = 36
7**2 = 49
8**2 = 64
9**2 = 81
10**2 = 100

We specify a string to be printed ('{}**2 = {}') with two placeholders. We call the format() method on this string, where format() has two arguments, one for each placeholder.

If we don’t want the arguments to be printed out in the same order as they are given, we can use either an index number or a keyword in the placeholders. For example:

for i in range(1,11):
    print('{1} is {0} squared'.format(i, i**2))

for i in range(1,11):
    print('{square} is {num} squared'.format(num = i, square = i**2))

In the first example, index 0 corresponds to i and index 1 to i**2. In the second example, we’ve named the arguments so we can use these names in the placeholders.

Formatting numbers

When calculating with floating point numbers, the answers we get are frequently not in the form we want to display. A float may have too many digits after the decimal point, or we may want to see the result in scientific (exponential) notation and so on. format() allows us to specify the printed format of these numbers.

For example, suppose we want a table of the square roots of the numbers from 1 to 10, but we’d like only 3 decimal places in the answers. We can do this as follows:

from math import *
for i in range(1,11):
    print('\u221a{0} = {1:8.3f}'.format(i, sqrt(i)))

The output is:

√1 =    1.000
√2 =    1.414
√3 =    1.732
√4 =    2.000
√5 =    2.236
√6 =    2.449
√7 =    2.646
√8 =    2.828
√9 =    3.000
√10 =    3.162

In the print statement, we’ve used \u221a to print the square root sign, since its Unicode value is 221a. The first placeholder is just {0}, indicating that the argument with index 0 in the format() call is to be placed here, with no modifications.

The format of the square root value is given as {1:8.3f}. The 1 indicates that the value with index 1 in the format argument list is to be inserted here. The 8 means that a minimum of 8 spaces is allocated to the value, and the .3f indicates that the value is a float (the ‘f’) and should be displayed with 3 decimal places.

For large numbers, exponential notation is often more convenient. Here’s a table of the 10th powers of some numbers:

for i in range(10,21):
    print('{0}^10 = {1:10.5e}'.format(i, i**10))

The output is

10^10 = 1.00000e+10
11^10 = 2.59374e+10
12^10 = 6.19174e+10
13^10 = 1.37858e+11
14^10 = 2.89255e+11
15^10 = 5.76650e+11
16^10 = 1.09951e+12
17^10 = 2.01599e+12
18^10 = 3.57047e+12
19^10 = 6.13107e+12
20^10 = 1.02400e+13

The placeholder {1:10.5e} says to insert the value with index 1, to use a minimum of 10 spaces with 5 decimal places, and the ‘e’ says to use exponential notation.

There are several other options for outputting strings and numbers, including instructions for aligning the output. A useful reference is here.

Formatted string literals

A quick method of formatting output without using the format() function is the formatted string literal. The syntax is as shown:

x = 42
message = 'everything'
print(f'The answer to the life, the universe and {message} is {x}')

#Output
The answer to the life, the universe and everything is 42

In the print() statement, we prefix the string to print with either f or F. Wherever we want to print the value of some Python expression, we enclose that expression in braces {}. Thus the value of message (‘everything’) is printed instead of {message} and of x (42) is printed instead of {x}.

For numerical output, we can also use the formatting codes described above. For example

print(f'The value of 1/7 to 4 decimal places is {1/7:.4f}')

The expression 1/7 inside the braces is calculated, and the result is printed with 4 decimal places, so we get ‘The value of 1/7 to 4 decimal places is 0.1429’.

Old style string formatting with the % operator

For quick formatting in a style that will be familiar to C and C++ programmers, we can use the % operator. Using this format, we can rewrite the previous example:

x = 42
message = 'everything'
print('The answer to life, the universe and %s is %d' % (message, x))

The %s and %d serve as place holders for the quantities specified after the % operator that separates the string from the tuple (message, x). The %s indicates that a string will be placed at that location, and %d indicates that an int will be placed there. Outputting a float with %d truncates the number to an int.

Other codes include %f for a float and %x for a hexadecimal representation of an int. In practice, the %s code will accept pretty much any object, as long as it can be converted to a string.

Printing strings

Apart from the formatting options for printing out values, there are a couple of useful tweaks we can apply in a print() statement. If we print out a sequence of values as in print('The value of x is', x), by default the print statement inserts a blank between ‘The value of x is’ and the value of x. We can change this by specifying the sep argument, as in print('The value of x is', x, sep = '...'). Instead of a single blank, the two outputs are now separated by 3 dots.

By default, a print statement prints an end of line character at the end. This can be overridden by specifying the end argument. So if we say print('The value of x is', x, end = ';') the output will end with a semicolon and without a newline, so we can then give another print command to print more on the same line.

Exercise

Write a program that generates a table of cube roots of the integers from 1 to 20. The output should use the Unicode symbol ∛ for the cube root, and should display the results in the following format:

  ∛1 = 1.000
  ∛2 = 1.260
  ∛3 = 1.442
  ∛4 = 1.587
  ∛5 = 1.710
  ∛6 = 1.817
  ∛7 = 1.913
  ∛8 = 2.000
  ∛9 = 2.080
 ∛10 = 2.154
 ∛11 = 2.224
 ∛12 = 2.289
 ∛13 = 2.351
 ∛14 = 2.410
 ∛15 = 2.466
 ∛16 = 2.520
 ∛17 = 2.571
 ∛18 = 2.621
 ∛19 = 2.668
 ∛20 = 2.714

Note that the equals signs are all aligned, and the values in the left-most column are right-aligned. Hint: look up options for aligning text in the format() statement.

See answer

The Unicode values for mathematical symbols can be found here. The value for the cube root sign is 221b.

One program which prints out the required table is:

for i in range(1,21):
    print('{0:>4} = {1:5.3f}'.format('\u221b' + str(i), i ** (1/3)))

In the first placeholder {0:>4} we use the > symbol to indicate that the value should be right-aligned within a minimum of 4 spaces. The other placeholder {1:5.3f} outputs index 1 in a minimum of 5 spaces with 3 decimal points in float format.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.