1 Working with Data

Questions:

How do I deal with data in Python?

Objectives:

Defining variables and using them in functions
Using simple data objects: strings and numbers
Using package functions and object methods

Keypoints:

variables are handy names for data objects
data objects are used by functions
functions can be stored in packages
methods available depend on the object we’re talking about

In any Python program we will have some data and some objective to achieve - something to do with that data. Python provides many data types and many ways of working with that data.

Manipulating data is done with functions. Data is stored in objects. In this section we’ll look at functions and objects and their interaction.

Let’s see the simplest example of this workflow, let’s take a string (a text carrying object data type) and use it in a function.

print("Hello, world!")

Hello, world!

In this example the string "Hello, World!", is being given to the print() function. Functions are bits of code that do stuff to data. The print() function just prints the data that you pass it to the screen. We pass data to functions by putting the data in the brackets after the function name.

Tip!

Data is a very big word in programming, it carries a lot of meaning. It can mean very small bits of stuff, like a single text character, or huge amounts of stuff, like the images collected by the Hubble telescope. It's all the same to the computer and programming language.

1.1 Using variables as names for data

We don’t usually use data directly. Instead we use a name that refers to a piece of data - a variable. Variables are just names that represent a bit of data. It’s called a variable because the data the variable is associated with can change. We assign a name to data by using the assignment symbol the = sign. The data associated with a variable can be changed by re-assignment, allowing us to reuse the name.

We can use variables as if they are the data they point to

x = "Hello, world!"
print(x)

x = 100
print(x)

Hello, world!
100

And variables are independent of one another, actions on one don’t affect another

weight_kg = 65

weight_pounds = 2.2 * weight_kg

print(weight_kg)

print(weight_pounds)

65
143.0

1.2 Python has three main types of function

Python is full of functions. So many in fact that coming up with names can be a problem! This is a serious issue as you have to refer to functions by name. To resolve this problem Python keeps its functions in different places in its library, and the way we call the function changes depending on where the function ‘lives’. There are three basic function types:

Built-In Functions and Operators
Package Functions
Object Methods

1.2.1 Built-In Functions and Operators

Python has some functions that can be called directly from anywhere in a program. These are listed here https://docs.python.org/3.3/library/functions.html. We’ve already seen print() `and there are some common ones we’ll come across later. You can spot a built-in function because it has a single-word name followed by brackets.

Related to built-in functions are operators. The things you’ll use most are the mathematical operators that work as you would expect from your knowledge of maths. So they include things like, + , -, *, / etc.

print(1 + 1)
print(2 * 2)
print(3 - 3)
print(4 / 4)

Some operators change what they do depending on the things you ask them to operate on. For instance, we can add strings?!

print( "Hello" + "World!")

HelloWorld!

This is supposed to be a way of making the language more intuitive and readable at the user end of things. Most times you see operators, they should be pretty obvious.

1.2.2 Functions from packages

Another source of functions is external packages - extensions to Python for use in particular problem domains. We can load in a package using import. Let’s import the random Python package that provides functions for generating random numbers.

import random

We can access the functions in this package by using the package name and the dot (.) syntax and the function name. Let’s call the numpy function randrange() which gives us a random number between two limits.

number = random.randrange(1, 10)

print( number )

1.2.3 Object Methods

Lumps of data are represented in the computer in things called objects. An object is basically a bit of data with some functions attached. This means each piece of data comes with the code to manipulate it. These attached functions are called methods.

We access data’s methods using the . syntax again, so this time you have variable_name.method(), read this as you telling the object the variable points at to do method() to itself. This is simpler than it sounds.

Consider a variable x pointing to a string object. We might use it with .capitalize() as follows

x = "hello, world!"

print( x.capitalize() )

Hello, world!

This does mean that the methods are closely tied to the data. Look what happens when we try to use .capitalize() on a number

y = 100

print( y.capitalize() )

Error in py_run_string_impl(code, local, convert) : 
  AttributeError: 'int' object has no attribute 'capitalize'

Python just throws an error. Basically this error is saying that int (integer, a number object ) doesn’t have a method called capitalize.

Loosely, methods are functions that only apply to particular object types.

We need to know what methods an object has before we can work on them. We can find this by reading the documentation for the object type. And Python will give us the type with the type() function

print( type(x) )

<class 'str'>

We can see that x contains a str - a string. The easiest place to find the Python documentation is online. Googling Python 3 str shows us this page https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str, which shows us all the String methods. This works well for finding all methods for objects of other types.

Roundup

We have discussed three types of function. The basic type is Python's built in functions, these are available anywhere in a Python program. The second, Package functions must be imported with package and accessed using the package name and dot. The third is the method which is a function tied to an object and is accessed using the variable name and the dot.

1.3 Objects and types in Python

Let’s examine some object types.

Python knows many types of object, most things are an object of some type. Three common object types are:

strings
integer numbers
floating point numbers

1.3.1 String Objects

The term ‘string’ is computer jargon for text data, usually a single lump of text data treated as a whole. To create a strings we simply have to add single or double quotes around some text, for example:

weight_kg_text = 'weight in kilograms'

Having actual numbers in there doesn’t make a string a number type - the following is still a string - it just happens to be made up of number like characters. Let’s see what happens if we try and treat it like a number.

phone_number = "01818118181"

print( phone_number * 2 )

0181811818101818118181

Here the * operator has modified itself to work on a string and repeated the string! Usually though the program will crash out, giving an operator the wrong data will confuse the program.

1.3.1.1 Indexing a string (Slicing)

We can access a single character in a string using indexing - basically asking for a character at a position. The syntax uses the square brackets.

print( phone_number[2] )

Note that using the index [2] gives us the third character - computer languages tend to count from 0.

We can get a longer subsection of a string using indexing as well - this is a technique that accesses a part of the data given a start and end point along the string.

dialling_code = phone_number[0:3]
print(dialling_code)

We just use the square brackets to indicate the start and stop points of the slice we want to extract, literally [start:end]. The start here is 0 meaning the first character, the end here is 3, but it means up to but not including the end.

print( dialling_code )

That’s why we only get the first three characters from the string and not 4 ie 0,1,2,3. The way to remember this is that the length of the resulting slice is end - start.

There are many string operations and methods, you can see them in the documentation at https://docs.python.org/3/library/stdtypes.html#textseq

1.3.2 Number Objects

Numbers in Python come in two types, whole numbers (called integers) and numbers with a decimal part (called floating point numbers).

In the example above, variable weight_kg has an integer value of 65. To create a variable with a floating point value, we can execute:

weight_kg = 65.0

The difference is important in some cases. You can convert type explicitly using the int() and float() functions.

print( float(1) + 3 )
print( int( 10.0 / 3.0 ) )

4.0
3

1.4 Quiz

What values do the variables mass and age have after each statement in the following program? Test your answers by executing the commands.

 mass = 47.5
 age = 122
 mass = mass * 2.0
 age = age - 20

What does the following program print out?

 first, second = 'Grace', 'Hopper'
 third, fourth = second, first
 print(third, fourth)

Recall that a section of a string is called a slice. Work on the slices below.

 element = 'oxygen'
 print('first three characters:', element[0:3])
 print('last three characters:', element[3:6])

What is the value of element[:4]?
What about element[4:]?
Or element[:]?
What is element[-1]?
What is element[-2]?
Given those answers, explain what element[1:-1] does.

Fix the capitalisation in messy_string

messy_string = "OH mY, ThESE LEtters Are ALL OVER The PLace!"

Hint: Think about standardising the letters by e.g making all one case, then fixing the capitalization from there.

Check out the math package. https://docs.python.org/3/library/math.html and import it.

What is the arc sine of -1, 0, 1 in radians?
How many degrees in \(\arcsin(-1)\) radians?