Python Tutorial

The python tutorial of ‘start science here!’ certainly is the longest and most exhaustive part. Python is for a computer scientist (computational chemists to be exact) the toolkit with which they do their work. Just like a carpenter needs their hammers or an organic chemist needs their flasks, we use python for our work.

First things first: You don’t have to work through the python tutorial from start to finish. For a start the following Quick Reference might suffice. But if you find the time you can head over to binder and start with the notebooks in the python_tutorial folder. You can stop at any point and do some other tutorials or work on your own project. But let me tell you, that I learned new stuff while composing the tutorial and I was using python daily for 3 years at this point.

Introduction to Jupyter Notebooks

Before clicking on that binder link and starting a python session read this short introduction about jupyter notebooks, how to work with them and how to execute code inside of them.

Interactive Tutorial Notebooks

Available on binder:

Binder link to repo.

Non-interactive static notebooks

An offline, non-interactive version of the notebooks can be found via this menu:

Quick Reference

The python quick reference or cheat sheet was taken from https://github.com/justmarkham/python-reference and adjusted for start science here. The remainder of this page will contain this quick reference. Each topic has a direct link to the examples in binder and an offline version of the examples.

Imports

Import modules with or without alias

import math
import numpy as np
print(math.pi)
print(np.pi)

Multiple module imports and multiple imports from a module

import os, sys
from numpy.random import random, randint
print(os.getcwd())
print(sys.version)
print(random())
print(randint(100))

Import all objects from a module (this is generally discouraged, as it makes your namespace messy)

from math import *
print(pi)
print(__doc__) # example of messy namespace

Check the attributes and methods of any object.

import math
from numpy import ndarray
print(dir(math))
print(dir(ndarray))

Data Types

To quickly start into an interactive environment with infos about data types:

Binder link to qr_01_data_types.ipynb Link to offline 01_data_types.ipynb

Determine the type of an object:

import math
print(type(math))    # module
from numpy import ndarray
print(type(ndarray)) # classes will print `type`
from numpy.random import random
print(type(random))  # builtin_function_or_method
print(type(1))       # int (integer)
print(type(1.1))     # float (floating point number)
print(type('hello')) # str (string)
print(type(True))    # bool (boolean, True/False values)
print(type(None))    # NoneType (NoneType is a singleton)

Check the type of an object:

isinstance(2, int) # True
isinstance(2.2, (float, int)) # True

Converting between types:

float(2) # 2.0
int(2.9) # 2
int(3.9) # 3
str(2.2) # '2.2'

Boolean values of built-in types:

bool(0)    # False
bool(None) # False
bool('')   # False
bool([])   # False (empty list)
bool({})   # False (empty dict)

True for variables with values and non-empty containers:

bool(2)     # True
bool([2])   # True
bool('two') # True

Math

Arithmetic operations

10 + 4        # addition
10 - 4        # subtraction
10 * 4        # multiplication
10 / 4        # division (python 2.7 has a quirk here and wont return the expected result)
10 ** 4       # exponent
10 // 4       # floor division
10 % 4        # modulo
divmod(10, 4) # returns tuple of floor division and modulo

Comparsions and boolean operations

Assigning objects to variables:

a = 5

Arithmetic comparisons:

x > 3  # True
x >= 3 # True
x != 3 # True
x == 3 # False

Difference between == and is:

a = []
b = []
a == b # True
a is b # False
a is a # True

Boolean operations and combinations:

5 > 3 and 6 > 3 # True
5 > 3 or 5 < 3  # True
not False       # True
False or not False and True # ?

Control Flow

If/elif/else:

if x > 0:
    print('positive')
elif x == 0:
    print('zero')
else:
    print('negative')

For loops (can contain else to do something after the loop is complete):

for i in range(5):
    print(i ** 2)
else:
    print("for loop closed")

Continue and break:

for i in range(10):
    if i == 0:
        # don't print 0
        continue
    if i == 5:
        # break loop after 5
        break
    print(i)

While loops continue running until either break occurs or the statement becomes false.

max_square = 50
i = 0
while i ** 2 <= max_square:
    print(i ** 2)
    i += 1
    if i > 100:
        break # security break

Lists

Properties of lists: ordered, iterable, mutable

Empty lists (bool([]) = False):

empty_list = []
empty_list = list()

Create a list:

simpsons = ['homer', 'marge', 'bart']

Examine a list:

Access an element in the list with indexing ([index]) and multiple elements with slicing ([start:stop:step]). Get length with the built-in len() function.

print(simpsons[0])
print(len(simpsons))

Modify lists (does not return lists, but rather changes the list in-place).

simpsons.append('lisa')                # add a single element
simpsons.extend(['itchy', 'scratchy']) # add all elements from another list
simpsons.insert(0, 'maggie')           # add element at index
simpsons.remove('bart')                # remove this specific element
simpsons.pop(0)                        # remove element at index and return that element
del simpsons[0]                        # remove that element and not return it
simpsons[0] = 'krusty'                 # overwrite indexed element

Addition of lists returns new lists:

neighbors = simpsons + ['ned', 'rod', 'todd']

Count element in list.

simpsons.count('lisa')      # counts the number of instances
simpsons.index('itchy')     # returns index of first instance

List slicing:

weekdays = ['mon', 'tues', 'wed', 'thurs', 'fri']
weekdays[0]         # element 0
weekdays[0:3]       # elements 0, 1, 2
weekdays[:3]        # elements 0, 1, 2
weekdays[3:]        # elements 3, 4
weekdays[-1]        # last element (element 4)
weekdays[::2]       # every 2nd element (0, 2, 4)
weekdays[::-1]      # backwards (4, 3, 2, 1, 0)

Alternative to [::-1]: list(reversed(…))

list(reversed(weekdays))

Sorting lists:

Sorts a list (alphanumerical) in-place. This overwrites the original list and changes the order of the elements:

simpsons.sort()
simpsons.sort(reverse=True)     # sort in reverse
simpsons.sort(key=len)          # sort by a key

To get a new sorted list, without changing the original:

sorted(simpsons)
sorted(simpsons, reverse=True)
sorted(simpsons, key=len)

Insert into already sorted list:

num = [10, 20, 40, 50]
from bisect import insort
insort(num, 30)

Creating a variable from a new list does not copy it. Instead you are creating a reference to the same list:

same_num = num
print(same_num)
same_mun[0] = 0
print(num)
print(same_num)

If you want to keep num and copy it to same_num, here are three ways:

same_num = num[:]
same_num = list(num)
import copy
same_num = copy.deepcopy(num)

Copying elements more thorough:

import copy

a = [1, 2, 3]
b = [4, 5, 6]
c = [a, b]

Normal assignment:

d = c

print(c == d)          # True - equality == is not identity is
print(c is d)          # True - d is the same object as c
print(c[0] is d[0])    # True - d[0] is the same object as c[0]

Shallow (normal) copy constructs a new compound object but keeps the underlying references to already existing objects:

d = copy.copy(c)

print(c == d)          # True - equality == is not identity is
print(c is d)          # False - d is now a new object
print(c[0] is d[0])    # True - d[0] is the same object as c[0]

Deepcopy constructs a new compound object and then, recursively, inserts copies into the new object found in the original object:

d = copy.deepcopy(c)

print(c == d)          # True - equality == is not identity is
print(c is d)          # False - d is now a new object
print(c[0] is d[0])    # False - d[0] is now a new object

Tuples

Properties of lists: ordered, iterable, immutable

The main difference between tuples and lists is, that tuples are immutable. You can not change an element inside a tuple with assignment (1, 2, 3)[2] = 4.

Empty tuples:

a = tuple()
b = ()
print(bool(a), bool(b))

Tuples with values:

digits = (0, 1, 'two')          # create a tuple directly
digits = tuple([0, 1, 'two'])   # create a tuple from a list
zero = (0,)                     # trailing comma is required to indicate it's a tuple

Getting values from a tuple:

digits[2]           # returns 'two'
len(digits)         # returns 3
digits.count(0)     # counts the number of instances of that value (1)
digits.index(1)     # returns the index of the first instance of that value (1)

Assigning values fails:

digits[2] = 2       # throws an error

You need to compose new tuples to make that work:

new_digits = (*digits[:1], 2) # more on the asterisk in Packing and Unpacking

Or concatenate tuples:

digits = digits + (3, 4)

Multiplication (also works with lists):

(3, 4) * 2          # returns (3, 4, 3, 4)

Sort a list of tuples:

tens = [(20, 60), (10, 40), (20, 30)]
sorted(tens)        # sorts by first element in tuple, then second element
                    #   returns [(10, 40), (20, 30), (20, 60)]

Unpack tuples:

bart = ('male', 10, 'simpson')  # create a tuple
(sex, age, surname) = bart      # assign three values at once
print(sex)
print(age)
print(surname)

Strings

Properties of lists: iterable, immutable

Creating strings:

Unpack tuples:

s = str(42)
s = str(1.2)
s = str(True)
s = 'Hello World!'

String slicing is like list slicing:

s[:6] # returns 'Hello '
s[7:] # returns 'orld!'
s[-1] # returns '!'

String methods. A string is immutable. It can not be changed in place (just like sets):

s.lower()           # returns 'hello world!'
s.upper()           # returns 'HELLO WORLD!'
s.startswith('H')   # returns True
s.endswith('orld!') # returns True
s.isdigit()         # returns False (returns True if every character in the string is a digit)
str.isdigit('1.23') # Is also False, because '.' is not a digit, although that string could be a float.
s.find('World')     # returns index of first occurrence (6), but doesn't support regex
s.find('Planet')    # returns -1 since not found
s.replace('Hello', 'Goodbye') # returns a string where all instances of 'Hello' are replaced with 'Goodbye'

A string can be split into lists. Default delimiter is space (’ ‘). But different delimiters can be passed. This functionality can come in useful when parsing text files:

s = 'I like you'
s.split(' ')        # returns ['I', 'like', 'you']
s.split()           # equivalent (since space is the default delimiter)
s2 = 'a, an, the'
s2.split(',')       # returns ['a', ' an', ' the']

A list of strings can be joined into a single string:

stooges = ['larry', 'curly', 'moe']
' and '.join(stooges)   # returns 'larry and curly and moe'

Arithmetic operations on strings. Some arithmetic operations work on strings. Addition, for example, concatenates strings:

s3 = 'The meaning of life is'
s4 = '42'
s3 + ' ' + s4       # returns 'The meaning of life is 42'
s5 = 'kartoffe' + 'lauf' * 2
print(s5)

Removing whitespaces and unwanted leading/trailing characters:

s6 = '   spam    '
s6.strip()         # returns 'spam'
s7 = '$ ls && pwd'
s7.lstrip('$ ')    # returns 'ls && pwd'

String substitutions:

'raining %s and %s' % ('cats', 'dogs')                       # old way
'raining {} and {}'.format('cats', 'dogs')                   # newer way
'raining {arg1} and {arg2}'.format(arg1='cats', arg2='dogs') # named arguments
arg1 = 'cats'
arg2 = 'dogs'
f'raining {arg1} and {arg2}'                                 # newest way

String formatting:

# whitespace alignment
animals = ['cat', 'dog', 'horse', 'crocodile', 'pidgeon', 'tortoise']
for a in animals:
    print(f'The {a:<10} is an animal')

numbers = [1, 200, -3, -5.234, 3.14159, -3.14159]
for n in numbers:
    print(f'{n:-.2f}')
for n in numbers:
    print(f'{n:5.3f}')
for n in numbers:
    print(f'{n:07.2f')

Normal strings vs raw strings:

print('first line\nsecond line')    # normal strings allow for escaped characters
print(r'first line\nfirst line')    # raw strings treat backslashes as literal characters

Dictionaries

Properties of unordered, iterable, mutable

made of key-value pairs

keys can be str, numbers, tuples

values can be anything

Create empty dictionaries:

empty_dict = {}
empty_dict = dict()
bool(empty_dict) # is False

create a dictionary (two ways):

family = {'dad':'homer', 'mom':'marge', 'size':6}
family = dict(dad='homer', mom='marge', size=6)

Convert a list of tuples into a dict:

list_of_tuples = [('dad', 'homer'), ('mom', 'marge'), ('size', 6)]
family = dict(list_of_tuples)

Convert two lists into a dict:

keys = ['dad', 'mom', 'size']
values = ['homer', 'marge', 6]

family = dict(zip(keys, values))

Examing a dict:

family['dad']       # returns 'homer'
len(family)         # returns 3
'mom' in family     # returns True
'marge' in family   # returns False (only checks keys)

Accessing keys, value and items:

list(family.keys())       # keys: ['dad', 'mom', 'size']
list(family.values())     # values: ['homer', 'marge', 6]
list(family.items())      # key-value pairs: [('dad', 'homer'), ('mom', 'marge'), ('size', 6)]

Why the extra list()?

Calling family.keys() and the other dict methods doesn’t return a list. It returns what is called a view object.

Dictionaries are mutable, so they can be changed in-place.

family['cat'] = 'snowball'              # add a new entry
family['cat'] = 'snowball ii'           # edit an existing entry
del family['cat']                       # delete an entry
family['kids'] = ['bart', 'lisa']       # dictionary value can be a list
family.pop('dad')                       # remove an entry and return the value ('homer')
family.update({'baby':'maggie', 'grandpa':'abe'})   # add multiple entries

Accessing values via indexing and using the get() method:

family['mom']                       # returns 'marge'
family.get('mom')                   # equivalent
family['grandma']                   # throws an error since the key does not exist
family.get('grandma')               # returns None instead
family.get('grandma', 'not found')  # returns 'not found' (the default)

Accessing lists in dictionaries:

family['kids'][0]                   # returns 'bart'
family['kids'].remove('lisa')       # removes 'lisa'

String substitution usind dicts:

'youngest child is %(baby)s' % family   # returns 'youngest child is maggie'

Sets

properties: unordered, iterable, mutable, can contain multiple data types

Sets are like dictionaries with just keys. They also use the curly braces in their definition. That’s why you can’t create an empty set with {}, which would return an emtpy dict.

empty_set = set()
bool(empty_set)   # is False

Sets with values in them:

languages = {'python', 'r', 'java'}         # create a set directly
snakes = set(['cobra', 'viper', 'python'])  # create a set from a list

Examine a set:

len(languages)              # returns 3
'python' in languages       # returns True

Set oprations:

languages & snakes          # returns intersection: {'python'}
languages | snakes          # returns union: {'cobra', 'r', 'java', 'viper', 'python'}
languages - snakes          # returns set difference: {'r', 'java'}
snakes - languages          # returns set difference: {'cobra', 'viper'}

Modify a set. Sets are mutable and thus can be altered in-place:

languages.add('sql')        # add a new element
languages.add('r')          # try to add an existing element (ignored, no error)
languages.remove('java')    # remove an element
languages.remove('c')       # try to remove a non-existing element (throws an error)
languages.discard('c')      # remove an element if present, but ignored otherwise
languages.pop()             # remove and return an arbitrary element
languages.clear()           # remove all elements
languages.update(['go', 'spark'])  # add multiple elements (can also pass a set)

Get a sorted list of unique elements from a set:

sorted(set([9, 0, 2, 1, 0]))    # returns [0, 1, 2, 9]

Beware: Sometimes the list(set(list(...))) construction is used to remove duplcate entries from lists. However, because sets are inherently unordered the resulting list might have the wrong order of elements.

Defining functions

Define a function with def:

def print_text():
    print('this is text')

Calling the function. Calls are always executed, when an object is followed by parentheses. That’s why a = 3; a() will raise the Error: TypeError: int is not callable. But a function can be called with:

print_text()

Create a function with one positional argument:

def print_this(x):
    print(x)

Calling this function. If a function has a return statement, it can return objects. However, the print_this() function has no return statement and, thus, returns the default None.

print_this(3)       # prints 3
n = print_this(4)   # prints 4 and assigns None to n
print(n)            # prints None

Return statements are written like this:

def square_this(x):
    return x**2

Adding documentation to your function. Some people might want to use your function but don’t want to read the whole code of the function. You can help them with a so-called docstring (three consecutive doulbe quotes or single quotes """ or ''') and give a summray of a functions code. That way it’s easier to reuse the function.

def square_this(x):
    """Returns the square of the provided argument."""
    return x**2

Call the function or assign the return value to a new variable:

square_this(3)          # prints 9
var = square_this(4)    # assigns 16 to var, but does not print 16

Positional arguments and Keyword arguments. Keyword arguments have default values and are defined like so:

def calc(a, b, op='add'):
    if op == 'add':
        return a + b
    elif op == 'sub':
        return a - b
    else:
        print('valid operations are add and sub')

Calling this function:

calc(10, 4, op='add')   # returns 14
calc(10, 4, 'add')      # also returns 14: unnamed arguments are inferred by position
calc(10, 4)             # also returns 14: default for 'op' is 'add'
calc(10, 4, 'sub')      # returns 6
calc(10, 4, 'div')      # prints 'valid operations are add and sub'

Placeholder functions. If you know you want to implement some functions, you can use pass as a placeholder (a function with an empty body is not allowed):

def stub():
    pass

Multiple vlues are returned as tuples:

def min_max(nums):
   return min(nums), max(nums)

Call this function:

nums = [1, 2, 3]
min_max_num = min_max(nums)      # min_max_num = (1, 3)
print(type(min_max_num))         # tuple
min_num, max_num = min_max(nums) # direct unpacking

Packing and Unpacking

Use * to unpack lists and other iterables (like tuples) into multiple arguments:

def addition_of_four(a, b, c, d):
    return a + b + c + d

mylist = [1, 2, 3, 4]

added_numbers = addition_of_four(*mylist) # works
added_numbers = addition_of_four(mylist)  # won't work.

Instead of fixing the numbers of arguments to a function, you can let the function take an arbitrary amount of arguments (this is called packing). The *args is a convention. You don’t need to call it *args, you can also call it *numbers.

def addition(*args):
    return sum(args)

addition(1, 5, 10, 20)
additon(1, -1)

That’s it for unnamed (positional) arguments. But what about named arguments? For that we have **:

def subtraction(minuend=0, subtrahend=0):
    return minuend - subtrahend

mydict = {'minuend': 10, 'subtrahend': 5}

subrtaction(**mydict)

Of course, when mydict has more or different keys, the function ceases to work:

mydict.update({'another_key': 'Hello World!'})
subtraction(**mydict)

That’s why we can also pack an arbitrary number of keyword arguments. The convention here is to use **kwargs:

def subtraction(**kwargs):
    return kwargs['minuend'] - kwargs['subtrahend']
subtraction(**mydict)

Lambda functions

Used to temporarily define small throwaway-functions.

def squared(x):
    return x**2

squared = lambda x: x**2

Instead of this:

simpsons = ['homer', 'marge', 'bart']
def last_letter(word):
    return word[-1]
simpsons_sorted = sorted(simpsons, key=last_letter)

You can use lambda to abbreviate this to:

simpsons = ['homer', 'marge', 'bart']
simpsons_sorted = sorted(simpsons, key=lambda word: word[-1])

Comprehensions

Comprehensions, like the lambda functions, are ways to make code cleaner and oftentimes easier to read. Consider this example, where a list of cubes is created:

nums = [1, 2, 3, 4, 5]
cubes = []
for num in nums:
    cubes.append(num**3)

Instead, you can also do:

cubes = [num**3 for num in nums]

Comprehensions can contain if-else. Instead of:

cubes_of_even = []
for num in nums:
    if num % 2 == 0:
        cubes_of_even.append(num**3)

You could also do:

cubes_of_even = [num**3 for num in nums if num % 2 == 0]

You can do similar things with dictionaries:

fruits = ['apple', 'banana', 'cherry']
fruit_lengths = {fruit:len(fruit) for fruit in fruits}              # {'apple': 5, 'banana': 6, 'cherry': 6}
fruit_indices = {fruit:index for index, fruit in enumerate(fruits)} # {'apple': 0, 'banana': 1, 'cherry': 2}

Map and Filter

map applies a function to every element in an sequence (list, tuple, set), whereas filter uses a function that returns True or False to remove the False elements from a sequence:

simpsons = ['homer', 'marge', 'bart']
list(map(len, simpsons))                      # returns [5, 5, 4]
list(map(lambda word: word[-1], simpsons))    # returns ['r', 'e', 't']

# equivalent list comprehensions
[len(word) for word in simpsons]
[word[-1] for word in simpsons]

Again, the list() is because map returns a iterator which can not directly be printed. (Why does python do this? It postpones the actual execution of the mapping function until it is needed. This saves computation time if you don’t need the map applied to the whole list but, for example, use it in a for loop, that stops early). Finally, here’s filter:

nums = range(5)
filter(lambda x: x % 2 == 0, nums)      # returns [0, 2, 4]

# equivalent list comprehension
[num for num in nums if num % 2 == 0]

Static quick reference notebooks