0% found this document useful (0 votes)
44 views

55-102 Numpy

This chapter provides an overview of NumPy, a core package for numerical computing in Python. NumPy introduces NumPy arrays as a high-level numerical data type that allows for efficient storage and manipulation of multi-dimensional arrays. NumPy arrays can be created manually or using functions like arange(), linspace(), ones(), zeros(), and random numbers. NumPy arrays have attributes like shape and dtype that provide information about the array dimensions and data type. NumPy also provides functions for basic operations on arrays like indexing, slicing, copying, and fancy indexing. Matplotlib can be used to visualize 1D and 2D NumPy arrays.

Uploaded by

Rap Imagin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

55-102 Numpy

This chapter provides an overview of NumPy, a core package for numerical computing in Python. NumPy introduces NumPy arrays as a high-level numerical data type that allows for efficient storage and manipulation of multi-dimensional arrays. NumPy arrays can be created manually or using functions like arange(), linspace(), ones(), zeros(), and random numbers. NumPy arrays have attributes like shape and dtype that provide information about the array dimensions and data type. NumPy also provides functions for basic operations on arrays like indexing, slicing, copying, and fancy indexing. Matplotlib can be used to visualize 1D and 2D NumPy arrays.

Uploaded by

Rap Imagin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

CHAPTER 4

NumPy: creating and manipulating


numerical data

Authors: Emmanuelle Gouillart, Didrik Pinte, Gaël Varoquaux, and Pauli Virtanen
This chapter gives an overview of NumPy, the core tool for performant numerical computing with Python.

4.1 The NumPy array object

Section contents

• What are NumPy and NumPy arrays?


• Creating arrays
• Basic data types
• Basic visualization
• Indexing and slicing
• Copies and views
• Fancy indexing

4.1.1 What are NumPy and NumPy arrays?


NumPy arrays
Python objects

50
Scipy lecture notes, Edition 2020.2

• high-level number objects: integers, floating point


• containers: lists (costless insertion and append), dictionaries (fast lookup)
NumPy provides
• extension package to Python for multi-dimensional arrays
• closer to hardware (efficiency)
• designed for scientific computation (convenience)
• Also known as array oriented computing

>>> import numpy as np


>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])

Tip: For example, An array containing:


• values of an experiment/simulation at discrete time steps
• signal recorded by a measurement device, e.g. sound wave
• pixels of an image, grey-level or colour
• 3-D data measured at different X-Y-Z positions, e.g. MRI scan
• ...

Why it is useful: Memory-efficient container that provides fast numerical operations.


In [1]: L = range(1000)

In [2]: %timeit [i**2 for i in L]


1000 loops, best of 3: 403 us per loop

In [3]: a = np.arange(1000)

In [4]: %timeit a**2


100000 loops, best of 3: 12.7 us per loop

NumPy Reference documentation


• On the web: https://numpy.org/doc/
• Interactive help:
In [5]: np.array?
String Form:<built-in function array>
Docstring:
array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0, ...

• Looking for something:


>>> np.lookfor('create array')
Search results for 'create array'
---------------------------------
numpy.array
(continues on next page)

4.1. The NumPy array object 51


Scipy lecture notes, Edition 2020.2

(continued from previous page)


Create an array.
numpy.memmap
Create a memory-map to an array stored in a *binary* file on disk.

In [6]: np.con*?
np.concatenate
np.conj
np.conjugate
np.convolve

Import conventions
The recommended convention to import numpy is:

>>> import numpy as np

4.1.2 Creating arrays


Manual construction of arrays
• 1-D:

>>> a = np.array([0, 1, 2, 3])


>>> a
array([0, 1, 2, 3])
>>> a.ndim
1
>>> a.shape
(4,)
>>> len(a)
4

• 2-D, 3-D, . . . :

>>> b = np.array([[0, 1, 2], [3, 4, 5]]) # 2 x 3 array


>>> b
array([[0, 1, 2],
[3, 4, 5]])
>>> b.ndim
2
>>> b.shape
(2, 3)
>>> len(b) # returns the size of the first dimension
2

>>> c = np.array([[[1], [2]], [[3], [4]]])


>>> c
array([[[1],
[2]],

[[3],
[4]]])
>>> c.shape
(2, 2, 1)

Exercise: Simple arrays

4.1. The NumPy array object 52


Scipy lecture notes, Edition 2020.2

• Create a simple two dimensional array. First, redo the examples from above. And then create
your own: how about odd numbers counting backwards on the first row, and even numbers on
the second?
• Use the functions len(), numpy.shape() on these arrays. How do they relate to each other?
And to the ndim attribute of the arrays?

Functions for creating arrays

Tip: In practice, we rarely enter items one by one. . .

• Evenly spaced:

>>> a = np.arange(10) # 0 .. n-1 (!)


>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.arange(1, 9, 2) # start, end (exclusive), step
>>> b
array([1, 3, 5, 7])

• or by number of points:

>>> c = np.linspace(0, 1, 6) # start, end, num-points


>>> c
array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
>>> d = np.linspace(0, 1, 5, endpoint=False)
>>> d
array([0. , 0.2, 0.4, 0.6, 0.8])

• Common arrays:

>>> a = np.ones((3, 3)) # reminder: (3, 3) is a tuple


>>> a
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
>>> b = np.zeros((2, 2))
>>> b
array([[0., 0.],
[0., 0.]])
>>> c = np.eye(3)
>>> c
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
>>> d = np.diag(np.array([1, 2, 3, 4]))
>>> d
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])

• np.random: random numbers (Mersenne Twister PRNG):

>>> a = np.random.rand(4) # uniform in [0, 1]


>>> a
array([ 0.95799151, 0.14222247, 0.08777354, 0.51887998])

>>> b = np.random.randn(4) # Gaussian


(continues on next page)

4.1. The NumPy array object 53


Scipy lecture notes, Edition 2020.2

(continued from previous page)


>>> b
array([ 0.37544699, -0.11425369, -0.47616538, 1.79664113])

>>> np.random.seed(1234) # Setting the random seed

Exercise: Creating arrays using functions

• Experiment with arange, linspace, ones, zeros, eye and diag.


• Create different kinds of arrays with random numbers.
• Try setting the seed before creating an array with random values.
• Look at the function np.empty. What does it do? When might this be useful?

4.1.3 Basic data types


You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. 2.
vs 2). This is due to a difference in the data-type used:
>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')

>>> b = np.array([1., 2., 3.])


>>> b.dtype
dtype('float64')

Tip: Different data-types allow us to store data more compactly in memory, but most of the time we
simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the
data-type from the input.

You can explicitly specify which data-type you want:


>>> c = np.array([1, 2, 3], dtype=float)
>>> c.dtype
dtype('float64')

The default data type is floating point:


>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')

There are also other types:


Complex
>>> d = np.array([1+2j, 3+4j, 5+6*1j])
>>> d.dtype
dtype('complex128')

Bool
>>> e = np.array([True, False, False, True])
>>> e.dtype
dtype('bool')

4.1. The NumPy array object 54


Scipy lecture notes, Edition 2020.2

Strings

>>> f = np.array(['Bonjour', 'Hello', 'Hallo'])


>>> f.dtype # <--- strings containing max. 7 letters
dtype('S7')

Much more
• int32
• int64
• uint32
• uint64

4.1.4 Basic visualization


Now that we have our first data arrays, we are going to visualize them.
Start by launching IPython:

$ ipython # or ipython3 depending on your install

Or the notebook:

$ jupyter notebook

Once IPython has started, enable interactive plots:

>>> %matplotlib

Or, from the notebook, enable plots in the notebook:

>>> %matplotlib inline

The inline is important for the notebook, so that plots are displayed in the notebook and not in a new
window.
Matplotlib is a 2D plotting package. We can import its functions as below:

>>> import matplotlib.pyplot as plt # the tidy way

And then use (note that you have to use show explicitly if you have not enabled interactive plots with
%matplotlib):

>>> plt.plot(x, y) # line plot


>>> plt.show() # <-- shows the plot (not needed with interactive plots)

Or, if you have enabled interactive plots with %matplotlib:

>>> plt.plot(x, y) # line plot

• 1D plotting:

>>> x = np.linspace(0, 3, 20)


>>> y = np.linspace(0, 9, 20)
>>> plt.plot(x, y) # line plot
[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, y, 'o') # dot plot
[<matplotlib.lines.Line2D object at ...>]

4.1. The NumPy array object 55


Scipy lecture notes, Edition 2020.2

• 2D arrays (such as images):

>>> image = np.random.rand(30, 30)


>>> plt.imshow(image, cmap=plt.cm.hot)
<matplotlib.image.AxesImage object at ...>
>>> plt.colorbar()
<matplotlib.colorbar.Colorbar object at ...>

See also:
More in the: matplotlib chapter

Exercise: Simple visualizations

• Plot some simple arrays: a cosine as a function of time and a 2D matrix.


• Try using the gray colormap on the 2D matrix.

4.1.5 Indexing and slicing


The items of an array can be accessed and assigned to the same way as other Python sequences (e.g.
lists):

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[-1]
(0, 2, 9)

4.1. The NumPy array object 56


Scipy lecture notes, Edition 2020.2

Warning: Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran
or Matlab, indices begin at 1.

The usual python idiom for reversing a sequence is supported:

>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

For multidimensional arrays, indices are tuples of integers:

>>> a = np.diag(np.arange(3))
>>> a
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 2]])
>>> a[1, 1]
1
>>> a[2, 1] = 10 # third line, second column
>>> a
array([[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 10, 2]])
>>> a[1]
array([0, 1, 0])

Note:
• In 2D, the first dimension corresponds to rows, the second to columns.
• for multidimensional a, a[0] is interpreted by taking all elements in the unspecified dimensions.

Slicing: Arrays, like other Python sequences can also be sliced:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[2:9:3] # [start:end:step]
array([2, 5, 8])

Note that the last index is not included! :

>>> a[:4]
array([0, 1, 2, 3])

All three slice components are not required: by default, start is 0, end is the last and step is 1:

>>> a[1:3]
array([1, 2])
>>> a[::2]
array([0, 2, 4, 6, 8])
>>> a[3:]
array([3, 4, 5, 6, 7, 8, 9])

A small illustrated summary of NumPy indexing and slicing. . .

4.1. The NumPy array object 57


Scipy lecture notes, Edition 2020.2

>>> a[0, 3:5]


array([3, 4]) 0 1 2 3 4 5
>>> a[4:, 4:]
10 11 12 13 14 15
array([[44, 55],
[54, 55]]) 20 21 22 23 24 25
>>> a[:, 2]
30 31 32 33 34 35
a([2, 12, 22, 32, 42, 52])

>>> a[2::2, ::2] 40 41 42 43 44 45


array([[20, 22, 24],
50 51 52 53 54 55
[40, 42, 44]])

You can also combine assignment and slicing:

>>> a = np.arange(10)
>>> a[5:] = 10
>>> a
array([ 0, 1, 2, 3, 4, 10, 10, 10, 10, 10])
>>> b = np.arange(5)
>>> a[5:] = b[::-1]
>>> a
array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

Exercise: Indexing and slicing

• Try the different flavours of slicing, using start, end and step: starting from a linspace, try to
obtain odd numbers counting backwards, and even numbers counting forwards.
• Reproduce the slices in the diagram above. You may use the following expression to create the
array:
>>> np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])

Exercise: Array creation

Create the following arrays (with correct data types):


[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 2],
[1, 6, 1, 1]]

[[0., 0., 0., 0., 0.],


[2., 0., 0., 0., 0.],
[0., 3., 0., 0., 0.],
[0., 0., 4., 0., 0.],
[0., 0., 0., 5., 0.],
[0., 0., 0., 0., 6.]]

4.1. The NumPy array object 58


Scipy lecture notes, Edition 2020.2

Par on course: 3 statements for each


Hint: Individual array elements can be accessed similarly to a list, e.g. a[1] or a[1, 2].
Hint: Examine the docstring for diag.

Exercise: Tiling for array creation

Skim through the documentation for np.tile, and use this function to construct the array:
[[4, 3, 4, 3, 4, 3],
[2, 1, 2, 1, 2, 1],
[4, 3, 4, 3, 4, 3],
[2, 1, 2, 1, 2, 1]]

4.1.6 Copies and views


A slicing operation creates a view on the original array, which is just a way of accessing array data.
Thus the original array is not copied in memory. You can use np.may_share_memory() to check if two
arrays share the same memory block. Note however, that this uses heuristics and may give you false
positives.
When modifying the view, the original array is modified as well:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a[::2]
>>> b
array([0, 2, 4, 6, 8])
>>> np.may_share_memory(a, b)
True
>>> b[0] = 12
>>> b
array([12, 2, 4, 6, 8])
>>> a # (!)
array([12, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> a = np.arange(10)
>>> c = a[::2].copy() # force a copy
>>> c[0] = 12
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> np.may_share_memory(a, c)
False

This behavior can be surprising at first sight. . . but it allows to save both memory and time.

Worked example: Prime number sieve

4.1. The NumPy array object 59


Scipy lecture notes, Edition 2020.2

Compute prime numbers in 0–99, with a sieve


• Construct a shape (100,) boolean array is_prime, filled with True in the beginning:
>>> is_prime = np.ones((100,), dtype=bool)

• Cross out 0 and 1 which are not primes:


>>> is_prime[:2] = 0

• For each integer j starting from 2, cross out its higher multiples:
>>> N_max = int(np.sqrt(len(is_prime) - 1))
>>> for j in range(2, N_max + 1):
... is_prime[2*j::j] = False

• Skim through help(np.nonzero), and print the prime numbers


• Follow-up:
– Move the above code into a script file named prime_sieve.py
– Run it to check it works
– Use the optimization suggested in the sieve of Eratosthenes:
1. Skip j which are already known to not be primes
2. The first number to cross out is 𝑗 2

4.1.7 Fancy indexing

Tip: NumPy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This
method is called fancy indexing. It creates copies not views.

Using boolean masks

>>> np.random.seed(3)
>>> a = np.random.randint(0, 21, 15)
>>> a
array([10, 3, 8, 0, 19, 10, 11, 9, 10, 6, 0, 20, 12, 7, 14])
>>> (a % 3 == 0)
array([False, True, False, True, False, False, False, True, False,
(continues on next page)

4.1. The NumPy array object 60


Scipy lecture notes, Edition 2020.2

(continued from previous page)


True, True, False, True, False, False])
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask] # or, a[a%3==0]
>>> extract_from_a # extract a sub-array with the mask
array([ 3, 0, 9, 6, 0, 12])

Indexing with a mask can be very useful to assign a new value to a sub-array:

>>> a[a % 3 == 0] = -1
>>> a
array([10, -1, 8, -1, 19, 10, 11, -1, 10, -1, -1, 20, -1, 7, 14])

Indexing with an array of integers

>>> a = np.arange(0, 100, 10)


>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Indexing can be done with an array of integers, where the same index is repeated several time:

>>> a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list


array([20, 30, 20, 40, 20])

New values can be assigned with this kind of indexing:

>>> a[[9, 7]] = -100


>>> a
array([ 0, 10, 20, 30, 40, 50, 60, -100, 80, -100])

Tip: When a new array is created by indexing with an array of integers, the new array has the same
shape as the array of integers:

>>> a = np.arange(10)
>>> idx = np.array([[3, 4], [9, 7]])
>>> idx.shape
(2, 2)
>>> a[idx]
array([[3, 4],
[9, 7]])

The image below illustrates various fancy indexing applications

>>> a[(0,1,2,3,4), (1,2,3,4,5)]


0 1 2 3 4 5
array([1, 12, 23, 34, 45])

>>> a[3:, [0,2,5]] 10 11 12 13 14 15


array([[30, 32, 35],
20 21 22 23 24 25
[40, 42, 45],
[50, 52, 55]])
30 31 32 33 34 35
>>> mask = np.array([1,0,1,0,0,1], dtype=bool)
>>> a[mask, 2] 40 41 42 43 44 45
array([2, 22, 52])
50 51 52 53 54 55

4.1. The NumPy array object 61


Scipy lecture notes, Edition 2020.2

Exercise: Fancy indexing

• Again, reproduce the fancy indexing shown in the diagram above.


• Use fancy indexing on the left and array creation on the right to assign values into an array, for
instance by setting parts of the array in the diagram above to zero.

4.2 Numerical operations on arrays

Section contents

• Elementwise operations
• Basic reductions
• Broadcasting
• Array shape manipulation
• Sorting data
• Summary

4.2.1 Elementwise operations


Basic operations
With scalars:

>>> a = np.array([1, 2, 3, 4])


>>> a + 1
array([2, 3, 4, 5])
>>> 2**a
array([ 2, 4, 8, 16])

All arithmetic operates elementwise:

>>> b = np.ones(4) + 1
>>> a - b
array([-1., 0., 1., 2.])
>>> a * b
array([2., 4., 6., 8.])

>>> j = np.arange(5)
>>> 2**(j + 1) - j
array([ 2, 3, 6, 13, 28])

These operations are of course much faster than if you did them in pure python:

>>> a = np.arange(10000)
>>> %timeit a + 1
10000 loops, best of 3: 24.3 us per loop
>>> l = range(10000)
>>> %timeit [i+1 for i in l]
1000 loops, best of 3: 861 us per loop

Warning: Array multiplication is not matrix multiplication:

4.2. Numerical operations on arrays 62


Scipy lecture notes, Edition 2020.2

>>> c = np.ones((3, 3))


>>> c * c # NOT matrix multiplication!
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])

Note: Matrix multiplication:


>>> c.dot(c)
array([[3., 3., 3.],
[3., 3., 3.],
[3., 3., 3.]])

Exercise: Elementwise operations

• Try simple arithmetic elementwise operations: add even elements with odd elements
• Time them against their pure python counterparts using %timeit.
• Generate:
– [2**0, 2**1, 2**2, 2**3, 2**4]
– a_j = 2^(3*j) - j

Other operations
Comparisons:
>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> a == b
array([False, True, False, True])
>>> a > b
array([False, False, True, False])

Tip: Array-wise comparisons:


>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> c = np.array([1, 2, 3, 4])
>>> np.array_equal(a, b)
False
>>> np.array_equal(a, c)
True

Logical operations:
>>> a = np.array([1, 1, 0, 0], dtype=bool)
>>> b = np.array([1, 0, 1, 0], dtype=bool)
>>> np.logical_or(a, b)
array([ True, True, True, False])
>>> np.logical_and(a, b)
array([ True, False, False, False])

Transcendental functions:

4.2. Numerical operations on arrays 63


Scipy lecture notes, Edition 2020.2

>>> a = np.arange(5)
>>> np.sin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
>>> np.log(a)
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436])
>>> np.exp(a)
array([ 1. , 2.71828183, 7.3890561 , 20.08553692, 54.59815003])

Shape mismatches

>>> a = np.arange(4)
>>> a + np.array([1, 2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4) (2)

Broadcasting? We’ll return to that later.


Transposition:

>>> a = np.triu(np.ones((3, 3)), 1) # see help(np.triu)


>>> a
array([[0., 1., 1.],
[0., 0., 1.],
[0., 0., 0.]])
>>> a.T
array([[0., 0., 0.],
[1., 0., 0.],
[1., 1., 0.]])

Note: The transposition is a view


The transpose returns a view of the original array:

>>> a = np.arange(9).reshape(3, 3)
>>> a.T[0, 2] = 999
>>> a.T
array([[ 0, 3, 999],
[ 1, 4, 7],
[ 2, 5, 8]])
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[999, 7, 8]])

Note: Linear algebra


The sub-module numpy.linalg implements basic linear algebra, such as solving linear systems, singular
value decomposition, etc. However, it is not guaranteed to be compiled using efficient routines, and thus
we recommend the use of scipy.linalg, as detailed in section Linear algebra operations: scipy.linalg

Exercise other operations

• Look at the help for np.allclose. When might this be useful?


• Look at the help for np.triu and np.tril.

4.2. Numerical operations on arrays 64


Scipy lecture notes, Edition 2020.2

4.2.2 Basic reductions


Computing sums

>>> x = np.array([1, 2, 3, 4])


>>> np.sum(x)
10
>>> x.sum()
10

Sum by rows and by columns:


>>> x = np.array([[1, 1], [2, 2]])
>>> x
array([[1, 1],
[2, 2]])
>>> x.sum(axis=0) # columns (first dimension)
array([3, 3])
>>> x[:, 0].sum(), x[:, 1].sum()
(3, 3)
>>> x.sum(axis=1) # rows (second dimension)
array([2, 4])
>>> x[0, :].sum(), x[1, :].sum()
(2, 4)

Tip: Same idea in higher dimensions:


>>> x = np.random.rand(2, 2, 2)
>>> x.sum(axis=2)[0, 1]
1.14764...
>>> x[0, 1, :].sum()
1.14764...

Other reductions
— works the same way (and take axis=)
Extrema:
>>> x = np.array([1, 3, 2])
>>> x.min()
1
>>> x.max()
(continues on next page)

4.2. Numerical operations on arrays 65


Scipy lecture notes, Edition 2020.2

(continued from previous page)


3

>>> x.argmin() # index of minimum


0
>>> x.argmax() # index of maximum
1

Logical operations:

>>> np.all([True, True, False])


False
>>> np.any([True, True, False])
True

Note: Can be used for array comparisons:

>>> a = np.zeros((100, 100))


>>> np.any(a != 0)
False
>>> np.all(a == a)
True

>>> a = np.array([1, 2, 3, 2])


>>> b = np.array([2, 2, 3, 2])
>>> c = np.array([6, 4, 4, 5])
>>> ((a <= b) & (b <= c)).all()
True

Statistics:

>>> x = np.array([1, 2, 3, 1])


>>> y = np.array([[1, 2, 3], [5, 6, 1]])
>>> x.mean()
1.75
>>> np.median(x)
1.5
>>> np.median(y, axis=-1) # last axis
array([2., 5.])

>>> x.std() # full population standard dev.


0.82915619758884995

. . . and many more (best to learn as you go).

Exercise: Reductions

• Given there is a sum, what other function might you expect to see?
• What is the difference between sum and cumsum?

Worked Example: diffusion using a random walk algorithm

4.2. Numerical operations on arrays 66


Scipy lecture notes, Edition 2020.2

Tip: Let us consider a simple 1D random walk process: at each time step a walker jumps right or
left with equal probability.

We are interested in finding the typical distance from the origin of a random walker after t left or
right jumps? We are going to simulate many “walkers” to find this law, and we are going to do so
using array computing tricks: we are going to create a 2D array with the “stories” (each walker has a
story) in one direction, and the time in the other:

>>> n_stories = 1000 # number of walkers


>>> t_max = 200 # time during which we follow the walker

We randomly choose all the steps 1 or -1 of the walk:

>>> t = np.arange(t_max)
>>> steps = 2 * np.random.randint(0, 1 + 1, (n_stories, t_max)) - 1 # +1 because the high␣
˓→value is exclusive

>>> np.unique(steps) # Verification: all steps are 1 or -1


array([-1, 1])

We build the walks by summing steps along the time:

>>> positions = np.cumsum(steps, axis=1) # axis = 1: dimension of time


>>> sq_distance = positions**2

We get the mean in the axis of the stories:

>>> mean_sq_distance = np.mean(sq_distance, axis=0)

Plot the results:

>>> plt.figure(figsize=(4, 3))


<Figure size ... with 0 Axes>
>>> plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
[<matplotlib.lines.Line2D object at ...>, <matplotlib.lines.Line2D object at ...>]
>>> plt.xlabel(r"$t$")
Text(...'$t$')
>>> plt.ylabel(r"$\sqrt{\langle (\delta x)^2 \rangle}$")
Text(...'$\\sqrt{\\langle (\\delta x)^2 \\rangle}$')
>>> plt.tight_layout() # provide sufficient space for labels

4.2. Numerical operations on arrays 67


Scipy lecture notes, Edition 2020.2

We find a well-known result in physics: the RMS


distance grows as the square root of the time!

4.2.3 Broadcasting
• Basic operations on numpy arrays (addition, etc.) are elementwise
• This works on arrays of the same size.
Nevertheless, It’s also possible to do operations on arrays of different
sizes if NumPy can transform these arrays so that they all have
the same size: this conversion is called broadcasting.
The image below gives an example of broadcasting:

Let’s verify:

4.2. Numerical operations on arrays 68


Scipy lecture notes, Edition 2020.2

>>> a = np.tile(np.arange(0, 40, 10), (3, 1)).T


>>> a
array([[ 0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]])
>>> b = np.array([0, 1, 2])
>>> a + b
array([[ 0, 1, 2],
[10, 11, 12],
[20, 21, 22],
[30, 31, 32]])

We have already used broadcasting without knowing it!:


>>> a = np.ones((4, 5))
>>> a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1
>>> a
array([[2., 2., 2., 2., 2.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])

A useful trick:
>>> a = np.arange(0, 40, 10)
>>> a.shape
(4,)
>>> a = a[:, np.newaxis] # adds a new axis -> 2D array
>>> a.shape
(4, 1)
>>> a
array([[ 0],
[10],
[20],
[30]])
>>> a + b
array([[ 0, 1, 2],
[10, 11, 12],
[20, 21, 22],
[30, 31, 32]])

Tip: Broadcasting seems a bit magical, but it is actually quite natural to use it when we want to solve
a problem whose output data is an array with more dimensions than input data.

Worked Example: Broadcasting

Let’s construct an array of distances (in miles) between cities of Route 66: Chicago, Springfield,
Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff and Los Angeles.
>>> mileposts = np.array([0, 198, 303, 736, 871, 1175, 1475, 1544,
... 1913, 2448])
>>> distance_array = np.abs(mileposts - mileposts[:, np.newaxis])
>>> distance_array
array([[ 0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448],
[ 198, 0, 105, 538, 673, 977, 1277, 1346, 1715, 2250],
[ 303, 105, 0, 433, 568, 872, 1172, 1241, 1610, 2145],
[ 736, 538, 433, 0, 135, 439, 739, 808, 1177, 1712],
[ 871, 673, 568, 135, 0, 304, 604, 673, 1042, 1577],
[1175, 977, 872, 439, 304, 0, 300, 369, 738, 1273],
[1475, 1277, 1172, 739, 604, 300, 0, 69, 438, 973],
[1544, 1346, 1241, 808, 673, 369, 69, 0, 369, 904],
[1913, 1715, 1610, 1177, 1042,
4.2. Numerical operations on arrays 738, 438, 369, 0, 535], 69
[2448, 2250, 2145, 1712, 1577, 1273, 973, 904, 535, 0]])
Scipy lecture notes, Edition 2020.2

A lot of grid-based or network-based problems can also use broadcasting. For instance, if we want to
compute the distance from the origin of points on a 5x5 grid, we can do
>>> x, y = np.arange(5), np.arange(5)[:, np.newaxis]
>>> distance = np.sqrt(x ** 2 + y ** 2)
>>> distance
array([[0. , 1. , 2. , 3. , 4. ],
[1. , 1.41421356, 2.23606798, 3.16227766, 4.12310563],
[2. , 2.23606798, 2.82842712, 3.60555128, 4.47213595],
[3. , 3.16227766, 3.60555128, 4.24264069, 5. ],
[4. , 4.12310563, 4.47213595, 5. , 5.65685425]])

Or in color:
>>> plt.pcolor(distance)
>>> plt.colorbar()

Remark : the numpy.ogrid() function allows to


directly create vectors x and y of the previous example, with two “significant dimensions”:
>>> x, y = np.ogrid[0:5, 0:5]
>>> x, y
(array([[0],
[1],
[2],
[3],
[4]]), array([[0, 1, 2, 3, 4]]))
(continues on next page)

4.2. Numerical operations on arrays 70


Scipy lecture notes, Edition 2020.2

(continued from previous page)


>>> x.shape, y.shape
((5, 1), (1, 5))
>>> distance = np.sqrt(x ** 2 + y ** 2)

Tip: So, np.ogrid is very useful as soon as we have to handle computations on a grid. On the other
hand, np.mgrid directly provides matrices full of indices for cases where we can’t (or don’t want to)
benefit from broadcasting:

>>> x, y = np.mgrid[0:4, 0:4]


>>> x
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
>>> y
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])

See also:
Broadcasting: discussion of broadcasting in the Advanced NumPy chapter.

4.2.4 Array shape manipulation


Flattening

>>> a = np.array([[1, 2, 3], [4, 5, 6]])


>>> a.ravel()
array([1, 2, 3, 4, 5, 6])
>>> a.T
array([[1, 4],
[2, 5],
[3, 6]])
>>> a.T.ravel()
array([1, 4, 2, 5, 3, 6])

Higher dimensions: last dimensions ravel out “first”.

Reshaping
The inverse operation to flattening:

>>> a.shape
(2, 3)
>>> b = a.ravel()
>>> b = b.reshape((2, 3))
>>> b
array([[1, 2, 3],
[4, 5, 6]])

Or,

>>> a.reshape((2, -1)) # unspecified (-1) value is inferred


array([[1, 2, 3],
[4, 5, 6]])

4.2. Numerical operations on arrays 71


Scipy lecture notes, Edition 2020.2

Warning: ndarray.reshape may return a view (cf help(np.reshape))), or copy

Tip:

>>> b[0, 0] = 99
>>> a
array([[99, 2, 3],
[ 4, 5, 6]])

Beware: reshape may also return a copy!:

>>> a = np.zeros((3, 2))


>>> b = a.T.reshape(3*2)
>>> b[0] = 9
>>> a
array([[0., 0.],
[0., 0.],
[0., 0.]])

To understand this you need to learn more about the memory layout of a numpy array.

Adding a dimension
Indexing with the np.newaxis object allows us to add an axis to an array (you have seen this already
above in the broadcasting section):

>>> z = np.array([1, 2, 3])


>>> z
array([1, 2, 3])

>>> z[:, np.newaxis]


array([[1],
[2],
[3]])

>>> z[np.newaxis, :]
array([[1, 2, 3]])

Dimension shuffling

>>> a = np.arange(4*3*2).reshape(4, 3, 2)
>>> a.shape
(4, 3, 2)
>>> a[0, 2, 1]
5
>>> b = a.transpose(1, 2, 0)
>>> b.shape
(3, 2, 4)
>>> b[2, 1, 0]
5

Also creates a view:

>>> b[2, 1, 0] = -1
>>> a[0, 2, 1]
-1

4.2. Numerical operations on arrays 72


Scipy lecture notes, Edition 2020.2

Resizing
Size of an array can be changed with ndarray.resize:

>>> a = np.arange(4)
>>> a.resize((8,))
>>> a
array([0, 1, 2, 3, 0, 0, 0, 0])

However, it must not be referred to somewhere else:

>>> b = a
>>> a.resize((4,))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot resize an array that has been referenced or is
referencing another array in this way. Use the resize function

Exercise: Shape manipulations

• Look at the docstring for reshape, especially the notes section which has some more information
about copies and views.
• Use flatten as an alternative to ravel. What is the difference? (Hint: check which one returns
a view and which a copy)
• Experiment with transpose for dimension shuffling.

4.2.5 Sorting data


Sorting along an axis:

>>> a = np.array([[4, 3, 5], [1, 2, 1]])


>>> b = np.sort(a, axis=1)
>>> b
array([[3, 4, 5],
[1, 1, 2]])

Note: Sorts each row separately!

In-place sort:

>>> a.sort(axis=1)
>>> a
array([[3, 4, 5],
[1, 1, 2]])

Sorting with fancy indexing:

>>> a = np.array([4, 3, 1, 2])


>>> j = np.argsort(a)
>>> j
array([2, 3, 1, 0])
>>> a[j]
array([1, 2, 3, 4])

Finding minima and maxima:

4.2. Numerical operations on arrays 73


Scipy lecture notes, Edition 2020.2

>>> a = np.array([4, 3, 1, 2])


>>> j_max = np.argmax(a)
>>> j_min = np.argmin(a)
>>> j_max, j_min
(0, 2)

Exercise: Sorting

• Try both in-place and out-of-place sorting.


• Try creating arrays with different dtypes and sorting them.
• Use all or array_equal to check the results.
• Look at np.random.shuffle for a way to create sortable input quicker.
• Combine ravel, sort and reshape.
• Look at the axis keyword for sort and rewrite the previous exercise.

4.2.6 Summary
What do you need to know to get started?
• Know how to create arrays : array, arange, ones, zeros.
• Know the shape of the array with array.shape, then use slicing to obtain different views of the
array: array[::2], etc. Adjust the shape of the array using reshape or flatten it with ravel.
• Obtain a subset of the elements of an array and/or modify their values with masks

>>> a[a < 0] = 0

• Know miscellaneous operations on arrays, such as finding the mean or max (array.max(), array.
mean()). No need to retain everything, but have the reflex to search in the documentation (online
docs, help(), lookfor())!!
• For advanced use: master the indexing with arrays of integers, as well as broadcasting. Know more
NumPy functions to handle various array operations.

Quick read

If you want to do a first quick pass through the Scipy lectures to learn the ecosystem, you can directly
skip to the next chapter: Matplotlib: plotting.
The remainder of this chapter is not necessary to follow the rest of the intro part. But be sure to
come back and finish this chapter, as well as to do some more exercices.

4.3 More elaborate arrays

Section contents

• More data types


• Structured data types
• maskedarray: dealing with (propagation of) missing data

4.3. More elaborate arrays 74


Scipy lecture notes, Edition 2020.2

4.3.1 More data types


Casting
“Bigger” type wins in mixed-type operations:

>>> np.array([1, 2, 3]) + 1.5


array([2.5, 3.5, 4.5])

Assignment never changes the type!

>>> a = np.array([1, 2, 3])


>>> a.dtype
dtype('int64')
>>> a[0] = 1.9 # <-- float is truncated to integer
>>> a
array([1, 2, 3])

Forced casts:

>>> a = np.array([1.7, 1.2, 1.6])


>>> b = a.astype(int) # <-- truncates to integer
>>> b
array([1, 1, 1])

Rounding:

>>> a = np.array([1.2, 1.5, 1.6, 2.5, 3.5, 4.5])


>>> b = np.around(a)
>>> b # still floating-point
array([1., 2., 2., 2., 4., 4.])
>>> c = np.around(a).astype(int)
>>> c
array([1, 2, 2, 2, 4, 4])

Different data type sizes


Integers (signed):

int8 8 bits
int16 16 bits
int32 32 bits (same as int on 32-bit platform)
int64 64 bits (same as int on 64-bit platform)

>>> np.array([1], dtype=int).dtype


dtype('int64')
>>> np.iinfo(np.int32).max, 2**31 - 1
(2147483647, 2147483647)

Unsigned integers:

uint8 8 bits
uint16 16 bits
uint32 32 bits
uint64 64 bits

>>> np.iinfo(np.uint32).max, 2**32 - 1


(4294967295, 4294967295)

4.3. More elaborate arrays 75


Scipy lecture notes, Edition 2020.2

Long integers

Python 2 has a specific type for ‘long’ integers, that cannot overflow, represented with an ‘L’ at the
end. In Python 3, all integers are long, and thus cannot overflow.
>>> np.iinfo(np.int64).max, 2**63 - 1
(9223372036854775807, 9223372036854775807L)

Floating-point numbers:

float16 16 bits
float32 32 bits
float64 64 bits (same as float)
float96 96 bits, platform-dependent (same as np.longdouble)
float128 128 bits, platform-dependent (same as np.longdouble)

>>> np.finfo(np.float32).eps
1.1920929e-07
>>> np.finfo(np.float64).eps
2.2204460492503131e-16

>>> np.float32(1e-8) + np.float32(1) == 1


True
>>> np.float64(1e-8) + np.float64(1) == 1
False

Complex floating-point numbers:

complex64 two 32-bit floats


complex128 two 64-bit floats
complex192 two 96-bit floats, platform-dependent
complex256 two 128-bit floats, platform-dependent

Smaller data types

If you don’t know you need special data types, then you probably don’t.
Comparison on using float32 instead of float64:
• Half the size in memory and on disk
• Half the memory bandwidth required (may be a bit faster in some operations)
In [1]: a = np.zeros((int(1e6),), dtype=np.float64)

In [2]: b = np.zeros((int(1e6),), dtype=np.float32)

In [3]: %timeit a*a


1000 loops, best of 3: 1.78 ms per loop

In [4]: %timeit b*b


1000 loops, best of 3: 1.07 ms per loop

• But: bigger rounding errors — sometimes in surprising places (i.e., don’t use them unless you
really need them)

4.3. More elaborate arrays 76


Scipy lecture notes, Edition 2020.2

4.3.2 Structured data types

sensor_code (4-character string)


position (float)
value (float)

>>> samples = np.zeros((6,), dtype=[('sensor_code', 'S4'),


... ('position', float), ('value', float)])
>>> samples.ndim
1
>>> samples.shape
(6,)
>>> samples.dtype.names
('sensor_code', 'position', 'value')

>>> samples[:] = [('ALFA', 1, 0.37), ('BETA', 1, 0.11), ('TAU', 1, 0.13),


... ('ALFA', 1.5, 0.37), ('ALFA', 3, 0.11), ('TAU', 1.2, 0.13)]
>>> samples
array([('ALFA', 1.0, 0.37), ('BETA', 1.0, 0.11), ('TAU', 1.0, 0.13),
('ALFA', 1.5, 0.37), ('ALFA', 3.0, 0.11), ('TAU', 1.2, 0.13)],
dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])

Field access works by indexing with field names:

>>> samples['sensor_code']
array(['ALFA', 'BETA', 'TAU', 'ALFA', 'ALFA', 'TAU'],
dtype='|S4')
>>> samples['value']
array([0.37, 0.11, 0.13, 0.37, 0.11, 0.13])
>>> samples[0]
('ALFA', 1.0, 0.37)

>>> samples[0]['sensor_code'] = 'TAU'


>>> samples[0]
('TAU', 1.0, 0.37)

Multiple fields at once:

>>> samples[['position', 'value']]


array([(1. , 0.37), (1. , 0.11), (1. , 0.13), (1.5, 0.37),
(3. , 0.11), (1.2, 0.13)],
dtype=[('position', '<f8'), ('value', '<f8')])

Fancy indexing works, as usual:

>>> samples[samples['sensor_code'] == b'ALFA']


array([(b'ALFA', 1.5, 0.37), (b'ALFA', 3. , 0.11)],
dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])

Note: There are a bunch of other syntaxes for constructing structured arrays, see here and here.

4.3.3 maskedarray: dealing with (propagation of) missing data


• For floats one could use NaN’s, but masks work for all types:

>>> x = np.ma.array([1, 2, 3, 4], mask=[0, 1, 0, 1])


>>> x
masked_array(data=[1, --, 3, --],
(continues on next page)

4.3. More elaborate arrays 77


Scipy lecture notes, Edition 2020.2

(continued from previous page)


mask=[False, True, False, True],
fill_value=999999)

>>> y = np.ma.array([1, 2, 3, 4], mask=[0, 1, 1, 1])


>>> x + y
masked_array(data=[2, --, --, --],
mask=[False, True, True, True],
fill_value=999999)

• Masking versions of common functions:


>>> np.ma.sqrt([1, -1, 2, -2])
masked_array(data=[1.0, --, 1.41421356237... --],
mask=[False, True, False, True],
fill_value=1e+20)

Note: There are other useful array siblings

While it is off topic in a chapter on numpy, let’s take a moment to recall good coding practice, which
really do pay off in the long run:

Good practices

• Explicit variable names (no need of a comment to explain what is in the variable)
• Style: spaces after commas, around =, etc.
A certain number of rules for writing “beautiful” code (and, more importantly, using the same
conventions as everybody else!) are given in the Style Guide for Python Code and the Docstring
Conventions page (to manage help strings).
• Except some rare cases, variable names and comments in English.

4.4 Advanced operations

Section contents

• Polynomials
• Loading data files

4.4.1 Polynomials
NumPy also contains polynomials in different bases:
For example, 3𝑥2 + 2𝑥 − 1:
>>> p = np.poly1d([3, 2, -1])
>>> p(0)
-1
>>> p.roots
array([-1. , 0.33333333])
>>> p.order
2

4.4. Advanced operations 78


Scipy lecture notes, Edition 2020.2

>>> x = np.linspace(0, 1, 20)


>>> y = np.cos(x) + 0.3*np.random.rand(20)
>>> p = np.poly1d(np.polyfit(x, y, 3))

>>> t = np.linspace(0, 1, 200) # use a larger number of points for smoother plotting
>>> plt.plot(x, y, 'o', t, p(t), '-')
[<matplotlib.lines.Line2D object at ...>, <matplotlib.lines.Line2D object at ...>]

See http://numpy.org/doc/stable/reference/
routines.polynomials.poly1d.html for more.

More polynomials (with more bases)


NumPy also has a more sophisticated polynomial interface, which supports e.g. the Chebyshev basis.
3𝑥2 + 2𝑥 − 1:

>>> p = np.polynomial.Polynomial([-1, 2, 3]) # coefs in different order!


>>> p(0)
-1.0
>>> p.roots()
array([-1. , 0.33333333])
>>> p.degree() # In general polynomials do not always expose 'order'
2

Example using polynomials in Chebyshev basis, for polynomials in range [-1, 1]:

>>> x = np.linspace(-1, 1, 2000)


>>> y = np.cos(x) + 0.3*np.random.rand(2000)
>>> p = np.polynomial.Chebyshev.fit(x, y, 90)

>>> plt.plot(x, y, 'r.')


[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, p(x), 'k-', lw=3)
[<matplotlib.lines.Line2D object at ...>]

4.4. Advanced operations 79


Scipy lecture notes, Edition 2020.2

The Chebyshev polynomials have some advantages


in interpolation.

4.4.2 Loading data files


Text files
Example: populations.txt:
# year hare lynx carrot
1900 30e3 4e3 48300
1901 47.2e3 6.1e3 48200
1902 70.2e3 9.8e3 41500
1903 77.4e3 35.2e3 38200

>>> data = np.loadtxt('data/populations.txt')


>>> data
array([[ 1900., 30000., 4000., 48300.],
[ 1901., 47200., 6100., 48200.],
[ 1902., 70200., 9800., 41500.],
...

>>> np.savetxt('pop2.txt', data)


>>> data2 = np.loadtxt('pop2.txt')

Note: If you have a complicated text file, what you can try are:
• np.genfromtxt
• Using Python’s I/O functions and e.g. regexps for parsing (Python is quite well suited for this)

Reminder: Navigating the filesystem with IPython

In [1]: pwd # show current directory


'/home/user/stuff/2011-numpy-tutorial'
In [2]: cd ex
'/home/user/stuff/2011-numpy-tutorial/ex'
In [3]: ls
populations.txt species.txt

Images
Using Matplotlib:

4.4. Advanced operations 80


Scipy lecture notes, Edition 2020.2

>>> img = plt.imread('data/elephant.png')


>>> img.shape, img.dtype
((200, 300, 3), dtype('float32'))
>>> plt.imshow(img)
<matplotlib.image.AxesImage object at ...>
>>> plt.savefig('plot.png')

>>> plt.imsave('red_elephant.png', img[:,:,0], cmap=plt.cm.gray)

This saved only one channel (of RGB):

>>> plt.imshow(plt.imread('red_elephant.png'))
<matplotlib.image.AxesImage object at ...>

Other libraries:

>>> import imageio


>>> imageio.imsave('tiny_elephant.png', img[::6,::6])
>>> plt.imshow(plt.imread('tiny_elephant.png'), interpolation='nearest')
<matplotlib.image.AxesImage object at ...>

4.4. Advanced operations 81


Scipy lecture notes, Edition 2020.2

NumPy’s own format


NumPy has its own binary format, not portable but with efficient I/O:

>>> data = np.ones((3, 3))


>>> np.save('pop.npy', data)
>>> data3 = np.load('pop.npy')

Well-known (& more obscure) file formats


• HDF5: h5py, PyTables
• NetCDF: scipy.io.netcdf_file, netcdf4-python, . . .
• Matlab: scipy.io.loadmat, scipy.io.savemat
• MatrixMarket: scipy.io.mmread, scipy.io.mmwrite
• IDL: scipy.io.readsav
. . . if somebody uses it, there’s probably also a Python library for it.

Exercise: Text data files

Write a Python script that loads data from populations.txt:: and drop the last column and the first
5 rows. Save the smaller dataset to pop2.txt.

NumPy internals

If you are interested in the NumPy internals, there is a good discussion in Advanced NumPy.

4.5 Some exercises


4.5.1 Array manipulations
1. Form the 2-D array (without typing it in explicitly):

[[1, 6, 11],
[2, 7, 12],
[3, 8, 13],
[4, 9, 14],
[5, 10, 15]]

4.5. Some exercises 82


Scipy lecture notes, Edition 2020.2

and generate a new array containing its 2nd and 4th rows.
2. Divide each column of the array:

>>> import numpy as np


>>> a = np.arange(25).reshape(5, 5)

elementwise with the array b = np.array([1., 5, 10, 15, 20]). (Hint: np.newaxis).
3. Harder one: Generate a 10 x 3 array of random numbers (in range [0,1]). For each row, pick the
number closest to 0.5.
• Use abs and argsort to find the column j closest for each row.
• Use fancy indexing to extract the numbers. (Hint: a[i,j] – the array i must contain the
row numbers corresponding to stuff in j.)

4.5.2 Picture manipulation: Framing a Face


Let’s do some manipulations on numpy arrays by starting with an image of a racoon. scipy provides a
2D array of this image with the scipy.misc.face function:

>>> from scipy import misc


>>> face = misc.face(gray=True) # 2D grayscale image

Here are a few images we will be able to obtain with our manipulations: use different colormaps, crop
the image, change some parts of the image.

• Let’s use the imshow function of matplotlib to display the image.

>>> import matplotlib.pyplot as plt


>>> face = misc.face(gray=True)
>>> plt.imshow(face)
<matplotlib.image.AxesImage object at 0x...>

• The face is displayed in false colors. A colormap must be specified for it to be displayed
in grey.

>>> plt.imshow(face, cmap=plt.cm.gray)


<matplotlib.image.AxesImage object at 0x...>

• Create an array of the image with a narrower centering [for example,] remove 100 pixels
from all the borders of the image. To check the result, display this new array with imshow.

>>> crop_face = face[100:-100, 100:-100]

• We will now frame the face with a black locket. For this, we need to create a mask cor-
responding to the pixels we want to be black. The center of the face is around (660, 330), so
we defined the mask by this condition (y-300)**2 + (x-660)**2

>>> sy, sx = face.shape


>>> y, x = np.ogrid[0:sy, 0:sx] # x and y indices of pixels
>>> y.shape, x.shape
((768, 1), (1, 1024))
(continues on next page)

4.5. Some exercises 83


Scipy lecture notes, Edition 2020.2

(continued from previous page)


>>> centerx, centery = (660, 300) # center of the image
>>> mask = ((y - centery)**2 + (x - centerx)**2) > 230**2 # circle

then we assign the value 0 to the pixels of the image corresponding to the mask. The syntax
is extremely simple and intuitive:

>>> face[mask] = 0
>>> plt.imshow(face)
<matplotlib.image.AxesImage object at 0x...>

• Follow-up: copy all instructions of this exercise in a script called face_locket.py then
execute this script in IPython with %run face_locket.py.
Change the circle to an ellipsoid.

4.5.3 Data statistics


The data in populations.txt describes the populations of hares and lynxes (and carrots) in northern
Canada during 20 years:

>>> data = np.loadtxt('data/populations.txt')


>>> year, hares, lynxes, carrots = data.T # trick: columns to variables

>>> import matplotlib.pyplot as plt


>>> plt.axes([0.2, 0.1, 0.5, 0.8])
<matplotlib.axes...Axes object at ...>
>>> plt.plot(year, hares, year, lynxes, year, carrots)
[<matplotlib.lines.Line2D object at ...>, ...]
>>> plt.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
<matplotlib.legend.Legend object at ...>

Computes and print, based on the data in


populations.txt. . .
1. The mean and std of the populations of each species for the years in the period.
2. Which year each species had the largest population.
3. Which species has the largest population for each year. (Hint: argsort & fancy indexing of
np.array(['H', 'L', 'C']))
4. Which years any of the populations is above 50000. (Hint: comparisons and np.any)
5. The top 2 years for each species when they had the lowest populations. (Hint: argsort, fancy
indexing)
6. Compare (plot) the change in hare population (see help(np.gradient)) and the number of lynxes.
Check correlation (see help(np.corrcoef)).

4.5. Some exercises 84


Scipy lecture notes, Edition 2020.2

. . . all without for-loops.


Solution: Python source file

4.5.4 Crude integral approximations


Write a function f(a, b, c) that returns 𝑎𝑏 −𝑐. Form a 24x12x6 array containing its values in parameter
ranges [0,1] x [0,1] x [0,1].
Approximate the 3-d integral
∫︁ 1 ∫︁ 1 ∫︁ 1
(𝑎𝑏 − 𝑐)𝑑𝑎 𝑑𝑏 𝑑𝑐
0 0 0

1
over this volume with the mean. The exact result is: ln 2 − 2 ≈ 0.1931 . . . — what is your relative error?
(Hints: use elementwise operations and broadcasting. You can make np.ogrid give a number of points
in given range with np.ogrid[0:1:20j].)
Reminder Python functions:
def f(a, b, c):
return some_result

Solution: Python source file

4.5.5 Mandelbrot set

Write a script that computes the Mandelbrot frac-


tal. The Mandelbrot iteration:
N_max = 50
some_threshold = 50

c = x + 1j*y

z = 0
for j in range(N_max):
z = z**2 + c

Point (x, y) belongs to the Mandelbrot set if |𝑧| < some_threshold.


Do this computation by:
1. Construct a grid of c = x + 1j*y values in range [-2, 1] x [-1.5, 1.5]
2. Do the iteration
3. Form the 2-d boolean mask indicating which points are in the set
4. Save the result to an image with:

4.5. Some exercises 85


Scipy lecture notes, Edition 2020.2

>>> import matplotlib.pyplot as plt


>>> plt.imshow(mask.T, extent=[-2, 1, -1.5, 1.5])
<matplotlib.image.AxesImage object at ...>
>>> plt.gray()
>>> plt.savefig('mandelbrot.png')

Solution: Python source file

4.5.6 Markov chain

Markov chain transition matrix P, and probability distribution on the states p:


1. 0 <= P[i,j] <= 1: probability to go from state i to state j
2. Transition rule: 𝑝𝑛𝑒𝑤 = 𝑃 𝑇 𝑝𝑜𝑙𝑑
3. all(sum(P, axis=1) == 1), p.sum() == 1: normalization
Write a script that works with 5 states, and:
• Constructs a random matrix, and normalizes each row so that it is a transition matrix.
• Starts from a random (normalized) probability distribution p and takes 50 steps => p_50
• Computes the stationary distribution: the eigenvector of P.T with eigenvalue 1 (numerically: closest
to 1) => p_stationary
Remember to normalize the eigenvector — I didn’t. . .
• Checks if p_50 and p_stationary are equal to tolerance 1e-5
Toolbox: np.random.rand, .dot(), np.linalg.eig, reductions, abs(), argmin, comparisons, all, np.
linalg.norm, etc.
Solution: Python source file

4.6 Full code examples


4.6.1 Full code examples for the numpy chapter

Note: Click here to download the full example code

4.6. Full code examples 86


Scipy lecture notes, Edition 2020.2

2D plotting
Plot a basic 2D figure

import numpy as np
import matplotlib.pyplot as plt

image = np.random.rand(30, 30)


plt.imshow(image, cmap=plt.cm.hot)
plt.colorbar()
plt.show()

Total running time of the script: ( 0 minutes 0.040 seconds)

Note: Click here to download the full example code

1D plotting
Plot a basic 1D figure

4.6. Full code examples 87


Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.plot(x, y)
plt.plot(x, y, 'o')
plt.show()

Total running time of the script: ( 0 minutes 0.087 seconds)

Note: Click here to download the full example code

Distances exercise
Plot distances in a grid

4.6. Full code examples 88


Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

x, y = np.arange(5), np.arange(5)[:, np.newaxis]


distance = np.sqrt(x ** 2 + y ** 2)
plt.pcolor(distance)
plt.colorbar()
plt.show()

Total running time of the script: ( 0 minutes 0.045 seconds)

Note: Click here to download the full example code

Fitting to polynomial
Plot noisy data and their polynomial fit

4.6. Full code examples 89


Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(12)

x = np.linspace(0, 1, 20)
y = np.cos(x) + 0.3*np.random.rand(20)
p = np.poly1d(np.polyfit(x, y, 3))

t = np.linspace(0, 1, 200)
plt.plot(x, y, 'o', t, p(t), '-')
plt.show()

Total running time of the script: ( 0 minutes 0.028 seconds)

Note: Click here to download the full example code

Fitting in Chebyshev basis


Plot noisy data and their polynomial fit in a Chebyshev basis

4.6. Full code examples 90


Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.linspace(-1, 1, 2000)
y = np.cos(x) + 0.3*np.random.rand(2000)
p = np.polynomial.Chebyshev.fit(x, y, 90)

plt.plot(x, y, 'r.')
plt.plot(x, p(x), 'k-', lw=3)
plt.show()

Total running time of the script: ( 0 minutes 0.055 seconds)

Note: Click here to download the full example code

Population exercise
Plot populations of hares, lynxes, and carrots

4.6. Full code examples 91


Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt('../data/populations.txt')
year, hares, lynxes, carrots = data.T

plt.axes([0.2, 0.1, 0.5, 0.8])


plt.plot(year, hares, year, lynxes, year, carrots)
plt.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
plt.show()

Total running time of the script: ( 0 minutes 0.019 seconds)

Note: Click here to download the full example code

Reading and writing an elephant


Read and write images

import numpy as np
import matplotlib.pyplot as plt

original figure

plt.figure()
img = plt.imread('../data/elephant.png')
plt.imshow(img)

4.6. Full code examples 92


Scipy lecture notes, Edition 2020.2

red channel displayed in grey

plt.figure()
img_red = img[:, :, 0]
plt.imshow(img_red, cmap=plt.cm.gray)

4.6. Full code examples 93


Scipy lecture notes, Edition 2020.2

lower resolution

plt.figure()
img_tiny = img[::6, ::6]
plt.imshow(img_tiny, interpolation='nearest')

plt.show()

4.6. Full code examples 94


Scipy lecture notes, Edition 2020.2

Total running time of the script: ( 0 minutes 0.046 seconds)

Note: Click here to download the full example code

Mandelbrot set
Compute the Mandelbrot fractal and plot it

4.6. Full code examples 95


Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt
from numpy import newaxis

def compute_mandelbrot(N_max, some_threshold, nx, ny):


# A grid of c-values
x = np.linspace(-2, 1, nx)
y = np.linspace(-1.5, 1.5, ny)

c = x[:,newaxis] + 1j*y[newaxis,:]

# Mandelbrot iteration

z = c

# The code below overflows in many regions of the x-y grid, suppress
# warnings temporarily
with np.warnings.catch_warnings():
np.warnings.simplefilter("ignore")
for j in range(N_max):
z = z**2 + c
mandelbrot_set = (abs(z) < some_threshold)

return mandelbrot_set

mandelbrot_set = compute_mandelbrot(50, 50., 601, 401)

plt.imshow(mandelbrot_set.T, extent=[-2, 1, -1.5, 1.5])


plt.gray()
plt.show()

4.6. Full code examples 96


Scipy lecture notes, Edition 2020.2

Total running time of the script: ( 0 minutes 0.112 seconds)

Note: Click here to download the full example code

Random walk exercise


Plot distance as a function of time for a random walk together with the theoretical result

import numpy as np
import matplotlib.pyplot as plt

# We create 1000 realizations with 200 steps each


n_stories = 1000
t_max = 200

t = np.arange(t_max)
# Steps can be -1 or 1 (note that randint excludes the upper limit)
steps = 2 * np.random.randint(0, 1 + 1, (n_stories, t_max)) - 1

# The time evolution of the position is obtained by successively


# summing up individual steps. This is done for each of the
# realizations, i.e. along axis 1.
positions = np.cumsum(steps, axis=1)

# Determine the time evolution of the mean square distance.


sq_distance = positions**2
mean_sq_distance = np.mean(sq_distance, axis=0)

# Plot the distance d from the origin as a function of time and


# compare with the theoretically expected result where d(t)
# grows as a square root of time t.
plt.figure(figsize=(4, 3))
plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
plt.xlabel(r"$t$")
plt.ylabel(r"$\sqrt{\langle (\delta x)^2 \rangle}$")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.060 seconds)

4.6. Full code examples 97

You might also like