0% found this document useful (0 votes)

44 views

55-102 Numpy

This chapter provides an overview of NumPy, a core package for numerical computing in Python. NumPy introduces NumPy arrays as a high-level numerical data type that allows for efficient storage and manipulation of multi-dimensional arrays. NumPy arrays can be created manually or using functions like arange(), linspace(), ones(), zeros(), and random numbers. NumPy arrays have attributes like shape and dtype that provide information about the array dimensions and data type. NumPy also provides functions for basic operations on arrays like indexing, slicing, copying, and fancy indexing. Matplotlib can be used to visualize 1D and 2D NumPy arrays.

Uploaded by

Rap Imagin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

55-102 Numpy

Uploaded by

Rap Imagin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

CHAPTER 4

NumPy: creating and manipulating

numerical data

Authors: Emmanuelle Gouillart, Didrik Pinte, Gaël Varoquaux, and Pauli Virtanen
This chapter gives an overview of NumPy, the core tool for performant numerical computing with Python.

4.1 The NumPy array object

Section contents

• What are NumPy and NumPy arrays?

• Creating arrays
• Basic data types
• Basic visualization
• Indexing and slicing
• Copies and views
• Fancy indexing

4.1.1 What are NumPy and NumPy arrays?

NumPy arrays
Python objects

50
Scipy lecture notes, Edition 2020.2

• high-level number objects: integers, floating point

• containers: lists (costless insertion and append), dictionaries (fast lookup)
NumPy provides
• extension package to Python for multi-dimensional arrays
• closer to hardware (efficiency)
• designed for scientific computation (convenience)
• Also known as array oriented computing

>>> import numpy as np

>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])

Tip: For example, An array containing:

• values of an experiment/simulation at discrete time steps
• signal recorded by a measurement device, e.g. sound wave
• pixels of an image, grey-level or colour
• 3-D data measured at different X-Y-Z positions, e.g. MRI scan
• ...

Why it is useful: Memory-efficient container that provides fast numerical operations.

In [1]: L = range(1000)

In [2]: %timeit [i**2 for i in L]

1000 loops, best of 3: 403 us per loop

In [3]: a = np.arange(1000)

In [4]: %timeit a**2

100000 loops, best of 3: 12.7 us per loop

NumPy Reference documentation

• On the web: https://numpy.org/doc/
• Interactive help:
In [5]: np.array?
String Form:<built-in function array>
Docstring:
array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0, ...

• Looking for something:

>>> np.lookfor('create array')
Search results for 'create array'
---------------------------------
numpy.array
(continues on next page)

4.1. The NumPy array object 51

Scipy lecture notes, Edition 2020.2

(continued from previous page)

Create an array.
numpy.memmap
Create a memory-map to an array stored in a *binary* file on disk.

In [6]: np.con*?
np.concatenate
np.conj
np.conjugate
np.convolve

Import conventions
The recommended convention to import numpy is:

>>> import numpy as np

4.1.2 Creating arrays

Manual construction of arrays
• 1-D:

>>> a = np.array([0, 1, 2, 3])

>>> a
array([0, 1, 2, 3])
>>> a.ndim
1
>>> a.shape
(4,)
>>> len(a)
4

• 2-D, 3-D, . . . :

>>> b = np.array([[0, 1, 2], [3, 4, 5]]) # 2 x 3 array

>>> b
array([[0, 1, 2],
[3, 4, 5]])
>>> b.ndim
2
>>> b.shape
(2, 3)
>>> len(b) # returns the size of the first dimension
2

>>> c = np.array([[[1], [2]], [[3], [4]]])

>>> c
array([[[1],
[2]],

[[3],
[4]]])
>>> c.shape
(2, 2, 1)

Exercise: Simple arrays

4.1. The NumPy array object 52

Scipy lecture notes, Edition 2020.2

• Create a simple two dimensional array. First, redo the examples from above. And then create
your own: how about odd numbers counting backwards on the first row, and even numbers on
the second?
• Use the functions len(), numpy.shape() on these arrays. How do they relate to each other?
And to the ndim attribute of the arrays?

Functions for creating arrays

Tip: In practice, we rarely enter items one by one. . .

• Evenly spaced:

>>> a = np.arange(10) # 0 .. n-1 (!)

>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.arange(1, 9, 2) # start, end (exclusive), step
>>> b
array([1, 3, 5, 7])

• or by number of points:

>>> c = np.linspace(0, 1, 6) # start, end, num-points

>>> c
array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
>>> d = np.linspace(0, 1, 5, endpoint=False)
>>> d
array([0. , 0.2, 0.4, 0.6, 0.8])

• Common arrays:

>>> a = np.ones((3, 3)) # reminder: (3, 3) is a tuple

>>> a
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
>>> b = np.zeros((2, 2))
>>> b
array([[0., 0.],
[0., 0.]])
>>> c = np.eye(3)
>>> c
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
>>> d = np.diag(np.array([1, 2, 3, 4]))
>>> d
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])

• np.random: random numbers (Mersenne Twister PRNG):

>>> a = np.random.rand(4) # uniform in [0, 1]

>>> a
array([ 0.95799151, 0.14222247, 0.08777354, 0.51887998])

>>> b = np.random.randn(4) # Gaussian

(continues on next page)

4.1. The NumPy array object 53

Scipy lecture notes, Edition 2020.2

(continued from previous page)

>>> b
array([ 0.37544699, -0.11425369, -0.47616538, 1.79664113])

>>> np.random.seed(1234) # Setting the random seed

Exercise: Creating arrays using functions

• Experiment with arange, linspace, ones, zeros, eye and diag.

• Create different kinds of arrays with random numbers.
• Try setting the seed before creating an array with random values.
• Look at the function np.empty. What does it do? When might this be useful?

4.1.3 Basic data types

You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. 2.
vs 2). This is due to a difference in the data-type used:
>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')

>>> b = np.array([1., 2., 3.])

>>> b.dtype
dtype('float64')

Tip: Different data-types allow us to store data more compactly in memory, but most of the time we
simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the
data-type from the input.

You can explicitly specify which data-type you want:

>>> c = np.array([1, 2, 3], dtype=float)
>>> c.dtype
dtype('float64')

The default data type is floating point:

>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')

There are also other types:

Complex
>>> d = np.array([1+2j, 3+4j, 5+6*1j])
>>> d.dtype
dtype('complex128')

Bool
>>> e = np.array([True, False, False, True])
>>> e.dtype
dtype('bool')

4.1. The NumPy array object 54

Scipy lecture notes, Edition 2020.2

Strings

>>> f = np.array(['Bonjour', 'Hello', 'Hallo'])

>>> f.dtype # <--- strings containing max. 7 letters
dtype('S7')

Much more
• int32
• int64
• uint32
• uint64

4.1.4 Basic visualization

Now that we have our first data arrays, we are going to visualize them.
Start by launching IPython:

$ ipython # or ipython3 depending on your install

Or the notebook:

$ jupyter notebook

Once IPython has started, enable interactive plots:

>>> %matplotlib

Or, from the notebook, enable plots in the notebook:

>>> %matplotlib inline

The inline is important for the notebook, so that plots are displayed in the notebook and not in a new
window.
Matplotlib is a 2D plotting package. We can import its functions as below:

>>> import matplotlib.pyplot as plt # the tidy way

And then use (note that you have to use show explicitly if you have not enabled interactive plots with
%matplotlib):

>>> plt.plot(x, y) # line plot

>>> plt.show() # <-- shows the plot (not needed with interactive plots)

Or, if you have enabled interactive plots with %matplotlib:

>>> plt.plot(x, y) # line plot

• 1D plotting:

>>> x = np.linspace(0, 3, 20)

>>> y = np.linspace(0, 9, 20)
>>> plt.plot(x, y) # line plot
[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, y, 'o') # dot plot
[<matplotlib.lines.Line2D object at ...>]

4.1. The NumPy array object 55

Scipy lecture notes, Edition 2020.2

• 2D arrays (such as images):

>>> image = np.random.rand(30, 30)

>>> plt.imshow(image, cmap=plt.cm.hot)
<matplotlib.image.AxesImage object at ...>
>>> plt.colorbar()
<matplotlib.colorbar.Colorbar object at ...>

See also:
More in the: matplotlib chapter

Exercise: Simple visualizations

• Plot some simple arrays: a cosine as a function of time and a 2D matrix.

• Try using the gray colormap on the 2D matrix.

4.1.5 Indexing and slicing

The items of an array can be accessed and assigned to the same way as other Python sequences (e.g.
lists):

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[-1]
(0, 2, 9)

4.1. The NumPy array object 56

Scipy lecture notes, Edition 2020.2

Warning: Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran
or Matlab, indices begin at 1.

The usual python idiom for reversing a sequence is supported:

>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

For multidimensional arrays, indices are tuples of integers:

>>> a = np.diag(np.arange(3))
>>> a
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 2]])
>>> a[1, 1]
1
>>> a[2, 1] = 10 # third line, second column
>>> a
array([[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 10, 2]])
>>> a[1]
array([0, 1, 0])

Note:
• In 2D, the first dimension corresponds to rows, the second to columns.
• for multidimensional a, a[0] is interpreted by taking all elements in the unspecified dimensions.

Slicing: Arrays, like other Python sequences can also be sliced:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[2:9:3] # [start:end:step]
array([2, 5, 8])

Note that the last index is not included! :

>>> a[:4]
array([0, 1, 2, 3])

All three slice components are not required: by default, start is 0, end is the last and step is 1:

>>> a[1:3]
array([1, 2])
>>> a[::2]
array([0, 2, 4, 6, 8])
>>> a[3:]
array([3, 4, 5, 6, 7, 8, 9])

A small illustrated summary of NumPy indexing and slicing. . .

4.1. The NumPy array object 57

Scipy lecture notes, Edition 2020.2

>>> a[0, 3:5]

array([3, 4]) 0 1 2 3 4 5
>>> a[4:, 4:]
10 11 12 13 14 15
array([[44, 55],
[54, 55]]) 20 21 22 23 24 25
>>> a[:, 2]
30 31 32 33 34 35
a([2, 12, 22, 32, 42, 52])

>>> a[2::2, ::2] 40 41 42 43 44 45

array([[20, 22, 24],
50 51 52 53 54 55
[40, 42, 44]])

You can also combine assignment and slicing:

>>> a = np.arange(10)
>>> a[5:] = 10
>>> a
array([ 0, 1, 2, 3, 4, 10, 10, 10, 10, 10])
>>> b = np.arange(5)
>>> a[5:] = b[::-1]
>>> a
array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

Exercise: Indexing and slicing

• Try the different flavours of slicing, using start, end and step: starting from a linspace, try to
obtain odd numbers counting backwards, and even numbers counting forwards.
• Reproduce the slices in the diagram above. You may use the following expression to create the
array:
>>> np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])

Exercise: Array creation

Create the following arrays (with correct data types):

[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 2],
[1, 6, 1, 1]]

[[0., 0., 0., 0., 0.],

[2., 0., 0., 0., 0.],
[0., 3., 0., 0., 0.],
[0., 0., 4., 0., 0.],
[0., 0., 0., 5., 0.],
[0., 0., 0., 0., 6.]]

4.1. The NumPy array object 58

Scipy lecture notes, Edition 2020.2

Par on course: 3 statements for each

Hint: Individual array elements can be accessed similarly to a list, e.g. a[1] or a[1, 2].
Hint: Examine the docstring for diag.

Exercise: Tiling for array creation

Skim through the documentation for np.tile, and use this function to construct the array:
[[4, 3, 4, 3, 4, 3],
[2, 1, 2, 1, 2, 1],
[4, 3, 4, 3, 4, 3],
[2, 1, 2, 1, 2, 1]]

4.1.6 Copies and views

A slicing operation creates a view on the original array, which is just a way of accessing array data.
Thus the original array is not copied in memory. You can use np.may_share_memory() to check if two
arrays share the same memory block. Note however, that this uses heuristics and may give you false
positives.
When modifying the view, the original array is modified as well:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a[::2]
>>> b
array([0, 2, 4, 6, 8])
>>> np.may_share_memory(a, b)
True
>>> b[0] = 12
>>> b
array([12, 2, 4, 6, 8])
>>> a # (!)
array([12, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> a = np.arange(10)
>>> c = a[::2].copy() # force a copy
>>> c[0] = 12
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> np.may_share_memory(a, c)
False

This behavior can be surprising at first sight. . . but it allows to save both memory and time.

Worked example: Prime number sieve

4.1. The NumPy array object 59

Scipy lecture notes, Edition 2020.2

Compute prime numbers in 0–99, with a sieve

• Construct a shape (100,) boolean array is_prime, filled with True in the beginning:
>>> is_prime = np.ones((100,), dtype=bool)

• Cross out 0 and 1 which are not primes:

>>> is_prime[:2] = 0

• For each integer j starting from 2, cross out its higher multiples:
>>> N_max = int(np.sqrt(len(is_prime) - 1))
>>> for j in range(2, N_max + 1):
... is_prime[2*j::j] = False

• Skim through help(np.nonzero), and print the prime numbers

• Follow-up:
– Move the above code into a script file named prime_sieve.py
– Run it to check it works
– Use the optimization suggested in the sieve of Eratosthenes:
1. Skip j which are already known to not be primes
2. The first number to cross out is 𝑗 2

4.1.7 Fancy indexing

Tip: NumPy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This
method is called fancy indexing. It creates copies not views.

Using boolean masks

>>> np.random.seed(3)
>>> a = np.random.randint(0, 21, 15)
>>> a
array([10, 3, 8, 0, 19, 10, 11, 9, 10, 6, 0, 20, 12, 7, 14])
>>> (a % 3 == 0)
array([False, True, False, True, False, False, False, True, False,
(continues on next page)

4.1. The NumPy array object 60

Scipy lecture notes, Edition 2020.2

(continued from previous page)

True, True, False, True, False, False])
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask] # or, a[a%3==0]
>>> extract_from_a # extract a sub-array with the mask
array([ 3, 0, 9, 6, 0, 12])

Indexing with a mask can be very useful to assign a new value to a sub-array:

>>> a[a % 3 == 0] = -1
>>> a
array([10, -1, 8, -1, 19, 10, 11, -1, 10, -1, -1, 20, -1, 7, 14])

Indexing with an array of integers

>>> a = np.arange(0, 100, 10)

>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Indexing can be done with an array of integers, where the same index is repeated several time:

>>> a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list

array([20, 30, 20, 40, 20])

New values can be assigned with this kind of indexing:

>>> a[[9, 7]] = -100

>>> a
array([ 0, 10, 20, 30, 40, 50, 60, -100, 80, -100])

Tip: When a new array is created by indexing with an array of integers, the new array has the same
shape as the array of integers:

>>> a = np.arange(10)
>>> idx = np.array([[3, 4], [9, 7]])
>>> idx.shape
(2, 2)
>>> a[idx]
array([[3, 4],
[9, 7]])

The image below illustrates various fancy indexing applications

>>> a[(0,1,2,3,4), (1,2,3,4,5)]

0 1 2 3 4 5
array([1, 12, 23, 34, 45])

>>> a[3:, [0,2,5]] 10 11 12 13 14 15

array([[30, 32, 35],
20 21 22 23 24 25
[40, 42, 45],
[50, 52, 55]])
30 31 32 33 34 35
>>> mask = np.array([1,0,1,0,0,1], dtype=bool)
>>> a[mask, 2] 40 41 42 43 44 45
array([2, 22, 52])
50 51 52 53 54 55

4.1. The NumPy array object 61

Scipy lecture notes, Edition 2020.2

Exercise: Fancy indexing

• Again, reproduce the fancy indexing shown in the diagram above.

• Use fancy indexing on the left and array creation on the right to assign values into an array, for
instance by setting parts of the array in the diagram above to zero.

4.2 Numerical operations on arrays

Section contents

• Elementwise operations
• Basic reductions
• Broadcasting
• Array shape manipulation
• Sorting data
• Summary

4.2.1 Elementwise operations

Basic operations
With scalars:

>>> a = np.array([1, 2, 3, 4])

>>> a + 1
array([2, 3, 4, 5])
>>> 2**a
array([ 2, 4, 8, 16])

All arithmetic operates elementwise:

>>> b = np.ones(4) + 1
>>> a - b
array([-1., 0., 1., 2.])
>>> a * b
array([2., 4., 6., 8.])

>>> j = np.arange(5)
>>> 2**(j + 1) - j
array([ 2, 3, 6, 13, 28])

These operations are of course much faster than if you did them in pure python:

>>> a = np.arange(10000)
>>> %timeit a + 1
10000 loops, best of 3: 24.3 us per loop
>>> l = range(10000)
>>> %timeit [i+1 for i in l]
1000 loops, best of 3: 861 us per loop

Warning: Array multiplication is not matrix multiplication:

4.2. Numerical operations on arrays 62

Scipy lecture notes, Edition 2020.2

>>> c = np.ones((3, 3))

>>> c * c # NOT matrix multiplication!
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])

Note: Matrix multiplication:

>>> c.dot(c)
array([[3., 3., 3.],
[3., 3., 3.],
[3., 3., 3.]])

Exercise: Elementwise operations

• Try simple arithmetic elementwise operations: add even elements with odd elements
• Time them against their pure python counterparts using %timeit.
• Generate:
– [2**0, 2**1, 2**2, 2**3, 2**4]
– a_j = 2^(3*j) - j

Other operations
Comparisons:
>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> a == b
array([False, True, False, True])
>>> a > b
array([False, False, True, False])

Tip: Array-wise comparisons:

>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> c = np.array([1, 2, 3, 4])
>>> np.array_equal(a, b)
False
>>> np.array_equal(a, c)
True

Logical operations:
>>> a = np.array([1, 1, 0, 0], dtype=bool)
>>> b = np.array([1, 0, 1, 0], dtype=bool)
>>> np.logical_or(a, b)
array([ True, True, True, False])
>>> np.logical_and(a, b)
array([ True, False, False, False])

Transcendental functions:

4.2. Numerical operations on arrays 63

Scipy lecture notes, Edition 2020.2

>>> a = np.arange(5)
>>> np.sin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
>>> np.log(a)
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436])
>>> np.exp(a)
array([ 1. , 2.71828183, 7.3890561 , 20.08553692, 54.59815003])

Shape mismatches

>>> a = np.arange(4)
>>> a + np.array([1, 2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4) (2)

Broadcasting? We’ll return to that later.

Transposition:

>>> a = np.triu(np.ones((3, 3)), 1) # see help(np.triu)

>>> a
array([[0., 1., 1.],
[0., 0., 1.],
[0., 0., 0.]])
>>> a.T
array([[0., 0., 0.],
[1., 0., 0.],
[1., 1., 0.]])

Note: The transposition is a view

The transpose returns a view of the original array:

>>> a = np.arange(9).reshape(3, 3)
>>> a.T[0, 2] = 999
>>> a.T
array([[ 0, 3, 999],
[ 1, 4, 7],
[ 2, 5, 8]])
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[999, 7, 8]])

Note: Linear algebra

The sub-module numpy.linalg implements basic linear algebra, such as solving linear systems, singular
value decomposition, etc. However, it is not guaranteed to be compiled using efficient routines, and thus
we recommend the use of scipy.linalg, as detailed in section Linear algebra operations: scipy.linalg

Exercise other operations

• Look at the help for np.allclose. When might this be useful?

• Look at the help for np.triu and np.tril.

4.2. Numerical operations on arrays 64

Scipy lecture notes, Edition 2020.2

4.2.2 Basic reductions

Computing sums

>>> x = np.array([1, 2, 3, 4])

>>> np.sum(x)
10
>>> x.sum()
10

Sum by rows and by columns:

>>> x = np.array([[1, 1], [2, 2]])
>>> x
array([[1, 1],
[2, 2]])
>>> x.sum(axis=0) # columns (first dimension)
array([3, 3])
>>> x[:, 0].sum(), x[:, 1].sum()
(3, 3)
>>> x.sum(axis=1) # rows (second dimension)
array([2, 4])
>>> x[0, :].sum(), x[1, :].sum()
(2, 4)

Tip: Same idea in higher dimensions:

>>> x = np.random.rand(2, 2, 2)
>>> x.sum(axis=2)[0, 1]
1.14764...
>>> x[0, 1, :].sum()
1.14764...

Other reductions
— works the same way (and take axis=)
Extrema:
>>> x = np.array([1, 3, 2])
>>> x.min()
1
>>> x.max()
(continues on next page)

4.2. Numerical operations on arrays 65

Scipy lecture notes, Edition 2020.2

(continued from previous page)

>>> x.argmin() # index of minimum

0
>>> x.argmax() # index of maximum
1

Logical operations:

>>> np.all([True, True, False])

False
>>> np.any([True, True, False])
True

Note: Can be used for array comparisons:

>>> a = np.zeros((100, 100))

>>> np.any(a != 0)
False
>>> np.all(a == a)
True

>>> a = np.array([1, 2, 3, 2])

>>> b = np.array([2, 2, 3, 2])
>>> c = np.array([6, 4, 4, 5])
>>> ((a <= b) & (b <= c)).all()
True

Statistics:

>>> x = np.array([1, 2, 3, 1])

>>> y = np.array([[1, 2, 3], [5, 6, 1]])
>>> x.mean()
1.75
>>> np.median(x)
1.5
>>> np.median(y, axis=-1) # last axis
array([2., 5.])

>>> x.std() # full population standard dev.

0.82915619758884995

. . . and many more (best to learn as you go).

Exercise: Reductions

• Given there is a sum, what other function might you expect to see?
• What is the difference between sum and cumsum?

Worked Example: diffusion using a random walk algorithm

4.2. Numerical operations on arrays 66

Scipy lecture notes, Edition 2020.2

Tip: Let us consider a simple 1D random walk process: at each time step a walker jumps right or
left with equal probability.

We are interested in finding the typical distance from the origin of a random walker after t left or
right jumps? We are going to simulate many “walkers” to find this law, and we are going to do so
using array computing tricks: we are going to create a 2D array with the “stories” (each walker has a
story) in one direction, and the time in the other:

>>> n_stories = 1000 # number of walkers

>>> t_max = 200 # time during which we follow the walker

We randomly choose all the steps 1 or -1 of the walk:

>>> t = np.arange(t_max)
>>> steps = 2 * np.random.randint(0, 1 + 1, (n_stories, t_max)) - 1 # +1 because the high␣
˓→value is exclusive

>>> np.unique(steps) # Verification: all steps are 1 or -1

array([-1, 1])

We build the walks by summing steps along the time:

>>> positions = np.cumsum(steps, axis=1) # axis = 1: dimension of time

>>> sq_distance = positions**2

We get the mean in the axis of the stories:

>>> mean_sq_distance = np.mean(sq_distance, axis=0)

Plot the results:

>>> plt.figure(figsize=(4, 3))

<Figure size ... with 0 Axes>
>>> plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
[<matplotlib.lines.Line2D object at ...>, <matplotlib.lines.Line2D object at ...>]
>>> plt.xlabel(r"$t$")
Text(...'$t$')
>>> plt.ylabel(r"$\sqrt{\langle (\delta x)^2 \rangle}$")
Text(...'$\\sqrt{\\langle (\\delta x)^2 \\rangle}$')
>>> plt.tight_layout() # provide sufficient space for labels

4.2. Numerical operations on arrays 67

Scipy lecture notes, Edition 2020.2

We find a well-known result in physics: the RMS

distance grows as the square root of the time!

4.2.3 Broadcasting
• Basic operations on numpy arrays (addition, etc.) are elementwise
• This works on arrays of the same size.
Nevertheless, It’s also possible to do operations on arrays of different
sizes if NumPy can transform these arrays so that they all have
the same size: this conversion is called broadcasting.
The image below gives an example of broadcasting:

Let’s verify:

4.2. Numerical operations on arrays 68

Scipy lecture notes, Edition 2020.2

>>> a = np.tile(np.arange(0, 40, 10), (3, 1)).T

>>> a
array([[ 0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]])
>>> b = np.array([0, 1, 2])
>>> a + b
array([[ 0, 1, 2],
[10, 11, 12],
[20, 21, 22],
[30, 31, 32]])

We have already used broadcasting without knowing it!:

>>> a = np.ones((4, 5))
>>> a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1
>>> a
array([[2., 2., 2., 2., 2.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])

A useful trick:
>>> a = np.arange(0, 40, 10)
>>> a.shape
(4,)
>>> a = a[:, np.newaxis] # adds a new axis -> 2D array
>>> a.shape
(4, 1)
>>> a
array([[ 0],
[10],
[20],
[30]])
>>> a + b
array([[ 0, 1, 2],
[10, 11, 12],
[20, 21, 22],
[30, 31, 32]])

Tip: Broadcasting seems a bit magical, but it is actually quite natural to use it when we want to solve
a problem whose output data is an array with more dimensions than input data.

Worked Example: Broadcasting

Let’s construct an array of distances (in miles) between cities of Route 66: Chicago, Springfield,
Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff and Los Angeles.
>>> mileposts = np.array([0, 198, 303, 736, 871, 1175, 1475, 1544,
... 1913, 2448])
>>> distance_array = np.abs(mileposts - mileposts[:, np.newaxis])
>>> distance_array
array([[ 0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448],
[ 198, 0, 105, 538, 673, 977, 1277, 1346, 1715, 2250],
[ 303, 105, 0, 433, 568, 872, 1172, 1241, 1610, 2145],
[ 736, 538, 433, 0, 135, 439, 739, 808, 1177, 1712],
[ 871, 673, 568, 135, 0, 304, 604, 673, 1042, 1577],
[1175, 977, 872, 439, 304, 0, 300, 369, 738, 1273],
[1475, 1277, 1172, 739, 604, 300, 0, 69, 438, 973],
[1544, 1346, 1241, 808, 673, 369, 69, 0, 369, 904],
[1913, 1715, 1610, 1177, 1042,
4.2. Numerical operations on arrays 738, 438, 369, 0, 535], 69
[2448, 2250, 2145, 1712, 1577, 1273, 973, 904, 535, 0]])
Scipy lecture notes, Edition 2020.2

A lot of grid-based or network-based problems can also use broadcasting. For instance, if we want to
compute the distance from the origin of points on a 5x5 grid, we can do
>>> x, y = np.arange(5), np.arange(5)[:, np.newaxis]
>>> distance = np.sqrt(x ** 2 + y ** 2)
>>> distance
array([[0. , 1. , 2. , 3. , 4. ],
[1. , 1.41421356, 2.23606798, 3.16227766, 4.12310563],
[2. , 2.23606798, 2.82842712, 3.60555128, 4.47213595],
[3. , 3.16227766, 3.60555128, 4.24264069, 5. ],
[4. , 4.12310563, 4.47213595, 5. , 5.65685425]])

Or in color:
>>> plt.pcolor(distance)
>>> plt.colorbar()

Remark : the numpy.ogrid() function allows to

directly create vectors x and y of the previous example, with two “significant dimensions”:
>>> x, y = np.ogrid[0:5, 0:5]
>>> x, y
(array([[0],
[1],
[2],
[3],
[4]]), array([[0, 1, 2, 3, 4]]))
(continues on next page)

4.2. Numerical operations on arrays 70

Scipy lecture notes, Edition 2020.2

(continued from previous page)

>>> x.shape, y.shape
((5, 1), (1, 5))
>>> distance = np.sqrt(x ** 2 + y ** 2)

Tip: So, np.ogrid is very useful as soon as we have to handle computations on a grid. On the other
hand, np.mgrid directly provides matrices full of indices for cases where we can’t (or don’t want to)
benefit from broadcasting:

>>> x, y = np.mgrid[0:4, 0:4]

>>> x
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
>>> y
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])

See also:
Broadcasting: discussion of broadcasting in the Advanced NumPy chapter.

4.2.4 Array shape manipulation

Flattening

>>> a = np.array([[1, 2, 3], [4, 5, 6]])

>>> a.ravel()
array([1, 2, 3, 4, 5, 6])
>>> a.T
array([[1, 4],
[2, 5],
[3, 6]])
>>> a.T.ravel()
array([1, 4, 2, 5, 3, 6])

Higher dimensions: last dimensions ravel out “first”.

Reshaping
The inverse operation to flattening:

>>> a.shape
(2, 3)
>>> b = a.ravel()
>>> b = b.reshape((2, 3))
>>> b
array([[1, 2, 3],
[4, 5, 6]])

Or,

>>> a.reshape((2, -1)) # unspecified (-1) value is inferred

array([[1, 2, 3],
[4, 5, 6]])

4.2. Numerical operations on arrays 71

Scipy lecture notes, Edition 2020.2

Warning: ndarray.reshape may return a view (cf help(np.reshape))), or copy

Tip:

>>> b[0, 0] = 99
>>> a
array([[99, 2, 3],
[ 4, 5, 6]])

Beware: reshape may also return a copy!:

>>> a = np.zeros((3, 2))

>>> b = a.T.reshape(3*2)
>>> b[0] = 9
>>> a
array([[0., 0.],
[0., 0.],
[0., 0.]])

To understand this you need to learn more about the memory layout of a numpy array.

Adding a dimension
Indexing with the np.newaxis object allows us to add an axis to an array (you have seen this already
above in the broadcasting section):

>>> z = np.array([1, 2, 3])

>>> z
array([1, 2, 3])

>>> z[:, np.newaxis]

array([[1],
[2],
[3]])

>>> z[np.newaxis, :]
array([[1, 2, 3]])

Dimension shuffling

>>> a = np.arange(4*3*2).reshape(4, 3, 2)
>>> a.shape
(4, 3, 2)
>>> a[0, 2, 1]
5
>>> b = a.transpose(1, 2, 0)
>>> b.shape
(3, 2, 4)
>>> b[2, 1, 0]
5

Also creates a view:

>>> b[2, 1, 0] = -1
>>> a[0, 2, 1]
-1

4.2. Numerical operations on arrays 72

Scipy lecture notes, Edition 2020.2

Resizing
Size of an array can be changed with ndarray.resize:

>>> a = np.arange(4)
>>> a.resize((8,))
>>> a
array([0, 1, 2, 3, 0, 0, 0, 0])

However, it must not be referred to somewhere else:

>>> b = a
>>> a.resize((4,))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot resize an array that has been referenced or is
referencing another array in this way. Use the resize function

Exercise: Shape manipulations

• Look at the docstring for reshape, especially the notes section which has some more information
about copies and views.
• Use flatten as an alternative to ravel. What is the difference? (Hint: check which one returns
a view and which a copy)
• Experiment with transpose for dimension shuffling.

4.2.5 Sorting data

Sorting along an axis:

>>> a = np.array([[4, 3, 5], [1, 2, 1]])

>>> b = np.sort(a, axis=1)
>>> b
array([[3, 4, 5],
[1, 1, 2]])

Note: Sorts each row separately!

In-place sort:

>>> a.sort(axis=1)
>>> a
array([[3, 4, 5],
[1, 1, 2]])

Sorting with fancy indexing:

>>> a = np.array([4, 3, 1, 2])

>>> j = np.argsort(a)
>>> j
array([2, 3, 1, 0])
>>> a[j]
array([1, 2, 3, 4])

Finding minima and maxima:

4.2. Numerical operations on arrays 73

Scipy lecture notes, Edition 2020.2

>>> a = np.array([4, 3, 1, 2])

>>> j_max = np.argmax(a)
>>> j_min = np.argmin(a)
>>> j_max, j_min
(0, 2)

Exercise: Sorting

• Try both in-place and out-of-place sorting.

• Try creating arrays with different dtypes and sorting them.
• Use all or array_equal to check the results.
• Look at np.random.shuffle for a way to create sortable input quicker.
• Combine ravel, sort and reshape.
• Look at the axis keyword for sort and rewrite the previous exercise.

4.2.6 Summary
What do you need to know to get started?
• Know how to create arrays : array, arange, ones, zeros.
• Know the shape of the array with array.shape, then use slicing to obtain different views of the
array: array[::2], etc. Adjust the shape of the array using reshape or flatten it with ravel.
• Obtain a subset of the elements of an array and/or modify their values with masks

>>> a[a < 0] = 0

• Know miscellaneous operations on arrays, such as finding the mean or max (array.max(), array.
mean()). No need to retain everything, but have the reflex to search in the documentation (online
docs, help(), lookfor())!!
• For advanced use: master the indexing with arrays of integers, as well as broadcasting. Know more
NumPy functions to handle various array operations.

Quick read

If you want to do a first quick pass through the Scipy lectures to learn the ecosystem, you can directly
skip to the next chapter: Matplotlib: plotting.
The remainder of this chapter is not necessary to follow the rest of the intro part. But be sure to
come back and finish this chapter, as well as to do some more exercices.

4.3 More elaborate arrays

Section contents

• More data types

• Structured data types
• maskedarray: dealing with (propagation of) missing data

4.3. More elaborate arrays 74

Scipy lecture notes, Edition 2020.2

4.3.1 More data types

Casting
“Bigger” type wins in mixed-type operations:

>>> np.array([1, 2, 3]) + 1.5

array([2.5, 3.5, 4.5])

Assignment never changes the type!

>>> a = np.array([1, 2, 3])

>>> a.dtype
dtype('int64')
>>> a[0] = 1.9 # <-- float is truncated to integer
>>> a
array([1, 2, 3])

Forced casts:

>>> a = np.array([1.7, 1.2, 1.6])

>>> b = a.astype(int) # <-- truncates to integer
>>> b
array([1, 1, 1])

Rounding:

>>> a = np.array([1.2, 1.5, 1.6, 2.5, 3.5, 4.5])

>>> b = np.around(a)
>>> b # still floating-point
array([1., 2., 2., 2., 4., 4.])
>>> c = np.around(a).astype(int)
>>> c
array([1, 2, 2, 2, 4, 4])

Different data type sizes

Integers (signed):

int8 8 bits
int16 16 bits
int32 32 bits (same as int on 32-bit platform)
int64 64 bits (same as int on 64-bit platform)

>>> np.array([1], dtype=int).dtype

dtype('int64')
>>> np.iinfo(np.int32).max, 2**31 - 1
(2147483647, 2147483647)

Unsigned integers:

uint8 8 bits
uint16 16 bits
uint32 32 bits
uint64 64 bits

>>> np.iinfo(np.uint32).max, 2**32 - 1

(4294967295, 4294967295)

4.3. More elaborate arrays 75

Scipy lecture notes, Edition 2020.2

Long integers

Python 2 has a specific type for ‘long’ integers, that cannot overflow, represented with an ‘L’ at the
end. In Python 3, all integers are long, and thus cannot overflow.
>>> np.iinfo(np.int64).max, 2**63 - 1
(9223372036854775807, 9223372036854775807L)

Floating-point numbers:

float16 16 bits
float32 32 bits
float64 64 bits (same as float)
float96 96 bits, platform-dependent (same as np.longdouble)
float128 128 bits, platform-dependent (same as np.longdouble)

>>> np.finfo(np.float32).eps
1.1920929e-07
>>> np.finfo(np.float64).eps
2.2204460492503131e-16

>>> np.float32(1e-8) + np.float32(1) == 1

True
>>> np.float64(1e-8) + np.float64(1) == 1
False

Complex floating-point numbers:

complex64 two 32-bit floats

complex128 two 64-bit floats
complex192 two 96-bit floats, platform-dependent
complex256 two 128-bit floats, platform-dependent

Smaller data types

If you don’t know you need special data types, then you probably don’t.
Comparison on using float32 instead of float64:
• Half the size in memory and on disk
• Half the memory bandwidth required (may be a bit faster in some operations)
In [1]: a = np.zeros((int(1e6),), dtype=np.float64)

In [2]: b = np.zeros((int(1e6),), dtype=np.float32)

In [3]: %timeit a*a

1000 loops, best of 3: 1.78 ms per loop

In [4]: %timeit b*b

1000 loops, best of 3: 1.07 ms per loop

• But: bigger rounding errors — sometimes in surprising places (i.e., don’t use them unless you
really need them)

4.3. More elaborate arrays 76

Scipy lecture notes, Edition 2020.2

4.3.2 Structured data types

sensor_code (4-character string)

position (float)
value (float)

>>> samples = np.zeros((6,), dtype=[('sensor_code', 'S4'),

... ('position', float), ('value', float)])
>>> samples.ndim
1
>>> samples.shape
(6,)
>>> samples.dtype.names
('sensor_code', 'position', 'value')

>>> samples[:] = [('ALFA', 1, 0.37), ('BETA', 1, 0.11), ('TAU', 1, 0.13),

... ('ALFA', 1.5, 0.37), ('ALFA', 3, 0.11), ('TAU', 1.2, 0.13)]
>>> samples
array([('ALFA', 1.0, 0.37), ('BETA', 1.0, 0.11), ('TAU', 1.0, 0.13),
('ALFA', 1.5, 0.37), ('ALFA', 3.0, 0.11), ('TAU', 1.2, 0.13)],
dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])

Field access works by indexing with field names:

>>> samples['sensor_code']
array(['ALFA', 'BETA', 'TAU', 'ALFA', 'ALFA', 'TAU'],
dtype='|S4')
>>> samples['value']
array([0.37, 0.11, 0.13, 0.37, 0.11, 0.13])
>>> samples[0]
('ALFA', 1.0, 0.37)

>>> samples[0]['sensor_code'] = 'TAU'

>>> samples[0]
('TAU', 1.0, 0.37)

Multiple fields at once:

>>> samples[['position', 'value']]

array([(1. , 0.37), (1. , 0.11), (1. , 0.13), (1.5, 0.37),
(3. , 0.11), (1.2, 0.13)],
dtype=[('position', '<f8'), ('value', '<f8')])

Fancy indexing works, as usual:

>>> samples[samples['sensor_code'] == b'ALFA']

array([(b'ALFA', 1.5, 0.37), (b'ALFA', 3. , 0.11)],
dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])

Note: There are a bunch of other syntaxes for constructing structured arrays, see here and here.

4.3.3 maskedarray: dealing with (propagation of) missing data

• For floats one could use NaN’s, but masks work for all types:

>>> x = np.ma.array([1, 2, 3, 4], mask=[0, 1, 0, 1])

>>> x
masked_array(data=[1, --, 3, --],
(continues on next page)

4.3. More elaborate arrays 77

Scipy lecture notes, Edition 2020.2

(continued from previous page)

mask=[False, True, False, True],
fill_value=999999)

>>> y = np.ma.array([1, 2, 3, 4], mask=[0, 1, 1, 1])

>>> x + y
masked_array(data=[2, --, --, --],
mask=[False, True, True, True],
fill_value=999999)

• Masking versions of common functions:

>>> np.ma.sqrt([1, -1, 2, -2])
masked_array(data=[1.0, --, 1.41421356237... --],
mask=[False, True, False, True],
fill_value=1e+20)

Note: There are other useful array siblings

While it is off topic in a chapter on numpy, let’s take a moment to recall good coding practice, which
really do pay off in the long run:

Good practices

• Explicit variable names (no need of a comment to explain what is in the variable)
• Style: spaces after commas, around =, etc.
A certain number of rules for writing “beautiful” code (and, more importantly, using the same
conventions as everybody else!) are given in the Style Guide for Python Code and the Docstring
Conventions page (to manage help strings).
• Except some rare cases, variable names and comments in English.

4.4 Advanced operations

Section contents

• Polynomials
• Loading data files

4.4.1 Polynomials
NumPy also contains polynomials in different bases:
For example, 3𝑥2 + 2𝑥 − 1:
>>> p = np.poly1d([3, 2, -1])
>>> p(0)
-1
>>> p.roots
array([-1. , 0.33333333])
>>> p.order
2

4.4. Advanced operations 78

Scipy lecture notes, Edition 2020.2

>>> x = np.linspace(0, 1, 20)

>>> y = np.cos(x) + 0.3*np.random.rand(20)
>>> p = np.poly1d(np.polyfit(x, y, 3))

>>> t = np.linspace(0, 1, 200) # use a larger number of points for smoother plotting
>>> plt.plot(x, y, 'o', t, p(t), '-')
[<matplotlib.lines.Line2D object at ...>, <matplotlib.lines.Line2D object at ...>]

See http://numpy.org/doc/stable/reference/
routines.polynomials.poly1d.html for more.

More polynomials (with more bases)

NumPy also has a more sophisticated polynomial interface, which supports e.g. the Chebyshev basis.
3𝑥2 + 2𝑥 − 1:

>>> p = np.polynomial.Polynomial([-1, 2, 3]) # coefs in different order!

>>> p(0)
-1.0
>>> p.roots()
array([-1. , 0.33333333])
>>> p.degree() # In general polynomials do not always expose 'order'
2

Example using polynomials in Chebyshev basis, for polynomials in range [-1, 1]:

>>> x = np.linspace(-1, 1, 2000)

>>> y = np.cos(x) + 0.3*np.random.rand(2000)
>>> p = np.polynomial.Chebyshev.fit(x, y, 90)

>>> plt.plot(x, y, 'r.')

[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, p(x), 'k-', lw=3)
[<matplotlib.lines.Line2D object at ...>]

4.4. Advanced operations 79

Scipy lecture notes, Edition 2020.2

The Chebyshev polynomials have some advantages

in interpolation.

4.4.2 Loading data files

Text files
Example: populations.txt:
# year hare lynx carrot
1900 30e3 4e3 48300
1901 47.2e3 6.1e3 48200
1902 70.2e3 9.8e3 41500
1903 77.4e3 35.2e3 38200

>>> data = np.loadtxt('data/populations.txt')

>>> data
array([[ 1900., 30000., 4000., 48300.],
[ 1901., 47200., 6100., 48200.],
[ 1902., 70200., 9800., 41500.],
...

>>> np.savetxt('pop2.txt', data)

>>> data2 = np.loadtxt('pop2.txt')

Note: If you have a complicated text file, what you can try are:
• np.genfromtxt
• Using Python’s I/O functions and e.g. regexps for parsing (Python is quite well suited for this)

Reminder: Navigating the filesystem with IPython

In [1]: pwd # show current directory

'/home/user/stuff/2011-numpy-tutorial'
In [2]: cd ex
'/home/user/stuff/2011-numpy-tutorial/ex'
In [3]: ls
populations.txt species.txt

Images
Using Matplotlib:

4.4. Advanced operations 80

Scipy lecture notes, Edition 2020.2

>>> img = plt.imread('data/elephant.png')

>>> img.shape, img.dtype
((200, 300, 3), dtype('float32'))
>>> plt.imshow(img)
<matplotlib.image.AxesImage object at ...>
>>> plt.savefig('plot.png')

>>> plt.imsave('red_elephant.png', img[:,:,0], cmap=plt.cm.gray)

This saved only one channel (of RGB):

>>> plt.imshow(plt.imread('red_elephant.png'))
<matplotlib.image.AxesImage object at ...>

Other libraries:

>>> import imageio

>>> imageio.imsave('tiny_elephant.png', img[::6,::6])
>>> plt.imshow(plt.imread('tiny_elephant.png'), interpolation='nearest')
<matplotlib.image.AxesImage object at ...>

4.4. Advanced operations 81

Scipy lecture notes, Edition 2020.2

NumPy’s own format

NumPy has its own binary format, not portable but with efficient I/O:

>>> data = np.ones((3, 3))

>>> np.save('pop.npy', data)
>>> data3 = np.load('pop.npy')

Well-known (& more obscure) file formats

• HDF5: h5py, PyTables
• NetCDF: scipy.io.netcdf_file, netcdf4-python, . . .
• Matlab: scipy.io.loadmat, scipy.io.savemat
• MatrixMarket: scipy.io.mmread, scipy.io.mmwrite
• IDL: scipy.io.readsav
. . . if somebody uses it, there’s probably also a Python library for it.

Exercise: Text data files

Write a Python script that loads data from populations.txt:: and drop the last column and the first
5 rows. Save the smaller dataset to pop2.txt.

NumPy internals

If you are interested in the NumPy internals, there is a good discussion in Advanced NumPy.

4.5 Some exercises

4.5.1 Array manipulations
1. Form the 2-D array (without typing it in explicitly):

[[1, 6, 11],
[2, 7, 12],
[3, 8, 13],
[4, 9, 14],
[5, 10, 15]]

4.5. Some exercises 82

Scipy lecture notes, Edition 2020.2

and generate a new array containing its 2nd and 4th rows.
2. Divide each column of the array:

>>> import numpy as np

>>> a = np.arange(25).reshape(5, 5)

elementwise with the array b = np.array([1., 5, 10, 15, 20]). (Hint: np.newaxis).
3. Harder one: Generate a 10 x 3 array of random numbers (in range [0,1]). For each row, pick the
number closest to 0.5.
• Use abs and argsort to find the column j closest for each row.
• Use fancy indexing to extract the numbers. (Hint: a[i,j] – the array i must contain the
row numbers corresponding to stuff in j.)

4.5.2 Picture manipulation: Framing a Face

Let’s do some manipulations on numpy arrays by starting with an image of a racoon. scipy provides a
2D array of this image with the scipy.misc.face function:

>>> from scipy import misc

>>> face = misc.face(gray=True) # 2D grayscale image

Here are a few images we will be able to obtain with our manipulations: use different colormaps, crop
the image, change some parts of the image.

• Let’s use the imshow function of matplotlib to display the image.

>>> import matplotlib.pyplot as plt

>>> face = misc.face(gray=True)
>>> plt.imshow(face)
<matplotlib.image.AxesImage object at 0x...>

• The face is displayed in false colors. A colormap must be specified for it to be displayed
in grey.

>>> plt.imshow(face, cmap=plt.cm.gray)

<matplotlib.image.AxesImage object at 0x...>

• Create an array of the image with a narrower centering [for example,] remove 100 pixels
from all the borders of the image. To check the result, display this new array with imshow.

>>> crop_face = face[100:-100, 100:-100]

• We will now frame the face with a black locket. For this, we need to create a mask cor-
responding to the pixels we want to be black. The center of the face is around (660, 330), so
we defined the mask by this condition (y-300)**2 + (x-660)**2

>>> sy, sx = face.shape

>>> y, x = np.ogrid[0:sy, 0:sx] # x and y indices of pixels
>>> y.shape, x.shape
((768, 1), (1, 1024))
(continues on next page)

4.5. Some exercises 83

Scipy lecture notes, Edition 2020.2

(continued from previous page)

>>> centerx, centery = (660, 300) # center of the image
>>> mask = ((y - centery)**2 + (x - centerx)**2) > 230**2 # circle

then we assign the value 0 to the pixels of the image corresponding to the mask. The syntax
is extremely simple and intuitive:

>>> face[mask] = 0
>>> plt.imshow(face)
<matplotlib.image.AxesImage object at 0x...>

• Follow-up: copy all instructions of this exercise in a script called face_locket.py then
execute this script in IPython with %run face_locket.py.
Change the circle to an ellipsoid.

4.5.3 Data statistics

The data in populations.txt describes the populations of hares and lynxes (and carrots) in northern
Canada during 20 years:

>>> data = np.loadtxt('data/populations.txt')

>>> year, hares, lynxes, carrots = data.T # trick: columns to variables

>>> import matplotlib.pyplot as plt

>>> plt.axes([0.2, 0.1, 0.5, 0.8])
<matplotlib.axes...Axes object at ...>
>>> plt.plot(year, hares, year, lynxes, year, carrots)
[<matplotlib.lines.Line2D object at ...>, ...]
>>> plt.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
<matplotlib.legend.Legend object at ...>

Computes and print, based on the data in

populations.txt. . .
1. The mean and std of the populations of each species for the years in the period.
2. Which year each species had the largest population.
3. Which species has the largest population for each year. (Hint: argsort & fancy indexing of
np.array(['H', 'L', 'C']))
4. Which years any of the populations is above 50000. (Hint: comparisons and np.any)
5. The top 2 years for each species when they had the lowest populations. (Hint: argsort, fancy
indexing)
6. Compare (plot) the change in hare population (see help(np.gradient)) and the number of lynxes.
Check correlation (see help(np.corrcoef)).

4.5. Some exercises 84

Scipy lecture notes, Edition 2020.2

. . . all without for-loops.

Solution: Python source file

4.5.4 Crude integral approximations

Write a function f(a, b, c) that returns 𝑎𝑏 −𝑐. Form a 24x12x6 array containing its values in parameter
ranges [0,1] x [0,1] x [0,1].
Approximate the 3-d integral
∫︁ 1 ∫︁ 1 ∫︁ 1
(𝑎𝑏 − 𝑐)𝑑𝑎 𝑑𝑏 𝑑𝑐
0 0 0

1
over this volume with the mean. The exact result is: ln 2 − 2 ≈ 0.1931 . . . — what is your relative error?
(Hints: use elementwise operations and broadcasting. You can make np.ogrid give a number of points
in given range with np.ogrid[0:1:20j].)
Reminder Python functions:
def f(a, b, c):
return some_result

Solution: Python source file

4.5.5 Mandelbrot set

Write a script that computes the Mandelbrot frac-

tal. The Mandelbrot iteration:
N_max = 50
some_threshold = 50

c = x + 1j*y

z = 0
for j in range(N_max):
z = z**2 + c

Point (x, y) belongs to the Mandelbrot set if |𝑧| < some_threshold.

Do this computation by:
1. Construct a grid of c = x + 1j*y values in range [-2, 1] x [-1.5, 1.5]
2. Do the iteration
3. Form the 2-d boolean mask indicating which points are in the set
4. Save the result to an image with:

4.5. Some exercises 85

Scipy lecture notes, Edition 2020.2

>>> import matplotlib.pyplot as plt

>>> plt.imshow(mask.T, extent=[-2, 1, -1.5, 1.5])
<matplotlib.image.AxesImage object at ...>
>>> plt.gray()
>>> plt.savefig('mandelbrot.png')

Solution: Python source file

4.5.6 Markov chain

Markov chain transition matrix P, and probability distribution on the states p:

1. 0 <= P[i,j] <= 1: probability to go from state i to state j
2. Transition rule: 𝑝𝑛𝑒𝑤 = 𝑃 𝑇 𝑝𝑜𝑙𝑑
3. all(sum(P, axis=1) == 1), p.sum() == 1: normalization
Write a script that works with 5 states, and:
• Constructs a random matrix, and normalizes each row so that it is a transition matrix.
• Starts from a random (normalized) probability distribution p and takes 50 steps => p_50
• Computes the stationary distribution: the eigenvector of P.T with eigenvalue 1 (numerically: closest
to 1) => p_stationary
Remember to normalize the eigenvector — I didn’t. . .
• Checks if p_50 and p_stationary are equal to tolerance 1e-5
Toolbox: np.random.rand, .dot(), np.linalg.eig, reductions, abs(), argmin, comparisons, all, np.
linalg.norm, etc.
Solution: Python source file

4.6 Full code examples

4.6.1 Full code examples for the numpy chapter

Note: Click here to download the full example code

4.6. Full code examples 86

Scipy lecture notes, Edition 2020.2

2D plotting
Plot a basic 2D figure

import numpy as np
import matplotlib.pyplot as plt

image = np.random.rand(30, 30)

plt.imshow(image, cmap=plt.cm.hot)
plt.colorbar()
plt.show()

Total running time of the script: ( 0 minutes 0.040 seconds)

Note: Click here to download the full example code

1D plotting
Plot a basic 1D figure

4.6. Full code examples 87

Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.plot(x, y)
plt.plot(x, y, 'o')
plt.show()

Total running time of the script: ( 0 minutes 0.087 seconds)

Note: Click here to download the full example code

Distances exercise
Plot distances in a grid

4.6. Full code examples 88

Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

x, y = np.arange(5), np.arange(5)[:, np.newaxis]

distance = np.sqrt(x ** 2 + y ** 2)
plt.pcolor(distance)
plt.colorbar()
plt.show()

Total running time of the script: ( 0 minutes 0.045 seconds)

Note: Click here to download the full example code

Fitting to polynomial
Plot noisy data and their polynomial fit

4.6. Full code examples 89

Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(12)

x = np.linspace(0, 1, 20)
y = np.cos(x) + 0.3*np.random.rand(20)
p = np.poly1d(np.polyfit(x, y, 3))

t = np.linspace(0, 1, 200)
plt.plot(x, y, 'o', t, p(t), '-')
plt.show()

Total running time of the script: ( 0 minutes 0.028 seconds)

Note: Click here to download the full example code

Fitting in Chebyshev basis

Plot noisy data and their polynomial fit in a Chebyshev basis

4.6. Full code examples 90

Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.linspace(-1, 1, 2000)
y = np.cos(x) + 0.3*np.random.rand(2000)
p = np.polynomial.Chebyshev.fit(x, y, 90)

plt.plot(x, y, 'r.')
plt.plot(x, p(x), 'k-', lw=3)
plt.show()

Total running time of the script: ( 0 minutes 0.055 seconds)

Note: Click here to download the full example code

Population exercise
Plot populations of hares, lynxes, and carrots

4.6. Full code examples 91

Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt('../data/populations.txt')
year, hares, lynxes, carrots = data.T

plt.axes([0.2, 0.1, 0.5, 0.8])

plt.plot(year, hares, year, lynxes, year, carrots)
plt.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
plt.show()

Total running time of the script: ( 0 minutes 0.019 seconds)

Note: Click here to download the full example code

Reading and writing an elephant

Read and write images

import numpy as np
import matplotlib.pyplot as plt

original figure

plt.figure()
img = plt.imread('../data/elephant.png')
plt.imshow(img)

4.6. Full code examples 92

Scipy lecture notes, Edition 2020.2

red channel displayed in grey

plt.figure()
img_red = img[:, :, 0]
plt.imshow(img_red, cmap=plt.cm.gray)

4.6. Full code examples 93

Scipy lecture notes, Edition 2020.2

lower resolution

plt.figure()
img_tiny = img[::6, ::6]
plt.imshow(img_tiny, interpolation='nearest')

plt.show()

4.6. Full code examples 94

Scipy lecture notes, Edition 2020.2

Total running time of the script: ( 0 minutes 0.046 seconds)

Note: Click here to download the full example code

Mandelbrot set
Compute the Mandelbrot fractal and plot it

4.6. Full code examples 95

Scipy lecture notes, Edition 2020.2

import numpy as np
import matplotlib.pyplot as plt
from numpy import newaxis

def compute_mandelbrot(N_max, some_threshold, nx, ny):

# A grid of c-values
x = np.linspace(-2, 1, nx)
y = np.linspace(-1.5, 1.5, ny)

c = x[:,newaxis] + 1j*y[newaxis,:]

# Mandelbrot iteration

z = c

# The code below overflows in many regions of the x-y grid, suppress
# warnings temporarily
with np.warnings.catch_warnings():
np.warnings.simplefilter("ignore")
for j in range(N_max):
z = z**2 + c
mandelbrot_set = (abs(z) < some_threshold)

return mandelbrot_set

mandelbrot_set = compute_mandelbrot(50, 50., 601, 401)

plt.imshow(mandelbrot_set.T, extent=[-2, 1, -1.5, 1.5])

plt.gray()
plt.show()

4.6. Full code examples 96

Scipy lecture notes, Edition 2020.2

Total running time of the script: ( 0 minutes 0.112 seconds)

Note: Click here to download the full example code

Random walk exercise

Plot distance as a function of time for a random walk together with the theoretical result

import numpy as np
import matplotlib.pyplot as plt

# We create 1000 realizations with 200 steps each

n_stories = 1000
t_max = 200

t = np.arange(t_max)
# Steps can be -1 or 1 (note that randint excludes the upper limit)
steps = 2 * np.random.randint(0, 1 + 1, (n_stories, t_max)) - 1

# The time evolution of the position is obtained by successively

# summing up individual steps. This is done for each of the
# realizations, i.e. along axis 1.
positions = np.cumsum(steps, axis=1)

# Determine the time evolution of the mean square distance.

sq_distance = positions**2
mean_sq_distance = np.mean(sq_distance, axis=0)

# Plot the distance d from the origin as a function of time and

# compare with the theoretically expected result where d(t)
# grows as a square root of time t.
plt.figure(figsize=(4, 3))
plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
plt.xlabel(r"$t$")
plt.ylabel(r"$\sqrt{\langle (\delta x)^2 \rangle}$")
plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 0.060 seconds)

4.6. Full code examples 97

NumPy Notes
No ratings yet
NumPy Notes
13 pages
Module 5 - Part 1 - NumPy - Upto MAtrix Operations
No ratings yet
Module 5 - Part 1 - NumPy - Upto MAtrix Operations
25 pages
An Introduction To Numpy and Scipy by Scott Shell
No ratings yet
An Introduction To Numpy and Scipy by Scott Shell
24 pages
vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
No ratings yet
vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
12 pages
p
No ratings yet
p
27 pages
Module Numpy
No ratings yet
Module Numpy
67 pages
Python Unit 3
No ratings yet
Python Unit 3
38 pages
Numpy
No ratings yet
Numpy
23 pages
Unit 3
No ratings yet
Unit 3
42 pages
Numpy, Pandas and Matplotlib
No ratings yet
Numpy, Pandas and Matplotlib
60 pages
Int246 L1
No ratings yet
Int246 L1
25 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
numpyintro-pdf
No ratings yet
numpyintro-pdf
17 pages
Getting started with NumPy in Data Analytics
No ratings yet
Getting started with NumPy in Data Analytics
45 pages
Numpy - Basics
No ratings yet
Numpy - Basics
18 pages
Numpy
No ratings yet
Numpy
44 pages
Numpy Pyhton Tutorial
No ratings yet
Numpy Pyhton Tutorial
28 pages
Unit8_DataAnalyticsandVisualizationpdf__2023_10_17_09_16_46
No ratings yet
Unit8_DataAnalyticsandVisualizationpdf__2023_10_17_09_16_46
64 pages
Python Numpy
100% (1)
Python Numpy
31 pages
NumPy Python Library by ChatGPT
No ratings yet
NumPy Python Library by ChatGPT
30 pages
Lab description file (4)
No ratings yet
Lab description file (4)
11 pages
NumPy Library and Function
No ratings yet
NumPy Library and Function
129 pages
Mastering NumPy for Data Science
No ratings yet
Mastering NumPy for Data Science
161 pages
Numpy
No ratings yet
Numpy
44 pages
Python Lectures
No ratings yet
Python Lectures
29 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Numerical Python Numpy
No ratings yet
Numerical Python Numpy
28 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
New Chat
No ratings yet
New Chat
30 pages
numpy primer
No ratings yet
numpy primer
19 pages
Lab 1
No ratings yet
Lab 1
6 pages
Numpy
No ratings yet
Numpy
14 pages
Numpy ML - AI
No ratings yet
Numpy ML - AI
135 pages
Num Py
No ratings yet
Num Py
49 pages
2 - Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
2 - Numpy - Tutorial - Ipynb - Colaboratory
10 pages
3 Introduction To Numpy
No ratings yet
3 Introduction To Numpy
9 pages
Lecture+Notes Python+for+DS PDF
No ratings yet
Lecture+Notes Python+for+DS PDF
48 pages
Python-Unit-4
No ratings yet
Python-Unit-4
43 pages
NUMPY
No ratings yet
NUMPY
33 pages
Python Presentation 3
No ratings yet
Python Presentation 3
44 pages
Python_Lab6_NumPy
No ratings yet
Python_Lab6_NumPy
46 pages
Numpy Array Basics
No ratings yet
Numpy Array Basics
75 pages
Ids 6 Experiments
No ratings yet
Ids 6 Experiments
27 pages
Lab-3 AI
No ratings yet
Lab-3 AI
21 pages
Numpy @CodeProgrammer
No ratings yet
Numpy @CodeProgrammer
64 pages
02_Appendix_2_Python_Packages (1)
No ratings yet
02_Appendix_2_Python_Packages (1)
25 pages
1 Numpy
No ratings yet
1 Numpy
41 pages
Numpy
No ratings yet
Numpy
24 pages
11.Arrays
No ratings yet
11.Arrays
12 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
UNIT 3 (1)
No ratings yet
UNIT 3 (1)
56 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Unit 4 Final
No ratings yet
Unit 4 Final
100 pages
Python 5 Unit
No ratings yet
Python 5 Unit
74 pages
Introduction To Numerical Computing With Numpy Manual
No ratings yet
Introduction To Numerical Computing With Numpy Manual
34 pages
Num Py
No ratings yet
Num Py
15 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Numpy Simply In Depth
From Everand
Numpy Simply In Depth
Ajit Singh
5/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Bda Lab Manual (R20a0592)
No ratings yet
Bda Lab Manual (R20a0592)
89 pages
100 JavaScript Questions
No ratings yet
100 JavaScript Questions
52 pages
Numerical Methods in Engineering with MATLAB Jaan Kiusalaas download
100% (1)
Numerical Methods in Engineering with MATLAB Jaan Kiusalaas download
52 pages
Python Module 7 AFV Core-Data-Structure
No ratings yet
Python Module 7 AFV Core-Data-Structure
48 pages
Buy ebook Data Structures and Applications: A Simple and Systematic Approach Padma Reddy cheap price
100% (2)
Buy ebook Data Structures and Applications: A Simple and Systematic Approach Padma Reddy cheap price
41 pages
B.SC Computer Science W.E.F 2022
No ratings yet
B.SC Computer Science W.E.F 2022
54 pages
PPS Unit 2-Notes
No ratings yet
PPS Unit 2-Notes
22 pages
BCA in Data Analytics
No ratings yet
BCA in Data Analytics
46 pages
Unit-III Web Technologies
No ratings yet
Unit-III Web Technologies
39 pages
335-PERL_Reference_Book
No ratings yet
335-PERL_Reference_Book
21 pages
Unit 2 Arrays, Functions, and Graphics
No ratings yet
Unit 2 Arrays, Functions, and Graphics
69 pages
Scripting MOC
No ratings yet
Scripting MOC
30 pages
PPL Notes
No ratings yet
PPL Notes
95 pages
400 Syl
No ratings yet
400 Syl
15 pages
FPL Qbank 2024-25.Docx
No ratings yet
FPL Qbank 2024-25.Docx
7 pages
Command Sequencing
No ratings yet
Command Sequencing
116 pages
S.Y.B.C.a. Science Syllabus From-2022-23
No ratings yet
S.Y.B.C.a. Science Syllabus From-2022-23
57 pages
12 ch1
No ratings yet
12 ch1
39 pages
IoT Advance Complete Notes
No ratings yet
IoT Advance Complete Notes
173 pages
Chapter 11
No ratings yet
Chapter 11
37 pages
Programming in C - Qbank I B.SC CS, BCA, I B.SC IT
No ratings yet
Programming in C - Qbank I B.SC CS, BCA, I B.SC IT
11 pages
Introduction to Java Programming, Comprehensive Version Y. Daniel Liang All Chapters Instant Download
100% (3)
Introduction to Java Programming, Comprehensive Version Y. Daniel Liang All Chapters Instant Download
41 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
ICP Assignment Document PDF
No ratings yet
ICP Assignment Document PDF
29 pages
PMF Comprog 2 Finals
No ratings yet
PMF Comprog 2 Finals
136 pages
Object Oriented Programming (210243) SE Computer Engineering
No ratings yet
Object Oriented Programming (210243) SE Computer Engineering
254 pages
Instant Download (Ebook) Programming Ruby 3.3: The Pragmatic Programmers Guide by Noel Rappin, Dave Thomas ISBN 9781680509823, 1680509829 PDF All Chapters
100% (3)
Instant Download (Ebook) Programming Ruby 3.3: The Pragmatic Programmers Guide by Noel Rappin, Dave Thomas ISBN 9781680509823, 1680509829 PDF All Chapters
81 pages
Introduction To JavaScript Functions Arrays and Objects
No ratings yet
Introduction To JavaScript Functions Arrays and Objects
8 pages
M6-GUIDE
No ratings yet
M6-GUIDE
10 pages
Coding Notes For Sems
No ratings yet
Coding Notes For Sems
27 pages