Introduction to NumPy
Follow the installation instructions found there. Once you do, you can im- port
NumPy and double-check the version:
In[1]:import numpy
numpy. version
Out[1]:'1.11.1'
For the pieces of the package discussed here, I’d recommend NumPy version
1.8 or later. By convention, you’ll find that most people in the SciPy/PyData world
will import NumPy using npas an alias:
In[2]:import numpy as np
/* C code */
int result = 0;
for(int i=0; i<100; i++){
result += i;
}
While in Python the equivalent operation could be written this way:
# Python code
result = 0
for i in range(100):
result += i
Creating Arrays from Python Lists
First, we can use np.array to create arrays from Python lists:
In[8]:#integerarray:
np.array([1,4,2,5,3])
Out[8]:array([1,4,2,5,3])
Remember that unlike Python lists, NumPy is constrained to arrays that all
contain the same type. If types do not match, NumPy will upcast if possible (here,
integers are upcast to floating point):
In[9]:np.array([3.14,4,2,3])
Out[9]:array([3.14,4.,2.,3.])
If we want to explicitly set the data type of the resulting array, we can use the
dtype keyword:
In[10]:np.array([1,2,3,4],dtype='float32')
Out[10]:array([1.,2.,3.,4.],dtype=float32)
Creating Arrays from Scratch
Especially for larger arrays, it is more efficient to create arrays from scratch using
routines built in to NumPy. Here are several examples:
In[12]: # Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)
Out[12]:array([0,0,0,0,0,0,0,0,0,0])
In[13]: # Create a 3x5 floating-point array filled with 1s
np.ones((3, 5), dtype=float)
Out[13]:array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
In[14]: # Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)
Out[14]:array([[ 3.14, 3.14, 3.14, 3.14, 3.14],
[ 3.14, 3.14, 3.14, 3.14, 3.14],
[ 3.14, 3.14, 3.14, 3.14, 3.14]])
In[15]: # Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)
Out[15]:array([0,2,4,6,8,10,12,14,16,18])
In[16]: # Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)
Out[16]:array([0.,0.25,0.5,0.75,1.])
In[17]: # Create a 3x3 array of uniformly distributed #
random values between 0 and 1 np.random.random((3, 3))
Out[17]:array([[ 0.99844933, 0.52183819, 0.22421193],
[ 0.08007488, 0.45429293, 0.20941444],
[ 0.14360941, 0.96910973, 0.946117]])
In[19]: # Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))
Out[19]:array([[2, 3, 4],
[5, 7, 8],
[0, 5, 0]])
In[20]: # Create a 3x3 identity matrix
np.eye(3)
Out[20]:array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
Table2-1.StandardNumPydatatypes
Datatype Description
bool_ Boolean(TrueorFalse)storedasabyte
Defaultintegertype(sameasClong;normallyei- therint64
int_ orint32)
intc IdenticaltoCint(normallyint32orint64)
Integerusedforindexing(sameasCssize_t;nor- mally
intp eitherint32 orint64)
int8 Byte(–128to 127)
int16 Integer(–32768to32767)
int32 Integer(–2147483648to2147483647)
Integer(–9223372036854775808to
int64 9223372036854775807)
uint8 Unsignedinteger(0to255)
uint16 Unsignedinteger(0to65535)
uint32 Unsignedinteger(0to4294967295)
uint64 Unsignedinteger(0to18446744073709551615)
float_ Shorthandforfloat64
Half-precision float: sign bit, 5 bits exponent, 10 bits
float16 mantissa
Single-precision float: sign bit, 8 bits exponent, 23 bits
float32
Datatype Description
Double-precision float: sign bit, 11 bits exponent, 52 bits
float64 mantissa
complex_ Shorthandforcomplex128
complex64 Complexnumber,representedbytwo32-bit floats
complex128 Complexnumber,representedbytwo64-bit floats
The Basics of NumPy Arrays!
Few categories of basic array manipulations:
Attributes of arrays
Determining the size, shape, memory consumption, and data types of
arrays
Indexing of arrays
Getting and setting the value of individual array elements
Slicing of arrays
Getting and setting smaller sub arrays with in a larger array
Reshaping of arrays
Changing the shape of a given array
Joining and splitting of arrays
Combining multiple arrays into one, and splitting one array into many
NumPy Array Attributes
First let’s discuss some useful array attributes. We’ll start by defining three
random arrays: a one-dimensional, two-dimensional, and three di- mensional
array. We’ll use NumPy’s random number generator, which we will seed with a
set value in order to ensure that the same random arrays are generated each
time this code is run:
In[1]:import numpy as np
np.random.seed(0)#seedforreproducibility
x1=np.random.randint(10,size=6)#One-dimensional array
x2=np.random.randint(10,size=(3,4))#Two-dimensional array
x3=np.random.randint(10,size=(3,4,5))#Three-dimensional array
Each array has attributes ndim(the number of dimensions),shape(the size of each
dimension), and size (the total size of the array):
In[2]: print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
x3ndim:3
x3shape:(3,4,5)
x3size:60
Another useful attribute is the dtype, the data type of the array.
In[3]:print("dtype:",x3.dtype)
dtype:int64
Array Indexing: Accessing Single Elements
If you are familiar with Python’s standard list indexing, indexing in NumPy will
feel quite familiar. In a one-dimensional array, you can ac-cess the ith
value(counting from zero)by specifying the desired index in square brackets, just
as with Python lists:
In[5]:x1
Out[5]:array([5,0,3,3,7,9])
In[6]:x1[0]
Out[6]:5
In[7]:x1[4]
Out[7]:7
To index from the end of the array, you can use negative indices:
In[8]:x1[-1]
Out[8]:9
In[9]:x1[-2]
Out[9]:7
In a multidimensional array, you access items using a comma-separated tuple
of indices:
In[10]:x2
out[10]:array([[3, 5, 2, 4],
[7, 6, 8, 8],
[1, 6, 7, 7]])
In[11]:x2[0,0]
Out[11]:3
In[12]:x2[2,0]
Out[12]:1
In[13]:x2[2,-1]
Out[13]:7
Youcanalsomodifyvaluesusinganyoftheaboveindexnotation:
In[14]:x2[0,0]=12
x2
Out[14]:array([[12, 5, 2, 4],
[7, 6, 8, 8],
[1, 6, 7, 7]])
In[11]:x2[0,0]
Out[11]:3
In[12]:x2[2,0]
Out[12]:1
In[13]:x2[2,-1]
Out[13]:7
ArraySlicing:AccessingSubarrays
Just as we can use square brackets to access individual array elements, we can
also use them to access subarrays with the slice notation, marked by the colon
(:) character. The NumPy slicing syntax follows that of the standard Python list;
to access a slice of an arrayx, use this:
x[start:stop:step]
If any of these are unspecified, they default to the valuesstart=0,
stop=sizeofdimension,step=1.We’lltakealookataccessingsubar- rays in one
dimension and in multiple dimensions.
One-dimensionalsubarrays
In[16]:x=np.arange(10)
x
Out[16]:array([0,1,2,3,4,5,6,7,8,9])
In[17]:x[:5]#firstfiveelements
Out[17]:array([0,1,2,3,4])
In[18]:x[5:]#elementsafterindex5
Out[18]:array([5,6,7,8,9])
In[19]:x[4:7]#middlesubarray
Out[19]:array([4,5,6])
In[20]:x[::2]#everyotherelement
Out[20]:array([0,2,4,6,8])
In[21]:x[1::2]#everyotherelement,startingatindex1
Out[21]:array([1,3,5,7,9])
A potentially confusing case is when thestepvalue is negative. In this
case,thedefaultsforstartandstopareswapped.Thisbecomesacon- venient way
to reverse an array:
In[22]:x[::-1]#allelements,reversed
Out[22]:array([9,8,7,6,5,4,3,2,1,0])
In[23]:x[5::-2]#reversedeveryotherfromindex5
Out[23]:array([5,3,1])
Multidimensionalsubarrays
Multidimensionalslicesworkinthesameway,withmultipleslicessepa- rated by
commas. For example:
In[24]:x2
Out[24]:array([[12, 5, 2, 4],
[7, 6, 8, 8],
[1, 6, 7, 7]])
In[25]:x2[:2,:3]#tworows,threecolumns
Out[25]:array([[12, 5, 2],
[7, 6, 8]])
In[26]:x2[:3,::2]#allrows,everyothercolumn
Out[26]:array([[12, 2],
[7, 8],
[1, 7]])
Finally,subarraydimensionscanevenbereversedtogether:
In[27]:x2[::-1,::-1]
Out[27]:array([[ 7, 7, 6, 1],
[ 8, 8, 6, 7],
[ 4, 2, 5, 12]])
Accessingarrayrowsandcolumns
One commonly needed routine is accessing single rows or columns of an
array.Youcandothisbycombiningindexingandslicing,usinganempty slice marked
by a single colon ( :):
In[28]:print(x2[:,0])#firstcolumnofx2
[1271]
In[29]:print(x2[0,:])#firstrowofx2
[12524]
In the case of row access, the empty slice can be omitted for a more com- pact
syntax:
In[30]:print(x2[0])#equivalenttox2[0,:]
[12524]
Subarraysasno-copyviews
One important—and extremely useful—thing to know about array slices is that
they return views rather than copies of the array data. This is one
areainwhichNumPyarrayslicingdiffersfromPythonlistslicing:inlists, slices will be
copies. Consider our two-dimensional array from before:
In[31]:print(x2)
[[12 5 2 4]
[7 6 8 8]
[1 6 7 7]]
Let’sextracta2×2subarrayfromthis:
In[32]: x2_sub = x2[:2, :2]
print(x2_sub)
[[125]
[76]]
Nowifwemodifythissubarray,we’llseethattheoriginalarrayis changed! Observe:
In[33]:x2_sub[0,0]=99
print(x2_sub)
[[995]
[76]]
In[34]:print(x2)
[[99 5 2 4]
[7 6 8 8]
[1 6 7 7]]
This default behavior is actually quite useful: it means that when we work with
large datasets, we can access and process pieces of these datasets without the
need to copy the underlying data buffer.
Creatingcopiesofarrays
Despite the nice features of array views, it is sometimes useful to instead
explicitly copy the data within an array or a subarray. This can be most easily
done with thecopy() method:
In[35]: x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)
[[995]
[76]]
Ifwenowmodifythissubarray,theoriginalarrayisnottouched:
In[36]: x2_sub_copy[0, 0] = 42
print(x2_sub_copy)
[[425]
[76]]
In[37]:print(x2)
[[99 5 2 4]
[7 6 8 8]
[1 6 7 7]]
ReshapingofArrays
Another useful type of operation is reshaping of arrays. The most flexible way of
doing this is with thereshape()method. For example, if you want to put the
numbers 1 through 9 in a 3×3 grid, you can do the following:
In[38]: grid = np.arange(1, 10).reshape((3, 3))
print(grid)
[[1 2 3]
[4 5 6]
[7 8 9]]
Note that for this to work, the size of the initial array must match the size
ofthereshapedarray.Wherepossible,thereshapemethodwilluseano- copy view of
the initial array, but with noncontiguous memory buffers this is not always the
case.
Another common reshaping pattern is the conversion of a one-dimen-
sionalarrayintoatwo-dimensionalroworcolumnmatrix.Youcando this with
thereshapemethod, or more easily by making use of the newaxis keyword
within a slice operation:
In[39]:x=np.array([1,2,3])
# row vector via reshape
x.reshape((1, 3))
Out[39]:array([[1,2,3]])
In[40]: # row vector via newaxis
x[np.newaxis, :]
Out[40]:array([[1,2,3]])
In[41]: # column vector via reshape
x.reshape((3, 1))
Out[41]:array([[1],
[2],
[3]])
In[42]: # column vector via newaxis
x[:, np.newaxis]
Out[42]:array([[1],
[2],
[3]])
We will see this type of transformation often throughout the remainder of the
book.
ArrayConcatenationandSplitting
Alloftheprecedingroutinesworkedonsinglearrays.It’salsopossibleto combine
multiple arrays into one, and to conversely split a single array into multiple
arrays. We’ll take a look at those operations here.
Concatenationofarrays
Concatenation, or joining of two arrays in NumPy, is primarily accom-
plishedthroughtheroutinesnp.concatenate,np.vstack,and
np.hstack.np.concatenatetakesatupleorlistofarraysasitsfirstar- gument, as we
can see here:
In[43]:x=np.array([1,2,3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
Out[43]:array([1,2,3,3,2,1])
Youcanalsoconcatenatemorethantwoarraysatonce:
In[44]:z=[99,99,99]
print(np.concatenate([x,y,z]))
[123321999999]
np.concatenatecanalsobeusedfortwo-dimensionalarrays:
In[45]:grid=np.array([[1,2,3],
[4,5,6]])
In[46]: # concatenate along the first axis
np.concatenate([grid, grid])
Out[46]:array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
In[47]: # concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)
Out[47]:array([[1,2,3,1,2,3],
[4,5,6,4,5,6]])
Forworkingwitharraysofmixeddimensions,itcanbeclearertousethe
np.vstack(verticalstack)andnp.hstack(horizontalstack)functions:
In[48]:x=np.array([1,2,3])
grid=np.array([[9,8,7],
[6,5,4]])
# vertically stack the arrays
np.vstack([x, grid])
Out[48]:array([[1, 2, 3],
[9, 8, 7],
[6, 5, 4]])
In[49]: # horizontally stack the arrays
y = np.array([[99],
[99]])
np.hstack([grid,y])
Out[49]:array([[9, 8, 7,99],
[6, 5, 4,99]])
Similarly,np.dstackwillstackarraysalongthethirdaxis.
Splittingofarrays
The opposite of concatenation is splitting, which is implemented by the
functionsnp.split,np.hsplit,andnp.vsplit.Foreachofthese,we can pass a
list of indices giving the split points:
In[50]:x=[1,2,3,99,99,3,2,1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
[123][9999][321]
NoticethatNsplitpointsleadtoN+1subarrays.Therelatedfunctions
np.hsplitandnp.vsplitaresimilar:
In[51]: grid = np.arange(16).reshape((4, 4))
grid
Out[51]:array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12,13,14,15]])
In[52]: upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)
[[01 2 3]
[45 6 7]]
[[8 9 1011]
[12131415]]
In[53]: left, right = np.hsplit(grid, [2])
print(left)
print(right)
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
[[2 3]
[6 7]
[10 11]
[14 15]]
Similarly,np.dsplitwillsplitarraysalongthethirdaxis.
ComputationonNumPyArrays: Universal
Functions
Up until now, we have been discussing some of the basic nuts and bolts of
NumPy; in the next few sections, we will dive into the reasons that NumPy is so
important in the Python data science world. Namely, it pro- vides an easy and
flexible interface to optimized computation with arrays of data.
Computation on NumPy arrays can be very fast, or it can be very slow. The key
to making it fast is to use vectorized operations, generally imple- mented through
NumPy’s universalfunctions(ufuncs).This section moti-vates the need for
NumPy’s ufuncs, which can be used to make repeated calculations on array
elements much more efficient. It then introduces many of the most common and
useful arithmetic ufuncs available in the NumPy package.
Table2-2liststhearithmeticoperatorsimplementedinNumPy.
Table2-2.ArithmeticoperatorsimplementedinNumPy
Operator Equivalentufunc Description
+ np.add Addition(e.g.,1+1=2)
- np.subtract Subtraction(e.g.,3-2=1)
- np.negative Unarynegation(e.g.,-2)
* np.multiply Multiplication(e.g.,2*3=6)
/ np.divide Division(e.g.,3/2=1.5)
// np.floor_divide Floordivision(e.g.,3//2=1)
** np.power Exponentiation(e.g.,2**3=8)
Modulus/remainder(e.g.,9%4=
% np.mod
1)
Additionally there are Boolean/bitwise operators; we will explore these in
“Comparisons, Masks, and Boolean Logic”.
Absolutevalue
JustasNumPyunderstandsPython’sbuilt-inarithmeticoperators,italso understands
Python’s built-in absolute value function:
In[11]:x=np.array([-2,-1,0,1,2])
abs(x)
Out[11]:array([2,1,0,1,2])
The corresponding NumPy ufunc is np.absolute, which is also available under
the aliasnp.abs:
In[12]:np.absolute(x)
Out[12]:array([2,1,0,1,2])
In[13]:np.abs(x)
Out[13]:array([2,1,0,1,2])
This ufunc can also handle complex data, in which the absolute value re- turns
the magnitude:
In[14]: x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)
Out[14]:array([5.,5.,2.,1.])
Out[12]:array([0.8967576,0.99196818,0.6687194])
Other aggregation functions
NumPy provides many other aggregation functions, but we won’t discuss them in
detail here. Additionally, most aggregates have aNaN-safe coun- terpart that
computes the result while ignoring missing values, which are marked by the
special IEEE floating-pointNaNvalue (for a fuller discus- sion of missing data, see
“HandlingMissing Data”). Some of theseNaN- safe functions were not added until
NumPy 1.8, so they will not be avail- able in older NumPy versions.
Table 2-3provides a list of useful aggregation functions available in NumPy.
Table2-3.AggregationfunctionsavailableinNumPy
NaN-safeVersion Description
Function
Name
np.sum np.nansum Computesumofelements
Computeproductof
np.prod np.nanprod elements
Computemedianof
np.mean np.nanmean elements
Computestandard
np.std np.nanstd deviation
np.var np.nanvar Computevariance
np.min np.nanmin Findminimumvalue
np.max np.nanmax Findmaximumvalue
Findindexofminimum value
np.argmin np.nanargmin
Findindexofmaximum value
np.argmax np.nanargmax
Computemedianof
np.median np.nanmedian elements
Computerank-basedstatis- tics
np.percentile np.nanpercentile of elements
Evaluatewhetheranyele- ments
np.any N/A are true
NaN-safeVersion Description
Function
Name
Evaluatewhetherallele- ments
np.all N/A are true
Wewill see these aggregates often throughout the rest of the book.
In[37]:A=np.array([1,0,1,0,1,0],dtype=bool)
B = np.array([1, 1, 1, 0, 1, 1], dtype=bool)
A | B
Out[37]:array([True,True,True,False,True,True],dtype=bool)
Usingoronthesearrayswilltrytoevaluatethetruthorfalsehoodofthe entire array
object, which is not a well-defined value:
In[38]:AorB
ValueError Traceback(mostrecentcalllast)
<ipython-input-38-5d8e4f2e21c0>in<module>()
---->1AorB
ValueError:Thetruthvalueofanarraywithmorethanoneelementis...