Introduction to numpy

numpy is the most used tool for data scientists coding in Python. Since numpy is written in mainly C and C++, it is much faster to work with than pure Python. Among other things, this package gives the user access to a new data type called arrays. These allow for fast computations on large amounts of data.

Install numpy

numpy should be included in your base conda environment, but it is always good practice to create a new environment to work on.

This will create a new conda environment called science and install numpy and ipython in it. Then we can use activate to enter the new environment.

import numpy

First thing we are going to need to do is import the numpy module so we can access their code.

Anytime you want to use this library you will have to import it before you can use it. Also, to make our life easier we can make the numpy name shorter.

We would have to type numpy many times so now we just have to type np. This is also typical for numpy users. If you see np in other people's code it more than likely refers to numpy.

Congratulations! You are now part of a global community accessing the code written by the numpy team. Let’s learn what we can do with it.

numpy array

Now we have access to all of the great tools numpy has created! There are many useful functions and classes included. The first and most basic is the numpy array or np.array.

At first glance this might look very similar to a list. You would be correct! Numpy arrays are like lists but can only contain a single data type. In this case, integers. We can also use floats, strings, and booleans.

Now we can work with large amounts of data very easily.

Math

The awesome thing about numpy is the ability to use functions that are then applied to the entire array. This is called vectorized functions. In this case, we are doing basic math operations.

Indexing

Numpy arrays can be indexed just like Python lists. However, there are some fancy ways to index that are exclusive to numpy arrays.

Notice the use of a Boolean (True/False) array to index the numpy array. Where the Trues are present, that value in the numpy array will be selected. This opens the door to using conditional statements to index numpy arrays.

Indexing in these fancy ways allows us to select different sections of the data in order to operate on or change.

n-dimensional arrays

In addition to one dimensional arrays, like we have been working with, numpy can create multi-dimensional arrays. These are called n-dimensional arrays or ndarrays.

This allows us to start doing matrix math.

Useful functions

numpy has a large number of predefined functions that can be run on arrays as well. These make for easy and efficient ways to calculate things like mean and standard deviation.

These are just a few of the great functions the numpy team have provided us with! Now we can do a wide variety of data science related tasks.