Pandas DataFrame To NumPy Array – df.to_numpy()

Pandas is great for working with tables, but sometimes you need to use the full force of a statistical package to get the job done. That’s where turning your DataFrame into a NumPy array comes.

Turning your DataFrame into a NumPy array means removing the DataFrame properties, and changing your data from a table to an array (or array of arrays).

pandas.DataFrame.to_numpy()

NumPy is a very powerful, very fast, statistical analysis package that is built on top of Python. In fact, NumPy is a dependency for Pandas, meaning you must have NumPy before you can install Pandas.

DataFrame To NumPy Array

This one is pretty simple, but let’s take a look at the parameters for .to_numpy()

  • dtype – For if you need to specify the type of data that you’re passing to .to_numpy(). You likely won’t need to set this parameter
  • copy (Default: False) – This parameter isn’t widely used either. Setting copy=True will return a full exact copy of a NumPy array. Copy=False will potentially return a view of your NumPy array instead. If you don’t know what the difference is, it’s ok and feel free not to worry about it.
  • na_value – The value to use when you have NAs. By default Pandas will return the NA default for that column data type. If you wanted to specify another value, go ahead and get fancy.
In [1]:
import pandas as pd

Pandas To Numpy

Pandas is wonderful for handling your datasets, but you may find it lacks in the statistical analysis power you need. This where it comes in handy to convert your DataFrame to a NumPy Array

Let's run through 2 examples:

  1. Converting a DataFrame to Numpy Array
  2. Converting a DataFrame to Numpy Array and setting NA values.

First, let's create a DataFrame

In [10]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', pd.NA),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', pd.NA, 25.30)],
           columns=('name', 'type', 'AvgBill')
                 )
df
Out[10]:
nametypeAvgBill
0Foreign CinemaRestaurant<NA>
1Liho LihoRestaurant224
2500 Clubbar80.5
3The Square<NA>25.3

1. Converting a DataFrame to Numpy Array

To turn your DataFrame into a NumPy array, simply call .to_numpy()

The way to show it's a NumPy is by calling 'type()'. Check out how 'x' is a numpy.ndarray

In [12]:
x = df.to_numpy()
x
Out[12]:
array([['Foreign Cinema', 'Restaurant', <NA>],
       ['Liho Liho', 'Restaurant', 224.0],
       ['500 Club', 'bar', 80.5],
       ['The Square', <NA>, 25.3]], dtype=object)
In [13]:
type(x)
Out[13]:
numpy.ndarray

2. Converting a DataFrame to Numpy Array and setting NA values.

If your data has NA values in it, you can specify what you want to fill them with through na_value.

Here I'll set my NAs to "SF"

In [15]:
y = df.to_numpy(na_value='SF')
y
Out[15]:
array([['Foreign Cinema', 'Restaurant', 'SF'],
       ['Liho Liho', 'Restaurant', 224.0],
       ['500 Club', 'bar', 80.5],
       ['The Square', 'SF', 25.3]], dtype=object)

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation