Pandas Sample is used when you need to pull random rows or columns from a DataFrame.

Why would you ever want random rows? Say you’re running a data science model, and you want to test a subset of data. If you’re not using train test split, you can use pd.sample() to pull a small section of rows.

I use Pandas Sample mostly when I want to view a small section of data, but DataFrame.head() shows me data that is too homogeneous. I want some variability!

pd.df.sample(n=number_of_samples, axis=rows_or_columns)

**Pseudo Code: With your DataFrame, return random rows or columns.**

## Pandas Sample

### Sample Parameters

Sample has some of my favorite parameters of any Pandas function. Each one is packed with dense functionality.

**n**– The number of samples you want to return. You can optionally specify n or frac (below). ‘n’ must be less than the number of rows you have in your DataFrame.**frac**– If you did not specify an ‘n’ (above) then you can specify ‘frac’ or fraction. As in, what fraction of your dataset do you want to return to you? Ex: “Return me 10% of my dataframe. Frac=.1”**replace (Default: False)**– Do you want your rows to be able to be randomly picked twice? By default, if pandas randomly selects a row that has*already*been picked, then it will not pick it again. However, if replace=True, then pandas*will*pick a row again.**weights (Optional)**– Super awesome parameter! By default, pandas will apply the same weights to all of your rows. Meaning, each row has an*equal chance*of being randomly picked. But what if you wanted some rows to have a higher chance to be picked than others? You can set a*weight per row*which will cause pandas to more heavily pick some rows than others. Check out the example for details.**random_state (Optional)**– By default, pandas will pick*different*random numbers each time you sample. However, what if you wanted to pick the*same*random numbers each time? By setting random_state to an int, you’ll ensure consistency.**axis (Default: 0 or ‘index’)**– Did you know you could also select random columns from your DataFrame? If you wanted to, set axis=1 or ‘columns’.

Now the fun part, let’s take a look at a code sample

```
import pandas as pd
```

### Pandas Sample¶

Pandas Sample is a great way to pull random (a sample) of rows from your DataFrame. I use this most often when I need to subset my data, but I want to do it randomly.

Examples we'll run through:

- Simple sample setting 'n'
- Simple sample setting 'frac'
- Sample setting 'n' and replace
- Sample with weights
- Sample random columns

But first, let's start with a couple of lists of restaurants in San Francisco:

```
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
('Liho Liho', 'Restaurant', 224.0),
('500 Club', 'bar', 80.5),
('The Square', 'bar', 25.30),
('Page', 'bar', 80.34),
('Tompkins', 'bar', 34.2),
('Als Place', 'Restaurant', 56.52),],
columns=('name', 'type', 'AvgBill')
)
df
```

### 1. Simple sample setting 'n'¶

Specifying 'n' is specifying the number of random rows you want to return.

Notice how I specify n=2 and I get two random rows back.

```
df.sample(n=2)
```

If I do it again, I get another set of random rows

```
df.sample(n=2)
```

### 2. Simple sample setting 'frac'¶

Instead of setting 'n' you could specifying 'frac' which tells pandas what franction of your dataframe do you want to randomly return to you?

Here I'm setting frac=.4 or 40%. So since I have 7 rows, 40% is 3 rows (2.8 rounded up).

```
df.sample(frac=.4)
```

### 3. Sample setting 'n' and replace¶

By default, pandas will only select a random row once. However, if you wanted to be able to select the same row more than once, then you can set replace=True. This will 'replace' your rows back into the DataFrame for sampling again.

With this case, you'll be able to set your n greater than the # of rows you have in your DataFrame.

Notice the same row below is randomly picked twice now.

```
df.sample(n=5, replace=True)
```

### 4. Sample with weights¶

By default, pandas give each row an equal chance to be selected. However, what if you wanted to select restaurants more often than bars? You could give restaurants a higher chance (higher weights) to be picked.

First let me add weights to my DataFrame. I want resturants to have 5x chance to be randomly picked than bars. I'll give each restaurant a weights=2 and bars weights=1.

```
weights = {'Restaurant': 5,
'bar': 1}
df['weights'] = df['type'].apply(lambda x: weights[x])
df
```

Here I'll pull a random sample of 3 rows from my DataFrame and pass my weights column. I set random state to make sure I get the same random numbers each time. Notice how 2 restaurants pop up out of the 3 rows. That is because they had higher weights and therefore a bigger chance to be picked.

```
df.sample(n=3, weights='weights', random_state=42)
```

### 5. Sample random columns¶

Say you wanted to randomly select columns instead of rows. Just set axis=1.

```
df.sample(n=2, axis=1)
```

Remember, you'll get random items each time you run your code unless you set a random_state

```
df.sample(n=2, axis=1)
```

Check out more Pandas functions on our Pandas Page