Blog Post 0

The goal of this blog post is to create a tutorial on generating a visualization for the penguins dataset

Reading the Dataset

First let’s read in our dataset and take a look at the first 5 rows to get a better sense of what it looks like

import pandas as pd
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)
penguins.head()
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
0 PAL0708 1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/07 39.1 18.7 181.0 3750.0 MALE NaN NaN Not enough blood for isotopes.
1 PAL0708 2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/07 39.5 17.4 186.0 3800.0 FEMALE 8.94956 -24.69454 NaN
2 PAL0708 3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 11/16/07 40.3 18.0 195.0 3250.0 FEMALE 8.36821 -25.33302 NaN
3 PAL0708 4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A2 Yes 11/16/07 NaN NaN NaN NaN NaN NaN NaN Adult not sampled.
4 PAL0708 5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 11/16/07 36.7 19.3 193.0 3450.0 FEMALE 8.76651 -25.32426 NaN

Generating the Visualization

It seems like we have some descriptive data for our penguins (Culmen Lenghth, Culmen Depth …). Something interesting we can visualize is the distribution of weights for our penguins. We can visualize this with a histogram of our Body Mass (g) variable. To create this we will use seaborn’s histplot function.

import seaborn as sns
sns.histplot(data = penguins, 
             x = 'Body Mass (g)')

base_hist.png

This is a great plot of getting a quick introduction to the distribution of weights for our penguins. We can change some parameters to adjust the bin size, title and color of our plot.

new_hist = sns.histplot(data = penguins, 
             x = 'Body Mass (g)', 
             color = 'orange', 
             bins = 20)
new_hist.set(title = 'Distribution of Penguin Weights')
new_hist.get_figure().savefig('new_hist.png')

new_hist.png

In this new plot, we have a different shade, smaller bin sizes (more total bins) to show a more refined distribution of our data and a title that describes the overall goal of our plot. This completes our short tutorial on creating a visualization in python on the penguins dataset and uploading it to a blog post on Github through Jekyll.

Written on October 4, 2021