# Plotting with Matplotlib.pyplot¶

## 1. Basic Plots¶

• Matplotlib is considered by many as the most basic plotting library in Python.
• It offers both static and interactive visualizations in Python.
• Plotting functions in libraries such as as pandas are built on top of Matplotlib making it very fundamental for data scientists programming in Python.
• In this lecture, we will mainly look at Pyplot, a sub-library within Matplotlib consisting of all the basic plotting functions.

### 1.1 Installing the library¶

• The first step is to install the Matplotlib library.
• Run the following command in your command line prompt:
conda install matplotlib
• Depending on how you installed Python, you might need to try the following code instead (if the previous one doesn't work):
pip install matplotlib

### 1.2 Histograms¶

• Before we can call the plotting functions, we need to import the library to our current working environment (kernel):
• In this example, we will take a look at the famous Old Faithful Geyser Dataset.
• This dataset contains the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
• There are 2 columns:
• 'eruptions': eruption time (in mins)
• 'waiting': waiting time to next eruption (in mins)

• Let's change the color of the plot!

• Plots do not make sense without axis labels!
• To add an x-axis label, use plt.xlabel():
• Similarly, use plt.ylabel() to add a y-axis label:

#### c. Changing the number of bins¶

• Histogram can look very different depending on the number of bins used to plot the histogram!

#### d. Adding grid to the plot¶

• Sometimes, adding a background grid makes it a lot easier to "read" the plot.

### 1.3 Boxplots¶

• Use plt.boxplot() to plot a boxplot:

#### a. Horizontal boxplot¶

• What’s if you want the boxplot to be horizontal instead?

#### b. Add variable name & axis label¶

• Just like with histogram, we can add axis label to boxplot! It might also be a good idea to add variable name (also called label) to the boxplot.

#### c. Multiple boxplots in one plot¶

• It's often the case that you want to plot boxplots for multiple variables in the dataset.
• For example, let's examine the famous Iris Dataset.
• This dataset contains measurements of 3 different Iris species: Setosa, Versicolor, and Virginica.
• There are 5 columns:
• 'Sepal.Length': the sepal length in cm.
• 'Sepal.Width': the sepal width in cm.
• 'Petal.Length': the petal length in cm.
• 'Petal.Width': the petal width in cm.
• 'Species': the specific Iris specie ('setosa', 'versicolor', 'virginica').
• This is clearly not very convenient! Later, we will discuss how to use the boxplot() function provided by pandas which improves the syntax significantly.

### 1.4 Scatterplots¶

• Scatterplot is one of the most important plots in Statistics! We plot scatterplot in Matplotlib using plt.scatter():
• Now, let’s add axis labels, change the color, add plot title and a background grid!

## 2. Matplotlib Inside pandas¶

• As discussed earlier, the plotting functions provided in the pandas library are built upon the functions provided by the Matplotlib library.
• You will soon find that this is much easier when you're dealing with data stored in a DataFrame (which is 95% of the time what we deal with).

### 2.1 Histograms¶

• Both function calls above essentially do the exact the same thing.
• Now, pandas plotting functions is extremely useful when we want to layer out plots (histograms in this case).
• But do note that it plots each column of the DataFrame as its own histogram.
• In case of the Iris dataset, we will have to do some data manipulation in order for it to plot a histogram for each specie.

### 2.2 Boxplots¶

• Similar to histogram, we can use plot.box() or boxplot() to plot boxplot(s) of column(s) of a DataFrame.

### 2.3 Scatterplots¶

• Alternatively, you can call function plot() and set the kind keyword to be 'scatter' for scatter plots.
• We can modify the plot just as above if we import Matplotlib.pyplot and use the functions provided by Pyplot.