Data Visualization Using Matplotlib

Amir
7 min readMar 21, 2022

--

This notebook will cover the following topics:

The Jupyter notebook of this story is available here in the visualization part.

Introduction

It is significantly important to visualize the data efficiently to get more knowledge about the problem. A good visualization can lead to a better understanding of the problem. Although seaborn has been increasingly used, it’s created based on Matplotlib. So, we first assess the functionality of Matplotlib and we’ll work with seaborn later.

1. Basic Matplotlib

Let’s import the required packages for this notebook.

The version of packages when I’m using to prepare this notebook is

Matplotlib:  3.5.1
NumPy: 1.22.2

Note: In the early part of this notebook, you will see the plots are not very well organized. This is on purpose to value the power of object-oriented use of Matplotlib. After introducing the object-oriented method for Matplotlib, figures get awesome! 🙂

1.1 Simple Matplotlib

plt.plot()
png

This is the simplest job we can do with Matplotlib. To take the best out of Matplotlib, we should know this package better though. First, let’s see the hierarchical relationship between the three objects of Matplotlib.

Now, we should familiarize ourselves with the anatomy of axes and axis in Matplotlib.

Let’s have our first plot.

png

Let’s set the color of lines, markers, line style, and line width.

png

Here is the list of available colors in Matplotlib

and the list of available markers

We can also use emoji as a marker

1.2 Subplots

We can create a figure with subplots using plt.subplot(n_row, n_col, number).

png

We can also have one plot over multiple subplots!

png

Text and annotation

To add more details to your plots, texts and annotations can be added.

You can see here different types of connection styles

Changing the font and color between plots

Figure size

An important option in using Matplotlib is setting figure size in inches.

plt.figure(figsize=(width, height))

Default values: [6.4, 4.8]

plt.figure(figsize=(8, 5))
plt.plot(x1, y1)
png

We can also specify the dot per inch (DPI), the color of background, and edge

png

However, if you want to set the color of the background of the plot, you need more power which can be obtained easier if you use Matplotlib in an object-oriented manner.

1.3 Object-oriented method

Let’s level up the quality of our figures by using the object-oriented property of Matplotlib

Reference for Text properties

We can also use ax.set_xlim() and ax.set_ylim() to limit the axis.

We use alpha in a plot to make the line transparent.

Save figures

Figures can be saved as

https://gist.github.com/da06dcf4c1334ed2b38e78573da3aa4a

2. Different types of plot

Here we learn about different types of plots using Matplotlib

We can also change the style of plots using the following syntax

plt.style.use(style_name)

So, I pick the default style for the rest of this notebook but you can find your favorite style here.

plt.style.use('default')

2.1 Scatter plot

Scatter plots are used to see the relationship between two variables.

Syntax

matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, *, edgecolors=None, plotnonfinite=False, data=None, **kwargs)

We can set a color for each sample based on their category.

We can change the labels in legend using set_text().

Plot XKCD

Sometimes might be interested try out XKCD!!

2.2 Bar plot

Bar plots are used to study the categorical variable vs a numerical variable.

Syntax

matplotlib.pyplot.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)
png

2.3 Histogram

Histograms are used to show the frequency occurrence of data.

Syntax

matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, *, data=None, **kwargs)

Let’s fake up some data

letter = 'a b c d'
random_letter1 = np.random.choice(letter.split(), 70)
random_letter2 = np.random.choice(letter.split(), 50)

Hist for more than one variable

In this case, the histtype is important.

png

2D histogram

png

2.4 Pie chart

Pie charts show the proportion of features in circular form.

Syntax

matplotlib.pyplot.pie(x, explode=None, labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1, startangle=0, radius=1, counterclock=True, wedgeprops=None, textprops=None, center=(0, 0), frame=False, rotatelabels=False, *, normalize=True, data=None)

2.5 Box Plot

A boxplot is used to display the distribution of data based on a five-number summary:

  • minimum
  • the first quartile (Q1)
  • median
  • the third quartile (Q3)
  • maximum Box plots can help us to find outliers.
matplotlib.pyplot.boxplot(x, notch=None, sym=None, vert=None, whis=None, positions=None, widths=None, patch_artist=None, bootstrap=None, usermedians=None, conf_intervals=None, meanline=None, showmeans=None, showcaps=None, showbox=None, showfliers=None, boxprops=None, labels=None, flierprops=None, medianprops=None, meanprops=None, capprops=None, whiskerprops=None, manage_ticks=True, autorange=False, zorder=None, *, data=None)

2.6 Violin plot

A violin plot is more informative than a plain box plot. While a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data.

Syntax

matplotlib.pyplot.violinplot(dataset, positions=None, vert=True, widths=0.5, showmeans=False, showextrema=True, showmedians=False, quantiles=None, points=100, bw_method=None, *, data=None)

As a simple example, I put violin and box plots next to each other for a better comparison.

3. Images with Matplotlib

We can read an image using imread() method and plot it using imshow() method.

# Read an image

image = plt.imread('./img/cat.jpeg');
# Show the image

plt.imshow(image)
plt.axis('off'); # Turn off the axis
png

A loaded image is 3D where the third dimension is the values for RGB (Red, Blue, Green).

# Show the green part

plt.imshow(image[:, :, 1])
plt.axis('off');
png

Different color scales can be picked for images using cmap. Also, we can set the maximum and minimum value in color scale using vmin and vmax.

png

So picking a good colormap and color limit is very important if you need to compare the values of two 2D variables.

4. Animation using Matplotlib

To make animation using Matplotlib, you need to import another module from this package as

import matplotlib.animation as FuncAnimation

4.1 Live graph with Matplotlib

Here is an example of live plotting. There is a text file in the repository called stock.txt. By changing this file, the plot gets updated automatically.

By changing stock.txt, we should see the changes live on the plot.

The file stock.txt and more notebooks about Python are available in the following repository.

Thanks for reading and your feedbacks

--

--