How to Make a Histogram in R - Programming R Tutorials (2024)

Statisticians and researchers often need a histogram to study a dataset that holds continuous values. It shows you the distribution of the frequency of the data and helps you understand elements such as the skew and outliers present in a dataset.

You can easily create a histogram in R using the hist() function in base R. This has a many options thatgiveyoucontrolofbinsizes,range,etc.Youcanalsouseggplot.

In this tutorial,I will explain what histograms are and what you can do with them along withsome basic methods for plotting histograms in R.

What is a Histogram?

A histogram showsthe distribution of data in terms of frequency count. Although some may find aclose similarity between bar charts and histograms, there is one subtle butvery important difference. While a bar chart shows the frequency of discretevariables, a histogram shows data for continuous data. Therefore, you may findgaps between the bars of a bar chart, but a histogram represents a continuousdistribution with no gaps.

In order to effectively explain the usage of a histogram, I will start with an example. Down below you can see a histogram for a built-in dataset of R, “AirPassengers”. It shows data for how many passengers travelled by air each month for 10 years.

How to Make a Histogram in R - Programming R Tutorials (1)

The x-axis shows you the number of passengers travelling by air and the y-axis shows you how often a figure in a given range on the x-axis appeared in the data. The x-axis has been divided into intervals of x values; these intervals are called bins.

In the plot youcan see that 100 to 200 passengers travelled by air more than 20 times whereas500 to 550 passengers travelled a little less than 5 times. Something youshould have noticed here is that the chart doesn’t show data for precisely 100passengers or 550 passengers. Instead, it gives you a range of continuousvalues in which the x-axis has been categorized into. This is precisely why ahistogram does not have gaps like a bar chart.

Moreover, you canalso identify the outliers on the extreme right, showing that instances wherethere were more than 200 passengers travelling by air occurred around 2 to 3times in 10 years.

Why Do You Need a Histogram?

Now you may still be wondering why exactly we needed the histogram when there are other ways to obtain similar information. I have listed some of the most frequent uses of histograms down below.

Find Commonly Occurring Events

A researcher may have spent a while collecting data and now, he or she may be wondering what is the most frequently occurring event in the data. A histogram shows the relative frequency in continuous terms, hence helping us understand the range where the densest observations lie.

Understand the Pattern of Your Data

Your data may sometimes show a normaldistribution and sometimes it may not. Moreover, if the data is symmetric,i.e., it is normal, you may be interested in learning how symmetric it is usinga visual tool.

A histogram neatly displays the distribution of the data hence helping you identify whether your data follows a pattern and, if so, the kind of pattern that it follows.

Identify Deviations

Someone working with data won’t always seeeverything aligned perfectly. When studying trends in a data, a histogram caneasily tell you if your data deviates from expected values in any range.

Suppose you had expected a specificresult from an experiment but when conducted, it gave you a differentdistribution. This immediately tells you something is wrong, and you need to goback and re-check things.

Plotting a Histogram in R

Now that you havesome working knowledge of a histogram and what you can do with it, I canproceed to show how you can obtain one in R. I’ll continue working on“AirPassengers”, a built-in dataset of R. First, we’ll load the data.

# r histogram example - load dataset> data(AirPassengers) 

You can now plota histogram using the “hist()” function. The function uses a vector of valuesas an input and returns a histogram for those values.

# r histogram example - hist function in r> hist(AirPassengers)
How to Make a Histogram in R - Programming R Tutorials (2)

[You can get some more detail with the “hist()” function by adding additional parameters to specify x and y labels and changing the bin width. In the code below, I have changed the bin width by specifying that my histogram uses 5 intervals. Moreover, I have also limited the x values (number of passengers) between 100 and 500.

# Frequency histogram in r (Formatting Options)> hist(AirPassengers, main="My hist() Plot ", xlab="# of Passengers", xlim=c(100,500), breaks=5)
How to Make a Histogram in R - Programming R Tutorials (3)

Something you mayhave noticed here is that although I specified bin count to be 5, the plot uses4 bins. The parameter “breaks” in the”hist()” function merely takes asuggestion from the user and produces intervals either close to or equal to theuser defined value. In R, the “hist()” function uses a predefined algorithm tocalculate bins and it still uses the same algorithm only staying close to theuser specification.

Another veryinteresting tweak you can make is by choosing unequal bin width for differentintervals. In the code below, I have divided the bins into a width that dependson the quantile of each range. You can try out other methods by specifying avector that holds values for the width for each interval.

# how to generate a histogram in r - unequal bins> hist(AirPassengers, breaks = quantile(AirPassengers, 0:10 / 10))
How to Make a Histogram in R - Programming R Tutorials (4)

Other Methods for Plotting Histograms in R

R gives a numberof methods to perform any basic function and each has its pros and cons. Anadditional method that I find very interesting is through the use of the“qplot()” function in the “ggplot2” package. You can start by installing thepackage if you haven’t done that already.

# histogram in R ggplot2 example> install.packages(“ggplot2”)> qplot(AirPassengers, geom="histogram")
How to Make a Histogram in R - Programming R Tutorials (5)

Conclusion

Histograms are very commonly used for analysis in data science because of the amount of information they pack between the bars. This tutorial aimed at giving you some insight on how histograms are created using R. However, if you are interested in going a few steps ahead, I encourage you to read the R documentation on the “hist()” function and try out a couple of more tweaks. This should help you get some more clarity on how the function really works and what you can use it for.

Going Deeper…

Interested in Learning More About Categorical Data Analysis in R? Check Out

Graphics

  • How to Plot Categorical Data in R(Basic)
  • How to Plot Categorical Data in R (Advanced)

Tutorials

  • How To Create a Contingency Table in R
  • How To Generate Descriptive Statistics in R
  • How To Create a Histogram in R
  • How To Run A Chi Square Test in R (earlier article)

The Author:

Syed Abdul Hadi is an aspiring undergrad with a keen interest in data analytics using mathematical models and data processing software. His expertise lies in predictive analysis and interactive visualization techniques. Reading, travelling and horse back riding are among his downtime activities. Visit him onLinkedInfor updates on his work.

How to Make a Histogram in R - Programming R Tutorials (2024)

References

Top Articles
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 6138

Rating: 4.9 / 5 (69 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.