Collection Analysis: Median vs. Average

There is more to understanding a collection age beyond average and thanks to Emma, who made a comment on my last collection analysis post, I thought it would also help to discuss median age in a collection. My experience has been that often “average” and “median” are used interchangeably (which is so very wrong!).  Median age of a collection really has some serious power in helping librarians talk about collection age.

First, let us get clear on the difference is between median age and average age of a collection. (Again, as I have done in previous posts, the best way to get a handle on the process is to use a small set of numbers until you feel comfortable.) The average is the sum of all the dates in the set divided by the number of items in the set. (If you are using Excel, it will be the @average function)

Here is the example of some publication dates:


Calculating the average is done by adding all the years published is: 19,925. Then dividing by the 10 books in the set, the average age is 1992.5.

The Median age is similar to average in that it can indicate an overall age of the collection, but it actually gives us more information by considering the distribution of that range. Taking the same 10 books, the oldest in the collection is 1975 and the newest is 2013.  Median is calculated by examining the middle point of the range. In our example, the median age is 1982. Of course, the wonderful Excel will also help you calculate this with the @median function. You can find this under formulas tab in the statistical functions group.

In our little example, you can already see the dramatic difference in the resulting statistics. Average indicates an age around 1993 and the median is 1982.  So what does this statistic mean? It means that most of the books are older and not just slightly older, but REALLY old. Remember 1982 is the middle of the age range.

The difference between median and average is pretty significant in this small set. Over a large set of data this can be very helpful in getting sense of the age. As with average, it is unfair to include items that are in special collections such as a geneaology or local history collection. Archival material should be excluded in such a context. Like average, using this statistic on the overall library collection. Distinct collections or subjects benefit from this statistic, especially where currency is an issue. (Think legal, medical etc.) I like using it with the teen fiction collection, where I really want the latest and greatest. Practice and compare the median and average ages of your particular collection.

Statistics and analysis of your collection should be a regular part of your management and decisions. Boards, directors and other assorted folks in the money part of the collection equation will appreciate this analysis. It shows consideration and care in your collection.



Originally published at on April 8, 2013.



  1. I need to start by saying that I’m lousy at math, so maybe I’m just not following well.

    But, when I read this piece, my first thought was “well, gee, that’s not an average AGE, that’s an average date of publication.” So, I tried adding up the ages, and I got (assuming we’re calculating from 2014):

    1 + 1 + 4 + 4 + 22+ 22 + 24 + 24 + 24 + 29 = 155 / 10 = 15.5

    Giving us 1999/2000 as the average date of publication. Where did I go wrong?

    The other thing that has me scratching my head is your explanation of median, which is much simpler, but also different, from what they say at wikipedia ( Let me make sure I understand correctly: you’re saying that if I have 25 books published in 2014 and one published in 1914, the median is still 1954? The number of new vs old items in no way moves the median?

    1. You are right. I am talking about the average or median DATE of publication, not necessarily the age. You are also right that the number of items does move the median.In your example of 25 books published in 2014 and 1 in 1914 the median is still 2014.The key number being how many items along the range. Obviously, the substantial number of items from 2014 negate the item from 1914.
      Your data is mathematically correct, but I am not sure that we are measuring the same thing. Your 1999/2000 is an average of the differences and my numbers are an average of the dates themselves.
      I will try and read more on the mean and see if I can better understand the math. For purposes of getting people to talk about collection age, I think I am enough in the ballpark to make some conclusions about the collection.
      The median number is absolutely driven by the number of items. Maybe I was simplistic in just calling it a range of numbers. I fully intend to exploit my statistic nerd friends and get some clarification on your methodology. FWIW: I couldn’t make it through the wikipedia article without a headache. Also I am just impressed someone read this article.

      1. I’m reading this series pretty closely; I need the help since I’m an inexperienced collection development librarian in a system with a rapidly aging collection and minimal funding. Thanks for the clarification!

Comments are closed.