If you ever plan to cheat on your taxes, here’s something to consider (besides prison): Make sure that most of the numbers you fabricate start with the digit 1 (one). The second-most common leading digit should be 2, then 3, continuing on that pattern to leave 9 as the least common leading digit. This distribution is called Benford’s Law, and it’s a lot more straightforward than tax law… though why it exists is nearly as mysterious.

In a highly variable set of numbers such as those found in taxes, one would think that the leading digits would all be equally common. One would expect to find roughly the same amount of numbers starting with a 1 as, say, an 8. In a set of totally random numbers such as the lottery, that is exactly what one would discover; but when it comes to non-random real-life numbers, unless the data set is too constrained, a lot more numbers start with a one than any other digit. This can be useful in many ways.

The Internal Revenue Service runs our tax returns though software which makes sure the numbers follow Benford’s Law, and anytime one wanders too far from that number distribution, they know there’s a pretty good chance somebody’s pulling a fast one. It’ll raise flags to indicate that the return should be further scrutinized.

You can test Benford’s Law yourself by finding any non-random list of numbers that isn’t too specific a set. A good example would be the lengths of all of the major rivers in the U.S., or the sizes of all the files on your computer (but make sure you use actual file sizes, not Windows’ rounded-off values). An example of a set that is too constrained would be the ages of all of your friends. The threshold where a set becomes too constrained is a bit fuzzy, but until it is crossed, Benford’s Law predicts the frequency of leading numbers with very reasonable accuracy. The larger the data set, the more closely it should match.

As a test, I checked the sizes of all 3,124 items in a particular directory on my computer. Here is the distribution I found, along with the percentages of each that Benford’s Law predicts:

 

Leading Digit Occurences Frequency Benford’s Law
1 854 27.3% 30.1%
2 619 19.8% 17.6%
3 417 13.3% 12.5%
4 324 10.4% 9.7%
5 261 8.4% 7.9%
6 195 6.2% 6.7%
7 158 5% 5.8%
8 154 4.9% 5.1%
9 142 4.5% 4.6%

 

Clearly these findings follow Benford’s Law very closely, as will any large set of real-life numbers. The phenomenon was first noticed in 1881 by mathematician and astronomer Simon Newcomb. While thumbing through some logarithm books to perform calculations, he noticed that the pages for numbers that began with the digit 1 were much more worn than the others. It seemed that people had been doing more calculations with numbers that started with a 1. He examined the number distribution, found it interestingly weighted towards smaller numbers as leading digits. He wrote about this pattern as a curiosity, but it was soon forgotten.

The phenomenon was re-discovered in 1938 by Frank Benford, a physicist at the General Electric company. He was fascinated by it, and tested many data sets for the patterns, including baseball statistics, areas of river catchments, and the addresses of the first 342 people listed in the book American Men of Science, and found most to follow the distribution closely. Because of the huge menagerie of data he tested, he is often credited for the law, as its name indicates.

Benford’s Law is proving quite useful for businesses and government agencies as a way of detecting fraud in taxes, accounting, expenses, and insurance. It was also used to help check for Y2K compliance back when that was a problem. No doubt there are many other uses that no one has thought of just yet.

So how is Benford’s Law helpful to you? For an individual, its uses are not many. If you ever see a question about the length of a river on a multiple-choice test, and you haven’t got a clue, you might lean toward the answer that has a 1 as the first digit. But I don’t recommend using it to cheat anyone, especially the IRS.