PH 2022 election results analysis using Newcomb-Benford’s Law (Sabong News)
Author
Wilson Chua
Date
MAY 12 2022
How do accountants (Like Me) quickly detect the presence of fudged data in financial reports? We use a little known technic called the Newcomb-Benford’s law to efficiently and quickly analyze a set of numbers. FYI:
How does this law work? We take the leftmost digits of all the numbers in any report and tabulate the frequency each LEFTMOST digit (from 1 to 9) comes up. We then look at the distribution/histogram. A normal data distribution would look like this:
This means that a normal set of numbers would see around 30% of all the numbers to start with ‘1’. The ‘2’ will have an expected distribution of 17.6% and so on.
First I show what Benford Distribution would look like in a known false/fudged/faked data. For this I use the TEST dataset from Comelec’s transparency server.
I followed the steps outlined here:
for the steps to visualizing a Benford’s Law algorithm in Tableau.
I connect to the TEST data in
and
. I limit the table to only the test results for PRESIDENTIAL candidates.
I create a calculated field named “TotalVotesByBrgy” (and for President) using LOD (level of detail) set to the Barangay levels.
We now create two more calculated fields:
This extracts the LEFTMOST digit of each of the barangay level tallies for each candidate. And then another calculated field to hold the log of the bedfords Law:
From the
pane, drag
to
, and then drag
to
.
Click
on Rows and choose
>
. I drag this
to the Text Label marks.
Your view now shows the distribution of first digits, and the size of the bars (decreasing from left to right) suggests that the data in this case conforms to Benford’s law.
In the Percentages area, type 80,100,120.
This specifies that you want bands spanning from 80 to 100 percent, and from 100 to 120 percent. Next you will specify what value the percentages are referencing.
We see that most of the leftmost digits did not conform to the expected Benford’s Law distribution. The distribution should be at or near the yellow bands. These yellow bands represent 80–100% of expected distribution. This is what a faked data/fudged/manipulated data would look like.
Next we analyze the
election results from the Comelec Transparency server data (May 11, 2:17pm). This was supplied to Manila Bulletin. And using the same process outlined above, this is the resulting distribution:
Benford Plot for Presidential Votes:
The overall charts conforms to the expected patterns predicted by Benford’s Law algorithm. So I see no red flags that might indicate manipulation. However, as I am hoping my peers can chime in with their ideas and help peer review this as well.