Monday, March 30, 2026
HomeSoftware DevelopmentCease Publishing Rubbish Knowledge, It’s Embarrassing

Cease Publishing Rubbish Knowledge, It’s Embarrassing

-


Twice this week, I’ve come throughout embarrassingly unhealthy information.

The primary occasion is the UK authorities’s gasoline finder information. It is a downloadable CSV file of gasoline station areas and costs from across the UK. A probably very helpful database, particularly in the course of the present battle within the Center East. A buyer urged it as a doable observe dataset for my information wrangling and visualization software program, Simple Knowledge Rework . So I had a fast look and noticed some evident errors inside a couple of minutes.

A fast plot of the latitude and longitude reveals some clear outliers:

On additional investigation, a few of these UK gasoline stations are apparently positioned within the Indian and South Atlantic oceans. In not less than one case, it seems to be like they received the latitude and longitude the mistaken manner round.

A fast have a look at the gasoline value columns additionally reveals some main points:

The ratio between the costliest and most cost-effective gasoline (per litre) is 1538:1. Clearly mistaken.

Proven as a histogram with a logarithmic Y axis:

I’m guessing that the rationale for this unhealthy information is that the gasoline stations are submitting their very own information and, people being people, they make errors. However then the federal government is publishing the information with out even probably the most primary checks. That simply isn’t ok.

I reported the issue on 22-Mar-2026. They acknowledge my e-mail on 24-Mar-2026 (“Thanks for sharing this, we’ve handed this on to the technical staff to take a look at.”). The CSV file revealed on 29-Mar-2026 nonetheless has the rubbish information.

The second occasion is a report on electrical automobiles from UK motoring group, the RAC. The primary graph within the article is that this:

Did the variety of Battery Electrical Autos on the UK’s roads suddeny drop from ~1.4 million in 2024 to ~0.0017 million in 2025? What occurred to these ~1.4 million autos? I’m guessing that somebody received their 1000’s and hundreds of thousands combined up. However then they revealed the report with this evident error. Did anybody mathematically literate even test this graph?

Awful information undermines belief in establishments and may result in unhealthy selections. I worry we’re heading for a future the place LLMs generate information, which individuals don’t trouble to correctly test. This information is then used prepare LLMs. The error is then a lot tougher to identify as soon as it’s served again with out the unique supply by LLMs. A slop-apocalypse.

Authors ought to have their work proof learn, programmers ought to check their code and information individuals ought to do primary information validation. Let’s take some satisfaction in our work.



Related articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0FollowersFollow
0SubscribersSubscribe

Latest posts