We finally have a blog. It took longer than it should have, but here we are.
OpenData started as a side project born out of frustration: public data is everywhere, but actually using it is a mess. You find a CSV on some government site, download it, fight with encoding issues, realize half the columns are undocumented, and repeat. We wanted a place where open datasets are discoverable, consistently formatted, and queryable without the usual headaches.
What we’ll write about
This blog is where we’ll share what we’re learning as we build this thing out. Expect posts on:
- Infrastructure deep dives covering how we handle ingestion, storage, and querying at scale
- Data source spotlights walking through interesting public datasets and what you can do with them
- Project updates on new features, providers, and API changes
- Lessons learned from the trenches of building data infrastructure
Quick taste of the API
If you haven’t poked around yet, here’s what querying a dataset looks like:
curl "https://opnhub.ai/v1/datasets/bls/unemployment-rate/query?limit=5"
{
"data": [
{ "year": 2025, "month": 12, "rate": 4.1 },
{ "year": 2025, "month": 11, "rate": 4.2 }
],
"total_rows": 1842,
"columns": ["year", "month", "rate"]
}
Every dataset gets a stable API endpoint backed by Parquet files and DuckDB. No API keys needed for public data.
Following along
The whole platform is open source under Apache 2.0. If you’re into open data, data engineering, or just want to see how the sausage gets made, stick around. We’ll try to keep these posts practical and worth your time.