busibee.be: a free search engine for Belgian company data

I launched a new side project: busibee.be. It pulls public company data from the three official Belgian government sources (the Crossroads Bank for Enterprises, the National Bank, and the Belgian Official Gazette) and makes it all searchable in one place. Type a company name, a VAT number, an enterprise number or an address and you get a page with financials, official publications, people involved, NACEBEL codes, PEPPOL info and more. Available in English, Dutch, French & German!

Ever since working at a fintech company my interest into making this type of data accessible grew,, but working with it is pain. It's technically public, yet scattered across multiple government systems, each with their own quirky interface. The existing commercial alternatives are decent but usually expensive, often with interesting data locked behind paywalls. I figured: if this data is public, access to it should be public too. No account, no trial, no credit card.

What you can do with it

Search by whatever you have and you land on the company page. The usual identification stuff is there, but the parts I find most interesting are the financials.

For financials, busibee shows a multi-year table covering assets, equity, revenue, gross margin, net profit, cash and debt, along with the main ratios, and a hand-rolled overall health score from 0 to 100. All of that is pulled from the balance sheets filed at the National Bank.

For publications, every legal act a company has ever filed in the Belgian Official Gazette is indexed: incorporations, capital changes, board appointments, dissolutions, and so on. I run them through an LLM to extract the subjects so you can see at a glance what each publication is about (the defaults are not great 😅). You can download the original PDFs too (both for publications and balance sheets).

Why I like building these kinds of projects

I really enjoy data-heavy projects like this. They generate an insane amount of leaf pages: every company gets a page, every city, every street, every activity code. That's hundreds of thousands of pages, each one a valid, useful result for someone's search query. Google likes it, I like watching the sitemap grow, and every now and then a page I didn't expect shows up in my analytics because somebody searched something oddly specific.

The funny part: looking at the traffic, most of it isn't human. The majority of requests come from bots and AI crawlers. Google, Bing, ClaudeBot, GPTBot, Meta's crawler, ... They're all hammering the site continuously. I don't really mind. I built this to make Belgian company data more accessible, and if LLMs end up using it as a source that just means more people getting correct answers about Belgian businesses when they ask an AI (which indirectly drives traffic).

Running on my own mini server at home

The whole thing is self-hosted on an ASUS NUC 14 Pro "mini server" that's stored away in a closet, running over my regular Telenet internet sub. No cloud bills, no scaling worries, just a small box doing its thing. Cloudflare sits in front for caching which takes some pressure off the upload bandwidth, and so far it's holding up just fine. No downtime, and Telenet is not complaining about my traffic (yet).

There's also a little live status page with all the nerdy stats you'd expect – requests, visitors, cache hit rate, bandwidth, energy used, external service call rates, database record counts etc etc. I use it more as an internal dashboard, but it's public anyway.

I'll probably write a separate post about the stack at some point, there's more going on behind the scenes than I expected when I started (data pipelines, queues, a search engine, LLM extraction, XBRL parsing, OCR, ..), but that's for another day. For now, go have a look, search for your employer, your accountant, your neighbour's bakery, and let me know what you think!

👉 https://busibee.be

Comments

Join the conversation by sharing on BlueskyBluesky

Aranet4
Power
Car
NUC