Cardano network decentralisation by infrastructure provider
Data gathered and analysed from 1405 Cardano Relays

Hypothesis

This article looks to give a reasonable level of transparency of the Cardano network’s vulnerability to any one hosting provider. That is to say what were to happen if any hosting provider for any reason were to block the cardano network from operating across their platform.

What does the Cardano network look like?

In case you are not a stake pool operator you may not know the exact make up of the Cardano network. The Cardano network is made up of Relays and Producers. Relays are the public facing part of the network, they relay communication between the block producing nodes without allowing direct communications The producers are the nodes which create or mint new blocks, these are typically (according to best practices) only connected directly to that pool’s relays.

The best image to visualise the network comes from Cardano’s own Stake Pool documentation. The Core nodes represent the Producers and the Relays represent the relays within the network.

Cardano Network Design
Credit: cardano.org Stake Pool operations docs

How the data was obtained

This data was extracted as a “snapshot” of the network on the 28/12/2020.

The data source was the Cardano GraphQL API which is available opensource on Github. This can be extracted with a simple CURL command.

curl -X POST -H "Content-Type: application/json" -d '{"query": "{ stakePools{ id, url, relays {ipv4, dnsName, ipv6, dnsSrvName, port} retirements { inEffectFrom, announcedIn { includedAt }, retiredInEpoch {number, lastBlockTime}} }}"}' http://localhost:3100/graphql

From here using a Python script we simply resolved all DNS records and then determined the IP ownership through whois records where available.

Analysis of the results

Most Utilised Providers

  1. DigitalOcean 20.2%
  2. Amazon 15.4%
  3. Google 13.0%
  4. Hetzner 8.8%
  5. Contabo 4.1%

Initial Thoughts

So what else does this data tell us? Well shockingly Microsoft makes up a tiny proportion of this dataset at 1.7%, that’s an eyebrow raiser.

There are a lot of home DSL connections here which are likely vulnerable to any sort of DOS attacks likely without any mitigation plans. Perhaps a future concern as bandwidth utilisation grows but hopefully these pools would become sustainable within a data centre.

The top 4 hosts are predictable and make up ~57.4% of the total network’s infrastructure within our sample set.

Limitations of data gathered

This data has its limitations that come in many forms. At the time of writing there are a total of 2597 Cardano relays. Due to several reasons mostly limitations in the whois records we are only able to analyse 1405 relays from the dataset. This gives us enough data to hopefully provide a reasonable amount of data as a sample set to represent the network.

According to best practices these data points only contain Relay nodes that are registered on the blockchain. This means that we have no clear vision as to where the producers themselves are hosted. Though it is likely they will follow the same pattern or share the same provider as their relays we cannot be certain.

Unexpected discoveries

During the gathering of our data we noticed that some pools that had DNS records for their relays were in fact offline. First we reached out to every pool we could privately about this matter to avoid hurting their reputations but plenty of pools did not have a contact form on their websites and plenty of them no longer resolved either. It became a challenge and we resorted to reaching out via public Telegram channels in an attempt to reach these SPOs where we could. Over 90 relays had DNS records that no longer resolved at the time of our research.

When investigating the adapools site to check for confirmation that these pools were offline we noticed that they were appearing as online despite the records not resolving. After quickly reaching out to the operators of Adapools they very swiftly identified a bug and had this patched shortly there after. This only affected pools utilising DNS records that have since expired.

Summary

It would seem that even though from our data ~57.4% of the network is made up by Google, DigitalOcean, Amazon and Hetzner there are plenty of other relays run off of networks from business leased lines to alternate providers. Is this number high or low? From our initial estimates we found it to be far more diversified than we would have thought but we’d love to hear your thoughts on this.

For the sake of diversity future SPOs may wish to find hosting outside of the top 5 providers for the purpose of diversification of the network. Perhaps more colocated self hosted servers within smaller data centres would assist in this endeavour?

We’d love to hear your thoughts on this data set. Please comment below or tweet at us at @NASECstakepool perhaps it is meaningless, or are we too centralised across the major providers in your eyes?

If you found this article entertaining, interesting or even perhaps useful please do let us know. We operate the [NASEC] stake pool and any delegation means the world to a small SPO like ourselves. We operate professionally out of a dedicated server provided by OVH (1.4%) and believe that based on this data we are helping to further distribute the network.