Hypothesis
This article looks to give a reasonable level of transparency of the Cardano network’s vulnerability to any one hosting provider. That is to say what were to happen if any hosting provider for any reason were to block the cardano network from operating across their platform.
What does the Cardano network look like?
In case you are not a stake pool operator you may not know the exact make up of the Cardano network. The Cardano network is made up of Relays and Producers. Relays are the public facing part of the network, they relay communication between the block producing nodes without allowing direct communications The producers are the nodes which create or mint new blocks, these are typically (according to best practices) only connected directly to that pool’s relays.
The best image to visualise the network comes from Cardano’s own Stake Pool documentation. The Core nodes represent the Producers and the Relays represent the relays within the network.
How the data was obtained
This data was extracted as a “snapshot” of the network on the 28/12/2020.
The data source was the Cardano GraphQL API which is available opensource on Github. This can be extracted with a simple CURL command.
curl -X POST -H "Content-Type: application/json" -d '{"query": "{ stakePools{ id, url, relays {ipv4, dnsName, ipv6, dnsSrvName, port} retirements { inEffectFrom, announcedIn { includedAt }, retiredInEpoch {number, lastBlockTime}} }}"}' http://localhost:3100/graphql
From here using a Python script we simply resolved all DNS records and then determined the IP ownership through whois records where available.
Analysis of the results
Most Utilised Providers
- DigitalOcean 20.2%
- Amazon 15.4%
- Google 13.0%
- Hetzner 8.8%
- Contabo 4.1%
Initial Thoughts
So what else does this data tell us? Well shockingly Microsoft makes up a tiny proportion of this dataset at 1.7%, that’s an eyebrow raiser.
There are a lot of home DSL connections here which are likely vulnerable to any sort of DOS attacks likely without any mitigation plans. Perhaps a future concern as bandwidth utilisation grows but hopefully these pools would become sustainable within a data centre.
The top 4 hosts are predictable and make up ~57.4% of the total network’s infrastructure within our sample set.
Limitations of data gathered
This data has its limitations that come in many forms. At the time of writing there are a total of 2597 Cardano relays. Due to several reasons mostly limitations in the whois records we are only able to analyse 1405 relays from the dataset. This gives us enough data to hopefully provide a reasonable amount of data as a sample set to represent the network.
According to best practices these data points only contain Relay nodes that are registered on the blockchain. This means that we have no clear vision as to where the producers themselves are hosted. Though it is likely they will follow the same pattern or share the same provider as their relays we cannot be certain.
Unexpected discoveries
During the gathering of our data we noticed that some pools that had DNS records for their relays were in fact offline. First we reached out to every pool we could privately about this matter to avoid hurting their reputations but plenty of pools did not have a contact form on their websites and plenty of them no longer resolved either. It became a challenge and we resorted to reaching out via public Telegram channels in an attempt to reach these SPOs where we could. Over 90 relays had DNS records that no longer resolved at the time of our research.
When investigating the adapools site to check for confirmation that these pools were offline we noticed that they were appearing as online despite the records not resolving. After quickly reaching out to the operators of Adapools they very swiftly identified a bug and had this patched shortly there after. This only affected pools utilising DNS records that have since expired.
Summary
It would seem that even though from our data ~57.4% of the network is made up by Google, DigitalOcean, Amazon and Hetzner there are plenty of other relays run off of networks from business leased lines to alternate providers. Is this number high or low? From our initial estimates we found it to be far more diversified than we would have thought but we’d love to hear your thoughts on this.
For the sake of diversity future SPOs may wish to find hosting outside of the top 5 providers for the purpose of diversification of the network. Perhaps more colocated self hosted servers within smaller data centres would assist in this endeavour?
We’d love to hear your thoughts on this data set. Please comment below or tweet at us at @NASECstakepool perhaps it is meaningless, or are we too centralised across the major providers in your eyes?
If you found this article entertaining, interesting or even perhaps useful please do let us know. We operate the [NASEC] stake pool and any delegation means the world to a small SPO like ourselves. We operate professionally out of a dedicated server provided by OVH (1.4%) and believe that based on this data we are helping to further distribute the network.
Great insight and good work.
Are the pools you are referencing relative to the amount of stake each pool has?
Or is it simply the number of pools (even with 0 stake) and their locations?
This may impact the true decentralization figure, along with all the other considerations you had mentioned.
Otherwise thank you for the hard work!
Thanks Crador, we have included all registered relays. We didn’t know where to draw the line if we began striking off smaller pools so we felt it would be good to just provide all the data we could.
Great work and insight. Thank you!
Hi, thank you for this post! Could get your script? want to update this graphic and choose a datacenter to move to.
Hey sorry for the long delay in responding! I have since misplaced my script which performs the whois function but it’s fairly simple to reproduce with Python and the GraphSQL datasource. I’ll hopefully release a video on this soon with a link to a github repository.
Good info, thanks!
Alex, This is good analysis for consideration. I guess there is some risk as you said in your video regarding say AWS banning the protocol, but I think the likelihood of that happening is low unless somehow a Government identifies Cardano specifically as a harm in which case all pools regardless of host or internet provider will likely be blocked. I think mostly from a host provider, they have factored the costs for all hosting or will modify the cost of hosting and that may cause SPOs to move if the economics do not work. From a network decentralization point of view, for those large host providers, it may be interesting to break out how have SPOs allocated their pools across these providers? Meaning for example, AWS has hosting in multiple regions of the world. How distributed are the SPO operators across those AWS hosting locations? If one of AWS regions is impacted, which does happen, where are the biggest risk areas regionally?