1) Title

Highly available load-balanced endpoints reserved for ICON P-Reps

2) Project Category

Infrastructure: for supporting the underlying code base of the blockchain - Infrastructure supporting tools, bug patches, node maintenance tools, etc.

3) Project Description

As was outlined in our forum post (link), ICON's current centralized endpoint infrastructure presents a critical single point of failure for the entire network.

Having a centralized API endpoint that numerous parties rely on (e.g. ctz.solidwallet.io) is a mission-critical risk. As seen, if this endpoint goes down, it results in a cascading set of problems that impact the whole network. This grant aims to mitigate these problems.

As we see it now (awaiting further discussions with the Foundation), the core of the problem is with the central solidwallet endpoint, which is what all the nodes on the network by default sync off of. If it goes down, all the sub-preps go down. If enough main preps go down after that, the network will stop functioning as they, by default, sync off of solidwallet. Even if a main prep maintains their own citizen node, it would need coordination between main preps such that they are syncing off each other which in of itself creates additional problems.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/60882816-e508-4430-b977-f0c909baff17/Untitled.png

What we see is needed is a private endpoint that is syncing off several voluntary main preps and serves as a common sync point for the rest of the critical infrastructure. If we focus on making that common sync point as robust and fault tolerant as possible, all the subsequent API infrastructure would be able to sync off it. It is important that this sync point is private such that attackers are unable to access it. We would also be able to potentially make the REST API port 9000 private for main preps as the rest of the network will no longer need access to that port.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ed474e1d-6788-420e-948f-2fa5f604c196/Untitled.png

After hardening the sync point, further decentralization of the API interface can proceed, particularly in cooperation with Pokt network who appear to have a solid solution to convert the sub-preps into load balanced API nodes and be further compensated for operating a node on the network. These nodes could then take over the load of solidwallet as we prove out its stability on the network. Regardless of if the network fully embraces Pokt as an API provider, having a private network would be critical in making the network more stable.

This type of architecture is not entirely uncommon in other top performing ecosystems. Cosmos, for instance, employs a private sentry layer where nodes are connected to validators over a secure VPN connection. We have implemented such a setup with WireGuard for other ecosystems with this grant laying the groundwork for a follow-on grant to implement this setup. While Cosmos' networking protocols are different than ICON's, the protection layer, in practical purposes, is serving the same purpose.

All development of the infrastructure will leverage prior work done for the Polkadot network, where we built a set of load balanced endpoints on each of the major cloud providers. A critical consideration is taken with auto-scaling behavior as baseline operations should be relatively lean but the endpoints need to be able to scale to meet whatever demand. Crucially, scaling needs to be possible in a reasonable amount of time to make it effective. Currently sync time is roughly 2 hours which can be reduced to under 15 minutes if handled by a custom content delivery network that is updated on a regular basis by a source of truth node. We have already implemented this methodology in Polkadot and were able to bring the scaling time down to 5 minutes which makes scaling simple to manage. We intend on implementing this setup on our own and could operate it in the long term such that all nodes could sync and recover faster.

While the proposal is aimed mitigating the specific threat vectors outlined above, by building it with automated deployment tooling we will be able to offer the community a one-click deployment of citizen nodes for DApps, thereby decreasing the reliance on the public solidwallet endpoint. This type of tooling will come with minimal effort when executing this grant. All components will be built with a combination of Terraform, Ansible, Helm, and fronted with a CLI to walk users through the deployment process.

4) Project Duration

4 Months