Staking Rewards is proud to present an in-depth take on the Solana validators performance. The research report is conducted by P2P Validator, who are leveraging their expertise as an industry-leading Staking Provider to provide insights into the recent Solana downtimes.
In our opinion, performance of the Solana network validators is one of the most important aspects which determine the network growth and sustainability. Our team has done a deep dive into this topic and we want to share insights gained to benefit the Solana community.
The research is devoted to the analysis of the two most important metrics reflecting Solana network validators’ performance: downtime duration (node delinquency/unavailability duration) and block skip rate (measuring how frequently a node fails to produce a transaction block which is subsequently confirmed by consensus on the network).
In this article we reveal the first part of the research findings regarding analysis of downtime. The second part covers block skip rate analysis results.
All data used for analysis in the research were obtained from publicly available sources such as Solana JSON RPC API, Solanabeach API, Validators.app API and are relevant for Mainnet beta epochs №194-236 unless another epoch or time period is explicitly specified.
Solana is a relatively new (went live in March 2020) public high-performance distributed blockchain platform curated by Solana Foundation (non-profit organization headquartered in Geneva, Switzerland) along with professional blockchain developers, organizations and individuals running validator and RPC nodes and DevOps specialists from all over the world who are dedicated to the decentralization, growth, and security of the Solana network.
Solana aims to be fast and scalable, without compromising its security or decentralization. Its theoretical throughput limit of 50k transactions per second (TPS) is twice more than VISA’s limit, which means it can be used for many real-time applications in various business areas. Solana mainnet has already handled more than 35 billion transactions with current throughput exceeding 2000 TPS (see Figure 1) due to high demand for its capabilities and various use cases including ultra-fast on-chain payments, token creation and distribution, staking through delegation to network validators, smart-contracts, NFTs issuance and trading. Solana ecosystem also provides many different DeFi services such as decentralized exchange, token swaps, liquidity farming and bridging ensuring cross-chain interoperability with other blockchains.
There are currently more than 1000 independent validators and 800 RPC nodes (see Figure 2) which comprise a physical layer for the above mentioned functionality while making the network highly secure and decentralized. Each validator supports the network’s operation by providing high-end hardware resources and properly configuring their systems to keep the network running as fast and smoothly as possible.
Validators receive SOL tokens from stakers, participate in the consensus-based process of transactions validation, get rewards proportional to delegated stake amount and distribute these rewards to stakers (proportionally to staked shares) charging a variable commission. The more stake is delegated to a validator, the more this validator (and its delegators) earns and is more frequently chosen to process new transactions on the ledger and so is exposed to greater hardware and network load. Thus, on the one hand, validators are economically motivated to keep their hardware and software running without interruptions, and, on the other, to timely update Solana software and to improve their nodes and network connection as their stake and the Solana network load increase.
Solana validators downtime
It is normal for a node to be temporarily unavailable/offline sometimes as every technical system needs periodic maintenance and reconfiguration. Typical reasons for server unavailability are usually quite simple such as planned reboots to update host configuration or software, emergencies (power outage), and network problems in the data center or at the provider.
The longer a node is unavailable, the fewer staking rewards and transaction fees it receives. Staking rewards are paid proportionally to node’s vote transactions count which it cannot post if it is offline or functioning incorrectly. Validator downtime negatively affects its delegates’ rewards, which is why one should consider checking validator recent downtime duration history before delegating to it.
During periods of downtime an unavailable validator is assigned “delinquent” status which can be checked using Solana CLI solana validators command or by parsing corresponding json response (solana –output json validators). By constantly fetching statuses of all validators on the network it is possible to measure delinquency period durations which is a good approximation for downtime duration for further quantitative analysis. The downtime data analyzed is available through the public Redash dashboard.
Factors influencing downtime duration
There are many factors influencing downtime duration but these are typical ones:
- node operator reaction time (node operators may or may not use specialized monitoring and alerting systems);
- node operator skill (imagine the difference between inexperienced enthusiasts and mature professionals who have been working with such high-load systems for years);
- time taken to repair breakdowns in the power grid or communication network (which does not depend on a node operator);
- time needed to debug and fix specific configuration errors or replace hardware parts;
- complexity and duration of software update (i.e., different Solana versions take different time to install), node startup duration, etc.
Although most of these factors cannot be measured directly, we have managed to collect and analyse some important on-chain data related to the topic, which has allowed us to quantitatively describe several aspects regarding Solana network nodes unavailability such as downtime duration statistics over time, and its variability across nodes as well as duration of node software updates.
Downtime data analysis
Here we illustrate retrospective downtime statistics of Solana nodes that were active in the period from epoch №209 (5th of August, 2021) to epoch №236 (17th of October, 2021). Historical data reveal trends in the dynamics of downtime making it easier to understand the normal behavior of the metric as well as to identify abnormal fluctuations.
Nodes downtime duration by epochs
The descriptive statistics for downtime duration by epochs are presented in the Figure 3 below. Quantile values of 5%- and 95%-level reflect the maximum downtime among the top 5 and top 95 percent of validators respectively for each epoch. Average downtime is the simple arithmetic mean and the median defines a downtime duration which divides the top 50 and worst 50 percent of validators.
As can be seen from the chart above, typical average downtime duration for a node is around 1.5 hours, which is quite low, while median downtime duration is almost always zero (which means that most nodes usually don’t experience shutdowns). Also there were several epochs (№214, 223 and 234) with high downtime duration upticks mainly due to simultaneous upgrades of Solana software version. Epoch 223 is especially interesting as it is known that on 14 of September, 2021, the Solana network experienced a severe overload which led to the halting of the network, and after a successful network restart almost all the nodes had to update to a new Solana version with the necessary fixes.
Dispersion of downtime duration
As many factors affect downtime duration, it varies greatly across validators within the same epoch. The dispersion measures indicate the metric’s spread magnitude which is slightly changing over time as shown in Figure 4.
It can be seen from the chart above that downtime duration dispersion across validator nodes is dropping slightly over time which indicates that validators, on average, have both lower downtime durations and lower deviations of the metric from the mean.
Supermajority and superminority validators comparison
Since the leading validators with a large stake amount take many more financial risks compared to the smaller ones, their nodes’ technical characteristics are far better than of the majority. Therefore, it makes sense to compare performance of the superminority set of validators (the minimal set of validators that together control more than 33.33% of the total stake) with the rest falling into the supermajority set with 66.66% of total stake (see Figure 5).
As the charts above show, the supermajority is usually much worse in terms of average downtime duration, especially after epoch №220 and during hard times like epoch №223, when the Solana network halted and most validators had to perform major software updates.
In contrast, superminority validators (especially P2P Validator) have an average downtime duration and an average number of downtimes (see Figure 6 below) that is much lower than for the supermajority, and there is a much smaller probability that a validator from the superminority set is offline for more than 5% of total epoch duration (see Figure 7 below).
Downtime duration distribution for updates and other causes
As described previously, downtimes may happen due to Solana node software updates as well as hardware upgrades and unexpected halts. The available on-chain data allows us to distinguish between downtimes related to software updates and those related to other causes and compare downtime duration distributions for the supermajority and superminority groups of validators.
According to the distributions of downtime duration not related to software updates (see Figure 8), validators groups are quite similar apart from the fact that supermajority validators are more likely to have very long outages that greatly increase the average value of downtime duration (69 vs. 34 minutes for the superminority group). It should be noted that even if the P2P Validator goes down (or delinquent), on average it happens for an extremely short time of 1.5 minutes.
For downtimes due to software updates (see Figure 9), the distributions for the groups differ considerably: for thesuperjmajority group there is much more variability in downtime duration when compared to the superminority and again supermajority validators frequently have much longer update times leading to higher average (195 vs. 76 minutes for superminority group). Superminority validators including the P2P Validator demonstrate high consistency in update duration presumably due to specific administration standards developed by professional engineers who operate these validators.
Average update time by Solana software versions
Different Solana node software versions vary significantly in the complexity and duration of the installation process, which directly affects the downtime duration associated with updates. Figure 10 below shows the average update time of Solana node software versions by validators from the supermajority and superminority groups.
Of all the most used versions of Solana node software, the update to version 1.6.25 took the longest for both supermajority (4.5 hours on average) and superminority (3.5 hours on average) validators. Long updates to versions 1.7.11 and 1.7.15 were performed only by validators from the supermajority group and took approximately 2-3 hours to complete. Overall, validators from the superminority group usually perform the updates significantly faster ensuring less rewards losses for them and their delegators.
Downtime duration is a very important metric as it reflects Solana validators’ operators’ efficiency and influences rewards received by validators and their delegators as well as overall network’s stability and security. Solana Foundation and the network validators do everything they can to improve performance of nodes and quality of software that control nodes operation, and we can say with confidence that they do it very well, especially validators from the superminority group thanks to the experience and professionalism of DevOps engineers.
Authors of the report would like to express gratitude and appreciation for the P2P Validator team whose guidance, support and encouragement have been invaluable throughout the research. We would also like to thank Stephen Akridge, co-founder of Solana, Ruud van Asseldonk, software engineer at Chorus One, and Robert Dörzbach, product manager of the Solana Beach, for helpful advice, comments and corrections.
Information presented in this report and referenced sources are for educational purposes only. It is not financial/investment advice. Seek a licensed professional for any financial advice. Authors of the report made every reasonable effort to ensure the accuracy and validity of the information provided. However, as price points, conditions, and information are continually changing, authors reserve the right to change at any time without notice, information contained in the report and make no warranties or representations as to its accuracy or up-to-dateness.
Authors of the report are employees of P2P Validator company which provides professional services and consulting for highly secure non-custodial staking across more than 20 blockchain networks, including the Solana network with mainnet and testnet validator nodes as well as RPC nodes. Therefore, P2P Validator is not a neutral party with its own business interests in the Solana ecosystem. Nevertheless, the authors did their best to make the report as objective as possible with the main purpose in mind being to educate and inform the community.