Solana Validators play an integral role in the expansion of the Solana Ecosystem and to better understand them Staking Rewards is proud to present an in-depth take on Solana validator performance. The research report is conducted by P2P Validator, who are leveraging their expertise as an industry-leading Staking Provider to provide insights into the on-chain performance of validators securing the Solana network.
The first article described our findings regarding analysis of downtime history for all validators on the network. In this part of the report we cover the results of transaction block skip rate analysis.
All data used for the analysis were obtained from publicly available sources such as Solana JSON RPC API, Solanabeach.io API, Validators.app API, Bitquery.io and are relevant for Mainnet beta epochs №194-236 unless another epoch or time period is explicitly specified.
Solana validators skip rate
Skip rate measures the percentage of cases that a leading validator (a validator selected to process a block of transactions during its scheduled slot) fails to produce a transaction block which is subsequently confirmed by consensus on the network. A lower number of skip rate means that the leader is successfully producing blocks at a high rate which are not on forks not chosen by the network.
The more stake is delegated to a validator, the more frequently this validator is chosen to be a leader to process blocks of transactions and so is exposed to greater hardware and network load. Validators are economically incentivized to not skip blocks as they are losing their share of fees for transactions in the processed blocks.
Leading validators do not skip blocks intentionally. Node operators understand that skipped blocks are bad for the Solana network overall as high skip rate increases latency for end users, so they are always monitoring and trying to lower skip rate of their validators. In most cases, validators produce blocks very well, but there are several factors which have a high impact on blocks skip rate which are described below and illustrated with dependency charts in cases when corresponding on-chain data was available to collect.
To collect data on skip rate we use the Validators.app API: the validator_block_history endpoint returns the history of block production statistics for a given Solana network validator. Skip rate is calculated as the number of skipped leader slots divided by the number of total leader slots in an epoch.
By constantly fetching the skip rate statistics every epoch we accumulate historical data in our internal database for further analysis and sharing: the accumulated data and some useful charts are available through the public Redash dashboard.
Factors influencing skip rate for Solana Validators
Solana node software version
Solana node software versions are constantly updated by the joint efforts of the Solana foundation and community developers in order to eliminate bugs and errors in the source code to increase stability and speed of the network. When a new stable version becomes available, validator operators gradually update their nodes software (see Figure 1). Updates usually lead to significant improvements in overall network performance and, in particular, decrease skip rate.
As a rule, most validators prefer the stable versions recommended by the Solana Foundation as these are the most reliable and properly tested ones. The developer community is tirelessly working to improve the implementation of the protocol, expanding its capabilities and performance, and propose Solana software version updates approximately once a month. It can be seen on Figure 2 that each new stable version decreases skip rate except version 1.6.27 which was used by a large share of validators (but not for long) and 1.7.10 which has been used by a few validators for a relatively long time.
Node downtime directly affects skip rate because when a node is offline it can not produce blocks. If there were no other influencing factors then there should have been a strong linear dependency between node downtime duration (expressed in percentage of epoch duration) and skip rate. As skip rate also depends on other factors, there are huge deviations from a linear relationship, as seen in Figure 3. However, the least-squares estimate of the regression line is very close to a direct proportional relationship making the downtime duration the most determinative factor for skip rate especially for long downtimes.
Please note that there are a small number of cases where the downtime is high, but the skip rate is very low, which at first glance seems to be contrary to common sense. Such cases are explained by the fact that validators with a small stake are rarely chosen as leaders and, therefore, receive very few blocks for validation during an epoch. Even if such a small validator goes offline for >80% of the epoch duration, it still has a chance to successfully process a small number of blocks that fall into the remaining uptime window.
Data center location
Node’s data center location may have an impact on skip rate as some of the nodes are located in data centers that are very far from the majority (in terms of signal latency) as well as quality of technical support and network stability can vary greatly in different data centers as well as from country to country. High network delays for such nodes increase the probability that the produced block will reach the rest validators too late and will not be accepted by the majority of closely located nodes after they have already agreed on another block.
In the medium term, geographic and data center distribution of nodes is gradually changing (see Figure 4) for various reasons: from the cost of service in data centers to the regulatory environment in different countries and also depending on staking pools or Solana Foundation Delegation Program scoring criteria.
Node concentration in data centers located in Germany (DE) is the highest but gradually decreases, Finland (FI) and Canada (CA) data centers also lose their validator clients, while popularity of United States (US) and France (FR) data centers is growing rapidly among the Solana validators operators. The box plot presented on Figure 5 below shows the variability and median of skip rate value in different data center locations.
Statistics show that nodes located in Canada, Russia (RU) and Finland data centers experience one of the highest median skip rates, while Ukraine (UA), Japan (JP), Israel (IE) and the United States demonstrate much lower median rate.
It is not only the data center itself that affects the skip rate of the node located in it (due to the quality of service and geographic location). Since block confirmation depends not on a leading node, but on the consensus process within the global Solana network, such an integral network characteristic as the concentration of nodes in data center locations is also a determining metric. We assume that if the geographical concentration of nodes is higher then it is less likely that a block produced by a randomly picked validator will be rejected by consensus (forked) since this block will reach the majority faster as the majority is “closer” to each other.
To check this assumption, we have got the data on data center identifiers attached to each validator (see validators.app API) and calculated the Shannon’s entropy of nodes’ distribution by data center country and compared it with the average skip rate by epochs (see Figure 6). Higher value of entropy means that the Solana network is more geographically decentralized and lower value indicates that more nodes are concentrated in a few data centers located in the same countries.
Indeed, the validity of our assumption is supported by the statistics presented. Therefore, a dilemma arises: on the one hand, to improve the network security, it is necessary to increase its geographical decentralization, which, in turn, increases the skip rate and slows down the network, and on the other hand, for successful real-time applications, it is necessary to maximize the network performance.
Every month, countless new projects appear on the Solana, existing projects integrate each other’s capabilities, and the ecosystem is growing at a very high pace. The Solana network adoption and growing utilization is reflected by a rapid increase in the number of transactions per unit of time meaning more computational and storage resources are needed to be provided to cope with the demand. This is one of the reasons that Solana Foundation launched a server program to make it easier to find and quickly set up a node for those willing to run a validator on the network.
Higher blockchain load and, consequently, higher validators load emerge during periods of increased user, platforms and protocols activity in the Solana ecosystem. For example, when a long-awaited new project releases its NFTs or fungible tokens for public sale at a certain point in time, tens of thousands of users and trading bots put a massive load on the Solana blockchain in terms of transactions per second (TPS).
To illustrate the Solana blockchain load growth, we obtained transaction count data using the public Bitquery GraphQL API, which dynamics is shown on Figure 7.
The overall rapid growth is obvious from the chart. The dynamic has several short-term surges, probably associated with the launch of new hyped projects. An interesting observation is that from about mid-October, short-term drops in the TPS by 20-50% began to be periodically observed.
Since slot duration for a leader node is fixed, it needs to process a larger number of transactions within the blocks during periods when a higher number of transactions are initiated. This may be within the capabilities of the nodes managed by large validators, however, many of the nodes in the network are not of top notch quality and, therefore, do not always cope with the higher than average load, which leads to an increase in block skip rate for them.
Figure 8 demonstrates that the largest validators falling into the superminority set don’t show the dependence of the skip rate on TPS spikes, while the medium- and small-sized validators (in terms of their stake share) do. On average, for the medium-sized validators, skip rate increases by 0.22 p.p. per 100 additional TPS on top of the average TPS during an epoch, while for the small validators, it increases even more, by 1.13 p.p.
Node hardware characteristics
Obviously, the technical characteristics of the nodes have a strong impact on the block skip rate. Among all the validators in the ecosystem, there are those who use the most modern dedicated servers and powerful hardware with huge amounts of memory and the highest bandwidth as well as those who set up virtual servers for validation with rather mediocre performance.
Unfortunately, information about the current configuration of nodes, and even more so past changes of it, in practice is almost never disclosed by validators, which makes it impossible to quantify the influence of this most important factor on skip rate.
The Solana network is so fast that a simple failure in the local or global network can cause blocks processed by a leader to pass through the network with a slight delay. Such delays lead to inconsistencies in the consensus process of transaction validation such that other validators don’t see the block produced by a leader and vote to validate another block effectively pushing a leader into a short-time fork (which is always resolved by the protocol very fast). In most cases it is not the leader’s or network provider’s fault as there are clusters of nodes on the network for which the throughput is naturally higher due to the slightly lower distance between these nodes and the rest of the network.
“A memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released” (Wikipedia article). Over time, memory leaks affect the performance of both the particular application as well as the operating system of the node.
Validator nodes are controlled by Solana software written mostly in the Rust programming language which provides some memory safety guarantees but it is still possible to accidentally allocate memory that is never cleaned up. Unfortunately, these leaks accumulate into a large total leak which might result in unacceptable response times due to excessive paging. Large leaks slow down nodes and increase a chance to skip a block production.
Solana software has several known issues leading to memory leaks that are actively discussed by the developers community (see example of an issue) and fixed in newer versions as soon as possible to increase the Solana network performance. Figure 9 illustrates changes in skip rate over epochs between downtime periods, which is believed to be partly due to an accumulation of memory leaks. Since Solana’s codebase is constantly evolving, it is not always possible to quickly identify all possible causes of memory leaks and get rid of them by software updates. Therefore, node operators may sometimes resort to restarting their machines in order to clear out the memory and decrease skip rate.
The analysis of skip rate dynamics over epochs during uptime periods demonstrates that, on average, skip rate increases by 0.22 p.p. per 1 epoch. Therefore, after 10 epochs of continuous operation one should expect that skip rate will increase by 2.2 p.p. (e.g. from 5% to 7.2%). The direct dependency can be traced for almost all versions of Solana, except 1.6.21 (insignificant dependency), 1.6.22 and 1.6.24 (both significant inverse dependency).
Skip rate evolution for Solana Validators
This section illustrates retrospective skip rate statistics of Solana nodes that were active in the period from epoch №194 (21st of June, 2021) to epoch №236 (17th of October, 2021). Historical charts show the rapid decrease of skip rate as well as its variability over validators which is very good for the Solana network speed and stability.
Skip rate distribution over time
The descriptive statistics for skip rate by epochs are presented in the Figure 10 below. Quantile values of 5%- and 95%-level reflect the maximum skip rate among the top 5 and top 95 percent of validators, respectively, for each epoch. Average skip rate is the simple arithmetic mean and median defines a skip rate value which divides the top 50 and worst 50 percent of validators.
Average skip rate is almost constantly decreasing during the epochs under analysis. Since frequency of node downtime and average downtime duration do not change significantly, this effect is ensured, in our opinion, by updates of Solana software.
Current average skip rate for a node is just around 6%, which is dramatically lower than ~25% for epoch №194 meaning that in less than six months, the frequency of skips has dropped by as much as 4 times. There are even validators with skip rate below 2%, whose number is growing epoch to epoch.
Skip rate for validators groups
The difference between validators from the superminority and supermajority group is not significant overall (see Figure 11) but starting from epoch №220 superminority validators surpassed supermajority in terms of successful block production rate which is an incredible achievement, since nodes of validators with highest stake are constantly experiencing extreme loads and, therefore, have a much stronger impact on the network speed and stability.
Dispersion of skip rate over time
Node skip rate variation within the same epoch is decreasing significantly over time which shows that validators have both lower average skip rate as well as lower deviations of it from the mean (see Figure 12).
Top stake validators skip rate
The top stake validators on the Solana network process and produce a very large number of transaction blocks. Their performance has a major impact on speed and resilience of the network. That is why it is important to analyse their skip rate statistics separately from the rest of the validators. Figure 13 shows skip rate dynamics for top 5 validators (in terms of average stake) against the network average (shown in grey dashed line) from epoch №210 to epoch №236.
Top validators have been performing quite well (close to the network average) from epoch 210 to epoch 220 and much better since the 220 epoch when their skip rate became significantly lower than the network average (except a short period from 204 to 207 epoch for Staking Facilities). Among these top validators P2P Validator performed particularly well with almost consistently lowest skip rate and current values around 1-2%.
Summary of Skip Rates and Solana Validators
Skip rate is a very informative indicator of the efficiency and productivity of validators. It reflects validators performance during transaction blocks production, consistency of the consensus process and the Solana network overall stability. For validators, skip rate should be carefully monitored as higher skip rate leads to loss of rewards resulting from confirmed transaction fees. For users, lower average skip rate means higher speed and stability of the Solana network.
There are many factors described in the report that have an impact on skip rate: duration of node downtime, Solana node software version, data center location and concentration, TPS spikes as well as memory leaks during continuous operation without validator restarts or software updates. Understanding these factors and estimation of their impact is very important for ongoing work on increasing the overall performance of Solana network. Thanks to the efforts of the Solana Foundation and the community, the average skip rate has gradually decreased, and it can be reduced even more by working on the indicated factors.
Authors of the report would like to express gratitude and appreciation for the P2P Validator team whose guidance, support and encouragement have been invaluable throughout the research. We would also like to thank Stephen Akridge, co-founder of Solana, Ruud van Asseldonk, software engineer at Chorus One, and Robert Dörzbach, product manager of the Solana Beach, for helpful advice, comments and corrections.
Information presented in this report and referenced sources are for educational purposes only. It is not financial/investment advice. Seek a licensed professional for any financial advice. Authors of the report made every reasonable effort to ensure the accuracy and validity of the information provided. However, as price points, conditions, and information are continually changing, authors reserve the right to change at any time without notice, information contained in the report and make no warranties or representations as to its accuracy or up-to-dateness.
The Author of the report is an independent contractor of P2P Validator company which provides professional services and consulting for highly secure non-custodial staking across more than 20 blockchain networks, including the Solana network with mainnet and testnet validator nodes as well as RPC nodes. Therefore, P2P Validator is not a neutral party with its own business interests in the Solana ecosystem. Nevertheless, the authors did their best to make the report as objective as possible with the main purpose in mind being to educate and inform the community.
Sources of data