One of the problems of Ethereum, or any blockchain, is that it grows in size over time. This means an increase in the complexity of its code and its storage requirements.
A blockchain must retain all the data throughout its history which needs to be stored by all clients and downloaded by new clients. This leads to a constant increase in client load and sync time.
Moreover, code complexity increases over time because it is “easier to add a new feature than to remove an old one,” Vitalik Buterin wrote on his blog.
Therefore, Buterin believes that developers have to actively work towards stemming these growing trends while preserving Ethereum’s permanence. Buterin has therefore presented The Purge—a plan with three parts that aim to simplify the blockchain and reduce its data load.
Part 1: History expiry
A fully-synced Ethereum node currently requires around 1.1 TB of storage space for the execution client. It requires a few hundred more gigabytes for the consensus client. According to Buterin, most of this data is history, such as data about historical blocks, transactions, and receipts, many of which are several years old. To store all this history, the disk space required keeps increasing by hundreds of gigabytes every year.
Buterin believes that the problem can be solved by something called History Expiry.
Each block on a blockchain points to the previous one via a hash link. This means that consensus on the current block indicates consensus on history.
According to Buterin, as long as the network has consensus on the current block, any related historical data can be provided by a single actor through a Merkle proof, which allows anyone to verify its integrity. This means that instead of having every node store all the data, each node could store a small percentage of the data, reducing storage requirements.
Buterin basically suggests adopting the operating model of torrent networks, where each participant stores and distributes only a small part of the data stored and distributed by the network.
Ethereum has already taken steps towards reducing storage requirements—certain information now has an expiry date. For instance, consensus blocks are stored for six months and blobs are stored for 18 days.
EIP-4444 is another step in that direction—it aims to cap the storage period for historical blocks and receipts at one year. The long-term goal, however, is to have one fixed period, like 18 days, during which every node has to store everything and then the older data is stored in a distributed way on a peer-to-peer network.
Part 2: State Expiry
According to Buterin, removing the need for clients to store the entire history does not completely solve the problem of bloating storage requirements. This is because a client has to increase its storage capacity by around 50GB every year because of the “ongoing growth to the state: account balances and nonces, contract code and contract storage.”
A new state object can be created in three ways— by creating a new account, by sending ETH to a new account, and by setting a previously dormant storage slot. Once a state object is created, it is in the state forever.
Buterin believes the solution to expire state objects automatically over time needs to be efficient, user-friendly, and developer-friendly. This means that the solution should not require large amounts of computation, that users should not lose access to their tokens if they leave them untouched for years, and developers are not greatly inconvenienced in the process.
Buterin suggests two types of “known least bad solutions”:
- Partial state-expiry solutions
- Address-period-based state expiry proposals.
Partial state expiry
Partial state expiry proposals work based on the principle of dividing the state into “chunks.” This would require that everyone store the “top-level map” of which chunks are empty or not empty forever. The data within the chunks are only stored if they have been recently accessed. The “resurrection” mechanism allows anyone to bring back the data in a chunk if it is not stored by providing proof of what the data was.
Address-period-based state expiry
Address-period-based state expiry proposes having a growing list of state trees instead of just one storing the whole state. Any state that gets read or written is updated into the most recent state tree. A new empty state tree is added once per period, which could be a year.
In this scenario, the older state trees are frozen and full nodes need to store only the latest two trees. If a state object becomes part of an expired tree, it can be read or written, but the transaction would require a Merkle proof for it. After the transaction, it will be added back to the latest tree.
Feature cleanup
Over time, all protocols become complex, no matter how simple they started out.
Buterin wrote:
“If we do not want Ethereum to go into a black hole of ever-increasing complexity, we need to do one of two things: (i) stop making changes and ossify the protocol, (ii) be able to actually remove features and reduce complexity.”
According to Buterin, cleaning up Ethereum’s complexity requires several small fixes, like removing the SELFDESTRUCT opcode, removing old transaction types and beacon chain committees, reforming LOG, and more. Buterin also suggested simplifying gas mechanics, removing gas observability, and improvements to static analytics.
Mentioned in this article