The Ethereum Whitepaper

Paper Summary: “A Next-Generation Smart Contract and Decentralized Application Platform”

Dickson Wu

Published in

Geek Culture

21 min readNov 4, 2021

Paper by: Vitalik Buterin

Introduction:

In 2009, Bitcoin was born. That day was the start of a radical revolution of currency! Why? It was the first asset that had no backing, no intrinsic value, and no centralized controller (and yet gained traction and has value). But also: it’s a major usage of blockchain, which has huge sub-branches!

Sub branches include:

Coloured coins: Digital assets (ex: customer currencies & financial instruments) that are on the blockchain
Smart property: Ownership of physical assets
Namecoin: Non-fungible assets (ex: domain names) (Non-fungible means you can’t reproduce it or be replaced (an NFT can only be owned by 1 person))
Smart Contracts: Code that’s on the blockchain that can be executed when certain conditions are met
Decentralized Autonomous Organizations (DAO): Literally in the name :P

Why Etherum? Ethereum is a blockchain that has a Turing-complete programming language to be able to do all those sub-branches above + applications that have yet to be imagined!

Introduction to Bitcoin and Existing Concepts:

History:

Although Bitcoin was created in 2009, the idea of decentralized digital currency existed since the 1980s and 1990s! Chaumian blinding had lots of privacy — but it failed because it relied on a centralized intermediary.

In 1998, B-money was proposed as a decentralized consensus fashion of getting money by solving computational puzzles — but it failed because it wasn’t detailed in how to implement the decentralized consensus.

In 2005, reusable proofs of work → led to the idea of cryptocurrencies. But it failed due to relying on trusted computing. Finally, in 2009, Bitcoin was created through a combination of its predecessors!

Why is proof of work such a big deal? It’s because it solves many problems at once!

Simple and effective consensus mechanism. Nodes can all agree on the state of the Bitcoin Ledger
Allows free entry to become a miner, but also prevents Sybil attacks (where you just make a ton of fake accounts). It does so through an economic barrier — where you need computing power

Another alternative is Proof of Stake — the weight of the node is proportional to how much currency it holds. Both approaches are the backbone of cryptocurrencies.

Bitcoin As A State Transition System

You can see Bitcoin from the perspective that it’s a state transition system:

You have one state (a ledger of balances) → Transition into → a new state with an updated ledger of balances.

We can even define the system formally: “APPLY(S, TX) → S’ or ERROR”. Where Bitcoin is the APPLY function. ERROR occurs if we’re trying to send more funds than we actually have.

The State of the Bitcoin is just a collection of coins or UTXOs — but there is no list of owners and balances! Just the UTXOs. These UTXOs will say who their owner is (and thus that owner is able to actually spend it) + how much they’re worth.

Inputs of the transaction = reference to an existing UTXO + the cryptographic signature of the owner. The output of the transaction is the new UTXO.

So the process of the APPLY(S, TX) → S’ is:

For each input in TX: Return an error if → UTXO isn’t in S (can’t send coins that don’t exist) or the signature doesn’t match the owner of the UTXO (can’t send coins that aren’t yours)
If the sum of denominations (value) of input UTXO < sum of denominations of output UTXOs (conservation of value) return error
Return S’. Remove input UTXOs + Add output UTXOs

Mining:

We can’t just save the ledger on a server — that would be centralized. Instead, we need to make sure this whole thing is decentralized. So we gotta make sure a ton of nodes agree on a common history.

We do this by making the nodes constantly trying to create new blocks to add to the longest chain. A new block is created every ~10 minutes. The inside of the block contains:

Timestamp
Nonce
Reference to the previous (the hash of the) block

It’s that last part that chains these blocks together.

There’s an algorithm that checks if the blocks are valid:

Check if referenced (previous) block actually exists and is valid
Check if the timestamp of the block is greater than a previous block and less than 2 hours in the future
Check if PoW is valid
Check if the individual transactions within the block are valid

We can see here that Bitcoin isn’t keeping track of the state in any way — it’s just keeping confirmation that the transactions were valid and happened. The only way to actually get the state is by computing everything from the genesis block to the current block. Also: it matters what the transaction order within the blocks is.

To adjust the difficulty of the PoW, we require the hash that’s produced to have a certain number of leading 0’s → Which translates to getting the output to be less than a certain number. As of October 15th, it was 2¹⁸⁷, meaning that you need to try ~2⁶⁹ tries before getting a valid hash. The number is adjusted every 2016 blocks.

This is why proof of work requires so much work. The hash function is irreversible, thus you have to brute force the hell out of guessing the nonce (the work) until you finally find the valid hash!

If you mine the block you get rewarded with Bitcoin. Also if the inputs denomination < outputs denomination then that’s the transaction fee.

If we wanted to attack the network (aka send money to someone, and then redo that transaction by building another fork of blockchain), we would need the majority of the computing power.

This is because while everyone else is working on the valid chain (where the attacker sent the money to the other person), the attacker has to single-handedly build a chain that becomes longer than the valid chain. This isn’t possible unless you have the majority of the computing power.

Merkle trees:

Now we don’t store all our transactions within the block — instead, we’re storing everything inside the Merkle tree in order to save space! A Merkle tree is a binary tree where each node is just the hash of its children — except the leaves, which are the transactions. The root node (the final hash) is the only thing that’s included in the blockchain.

Now to validate a transaction, you can download a pruned version of the Merkle tree, and then compute the hashes to see if the root node matches your newly computed hash node.

If we were to store the full blocks that would be 15GB in RAM as of April 2014, and grow at ~1GB per month. Instead, there’s a protocol called Simplified Payment Verification (SPV) which utilizes the Merkle trees to run!

Alternative Blockchain Applications:

There are other applications to blockchain than just transactions!

Namecoin: Where you can register your name and no one else can take it.
Coloured Coins: Where you can create your own coins (or NFTs). You colour the coins by announcing publically that certain UTXO’s are these colours, and then you recursively search through your UTXO to find its colour
Meta Coins: Protocols that live on top of Bitcoin where there are different state transitions. You can’t implement complex transactions, but you can do simple ones.

So there are generally 2 approaches to building a consensus protocol:

Build a completely new one: Increases flexibility, but is incredibly difficult to implement. Not all applications are able to muster the support to build their own blockchain
Build one on top of Bitcoin: No need to start from scratch — but there are limitations in terms of complexity and scalability. They don’t inherit SPV, thus can’t scale.

Scripting:

With Bitcoin, we can actually implement smart contracts! This means that in order for a transaction to pass it must be verified by the smart contract itself. We can do things like Multisig (multiple parties confirming for something to pass) or decentralized cross-cryptocurrency exchanges.

But there are limitations:

Lack of Turing Completeness: Namely missing loops, which leads to space inefficient if statements
Value-blindness: So we can’t fine-grain control the amount that can be withdrawn. UTXOs are All or nothing
Lack of State: UTXOs are spent or not spent. So it’s hard to have states between
Blockchain-blindness: UTXOs can’t even see the blockchain data

All those limitations mean we can get around it in 3 ways: Build a new blockchain, script on top of Bitcoin, or build a meta-coin on top of Bitcoin. Each has its drawbacks.

Introducing Ethereum! Ethereum makes it much easier for people to build applications on top of it while being tied with an economic environment, and security of the blockchain.

Ethereum:

Ethereum is all about providing a platform to build Dapps! They optimize for rapid development times, security, and efficient interactions. Ethereum is a Turing complete programming language, value + blockchain + state aware blockchain so we can create whatever Dapp we want! (Namecoin = 2 lines of code, currencies and reputation systems = 20 lines of code).

Ethereum Accounts:

The state in Ethereum is a bit different than in Bitcoin — there are accounts! These accounts are 20-byte addresses with 4 fields within them:

Nonce (to act as a counter so you can’t spam the same payment)
Ether (Crypto-fuel fo Ethereum, it’s the transaction fee) Balance
Contract code
Storage

There are 2 types of accounts:

Externally owned accounts: Controlled by private keys/people. Externally owned accounts don’t have code + can send messages (aka transactions)
Contract accounts: Controlled by contract code. When they get a message they activate and execute their code. Contracts != those paper contracts. Rather they’re like little robots that execute their code but also have their own ether balance

Messages and Transactions:

Transaction in Ethereum means it was signed and sent by a person. This transaction contains:

Recipient of message
Signature of sender
Amount that is transferred
Optional data field
STARTGAS = the maximum amount of Gas the transaction can use
GASPRICE = the fee per computational step

The first 3 = standard fields of cryptocurrenices. Data field = does nothing by default, but the smart contract is able to read this data and do things with it.

STARTGAS and GASPRICE are to prevent DDOS attacks (aka someone who runs an infinite loop). Since each computation will cost money (with different computations costing different amounts), an attacker must pay proportionally to all the resources they use.

Messages:

Contracts can actually send messages to other contracts too! They’re called… Messages. Messages contain:

Sender of message (implicit)
Recipient of message
Amount of ether transferred
Optional Data Field
STARTGAS

It’s literally the same as a transaction — except sent by a contract rather than a person. These are executed through the CALL opcode.

Note: for STARTGAS, this is the total amount of Gas for both the transaction and all its sub-executions.

Ethereum State Transition Function:

Here are the steps for the APPLY(S, TX) → S’ for Ethereum:

Check if transaction is filled out, a valid signature, nonce = sender’s nonce
Calculate the transaction fee (STARTGAS * GASPRICE), find the sender address (from the signature). Subtract transaction fee from balance + increase the nonce
Initialize GAS = STARTGAS. Subtract some gas per byte to pay for the transaction
Send money to the receiver. If the receiver doesn’t exist, create an account. If the receiver is a contract run it → Until completion or runs out of gas
If at any time there’s a failure, revert everything, except for the gas to pay for the fees → which are sent to the miner
Else, refund the remaining gas fee to the sender → Send fees used to the miner

If we’re sending to another person then the transaction cost would only be GASPRICE * length of the transaction (in bytes). Also, messages work like transactions in terms of their reverts (only the message + all its sub-executions are reverted, not the contract).

Code Execution:

All code in Ethereum is written in low-level stack-based bytecode. There are 3 types of places we can store data:

Stack: Last-in-first-out container (reset after computation ends)
Memory: Infinitely expandable array (reset after computation ends)
Storage: Long-term storage, key/value storage. This is kept on the blockchain

The EVM execution model is pretty simple. There’s just a giant tuple that’s being updated constantly: (block_state, transaction, message, code, memory, stack, pc, gas).

Every round of execution = executing the “pc”th byte in the code → Where each instruction has its own instructions on how to affect the tuple (ADD pops the top 2 items from stack and then appends their sum, reduces gas by 1 and increases pc by 1).

Blockchain and Mining:

Ethereum is much like Bitcoin’s blockchain — except there are some differences of course! The blocks in Ethereum have the transition list + the most recent state + 2 new values (block number and difficulty). Block validation looks like:

Check if the previous referenced block exists and is valid
Check if the timestamp of the block is greater than the previous block and less than 15 minutes into the future
Check if all the contents of the block (block number, difficulty, transaction root, uncle root, gas limit) are valid
Check if PoW is valid
Execute all the individual transactions + update the state + pay the miner
Check if the Merkle tree root of this state is equal to the final state root of the block header

Storing everything in a Merkle tree seems inefficient — but most of the tree isn’t actually going to change for most of the time, thus most of the tree are just pointers to the previous block. We do this with a Patricia tree, where we can efficiently insert and delete nodes. This allows for a 5–20x savings in space.

Note: Contracts are executed just like how transactions are executed. They’re just state transitions!

Applications:

There are 3 types of applications for Ethereum:

Financial Applications: Sub-currencies, financial derivatives, hedging contracts, saving wallets, wills, and employ contracts
Semi-Financial Applications: self-enforcing bounties
Not-Financial at all: DAOs and Online voting

Token Systems:

Token systems represent all sorts of things! Tokens represent assets, representing the smart property, unforgeable coupons, or with no conventional value.

Fundamentally, token systems are literally: Subtract X units from A and give X units to B — given that A had at least X units, and A confirmed it. Thus it’s super easy to implement:

We need to slap on a couple extra functions (like distributing the currency, edge cases, let other contracts query the balance etc) but it’s super simple!

One super cool feature is that we could literally pay miners in our coin! We create a contract that has storage of ether, which can refund fees for ether, and then take those fees and resell them in a constant running auction.

Financial Derivatives and Stable-Value Currencies:

Financial Derivatives are perfect for smart contracts and super simple to implement in code. But we might need information that’s not present in the blockchain (like the price of ETH wrt USD). Well, we could have a data feed contract (you send a message to it, someone updates the data, then you send the data back to the contract).

But there’s one problem with crypto-commerce → it’s unstable (merchants don’t like that). One way to get around that is to create a sub-currency and then promise 1 unit of that sub-currency is equal to 1 USD or whatever. But that’s not a great system because you can’t always trust them + real-world factors (banking infrastructure is too weak or hostile).

But we can use financial derivatives as an alternative. Speculators provide the funds to back up the asset, and it’s tied to a smart contract so they can’t back out! (though this will still depend on a centralized company to provide the price, but still a massive improvement on relying on an issuer)

Identity and Reputation Systems:

We can easily implement Namecoin in Ethereum:

We can have a more sophisticated version of it too (let other contracts query it, have a notion of “owner”, reputation and web-of-trust functionality)!

Decentralized File Storage:

Ethereum allows people to rent out their hard drives to others and allows people to store their files in a decentralized manner. We can build it in the following way:

Take all our data and split it up into blocks, which we encrypt
Build a Merkle Tree out of it
A contract will be created: Every N block we pick a random index in the Merkle tree.
We pay X Ether to the first entity that can give a payment verification-like proof of that index in the Merkle Tree
If we want to recover the file, we just use the micropayment channel

To increase security, we can just send the file in many places, and then set up a contract that will only pay the storage node if there’s cryptographic proof they’re still storing it.

Decentralized Autonomous Organizations:

Why don’t we implement a democracy-like system where members (weighted by either their shares or everyone having equal votes) vote on passing how the organization should act. This will be enforced with cryptography!

The DAO can decide to spend their funds, modify its code, put up bounties, modify salaries and other random stuff! We can implement a DAO by allowing the code to be self-modifying if 2/3 of the people agree to change it. We can modify smart contracts by making a smart contract call upon code stored in modifiable storage. There are 3 types of transaction types for DAOs:

[0,i,K,V] = Proposal with index i to change address storage K to V
[0, i] = Register a vote in favour for i
[2, i] = Finalize proposal i if there have been enough votes

This has exciting applications and features! We have to implement storage of all these protocols and who voted for them, adding + removing members, and even Liquid Democracy (You can assign another person to vote for you and can be transitive).

We could have a decentralized corporation where it’s shares which weigh the votes. Here we need a system where we need asset management, the ability to buy and see shares, to accept offers, and Liquid-Democracy (aka Board of Directors).

Further Applications:

Saving Wallets: You can save your money in a smart contract instead of your wallet. Cap the amount of money that you can take out of the wallet per day. Can even give this ability to another person (with you having the ability to shut this ability down), or give it to another wallet. This way it’s much safer in case you get hacked
Crop Insurance: We could have a smart contract that automatically pays you given some data from the outside world. This can be precipitation or a natural disaster
Decentralized Data Feed: You can crowdsource people to provide you answers and only pay those who are within the 25th and 75th percentile. If only that range gets paid, everyone would want to only feed you the info that everyone else will feed you, which converges to the truth. Thus this can be used to get information from the outside world
Smart Multisignature Escrow: It’s much more advanced than Bitcoin. Here you can quantify how many keys = how much they can spend, and be asynchronous (versus bitcoin where if you reach a threshold, they could spend all the funds)
Cloud Computing: You can pay people to do computations for you! This could be for projects like SETI, Protein folding or genetic algorithms. You have to build it such that people can’t cheat and you have to verify that they’re doing it all correctly
Peer-to-peer Gambling: This allows for near-zero fees + no ability to cheat in gambling!
Prediction Markets: Given an oracle, we could put together a prediction market
On-Chain Decentralized Marketplaces: Using Identity + reputation system as a base

Miscellanea And Concerns:

Modified GHOST Implementation:

There are 2 problems with blockchains that have fast confirmations times. First is a high stale rate — meaning you did all the work to get the right hash — but it was useless. Since it’s fast to find the block, the bottleneck is because of the propagation through the network. Thus lots of nodes end up wasting computation and not contributing to security.

Plus there’s a centralization problem — the bigger your mining pool, the more efficient you are with your computation. This gives an edge over those with less mining power. Thus when you combine those problems together, fast confirmation times allow for a mining pool to have a monopoly over the mining process.

To tackle the first problem we use GHOST, Greedy Heaviest Observed Subtree. We include the stale blocks into the calculation of which chain should be trusted the most. If a chain has more “uncles” then it will be weighted more.

To solve the second issue, we can reward stales (87.5% for the uncle block, and 12.5% for the nephew block). But we don’t reward transaction fees.

Ethereum implemented a simplified GHOST which goes down 7 levels. It looks like:

Blocks will specify a parent and 0/more uncles
Uncles must have the properties of: Direct child of the 2nd to 7th ancestor of the current block. Can’t be an ancestor of the current block. This uncle must be unique within the block, and within the blockchain
If you include an uncle, you get 3.125% of the added coinbase reward. The uncle gets 93.75%

We go up to 7 generations because if it’s unlimited there are too many complicated calculations to make sure the uncle is valid, plus you could start incentivizing the miner to mine the chain of an attacker.

Fees:

If you publish a transaction on the blockchain — the whole network has to download and verify it. Someone could use this to attack the network, thus we need to regulate it.

The way that Bitcoin does it is just let the transactions send an optional fee to pay to the miners (thus if you aren’t including a fee, you end up waiting a long time / never getting your transaction through). People like this because it’s like having a market of miners and transactions that find the perfect balance of a fee to pay.

But the problem with this model is that you’re not taking into account all the other nodes who also have to download and verify it. Almost all the computation is actually the other nodes, not the miner. Thus you can get a tragedy-of-the-commons situation here.

But there’s a way around it. Let’s say there are k operations you need to do, a reward R per operation, there are N miners, and C is the cost per operation. When the miners are looking at the blocks they can see that they have an expected payout of kR/N while costing them kC.

So they could literally just don’t bother with mining any operations where kC > kR/N → In other words, the block will only mine blocks where R > NC. That solves the problem!

But reality deviates:

The miner pays a higher cost than verifying nodes. Since verification takes time, which increases the chance of a block being stale
Non-mining full nodes exist
Mining power is not equal across all nodes
Attackers can still set up contracts whose cost is super low

1 and 2 cancel out (miner includes fewer transactions, but NC increases). But 3 and 4 are problems. But we can just have it such that we cap the number of operations that a block can do by:

Where these values are constantly changing with time.

Also, having bigger blocks means you increase the propagation time, meaning you increase the chance of it turning stale. This is a problem in Bitcoin, but not as much in Ethereum since there’s the GHOST protocol.

Computation and Turing-completeness:

The Ethereum Virtual Machine is Turing Complete! In order to do loops, there’s a JUMP function that jumps to a previous part of the code (and conditional jump → JUMPI). Also contracts and call other contracts, which would be an infinite loop.

We get around malicious infinite loops by having a max number of computational steps we can do. If we exceed that, we throw an error and everything is revered. So here are a few scenarios in which we can attack the network, and how we can prevent it:

Basic infinite loop: We just revert it as usual and take the computation fee
A suuuuuuuppper long infinite loop that would only complete after a few blocks have been created (and thus the block becomes invalid): We just look at the STARTGAS (aka a number of steps we’re taking). If it’s too long we just don’t bother mining it
One where we just have enough gas to make money out of an account but not send it: It’s reverted anyway so no worries
If we’re calling other contracts to aggregate it’s data, and an attacker takes over one of those contracts and converts it to an infinite loop to return an error: We have a gas limit on all the messages

But what’s the big deal about Turing complete languages? A Turing-incompleteness (which don't allow for for loops) isn’t actually that restricting: can’t call contracts that create other contracts (not to mention we can’t even tell if something is a contract or not). So Turing-incompleteness is just harder to implement with all its edge cases.

Thus it’s much easier to implement a Turing-complete Language!

Currency And Issuance:

It’s not just Ether though, there are sub-units (think of it like dollars and cents) of Ether:

1: Wei → Fees and protocol implementations
10¹²: Szabo → Fees and protocol implementations
10¹⁵: Finney → Microtransactions
10¹⁸: Ether → Normal transactions

The Issuance model is:

The Ethereum Foundation will sell a ton of Ether (1000–2000 per BTC) → Which is used to pay for the development + salaries + investments. If you buy it early you get a discount
0.099x = Pay back early contributers
0.099x = long term reserve
0.26x = given to miners per year forever

Now there are some design choices here:

There’s an endowment pool: If there wasn’t an endowment pool, then they’d be forced to produce more there to match the same inflation rate, which means they get more Bitcoin → But they’d rather hold on to more ETH
The supply grows linearly: This is to prevent the excessive hoarding that’s present in Bitcoin — everyone has an opportunity to get ETH in the present and future. But there’s still an incentive to hold ETH because the growth rate tends toward 0. But we still add more because we’re going to lose ETH as a function of time

ETH will probably switch to Proof-of-stake, which will decrease the issuance rate to 0 or 0.05X per year. Also if Ethereum loses funding or something like that, anyone can create a future candidate version of Etheeum, but they max out the number of ETH you can produce. But they must comply with the social contract.

Mining Centralization:

Bitcoin, surprise, has some problems with its mining algorithm which makes it converge to centralization. First of all, we have super-specialized hardware for specifically Bitcoin mining that’s thousands of times faster than normal hardware — increases the barrier to entry, thus increasing centralization.

Plus most bitcoin miners don’t even verify the blocks themselves they rely on a mining pool to give them all the block headers. The top 3 mining pools occupy more than 50% of the processing power, thus could technically attempt a 51% attack.

Ethereum works differently to avoid those problems. Ethereum needs to grab random data from a state, and the computer randomly selected transactions in the last N blocks and return a hash.

Ethereum avoids the first problem because Ethereum contracts have different codes in them, thus you can’t have super-specialized software for it. We avoid the second problem because miners need to store the entire blockchain on them, thus removing the need for mining pools.

“Thus, the solution that we are developing is ultimately an adaptive economic human solution rather than purely a technical one”

Scalability:

Scalability is a big problem for both Bitcoin and Ethereum. Bitcoin is currently 15GB, grows at 1MB per hour. But if it were processing things like VISA, it would be 1MB per 3 seconds, 1 GB per hour, and 8 TB per year.

Ethereum is likely to suffer from the same problem: There are so many applications on top of Ethereum (it’s not just a coin (unlike Bitcoin)), but we only need to store the state instead of the entire history so that helps it a bit.

What’s the big deal with big blocks? Centralization. If the blockchain size was like 100TB, only a few people can actually run full nodes. The rest will run SPV. This means that the big nodes could literally collude and cheat. SPV nodes wouldn’t be able to detect until it's too late. At that point, the only way to reverse would be for the light nodes to agree and do a 51% attack against the fraudulent blocks.

But Ethereum avoids that problem. First, all nodes must be full nodes. Secondly, in between each transaction, we provide an intermediate state root. That way, as long as 1 honest node exists, they can prove that the transaction is invalid and propagate that to the rest of the network.

An attacker could get around that by flooding it with a ton of incomplete blocks, but there’s a protocol where you need to prove the block is valid before processing it [I don’t completely understand this part]

Conclusion:

Ethereum is the upgraded version of cryptocurrency that allows for the existence of smart contracts and Dapps through a Turing-Complete programming language.

But what’s interesting is that Etheruem can do more than just money. We can do decentralized file storage, computation, prediction markets and so much more! It adds an economic layer between peer-to-peer protocols and helps fields that don’t even relate to money!

Ethereum is the arbitrary state transition function that can support applications that are both financial and non-financial for many years to come!

If you want to find out more: Read the paper here!

Thanks for reading! I’m Dickson, an 18-year-old Crypto enthusiast who’s excited to use it to impact billions of people 🌎

If you want to follow along on my journey, you can join my monthly newsletter, check out my website, and connect on LinkedIn or Twitter 😃