In the last years the web3 topic became increasingly relevant and, as for every buzzword, a lot of companies and start-ups started developing solutions based on it.
Consequently there also was an increase on the number of attacks and vulnerabilities found in such projects, for example: Saurik’s write up on Optimism, the PolyNetwork hack, the Ronin Validator compromission, and many more.
In this post we will scratch the surface of the topic, limiting our focus on the Ethereum blockchain. We will take a look at the EVM bytecode, and learn how to reverse and emulate a smart contract with Qiling.
NOTE: If you already grasp the basic concepts of Ethereum and smart contracts feel free to skip this introduction part and jump to the juicy stuff below.
The Ethereum technology is basically a distributed state machine.
Ethereum’s global state is a large data structure which changes to a new state from block to block. The Ethereum Virtual Machine (EVM) is what defines the rules for computing a new valid state based on the global consensus.
A “smart contract” is a collection of code (its functions) and data (its state) that resides at a specific address on the Ethereum blockchain and it is executed on the EVM.
Each computer on the network (aka “node”) stores a copy of all the existing smart contracts and their current state alongside the blockchain and transaction data.
User accounts can interact with a smart contract by submitting a transaction which executes a function defined in its code.
When a smart contract receives funds from a user, its code is executed by all the nodes in the network in order to reach a consensus about the outcome to update the state.
Smart contracts are called contracts because they define rules, like a regular contract, and automatically enforce them via the code. Smart contracts cannot be deleted, and interactions with them are irreversible, however it’s possible to end (kill) a contract. When the contract a contract is killed, users can’t interact with it anymore but its code and state will still be visible on the blockchain.
Moreover, since contracts are stored in the blockchain, any interested party can inspect each contract’s bytecode and its current state.
Smart contracts can be programmed using relatively developer-friendly languages: Solidity, Vyper, Yul, FE, etc. The most relevant and used one is by far Solidity, so we will use it in this article.
In order to deploy and execute a smart contract on the EVM, it needs to be compiled into EVM bytecode. This process is done by the solc
compiler.
Here it is how a smart contract source code looks like.
|
|
The Greeter
contract is made of two public accessible functions: the constructor
and a greetings
function.
The constructor
is called when the contract is deployed. The greetings
function instead can be called by wallets or other smart contracts.
The contract can be compiled with solc
as follows:solc Greeter.sol -o . --bin --bin-runtime --abi --hashes
This will create 4 files:
Greeter.bin-runtime
: contains the smart contract code which is executed by the EVM, in a stack-based Virtual Machine (like the JavaVM or WebAssembly bytecode), meaning that instructions operands are taken from the stack and results are placed in the stack.Greeter.bin
: contains the bytecode used to deploy the smart contract code on the blockchain which is also executed by the EVM but just once during the contract creation transaction.Greeter.signature
: contains the signatures for the functions defined inside the smart contract. Such signatures are computed by taking the first 4 bytes of the keccak256 hash of the function name and the argument types (while argument names are ignored). As every hash function, it is a not-invertible process but there are lookup databases like 4byte.directory or Ethereum 4bytes list where it is possible to search known hashes and get the functions names, arguments, and types.Greeter.abi
: contains the Application Binary Interface (ABI) that specifies how to interact with a specific contract. This includes the method names, parameters, constants, data structures, event types (logs), and everything else needed to interact with the contract.It should be noted that 1 and 2 are always public, as they are stored on the blockchain, while 3 and 4 might be released publicly by the developer but this is not mandatory.
As an example this is how the Greeter
contract looks like once deployed on the Ethereum testnet blockchain Rinkeby in the 0x6409aed8d4994bd55400d6531d0607f7e90dac95c739f0684898cd5cbde2720b transaction
.
In the “Input Data” field there is the EVM bytecode used to deploy the smart contract, the one executing the constructor
function.
The same thing can be seen under the “State” tab. The argument used while deploying the Greeter
contract was: Hello
.
The actual contract bytecode is also public and can be retrieved at the contract address 0x08eda332751362cfeda082e5861879a0f7ad54c5.
As we have seen from the Blockchain Explorer, the smart contract bytecode is publicly available.
By comparing it with the bytecode that was compiled by solc
we can see that there are only small differences (as the deployed one contains the address and some additional metadata).
Using pyevmasm it is possible to disassemble the bytecode stored on the blockchain.
Let’s simply save the code to Greeter_bc.bin-runtime
and run the following command: evmasm -d -i Greeter_bc.bin-runtime -o Greeter.evm
.
This prints the first 10 lines of the EVM disassembled bytecode as a sequence of textual representation of EVM instructions.
You can see a complete reference of every EVM instruction / opcode at EtherVM.io.
SPOILER alert: Reading the disassembled bytecode is hard!
Understanding the program logic is not trivial and requires a deep understanding of how the EVM works. On the other hand, it is the source of truth we can refer to if more advanced techniques are failing.
Luckily, there are some decompilers, one of them being panoramix which we could use to decompile the bytecode to almost human-readable code. The resulting code is far from being the same of the original one but it is way more readable than the bytecode while retaining the same behaviour.
In the above screenshots 3 objects are defined:
storage
._fallback
function that will be executed when the contract is called with a function signature that is not defined in its code.unknownb19f4ce3
function that will be executed when the 0xb19f4ce3
signature is called.If you are thinking about what happened to the constructor
, nice catch!
The constructor
is missing ‘cause, as we have seen before, it is executed during the contract deploy thus it is stored in the input data field of the transaction that created the contract itself.
It can be verified by copying the input data from this transaction and decompiling it with panoramix.
The line stor0[] = Array(len=mem[224], data=mem[256 len mem[224]])
is taking the constructor’s parameters and storing them into the contract storage.
Finally, the last return
is returning the contract bytecode that will be stored on the blockchain with the contract creation transaction.
If you are a regular customer of this blog, you should be familiar with my obsession with the Qiling framework and how I love to turn reversing into emulation.
We can use Qiling to emulate the EVM bytecode and debug it instruction-by-instruction.
First of all let’s make sure to have the EVM Engine installed.
Let’s create a Python script in the same folder as the Greeter
smart contract bytecode (Greeter_bc.bin
).
We will start with importing the required classes and functions.
|
|
In order to emulate the smart contract we need to read its bytecode and initiate the Qiling engine.
Notice: we are opening the file containing the input data for the contract creation transaction, this way we will emulate the constructor as well.
|
|
Let’s create an Ethereum account for us and one for the contract we are going to emulate.
Then we can create a transaction/message from our user to the “void” address to “deploy” the contract.
|
|
Now the contract is ready to be used.
We can create a message to call the greetings(string)
function that has 0xb19f4ce3
as signature.
Finally, we retrieve the output data, decode it, and print it.
|
|
Time to run the script!
thezero@web3:~/greeter$ python3 Greeter.py
User address: 891b98a7aea5ca40ea8f4ce9b012304eb0ebf5e7
Contract address: e265bf7a8b8afc5249a6cf80e997ae40ba50fef7
Calling 0xb19f4ce3 with "World" parameter
Hello World
🎉🎉🎉
The complete code is available at Greeter.py.
OpenZeppelin’s Ethernaut is a Web3/Solidity based wargame played in the Ethereum Virtual Machine. Each level is a smart contract that needs to be hacked.
In this chapter we will analyze the 8th level named Vault. This challenge resemble a crackme so it is perfect to showcase how emulation through Qiling could help.
An instance of the Vault smart contract is live here (but it is suggested to deploy your personal one).
The following is the Solidity source code for it:
|
|
The locked
and password
variables defined inside the contract body make up the contract’s storage.
The constructor accepts a bytes32
variable called _password
and assigns it to the password
variable in the storage. Moreover, it sets the locked
variable to true
.
Having the source code makes the challenge really easy as we know we could retrieve the constructor
arguments from the contract creation transaction on the blockchain, where the _password
value is stored.
Unfortunately, in a real-world scenario we would not have the source code, let’s therefore see how we could solve this challenge with a black-box approach in three different ways:
As we did with the Greeter
contract, we download the bytecode from the blockchin and disassemble it with evmasm.
Skipping at address 0x93
of the disassembled code we can notice the following instructions:
0000001a: CALLDATALOAD
0000001b: PUSH1 0xe0
0000001d: SHR
0000001e: DUP1
0000001f: PUSH4 0xcf309012
00000024: EQ
00000025: PUSH1 0x37
00000027: JUMPI
00000028: DUP1
00000029: PUSH4 0xec9b5b3a
0000002e: EQ
0000002f: PUSH1 0x57
00000031: JUMPI
The contract is performing the following actions:
0xe0
on top of the VM stack.0xcf309012
on top of the VM stack.0x37
on top of the VM stack.true
jumps to the address in the top cell of the stack and pops both from the stack.0xec9b5b3a
on top of the VM stack.0x57
on top of the VM stack.true
jumps to the address in the top cell of the stack and pops both from the stack.In this case the challenge contract’s source code is public and someone added the signature for both the functions the code is trying to jump to (0xcf309012
and 0xec9b5b3a
) in the 4byte.directory database. They are locked()
and unlock(bytes32)
.
Having the function names and the knownledge to understand the bytecode we can see that the contract is checking if the first 4 bytes of the input data sent to the contract match one of the functions defined in the contract itself.
Going further we reach this piece of bytecode:
00000096: PUSH1 0x1
00000098: SLOAD
00000099: EQ
0000009a: ISZERO
0000009b: PUSH1 0xb8
0000009d: JUMPI
Which:
0x1
on top of the VM stack.0xb8
on top of the VM stack.true
jumps to the address in the top cell of the stack and pops both from the stack.Right before this bytecode snippet the calldata
opcode has been called, meaning that the function argument is in the top of the stack and that it is checked against the element at index 0x1
of the contract storage.
Basically, this bytecode snippet corresponds to the following Solidity source code line: if (password == _password)
.
Knowing this we could search the create transaction of the contract on the blockchain, extract the input and get the password.
Reading the opcodes is funny but not an easy task and more importantly is time-consuming.
Let’s try to decompile the contract with panoramix
.
|
|
This is a lot better.
The unlock(bytes32)
function is clearly visible and we can easily understand what is going on:
calldata
, excluded the function signature (4 bytes), is longer than 32 bytes.locked
variable in the storage to 0
(the boolean false
)Again, knowing this we could search the create transaction of the contract on the blockchain, extract the input and get the password.
By using Qiling we can hook specific opcodes or addresses to read and modify execution data of the EVM. At time of writing this feature is only available in the dev branch.
We can start with the usual stuff, importing the useful modules, reading the bytecode, initiating Qiling and creating some Ethereum accounts.
NOTE: we are opening the file containing the input data for the contract creation transaction, this way we will emulate the constructor as well.
|
|
Now, we define some utils function that will help us later:
stackdump
to read elements from the EVM stackstoragedump
to read elements from the contract storage
|
|
Then, we can define some useful hooks on interesting instructions that will be executed every time the EVM emulator hits them.
Below three hooks are defined on the following instructions:
SLOAD
: printing the value loaded from storage and its index.SSTORE
: printing the value stored in the storage and its index.EQ
: printing the two comparison values.
Finally, the hooks are applied to the Qiling engine.
|
|
Everything is ready to create the contract.
|
|
At this point if we execute the script we can already see the contract parameters that are being stored into the storage with the SSTORE
opcode.
thezero@web3:~/vault$ python3 Vault.py
0x4f SLOAD 0 0
0x5d SSTORE 0 1
0x64 SSTORE 1 412076657279207374726f6e67207365637265742070617373776f7264203a29
Having the element at the index 1
of the storage we could hex-decode it and get the password: binascii.unhexlify('412076657279207374726f6e67207365637265742070617373776f7264203a29') == 'A very strong secret password :)'
But, as it’s show-time, let’s use some other Qiling APIs and show the full process!
With the following snippet we could call the locked()
function to check the status of the contract.
|
|
We already know the correct value to unlock the contract but if we execute the unlock(bytes32)
function we can also leak such value thanks to the EQ
hook.
|
|
🏃♂️-time:
thezero@web3:~/vault$ python3 Vault.py
...SNIP...
0x24 EQ ec9b5b3a == cf309012
0x2e EQ ec9b5b3a == ec9b5b3a
0x98 SLOAD 1 412076657279207374726f6e67207365637265742070617373776f7264203a29
0x99 EQ 0 == 412076657279207374726f6e67207365637265742070617373776f7264203a29
Again, the complete code is available at Vault.py.
In this chapter we will skip to the 11th Ethernaut level named Privacy.
This challenge is the natural sequel to Vault and resemble a crackme as well.
|
|
While at a first glance the challenge might look 1:1 to the previous one, we could see in the State change tab of the contract that it has multiple elements in the storage already.
Before jumping into Qiling, let’s see how elements are stored inside the EVM storage, from the Solidity documentation:
State variables of contracts are stored in storage in a compact way such that multiple values sometimes use the same storage slot. Except for dynamically-sized arrays and mappings, data is stored contiguously item after item starting with the first state variable, which is stored in slot 0. For each variable, a size in bytes is determined according to its type. Multiple, contiguous items that need less than 32 bytes are packed into a single storage slot if possible.
In the Privacy contract the following state variables are declared:
bool public locked
: 1 byteuint256 public ID
: 32 bytesuint8 private flattening
: 1 byteuint8 private denomination
: 1 byteuint16 private awkwardness
: 2 bytesbytes32[3] private data
: 96 bytes (32 bytes times 3)The state variables are stored in the same order they are declared in the code so the storage will look like this:
| Slot # | Element(s) |
| ------ | ---------------------------------------------------------------- |
| 0 | locked (bool) |
| 1 | ID (uint256) |
| 2 | flattening (uint8) | denomination (uint8) | awkwardness (uint16) |
| 3 | data[0] (bytes32) |
| 4 | data[1] (bytes32) |
| 5 | data[2] (bytes32) |
Now we can fetch the storage information from the transaction on the Blockchain and reproduce its state with Qiling with the ql.arch.evm.vm.state.set_storage
API.
NOTE: this time we are opening the file containing the contract bytecode.
|
|
And as expected the precious EQ
hook leaks the expected value once again:
0x26 EQ e1afb08c == b3cea217
0x31 EQ e1afb08c == cf309012
0x3c EQ e1afb08c == e1afb08c
0xef SLOAD 5 cfe1b96a9775651e865d59795a642546a7781f18d98b756e12d2795a9e84920
0x117 EQ cfe1b96a9775651e865d59795a6425400000000000000000000000000000000 == 0
Replacing the call_data
variable with the following value will solve the challenge.call_data = "0xe1afb08c" + ql.arch.evm.abi.convert(['bytes16'], [unhex("0cfe1b96a9775651e865d59795a64254")])
In this post we analyzed the inner workings of smart contracts and how the execution on the EVM is performed. We also explained different approaches you could use and mix to reverse and emulate a smart contract.
The smart contracts we brought as examples were chosen to showcase the power of Qiling in an easy and comprehensible way, but emulation is not always the way to go (i.e. if you are searching for a public
function which should be internal
-only you might just want to decompile the contract and read it). On the other hand, emulating a smart contract is very useful for crack me-like situations, to fuzz contracts for {over|under}flows (maybe this could be a nice idea for another post), to help your reversing journey by emulating some specific functions (if you are into mobile security this is similar to what you would do with frida), etc.
I trust everyone who loves reversing and low-level stuff as much as I do, will enjoy playing with the EVM and all its technological environment.
Securing blockchain-based products requires lot of knowledge of how the specific blockchain works, how to write a secure smart contract, how to secure the interactions between web2 and web3 components, etc.
Do you need some help designing or reviewing your blockchain-based product? Let’s get in touch: https://www.shielder.com/contacts/