Rage against the (EV)Machine part 1: contract size

November 7, 2019

Originally published on Aragon One Official Blog

Introduction

In this blog post series we are going to explore and share with you some issues that we experienced while building Aragon Network’s Court smart contracts.

Working with the Ethereum Virtual Machine has several limitations that can become frustrating. Sometimes it feels like using an old 20th century computer. Besides, it’s often counter-intuitive, e.g., it has a 256 bits architecture, which sounds quite futuristic compared to common 32 and 64 bits nowadays.

In this post, we are going to talk about the first of the issues we run into: contract size.

The problem(s)

The maximum size a contract can have is limited in two ways.

The first one, quite obvious, is the gas cost of the deployment transaction: more code in your contract means more bytecode to be included in the original tx, and thus a higher gas cost. Right now Ethereum mainnet limit is about ~10M gas. Make sure you don’t go beyond that limit, or whatever it is when you deploy if you are running your tests with a big limit (something like ganache-cli -l 50000000). You can check the current gas limit value here.

The second way, less intuitive perhaps, is the infamous EIP-170. This prevents deploying any contract that has a bytecode size greater than 2^13 + 2^14 = 24576. You can easily check the deployment size of a contract with the following command (assuming you are using Truffle and therefore your compiled contracts are under build/contracts), just copy-paste it in your terminal:

If you don’t have jq installed, you can try:

The solutions

We tried several approaches to mitigate the issue of contract size. Initially I disassembled the code and tried to manually inspect the resulting bytecode in order to try to detect patterns that could be unexpectedly and unnecessarily increasing the size. For instance trying to identify functions having a bytecode size too big compared to the amount of logic they contained, too often repeated patterns or any other thing which I could not think of previously and caught my eye. This was not only extremely laborious, but pointless, precisely because the contract we are dealing with is so big. However, it was a good exercise, and allowed me to learn and get some inspiration.

This blog post series by my team mate Ale was very helpful, and so was his pocketh tool.
After traversing this arduous path, and having learnt our lessons (the hard way), we arrived to a series of rules of thumb that we hereby present to you:

Split your contracts

This one could seem pretty obvious, and it’s something we considered from the very beginning. For instance, here we discussed if we should use our Staking app externally or embed it into the Court contract instead. We first took the embedding approach, and started building a monolith with all the needed components (like the aforementioned Staking or the Voting app) to save gas and because we needed simplified versions of them. This didn’t play well with contract sizes. Soon enough, we weren’t being able to deploy our contracts.

Sometimes splitting things apart is not that easy to implement. To start with, you have to take care of shared state and try to make sure that the resulting smaller contracts are independent from each other, as having to maintain state in two different contracts is error prone. This can be especially tricky with inheritance, as you may have a hidden shared state in the base contract. So you will need a careful analysis before moving forward.

Another thing to consider when splitting contracts, is that there is some gas overhead when contracts call each other. Roughly about ~2000 gas for calling outer contracts, as you can see here. As a side note, although not directly related to our subject in this post, when splitting your contracts, make sure your functions are external instead of public whenever possible, as it will save you some gas too by avoiding to copy parameters into memory and reading from calldata instead. Something nice about Solidity 5, is that it forces you to use memory and calldata for public and external functions respectively, so you are more aware of this difference.

In our case we ended up splitting Commit Reveal Voting, the Sum Tree, and Staking from the big monolith the Court initially was. (Later some other splits, refactors, and renaming took place, but they were not related to contract size issues)

Shorten error messages

It’s a good practice to add an error message whenever your code reverts. It’s especially helpful while testing, but here you don’t need a beautiful, detailed, and user friendly error message (we were using all uppercase with underscores to start with). You will use these errors either programmatically to check proper assertions in your tests, when the revert is expected, something like:

or when manually trying to fix a bug, when the revert is unexpected and Truffle shows you something like:

    Error: Returned error: VM Exception while processing transaction: revert CT_ROUND_NOT_DRAFT_TERM -- Reason given: CT_ROUND_NOT_DRAFT_TERM.

In the first case, it doesn’t really matter what the message says, as it’s the computer that reads those messages. In the second case it’s going to be you, but you just need a little hint if any, as you will probably end up inspecting the code anyway and see where the error was being thrown.
Eventually web3 may support a way to produce more meaningful, front-end friendly error messages for the user. But again, it should be the front end code which translates those short messages to human readable ones. Meanwhile you can even document those errors with better explanations, like Compound does.

So, you don’t need long and user friendly error messages. Here you can see an example of these reductions.

Let’s dig a bit deeper into this so you can decide for yourself. Initially we were working under the assumption that everything under 32 chars long was ok, and this is true without compiler optimizations. Have a look at these two simple contracts:

If you compile these contracts (we are using solidity 0.5.8) you’ll see that both have a length of 282 bytes. If we disassemble the code we’ll see that the relevant lines are:

    134 {0x7f} [c208, r176] PUSH32             0x4100000000000000000000000000000000000000000000000000000000000000

and

    134 {0x7f} [c208, r176] PUSH32     0x4141414100000000000000000000000000000000000000000000000000000000

respectively. So indeed 32 bytes are always used, no matter how short strings are.

But if we turn on optimizations (so far we have runs = 1) then we get instead:

    118 {0x60} [c164, r133] PUSH1 0x41

and

    121 {0x63} [c166, r135] PUSH4 0x41414141

If you try to compare both outputs, you’ll notice that those different bytes produce of course some disaligments in the bytecode, as the jump destination points will be shifted due to the increase on that string, and also other side-effects in the optimized compiler which I was not able to understand. For instance in this example size goes up from 204 to 209, so it increases by 5 instead of by 3, but as a good rule of thumb you can count approximately that you save one byte per character.

Make variables private to avoid auto-generated getters

Whenever you declare a variable public, Solidity automatically generates a getter for it (with the same name). Even if you can’t initially see it, it’s in the compiled bytecode and so it adds up to the contract size. The hint here is obvious; convert to internal any variable that you don’t want to access from the outside, or that you already have a function that does that (maybe combined with other variables, or because you want to access it in a different way).

Be aware of modifiers

I personally dislike modifiers a lot, as they are easy to overlook. Having something that can modify state, that can run at the beginning, the end, or both, of your function (you don’t really know unless you check) in the function header instead of the body sounds like a terrible idea to me. But modifiers have a great advantage too: they are inline code, a feature that I miss a lot in Solidity. Regarding the matter of contract size, this advantage can be a problem too, as inlining means repeating code and therefore increasing bytecode size. Compare these two contracts:

Bytecode size for the first one is 1449 while for the second one is 769. These numbers can vary depending on how you configure the compiler (see below), but it’s definitely something worth checking.

Group getters and other external function considerations

Adding external functions has a bit of overhead due to the function selector (this is quite small though) and the function wrapper. Sometimes you can group a few together and save space too.

There’s another subtlety we found in our Court contract. Look at this commit. Let’s focus on the important parts. We started with this code:

And then added this other function:

This change accounts for an increase of ~2600 bytes. How is this possible? Yes, we have the function selector, and the function wrapper, and all those parameters, and the modifier. But the selector is about 10 bytes, the wrapper is about 100… So what’s going on here if we are just calling an internal function that already exists? The answer is easy: as that internal function is called only from the constructor, initially it was present only in the creation code, but not in the runtime code. Now that it can be called after the contract has been deployed through this new external wrapper, the inner logic needs to be duplicated within the runtime code as well.

For better understanding of this section I recommend again Ale’s article mentioned above, and in particular this nice diagram.
As a general recommendation, always try to identify what code is only needed during deployment, and isolate it.

Check optimizations

Right now we have this configuration in our truffle-config.js file:

The important thing here is that number 1. This means that it’s optimized as if the inner code were to be run only once (and then forgotten forever), and therefore prioritizes deployment over the inherent functions, which means that with this number it will produce the smallest bytecode size possible but it won’t care so much about functions gas costs; while if you set it, let’s say to 1000, it will optimize gas consumption as if the contract code was going to be used a thousand times. You can find another explanation here.

This is not good for us, because we expect our functions to be called a gazillion of times, and we would like to have them as optimized as possible, but if we try to increase the number of runs we surpass the EIP-170 max size. We are still working on it.


Thanks to Ale and John for reviewing this post.

Header image by Brad Switzer.