Lose Money Fast!

In late 2020 I started my own Ethereum 2.0 validator, which has been quietly running until 20th July, when I returned home to find it had been slashed and exited. For those unfamiliar with the terms, if a validator “misbehaves” then it is slashed, meaning part of its stake is taken as a penalty. In the case of Ethereum 2.0, this can result in the validator being forced to exit (can no longer act as a validator) depending on the misbehaviour.

You can see the technical details of the misbehaviour at https://mainnet.beaconcha.in/block/4289503#attester-slashings, but the short version is that two attestations for the same slot were signed by the same validator key. I know what you’re thinking “Well just don’t do that, Ross”, so let’s dig into how it happened.

I’ll preface this by saying that I can’t prove my theory of what happened, but I think I have a good theory.

What I do know:

  • For the Ethereum merge, validators need to run an execution layer (EL) client as well as a consensus layer (CL) client. I had previously been using Infura as an API provider in place of an EL client, but I needed to set up my own.
  • In mid-June, I ordered a dedicated server and installed EL & CL clients on it. Based on my existing disk usage for my consensus layer client and the expected disk usage for Geth I ordered a server with 1TB of SSD.
  • To migrate the validator, I shut down my previous installed CL client, started up the new CL client, and it ran successfully. I then deleted my previous CL server, which had been in cloud hosting.
  • In late June, I discovered that I had underestimated the disk requirements, and would exceed 1TB within a few months if not earlier. I ordered a bigger dedicated server, copied the EL & CL clients over, stopped the clients on the first server, started them on the new server, and everything was good.
  • I shut down the first server at this point. I did not delete the validator key from the first dedicated server, however, and I should have. In doing so I left a server which could start and cause operational issues.
  • The first server was returned to my hosting provider on 18th July 2022.
  • My validator was slashed on 20th July 2022.

I worked with the Lighthouse team to diagnose what had happened, and with their guidance identified the following elements in my validator’s logs:

Jul 19 10:43:51.022 ERRO Failure verifying attestation for gossip, attestation_slot: 4283617, committee_index: 33, request_index: 0, error: PriorAttestationKnown { validator_index: 13209, epoch: Epoch(133863) }
Jul 20 05:19:27.032 ERRO Failure verifying attestation for gossip, attestation_slot: 4289195, committee_index: 28, request_index: 0, error: PriorAttestationKnown { validator_index: 13209, epoch: Epoch(134037) }

What this log indicates is that my validator was seeing another validator colliding with it back on the 19th, and again on the 20th about an hour before the slashing occurred. Realistically this means a second server is running somewhere, with the same validator key. I can’t know for certain that the old server was booted into its original operating system (either by accident, or it wasn’t wiped fully and someone managed to recover the files), but it seems significantly more likely than someone hacked into my server, downloaded the keys without leaving any indications in the logs, then started up a whole separate server for no clear reason.

From an operational standpoint, there are two key learnings here:

  • For safety, the validator keys should be wiped from the original server as soon as a new server is confirmed as operational.
  • Lighthouse has a feature called “doppelganger protection” which would likely have helped here. It appears the original server was not running all of the time, so the new server would have started successfully and not seen the doppelganger, however if I had enabled doppelganger protection on the old server it would have likely auto-shut down every time it started.

For me personally, this is the end of my experience in running a validator; it will take another few weeks before I can hypothetically reclaim my funds, during which time I will continue to lose money. In reality I cannot reclaim my funds until the merge happens. I do not have another 32 Eth (spare or otherwise) to start a new validator, and even if I did, the entire experience has made me realise that running a solo validator does not make economic sense.

To provide perspective on that, a validator produces a return of around 4%/year. Taking the Ethereum price of $1,674.30 (as of 4th August 2022), and a maximum stake per validator of 32Eth (any amount above this does not produce additional returns), a validator can be expected to return $2,145 per year before costs.

That “before” costs is important - a server capable of running an EL & CL client costs around $80/month at a minimum (potentially much more if you have proper backups, operate a hot spare, and/or use a premium hosting provider). That gives us a reward of about $1,185/year, which is great if you ignore risk and cost of my time.

My time is limited, and running a validator requires patching and monitoring the validator. More critically, it is incredibly simple to make a mistake which, as I’ve demonstrated, can cost around $1,000. Personally I’m fortunate in that my validator had been running since genesis and so I have profit over time to absorb this, but a newly launched single validator is now lower reward than when I started, and more complex to run.

There are those that would argue this is a flaw in proof of stake, and I’d point out that it’s impractical to mine solo any more, everyone joins a mining pool. Running an Ethereum validator has been a useful educational experience, but one I’ll leave to the shared services from here on.