Last week there were reports from a number of pools about a DDOS attack on them, as well as reports of a memory leak coming from all corners. It appears the core of a lot of these issues was Chia Network bringing online a number of Bluebox Timelords to handle VDF compression using an update to the system. This caused very high network activity and processing as the new Bluebox Timelords sent out compressed blocks.
The following is a message posted to reddit by J. Eckert, Chia Network VP of Ecosystem Operations, that details what happened from there end. It is possible that there was still a network-level attack against the pools, but this seems more likely to me at this point.
UPDATE: We gotten to the root of the problem. There has been an existing bug for an unknown (at least to me) amount of time in how nodes process some of the information coming in from Bluebox Timelords… however the number and scale of Blueboxes in the wild wasn’t large enough to create the number of calls to trigger this to where it is noticeable.
Over the last week however, Chia has been testing some new updates to the Bluebox Timelords code to make them run more efficiently, and as a result of that we deployed a large number of them in the wild to spike things up a bit… and this exacerbated the bug in nodes. For the time being we’ve shutdown our increased cluster as a temporary mitigation while we look into the bug itself. This should remove most of the pressure from nodes experiencing this.
(For those out of the loop, the Timelord runs as fast as it can to generate the next block, but the tradeoff is it does so inefficiently. So Bluebox Timelords are servers that dig through the chain and look for uncompressed, inefficiently generated blocks in the past and compact them down more efficiently and then gossip these changes out to all the nodes to shrink the overall chain DB size.)
J. Eckert, VP Ecosystem Operations
There are actually a few major implications to this. It appears that this is another unpublished attack in the Chia Network Consensus Document. It seems like that all you need to do to bring the Chia Network we are farming to its knees is bring on a large number of Bluebox Timelords and let the network DDOS itself passing those VDFs around.
In a forum post in 2018 Vitalik Buterin of Ethereum spoke about the VDF process and how it is vulnerable to ASIC speedups and blockchain manipulation, but I think that possibility combined with this reality where a large number of BlueBoxes will cause network instability really poses a risk to the network itself. This seems like something that should be easy to code, rate limiting for Bluebox compression. Lets hope that we see that come before someone with malicious intent does this instead.
Update: Thanks to r/Chia mode WillPhule Flexpool has confirmed their outage was caused, at least partially, by mobile phone IPs in Russia – making it a likely botnet
I was skeptical, but Sargonas seemed to think that you could be correct. I followed up and n the case of flexpool it seems it was an actual attack.
https://www.reddit.com/r/Flexpool/comments/pfx617/downtime_on_us_chia_regions/hbpb8og/?utm_source=reddit&utm_medium=web2x&context=3
I think its too bad, im a big fan of the bluebox process. I hope they sort this out and get back to it.
Yet another reason to look forward to interoperable implementations of the Chia protocol. I share your concern about the Python runtime being a single point of failure for the Chia network, both in terms of availability and integrity of the Full Nodes exposed to the public Internet.