Now that Chia Network has released the technical write-up about their “CAT 2 Upgrade” process, both detailing out the specifics of the security problem that affected CAT1 as well as highlighting the two previous audits they had undergone we can see what I got right and what I got wrong in my, admittedly hot-headed, original opinion on the process they took.
First, I made a suggestion on how I would resolved the issue. That suggestion was based on the guess that the exploit was around the offer files interacting with the CATs because that seemed the mostly likely. It was also incorrect. The real flaw was a counterfeiting bug in the CAT1 code itself which allowed someone with a single mojo of a CAT to create an infinite supply of it, regardless of the genesis rules of the token. This is much, much worse than I had considered. But because the flaw is about counterfeiting there was no way for Chia to begin their process of closing offers before the cutoff block as I had recommended. So what they did, in order, was this:
- Identify issue
- Provide fix for issue in secret (CAT2)
- Provide similar fix for NFT1 in public (luckily no one noticed)
- Prepare tooling to counterfeit CATs, acquire bits of every CAT they wanted to forge
- Prepare the public exchanges and ecosystem tooling for the change
- Announce CAT1 cutoff block with 24 hours notice
- After cutoff block, use tooling to begin forging CATs and completing offers with those forged coins, while returning XCH to original holders.
- Release Chia Blockchain 1.5.0 with CAT2 support and no CAT1 support
- Finish closing offers for CATs they were able to acquire
This naturally created a window between the cutoff block and the offers getting force closed where anyone with a CAT (USDS say) after the cutoff could find an offer, regardless of how poorly priced, and safely “spend” their CAT1 USDS at any rate knowing they would simply get sent the USDS CAT2 based on their cutoff block balance. Because people had open offers for higher than the price (I think they are called buy-limit orders?) unless they were on the ball with the latest news they were at risk.
There are reports of people who were exploited this way. I have no idea if any of them are true, possibly not. And based on the nature of the exploit I do not think it was possible to have solved this problem for people without a fork to the blockchain itself invalidating CAT1 spends entirely. So what would I have done differently? I probably would not have used the exploit to forge tokens on my blockchain, regardless of the potential benefit. I probably would not have accepted offers with forged coins, regardless of where the XCH ended up. I would have released the NFT1 changes and the CAT2 changes simultaneously, since it appears they were related. And I definitely wouldn’t have written a blog post claiming that my unpermissioned actions were somehow “white hat”. I don’t believe in heavy-handed, paternalistic actions done for people “for their best interest” because I don’t like to decide what someone else’s best interests are. But I also haven’t found myself in the position Chia Network Inc did with this issue, so I am willing to concede I might have also been convinced this was the best move for the corporation. Just because its “grey hat” ethical security doesn’t mean it was the wrong move, I just don’t like the top down way it was decided and implemented.
But the real suggestion I had, and have always had for Chia Network, is even more true now than when I wrote the original piece. They have claimed in their posts and online that their code was fully audited before they pushed it into production and that this was caught in a secondary audit as part of their continuous security practice. Based on the audits they published in their blog post that statement is disingenuous as best or an outright lie at worst.
Both audits were conducted in the early stages of 2021, prior to transactions even going live on the Chia blockchain. The first one, conducted by NCC Group’s Crypto Services division, looked specifically at the consensus system and blockchain itself. Nothing in this audit was related to the CAT1 implementation whatsoever. The second audit, conducted by Least Authority, was finalized in April 2021 and looked at the then-current implementation of Chia’s Coloured Coins. Coloured Coins were the predecessor to CATs and this audit is at least somewhat relevant. But, again, it was finalized 7 months prior to the release of the CAT1 standard, well before that standard was complete and before significant architectural changes were made. Also, it made a number of recommendations that would have helped reduce the likelihood of this kind of attack, as well as one that would have helped mitigate it.
Let’s spend some time on the Least Authority Coloured Coins audit and see what we can learn from that. The focus of this audit was specifically to ensure the non-forgeability of the Coloured Coins implementation. They found some fairly serious flaws that were unresolved at the end of the audit, not with the implementation of the CC standard (which never made it to production anyway), but with Chia Networks development practices in general. They found that Chia was not writing proper unit tests (Suggestion 1) or property tests for edge case inputs (Suggestion 2) and they were relying on integration tests instead. This is, from my understanding, a common but flawed approach to developing code and I hope that they have begun making that change as the audit suggests they might.
The audit also recommends that they simplify functions to avoid unintended consequences (Suggestion 5). This also seems like a relevant suggestion that Chia Network did not fully take to heart. I think I might have found the commit to the Coloured Coin implementation that introduced the complex hash generation for identifying CATs, but I could be wrong. If I am not, though, that change was both complex and took place in October 2021 – well after the audit was done and it was not re-audited prior to going into production just a month later. Either way this commit is a very big change that took place immediately before rollout.
And this is my main issue here. Unless there is another audit conducted between when the CAT standard was finalized and November 2021 when it went live then they did indeed release an unaudited token standard into production and told people it was secure. This is my main point throughout as to how this was absolutely an avoidable problem and that all it took was time and patience to catch these flaws before they become critical. An infinite inflation bug on a fixed limit token standard is a very big deal. And based on the fix if there were 100s of millions of dollars of Stably USDS out in offers, instead of a few thousand their fix would have been totally unworkable and MANY people would have been exploited after the cutoff block. This only worked because of how unused Chia is in absolute terms. There is some truth to the argument its good they fixed the problem now before it caused a catastrophe, but they only fixed this one specific issue. They have not proposed any fixes to the process that led to this issue in the first place, which is more “move fast and break things” than “bank-level financial services development”.
This is not a problem unique to Chia Network in Silicon Valley, but based on their target markets it has the potential to bite them harder than most. The only reason this issue wasn’t exploited in the wild before Chia fixed it is because nobody really cares about the Chia blockchain right now. There isn’t an army of security researchers pouring over every commit looking for insecure hash generation. But had they “made it” in 2022 or gotten very popular between November 2021 and June 2022 there is a good chance this issue may have destroyed confidence in the system. That’s why banks don’t use agile development for core protocols, and instead use a waterfall methodology that requires significant audits, testing and review prior to deployment.
I know that Chia is a web-style startup, and uses best practices generally set by FAANG or MANGA or whatever, but those companies are producing consumer-facing web infrastructure. Chia is producing software it expects governments and multinational NGOs to use, not Aunt Pattie to post recipes on or for Chad and Cathy to put on background noise while they “chill”. This kind of software requires a wildly different development philosophy and doesn’t allow for “release and catch” bug detection styles. Hell, the FIPS 140 process for certifying software for use on US/Canadian government systems requires code to be frozen for up to a year for review prior to certification, more if they have issues that need to be fixed. This is why Azure US Government regions are so far behind the general tenants, because they require extensive audit and review of every commit prior to deployment.
I have been harping on Chia about this stuff since the very beginning. Test more. Audit more. Make simpler software. Get your SOC2 and develop a secure organization (Suggestion 6 in the Least Authority audit). These are all suggestions made by their independent auditors as well. They are not “nice to haves” for an organization looking to do business at the government level. Chia Network MUST get into the habit of scheduling audits on complete or nearly complete code prior to releasing new features. This is at odds with what the community wants, and is a prime example of where the requirements for a successful cryptocurrency and the requirements for a successful international banking protocol conflict.
I have not yet watched the AMA from Friday prior to writing this article. I am hopeful they address some of these points there. But I was not wrong in my derision. They cannot release standards and core parts of their software stack without audit and rigorous testing. As many people have said, it is impossible to prevent all bugs and no matter how much effort is spent they will never prevent every bug. But they absolutely need to try, at least. I stand by calling this a stupid, avoidable mistake, because its clear they had an audit done of their implementation, then completely changed their implementation and released it without performing another audit. If this is all a joke, or a lark then fine, that’s enough. But if this is going to be the backbone of international markets then they need to create a real secure code pipeline where independent review of every change happens prior to their partners building on it production.