Menu
The Chia Plot
  • Blog
  • How-To
  • About
  • Contact
  • Security
  • Discord
The Chia Plot
python

Is the Chia Full Node hot garbage?

Posted on December 21, 2021December 21, 2021 by Chris Dupres

Over the past few months, as the Chia blockchain has gotten a lot busier, many farmers have been having a ton of problems with their Chia full node. From database sync problems caused by power outages, to losing sync because of high transaction volume a lot of Chia farmers and services are having significant issues with the Full Node. One of the key complaints being made is that it is written in python, and so it sucks. This clearly isn’t true, and farmer problems don’t tell the whole story. The full node definitely needs some work, but it isn’t all bad either.

One of the main issues with farming Chia right now is reliability. Even with a really good setup, you cannot be sure you will not be affected by transaction issues. During the Chia Dust Storm a couple months ago I had no issues with my setup while people all over the world were unable to maintain sync. Most of the issues seemed to come from low power farmers not able to keep up with the network. After that problem Chia Network promised, and delivered, an update that would prioritize new blocks and hopefully fix the sync issues.

It did not work. Not entirely.

Twice in the past week my farm has just lost sync with the network for many hours while the GUI showed it was in sync with the network. A restart of all the services allowed it to reconnect to the network and resync the 2000 blocks or so I was behind. Judging by the 10% drop in netspace I am clearly not the only one. Not only that, but services like XCHScan and ChiaExplorer have also been having sync issues recently.

My farming setup is not slow. I am running a 6 core Rocket Lake i5 running with power limits disabled and 32GB of 3200mhz DDR4. It can haul ass when it needs to. And I don’t think a faster system would have helped here because I suspect the problem is that a significant amount of the network lagged and I got caught in a pocket that just never caught up. Nothing in my system can change the fact that all my peers were behind. I have since radically reduced my number of peers and gotten ruthless when disconnecting peers that look a little behind and it seems to have helped. For now.

Also, if I, or anyone in that pocket, had been running a fast timelord in that pocket we may have forked the chain. This is no joke and, if the enterprise partners that Chia is courting end up running their own nodes and timelord for reliability it can’t cause them to fork the chain and run a lower weight copy for 2 hours because some kid decides to flood the mempool with transactions. Reliability of the blockchain means reliability for everyone, always. Not just in general.

Chia full node mempool over 1w from GraphChia.com
Chia full node mempool over 1w from GraphChia.com

But why does the full node lag when transaction volumes spike? This is infuriating because from what I can tell its not spiking even close to the theoretical limits of the blockchain. Yes, 12 000 transactions in the mempool all at once is a lot, but if a database system can’t handle 12k concurrent transactions once in awhile on modern hardware without falling apart then it isn’t working right, full stop. The first time this happens to a business they will be enraged. The 3rd time in a few months? They will walk away.

And there is at least one previously (and still) successful business that relies on the Chia full node to operate: Flexpool. Alex from Flexpool has been consistent for many months about how the Chia Full Node is a huge risk to operations and that the python base just simply was not performant enough for prime time. Flexpool has been open about working on a Golang based Chia node for themselves so they mitigate this risk. And it is getting pretty hard to argue with him about this after the last few “hiccups”.

I also have my concerns with 100s of thousands of end-users running Python web servers from a security perspective, as it is really easy for an (admittedly rare) Python Remote Code Execution (RCE) vulnerability to become a full on remote shell when running as a privileged user (like Chia does on Windows). This is, of course, true of any language or environment as Log4j has so painfully reminded us and I would have concerns with any end-user web server. But this is not about security, and that’s when I need to consult experts. When I spoke to Alex about this he mentioned Python as an inherent issue, but he also had specific problems he has identified with how the node operates. This is going to get technical because, well, its Alex. And that’s how he is.

The worst thing ever about the Chia node is that it is using Python. Each programming language has its own purpose, and Python is definitely not designed for this. Another problem here is that the Chia node is required to evaluate the spend bundle in order to include it to the mempool; other that that, Chia node has no mempool limits as of now whatsoever. Besides that, during the dust attack, all nodes start to propagate the same spend bundles to the entire network, thus effectively DDoS’ing it.

Looking at other blockchain node implementations, Ethereum’s for example, has a fixed mempool size limit, and when the mempool size is exceeding the limits, the node is automatically shifting low fee transactions in favor of the ones who pay more.

But of course, the most significant flaw is one of the first I’ve specified before – is that the node is required to evaluate the bundle in order to include it to the mempool. Neither Bitcoin nor Ethereum are not required to do that.

Alex, Lead Engineer Flexpool

I also asked him what he thought was going on with the disconnects and why the whole network was having issues and he told me that it comes down to slower nodes falling behind and then spamming the network with GetBlock and attempts to re-establish sync. He says a lot of why the node behaves this way is because the fee is defined dynamically, so its not like there isn’t a reason for this stuff. Its just the effect on the network that is problematic. I asked if their Go implementation of the Chia node would solve this issues and the answer was an emphatic “Yes!”

But there is more to this story. When I spoke to Gene Hoffman, COO and President of Chia Network, about the issue he had a different take on it. Apparently while clearing out the mempool was handling around 50% more transactions per hour than the Ethereum network does, with 80 000 per hour to Ethereum’s 50 000. If that’s the case, then its not so bad. I will be looking into this, but will need some time to look at data.

As for Alex’s comments about the problems with how Chia has built things, Gene obviously disagreed. As to the spendbundle evaluation, his response was simply “that’s how blockchains work”. He said he had no idea how Ethereum handles them “But BTC evaluates every spend before mempool inclusion.” I am trying to figure out who is right about Bitcoin and who is not here, but this actually isn’t that well documented at this low a level, at least not anywhere I know about. If anyone independent knows, or has documentation to that effect I would love to see it. I do trust that Alex knows the Ethereum node very well, and I think that’s actually a better comparison to Chia than Bitcoin is because of the block speed. Bitcoin only forms 1 block every 10 minute, with Ethereum and Chia being orders of magnitude faster.

As to the mempool limits, well they do seem to be set in Chia. It is currently set to 10 blocks in the code and anyone building their own node from source (or just comfortable modifying a python script) sould be able to adjust it easily. It is available here on lines 41/42 as MEMPOOL_BLOCK_BUFFER. But 10 blocks is a lot, and that limit is clearly not helping the farmers who are getting behind during these transaction spikes. So there is a definable limit, but it is set so high that it isn’t helping small farmers stay online, which in turn is knocking a bunch more of us off too. I do not know what the implications would be radically reducing that for small farmers, so don’t say I told you to.

The main point here is that from Chia’s perspective the network is operating just fine, overall. If some nodes go offline its not an issue at all because there are lots of nodes to secure the network. They are looking at the network a bit like a kubernetes cluster where no one node makes any difference. However, each one of those nodes is a person who has invested in the network and learned how things work. I don’t think they are ignoring the issue per se, but it does look like they are treating the issues less seriously than the community because the overall network has been fine. And it has. During the first dust storm, pre-update, the high transaction volumes caused block issuance to slow down. Over the past few days this time, issuance has actually been a little above average.

All in all I do not think the full node is trash. It is doing its job at scale and I think a smaller network of performant multicore 100w CPUs would have no issue all staying in sync together. However, that’s not how Chia Network advertises their network nor is it their design goal. There has been a lot of back and forth discussion in my discord if they are going to end up either refactoring the network code to C or if they are going to abandon the Raspberry Pi as their minimum specification. This is a bigger question than I think it might seem, as a huge amount of their corporate identity involves having lots of low powered nodes.

I think eventually they will do both, increasing the minimum requirements as well as refactoring to a compiled language like C. Or maybe it will be Rust, that’s a popular choice for high performance network code. Either way, there is a lot of work to do before Chia can handle the load the Ethereum network handles all day every day and I think they have a long way to go before doing it with a fleet of Raspberry Pis, regardless of what language used.

Related

9 thoughts on “Is the Chia Full Node hot garbage?”

  1. Rabbit of Caerbannog says:
    December 21, 2021 at 10:36 am

    I’m worried about what will happen if the Dust Stormers leave the script on for like a week or a month. How would you feel if your node was down for 1-2-4 weeks with no rewards and no way to sync up – cause all you could connect to was Zombie Nodes clawing you down? It’s not a happy perspective I think..

    Reply
  2. Anonymous says:
    December 21, 2021 at 1:33 pm

    I disagree that the problem is the networking code in python. If you actually trace through the code and measure the parts that take up most CPU time, you will quickly discover that those parts are already written in C or ASM.

    However, there is still some potential when it comes to multithreading. Chia 1.2.11 already made some improvements on that front (separating block validation from spendbundle validation and moving them to separate threads). But there probably still is some room for improvement. As it stands right now, if you’re running on a decent system, single core speed is one of the bottlenecks. If you look at your core utilization during the dust storm, you will find that one core is usually maxed – that is the block validation process, which is the main factor for slow node progression. Better multithreading is what needs the focus now.

    As Kent Beck once said: “Make it work, make it right, make it fast.” – The team did a great job on the first two parts. Now that those are in place, we’ll hopefully see some performance improvements.

    Reply
    1. jorge.mendesdejesus says:
      December 21, 2021 at 2:01 pm

      Amen !!!!!

      Reply
    2. Anonymous says:
      December 22, 2021 at 3:09 am

      Depending how you look at networking code. The low-level code may be well optimized, but it is not what you have, but rather how you use it. Looking at what happened over the past few days, I would divide those affected nodes into two categories: 1. those that clearly fell behind and couldn’t recover, and 2. those that were struggling. Sure, those struggling nodes at some point got tipped, and became the left behind nodes, but for the sake of this argument, let’s keep it like that.

      Looking at those that fell behind, what needs to be considered is how chia does syncing. Basically, that is a serialized process, where data is downloaded from just one node at a time, and slowly rotating through nodes with higher height. As such, when a node is behind (say 20 or 50 blocks), it should drop all activity but syncing. Also, it should drop peer count to something like 5 or 10, as anything more than that is just a waste (processing one block doesn’t take much data to download, but takes long time, as such rotation doesn’t need to be fast, so no need for a lot of peers). This would immediately reduce burden both on those nodes, as well as other nodes that those nodes were connecting to.

      The other group (struggling one) had plenty peers with 0 height. Those peers with 0 height were not exactly peers that were behind or struggling, but rather an indication that the struggling node couldn’t fully go through the handshake with all those peers, as such couldn’t read height of those peers, not to mention read/write much (potentially what was read was dropped due to synchronization problems). Also, those other peers that had good heights potentially had the same problem – those heights were basically stale counter. Therefore, nodes that see connections, where Up/Down MB rate stalls, or doesn’t go through the handshake should drop those connections after some timeout (say 1 minute). In addition to that, the new connection should not be made after another timeout (say 5x of the previous one, or better yet adaptive timeout that would grow if more nodes would need to be dropped – standard practice). This would give those nodes some primitive back-off mechanism, and of course more CPU to keep up fully synced.

      So, both things are rather simple to implement, and have really nothing to do how good (fast) or bad is the network code, but obviously point to problems with how it is being used.

      Also, the exact same problem was with logging on my box. Logs were pushed out less than one millisecond apart. Assuming that at some point the OS level cache was exhausted (handling two db with over 40 GB, plus writing 100 bytes chunks to the same media every millisecond basically brings any process down / any media down. The offending process (maxing out one core) was start_full_node. The minute I lowered log levels to ERROR, my node got a breather. Again, it is not how code looks like, but how badly it is being used. I have never seen in my life a production code that is pumping disk-bound logs as much as the CPU allows it to do. It is just senseless coding.

      And I fully agree with you on the multithreading issue. My one core was maxed by one start_full_node process, where about 10 other start_full_node processes where idling. The fact that those logs were not forked out to a low priority thread is just moronic.

      Although, I think that we disagree on that Kent Beck quote, as far as “make it right” 🙂

      Reply
  3. jorge.mendesdejesus says:
    December 21, 2021 at 1:59 pm

    Yes, Python is slow as it is human friendly and it is intended to be more on the prototype level BUT let’s not forget that it is an extremely flexible language and when you know your prototype you can then implement the typical Python speedup tricks like JIT / PyPy / C modules etc etc. Disclamer: I haven’t checked the chia-blockchain Python code in detail……

    Reply
  4. Richard says:
    December 21, 2021 at 2:55 pm

    Maybe the Chia company feels comfortable knowing that many nodes offline mean there are still a lot more online, so the network is secure and the blockchain progressing as normal. But they can’t be happy with the delay in transactions getting through. There have been reports of transactions taking many hours or even longer to be confirmed on the blockchain. Applications such as the carbon rights deal with the Worldbank might not be bothered but it will be a problem for higher volume/low latency like applications. At least I think so.

    Reply
  5. Anonymous says:
    December 21, 2021 at 3:20 pm

    I think the problem is that Chia team promises too much.

    E.g. I have a full Daedalus Cardano wallet and it crashes my older iMac with 16GB ram as it eats up all the resources, whereas I can easily run a Chia full node plus several other forks on the same machine with no issues, nonetheless I use an even more powerful machine, a Dell T7920 which I use for plotting too. With this setup everything works smoothly and had no problems whatsoever during the dust storms.

    So to be honest I don’t think Chia full node is too bad although refactoring it in Go or c will certainly improve it. So probably the main issue is Chia team that promises a full node can run on a Raspberry pi!

    Reply
  6. Anonymous says:
    December 22, 2021 at 2:50 am

    I know it would be frustrating to be a farmer knocked offline during a dust storm, especially if it was a long duration. Seems a bit excessive to change alot just to keep a small subset of farmers online all the time when they have the option of upgrading their hardware. I feel like that is a market dynamic of the proof of work system. I didnt get knocked offline with my crappy i7 but if i did i would consider upgrading. I dont disagree that it would be good to keep everyone online all the time if there was a reasonable way to achieve it however that would be achieved. That side of things definetly isnt my skillset. I do agree with you that it presents a possible future issue as the usage on the network grows and there is an attack.

    Im just happy the network didnt go down.

    Im kinda seeing the issue like living in the middle of a forest miles away from the nearest town and expecting your power to never go out in storms. If power reliability is an issue buy a generator. If you cant afford a generator then you’ll just have to wait until all trees get picked up off the power lines. Its an inconvenience but theres no real harm done.

    Reply
  7. John says:
    January 7, 2022 at 1:12 pm

    I’ve read some people can weather the dust storms on a Raspberry Pi 4 with SSD. But the Full Node is hot garbage on a Raspberry Pi 4 using an SD card. After fighting with it for months, I’ve decided to upgrade to a i7 with SSD and it is a much better experience. If I’d started with the Raspberry Pi with SSD it might have been better, but at this point I just want something that works and exceeds the minimum specs.

    Reply

Leave a Reply Cancel reply

Advertisement

Recent Posts

  • Crypto is burning down – Chia seems fine
  • Chia CAT upgrade fiasco part 2 – Was I wrong?
  • WTF just happened?? CAT1 to CAT2 “upgrade”
  • The era of the Chia NFT is upon us
  • Chia Blockchain 1.4.0 released – NFTs and DIDs oh my
  • Discussion
  • Facts About Farmers
  • How-To
  • Information
  • News
  • pools
  • Security
  • Trademark
  • Trading
  • Uncategorized

Dark Mode Switch

©2021 The Chia Plot - Donate XCH / MRMT / SBX @ xch1p4440d6zwu9ryta2vx073lq2ge3s29d37kskz6t34jp085e8srjqnk0gcr
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONIDsessionUsed by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
CookieDurationDescription
na_id1 year 1 monthThis cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
na_rn1 monthThis cookie is used to recognize the visitor upon re-entry. This cookie allows to collect information on user behaviour and allows sharing function provided by Addthis.com
na_sc_e1 monthThis cookie is used to recognize the visitor upon re-entry. This cookie allows to collect information on user behaviour and allows sharing function provided by Addthis.com
na_sr1 monthThis cookie is set by Addthis.com. This cookie is used for sharing of links on social media platforms.
na_srp1 minuteThis cookie is used to recognize the visitor upon re-entry. This cookie allows to collect information on user behaviour and allows sharing function provided by Addthis.com
na_tc1 year 1 monthThis cookie is set by the provider Addthis. This cookie is used for social media sharing tracking service.
ouid1 year 1 monthThe cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
CookieDurationDescription
d3 monthsThis cookie tracks anonymous information on how visitors use the website.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
__gads1 year 24 daysThis cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_199099757_11 minuteThis cookie is set by Google and is used to distinguish users.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
CONSENT16 years 4 months 5 daysThese cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
advanced_ads_browser_width1 monthThis cookie is set by Advanced ads plugin.This cookie is used to measure and store the user browser width for adverts.
anj3 monthsNo description available.
CMID1 yearThe cookie is set by CasaleMedia. The cookie is used to collect information about the usage behavior for targeted advertising.
CMPRO3 monthsThis cookie is set by Casalemedia and is used for targeted advertisement purposes.
CMPS3 monthsThis cookie is set by Casalemedia and is used for targeted advertisement purposes.
CMRUM31 yearThis cookie is set by Casalemedia and is used for targeted advertisement purposes.
CMST1 dayThe cookie is set by CasaleMedia. The cookie is used to collect information about the usage behavior for targeted advertising.
DSID1 hourThis cookie is setup by doubleclick.net. This cookie is used by Google to make advertising more engaging to users and are stored under doubleclick.net. It contains an encrypted unique ID.
i1 yearThe purpose of the cookie is not known yet.
IDE1 year 24 daysUsed by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
KADUSERCOOKIE3 monthsThe cookie is set by pubmatic.com for identifying the visitors' website or device from which they visit PubMatic's partners' website.
KTPCACOOKIE1 dayThis cookie is set by pubmatic.com for the purpose of checking if third-party cookies are enabled on the user's website.
mc1 year 1 monthThis cookie is associated with Quantserve to track anonymously how a user interact with the website.
test_cookie15 minutesThis cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
uid1 year 1 monthThis cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
uuid3 monthsTo optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
uuid23 monthsThis cookies is set by AppNexus. The cookies stores information that helps in distinguishing between devices and browsers. This information us used to select advertisements served by the platform and assess the performance of the advertisement and attribute payment for those advertisements.
VISITOR_INFO1_LIVE5 months 27 daysThis cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSCsessionThis cookies is set by Youtube and is used to track the views of embedded videos.
yt-remote-connected-devicesneverThese cookies are set via embedded youtube-videos.
yt-remote-device-idneverThese cookies are set via embedded youtube-videos.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
CookieDurationDescription
__gpi1 year 24 daysNo description
adImpCountpastNo description
C3UID5 yearsNo description available.
C3UID-9245 yearsNo description
fc5 months 27 daysNo description available.
pfpastNo description
pxs5 months 27 daysNo description available.
SAVE & ACCEPT
Powered by CookieYes Logo