Skip to content

Conversation

@Nana-EC
Copy link
Contributor

@Nana-EC Nana-EC commented Nov 17, 2025

Reviewer Notes

For production purposes block nodes are recommended to be installed in a single node k8s style.
It's important to make this process clear and easy to reference for node operators of all types.
With this operators can customize and add on their own business needs

  • Add a step by steps doc for operators to follow, calling out prerequisites and recommendations
  • Add a Taskfile to help automate many steps
  • Add provisioner script(s) for reference

Related Issue(s)

Fixes #1701
Fixes #527

@Nana-EC Nana-EC added this to the 0.24.0 milestone Nov 17, 2025
@Nana-EC Nana-EC self-assigned this Nov 17, 2025
@Nana-EC Nana-EC added Block Node Issues/PR related to the Block Node. Documentation Issues/PR related to documentation Helm - Deploy labels Nov 17, 2025
@Nana-EC Nana-EC changed the title doc: 1701 document single node k8s deploment docs: 1701 document single node k8s deploment Nov 17, 2025
@@ -0,0 +1,515 @@
#!/usr/bin/env bash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exploring whether to break this script up into multiple scripts as inspired by Keifer. This would make it easier to follow and also allows for lift and shift where a section may not be applicable to a node operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave as is.
Another doc for Solo Weaver will be provided. We will also add a summary section or script recipe to highlight the different details

@@ -0,0 +1,159 @@
# Single Node Kubernetes Deployment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a reference here from the charts README after the current PR is merged.

@Nana-EC Nana-EC changed the title docs: 1701 document single node k8s deploment docs: 1701 document single node k8s deployment Nov 19, 2025
@Nana-EC Nana-EC marked this pull request as ready for review November 19, 2025 18:39
@Nana-EC Nana-EC requested review from a team as code owners November 19, 2025 18:39
@Nana-EC Nana-EC requested a review from jasperpotts November 19, 2025 18:39
Copy link
Contributor

@AlfredoG87 AlfredoG87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I did a first pass on this. I might need to do another pass and did not test it out yet. let me know if you need me to test the process out on the 2nd pass, I might need a new server but I can arrange that locally on a VM.

- sudo tar -xzf grpcurl.tar.gz -C /usr/local/bin grpcurl
- rm grpcurl.tar.gz

scale-bn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we call this job scale-bn is more of a restart-bn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what I really wanted was a rolling restart, will update

- kubectl -n ${NAMESPACE} exec ${POD} -- sh -c 'rm -rf /opt/hiero/block-node/data/live/* /opt/hiero/block-node/data/historic/*'
- kubectl -n ${NAMESPACE} delete pod $POD

reset-upgrade:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we talk about reset as a storage clean, but for some reset might mean "restart".
should we be more explicit? I believe we can make comments in this task file. maybe a comment is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All resets should clear store to make BN be new. Only consideration is if you upgrade along with the reset.
I added the restart prior but comments would be good agreed

Comment on lines 46 to 48
- helm upgrade $RELEASE oci://ghcr.io/hiero-ledger/hiero-block-node/block-node-server --version $VERSION -n $NAMESPACE --install --values values-override/bare-metal-values.yaml
- sleep 90
- kubectl get all -n $NAMESPACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reuse the helm-upgrade job here instead of re-doing the same cmds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update


reset-upgrade:
cmds:
- kubectl -n ${NAMESPACE} exec ${POD} -- sh -c 'rm -rf /opt/hiero/block-node/data/live/* /opt/hiero/block-node/data/historic/*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we re-use the reset-file-store job here?

Comment on lines 8 to 24
The single requirement is a server with a supported operating system and sufficient resources to run Kubernetes and the
Block Node Server.

Suggested minimum specifications:

1. **Local Full History (LFH)**: All block history is stored locally on the server.
- CPU: 24 cores, 48 threads (2024 or newer CPU) (PCIe 4+)
- RAM: 256 GB
- Disk:
- 8 TB NVMe SSD
- 500 TB
- 2 x 10 Gbps Network Interface Cards (NICs)
2. **Remote Full History (RFH)**: Block history is stored remotely.
- CPU: 24 cores, 48 threads (2024 or newer CPU) (PCIe 4+)
- RAM: 256 GB
- Disk: 8 TB NVMe SSD
- 2 x 10 Gbps Network Interface Cards (NICs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specs are for mainnet I assume, should we explicitly say so?

This is just a suggestion for the future.:
Maybe we can have in another place, not sure, a table of small, medium, large and x-large deployments and and example of TPS that each deployment is meant to support.

- 2 x 10 Gbps Network Interface Cards (NICs)

Recommendations:
- In both configurations a Linux-based operating system is recommended, such as Ubuntu 22.04 LTS or Debian 11 LTS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should start provisioning in Ubuntu 24.04 LTS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we tested with 22 but we can verify 24 also

Once a server has been acquired, it needs to be provisioned with the necessary software components to run Kubernetes
and the Block Node Server.

Assuming a Linux based environment, [./scripts/linux-bare-metal-provisioner.sh](scripts/linux-bare-metal-provisioner.sh) serves as a provisioner script to automate the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we validated this script in both Ubuntu 22.04 LTS and Debian 11 LTS ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i've only listed the verified ones

Comment on lines 101 to 103
curl -L https://github.com/fullstorydev/grpcurl/releases/download/v1.8.7/grpcurl_1.8.7_linux_x86_64.tar.gz -o grpcurl.tar.gz
sudo tar -xzf grpcurl.tar.gz -C /usr/local/bin grpcurl
rm grpcurl.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the setup-grpcurl job/task on the Taskfile.yml instead of repeating the same snippet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good note, I wrote this before creating the Taskfile

Comment on lines 109 to 124
VERSION="latest stable version here"
BASE_URL="https://github.com/hiero-ledger/hiero-block-node/releases/download/v${VERSION}"
ARCHIVE="block-node-protobuf-${VERSION}.tgz"
TGZ="block-node-protobuf-${VERSION}.tgz"
DEST_DIR="${HOME}/block-node-protobuf-${VERSION}"

curl -L "${BASE_URL}/${ARCHIVE}" -o "${ARCHIVE}"

# Ensure unzip is available
command -v unzip >/dev/null 2>&1 || sudo apt-get install -y unzip

mkdir -p "${DEST_DIR}"
tar -xzf "${ARCHIVE}" -C "${DEST_DIR}"
# This is needed as tar doesnt support to overwrite the existing tar
tar -xzf "${DEST_DIR}"/block-node-protobuf-${VERSION}.tgz -C "${DEST_DIR}"
rm "${ARCHIVE}" "${DEST_DIR}/${TGZ}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the above comment, but I see that we might want to improve the setup-bn-proto to install unzip if missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we don't need the unzip option anymore?

Signed-off-by: Nana Essilfie-Conduah <[email protected]>
Signed-off-by: Nana Essilfie-Conduah <[email protected]>
Signed-off-by: Nana Essilfie-Conduah <[email protected]>
@Nana-EC Nana-EC force-pushed the 1701-document-single-node-k8s-deploment branch from 8c6e619 to 52ff765 Compare December 2, 2025 06:35
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@             Coverage Diff              @@
##               main    #1877      +/-   ##
============================================
+ Coverage     79.01%   79.02%   +0.01%     
+ Complexity     1200     1196       -4     
============================================
  Files           130      130              
  Lines          5770     5770              
  Branches        615      615              
============================================
+ Hits           4559     4560       +1     
+ Misses          930      927       -3     
- Partials        281      283       +2     

see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


# Disable Swap (necessary for kubeadm)
# WARNING: This failed in docker container
sudo sed -i.bak 's/^\(.*\sswap\s.*\)$/#\1\n/' /etc/fstab
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line prevents the next line from working correctly, and is strictly unnecessary if the goal is just to disable swap for this script.
If we need to disable swap for the host system permanently, then we may need a more sophisticated fix than a simple sed regex (at minimum the sed command should be prefixed with a check whether there exists a swap device in fstab).

Comment on lines +95 to +97
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are trying to disable swap, swappiness should be 0.
If we want "bare minimum swap, only when absolutely needed" then it should be set to 1.
This value is a "please don't swap, but if we're running out of disk cache, then go ahead.

The next two lines push for a bit more disk cache than normal, but aren't that aggressive.

Given we tried to completely remove swap devices on line 36; we might not have the exact correct values here.

Comment on lines +73 to +93
net.core.rmem_default = 31457280
net.core.wmem_default = 31457280
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.core.optmem_max = 25165824
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_rmem = 8192 65536 33554432
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.udp_mem = 65536 131072 262144
net.ipv4.udp_rmem_min = 16384
net.ipv4.tcp_wmem = 8192 65536 33554432
net.ipv4.udp_wmem_min = 16384
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to do some performance tuning on these...

net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 15
fs.file-max = 2097152
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause issues with block nodes.
We process a lot of files, and this sets a hard limit of 2 million file descriptors.
The kernel default is absurdly large (typically max_ulong), so setting it smaller is more likely to create problems for block node than to improve anything.

@AlfredoG87 AlfredoG87 removed this from the 0.24.0 milestone Dec 5, 2025
@AlfredoG87 AlfredoG87 added this to the 0.25.0 milestone Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Block Node Issues/PR related to the Block Node. Documentation Issues/PR related to documentation Helm - Deploy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document Simple Single k8s deployment flow Design deployment script for single-node k8s deployment on non-k8s-provisioned server

4 participants