From "Torch not compiled with CUDA enabled" to a certified GPU factory line (ComfyUI on aarch64)
How we took an aarch64 box from CPU-only PyTorch errors to a repeatable, GPU-certified ComfyUI render pipeline—with hard evidence you can copy/paste into your own ops checklist.
From “Torch not compiled with CUDA enabled” to a certified GPU factory line (ComfyUI on aarch64)
I don’t care about “it should work.” I care about a pipeline that’s provably on GPU, repeatable, and boring.
This post is the path we used to take an aarch64 machine from the classic:
Torch not compiled with CUDA enabled
…to a certified GPU factory line: ComfyUI renders that we can treat like an operations primitive (and plug into OpenClaw / LIG).
The hardware/software reality check
If you’re on arm64/aarch64, you don’t get to hand-wave CUDA. You verify.
Here’s the evidence we captured from the machine running the renders:
| NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 |
"pytorch_version": "2.9.1+cu129"
"name": "cuda:0 NVIDIA GB10 : cudaMallocAsync"
That’s the bar:
nvidia-smisees the GPU and a sane driver/CUDA stack- ComfyUI reports a CUDA-backed PyTorch build (
+cu129here) - ComfyUI reports an actual CUDA device (
cuda:0 …)
If any one of those is missing, you don’t have a GPU pipeline—you have vibes.
What “certified” means in practice
We treat a render backend like a factory line. It’s only “certified” if we can:
- Run a known workflow end-to-end
- Collect machine-readable proof that it ran on GPU
- Produce consistent artifacts (images + manifests)
- Repeat it without a human babysitting the box
ComfyUI is a good backend for this because it exposes enough introspection (/system_stats) to make the certification objective.
The root cause of the CUDA error (aarch64 edition)
On aarch64, “CUDA is installed” doesn’t mean your Python stack is CUDA-enabled.
The failure mode looks like this:
- NVIDIA driver is present
nvidia-smiworks- But your PyTorch wheel is CPU-only (or mismatched)
- ComfyUI loads Torch, then falls back to CPU or throws
So the fix isn’t magical. It’s operational:
- Install a CUDA-enabled PyTorch build that matches your platform
- Confirm Torch can see the GPU
- Confirm ComfyUI is actually using that Torch build
Evidence images (what we actually rendered)
These are the artifacts we produced as part of the certification run. We keep both “background/mock” layers and final GPU outputs because it makes debugging template composition obvious.
OG format (1200×630)


Square format (1080×1080)


Portrait format (1080×1350)

Background asset (OG)

The ops checklist (what to verify, in order)
This is the sequence that prevented us from wasting time:
- Driver / kernel sanity
nvidia-smiworks without needing your Python environment
- PyTorch build sanity
- you’re on a CUDA-enabled build for your platform (don’t guess; check the version string)
- ComfyUI device selection sanity
- ComfyUI reports a CUDA device in
/system_stats
- ComfyUI reports a CUDA device in
- Workload sanity
- run a real workflow (not just a smoke test) and ensure it completes
- Evidence capture
- save
nvidia-smiheader + ComfyUI/system_statsexcerpt alongside artifacts
- save
Where this plugs into OpenClaw
Once ComfyUI-on-GPU is certified, we can treat it like a reliable backend:
- OpenClaw queues a render job
- ComfyUI executes on
cuda:0 - We store manifests + images
- We can re-run the same workflow deterministically as part of CI-like ops gates
That’s the “factory line” mindset: inputs → GPU render → audited outputs, every time.