I've been working on an experimental operating system that runs AI inference directly at the kernel level, eliminating the traditional OS overhead between hardware and AI workloads.
The idea: why run an LLM on top of Linux on top of hardware when the AI is the application? EmbodIOS treats inference as a first-class kernel primitive.
Early results:
~10x faster cold boot to inference-ready state ~8x memory reduction vs. Linux + PyTorch stack Direct hardware access for accelerators without userspace context switches
The OS is minimal by design – no scheduler optimized for multi-user timesharing, no filesystem abstractions you don't need, no legacy compatibility layers. Just what's required to feed tensors to silicon. Open sourced at: https://github.com/dddimcha/embodiOS Interested in feedback from anyone working on AI infrastructure, embedded ML, or alternative OS designs. What workloads would benefit most from this approach? What's missing?