Senior Software Engineer

NeuReality

Senior SW Engineer – AI Infrastructure & Optimization We are looking for a Senior Software Engineer to help build and optimize large-scale, high-performance GenAI infrastructure and inference systems on Kubernetes. As AI workloads increasingly move toward Kubernetes-native infrastructure, we are building systems that support distributed inference, performance optimization, reliability, observability, and production-grade deployment at scale. This role is ideal for an engineer who can reason deeply about systems, performance, tradeoffs, and reliability, and who is comfortable owning difficult technical decisions end-to-end. You will work across inference serving, distributed systems, optimization, and Kubernetes-native AI infrastructure. What You’ll Do Build and optimize high-performance Kubernetes-native GenAI inference systems Work with modern inference stacks such as vLLM, SGLang, TensorRT-LLM, and related tooling Work with Kubernetes-native distributed LLM inference frameworks such as llm-d and NVIDIA Dynamo Design and implement optimization algorithms and performance improvements Improve reliability, observability, deployment, and operational maturity of AI systems Make architectural decisions and take ownership of technical outcomes Collaborate with a small, senior engineering team focused on performance and production quality Requirements : Minimum 5 years of experience as a Software Engineer, with strong software engineering and system design skil l s.Programming experience in Go and Pyth onHands-on experience with the Kubernetes ecosystem, including Operators, service meshes, GitOps, Gateway API, and OpenTelemet ryExperience with cloud platfor msStrong understanding of optimization algorithms and performance engineeri ngAbility to independently drive technical initiatives from concept to producti onStrong systems thinking and debugging skil lsComfort operating in environments with high autonomy and responsibili tyNice to Ha veExperience with modern LLM inference frameworks such as vLLM, SGLang, or TensorRT-L LMExperience with distributed LLM inference frameworks such as llm-d or NVIDIA Dyna moContributions to open-source Kubernetes or ML infrastructure projec tsGPU performance optimization and profiling experien ceFamiliarity with CUDA, NCCL, or Triton kerne lsExperience running GenAI systems at scale in producti on

Apply »