Who We Are
Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By aggregating computing resources across the globe, we offer an innovative GPU marketplace and AI inference service that promise affordability and accessibility for all. As pioneers at the intersection of AI and open-source technology, we believe in an open future where AI innovation is limited only by imagination, not by access to resources. We're looking for forward-thinking individuals who share our passion for making AI universally accessible, secure, and affordable. Join us in building a platform that empowers innovators everywhere to turn their visionary AI projects into reality.
As we prepare for growth after our Series A, our team — led by co-founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing.
About the Role
We are seeking a highly technical Vice President of Infrastructure to build and scale the foundational infrastructure powering our AI cloud platform.
This is a hands-on executive leadership role. While you will own infrastructure strategy, organizational growth, and executive-level decision making, we expect you to remain deeply engaged in architecture, design, and engineering execution. You should expect to spend approximately 30-40% of your time directly contributing to technical design, architecture reviews, debugging critical production issues, and partnering with engineers on implementation.
The ideal candidate has previously built and scaled cloud platforms, preferably GPU-native cloud infrastructure supporting AI training and inference workloads. You have experience operating at the intersection of executive leadership and hands-on engineering and are excited to help build both the technology and the team.
What You'll Own
Cloud Infrastructure Architecture
Lead the design and evolution of our AI cloud platform
Define the architecture for GPU orchestration, compute scheduling, networking, storage, and distributed systems
Make critical decisions regarding cloud infrastructure, bare-metal deployments, and platform scalability
Personally participate in architecture reviews and key technical initiatives
GPU Cloud Platform
Build and scale large GPU clusters supporting customer workloads
Design systems for GPU provisioning, scheduling, utilization optimization, and capacity management
Drive platform reliability and performance for AI training and inference workloads
Partner closely with engineering teams on infrastructure requirements for next-generation AI systems
Technical Leadership
Remain deeply involved in engineering decisions and technical direction
Contribute directly to infrastructure design and implementation efforts
Review architecture proposals, system designs, and major infrastructure changes
Act as the technical escalation point for complex infrastructure challenges
Infrastructure & Reliability
Establish best practices for Kubernetes, observability, CI/CD, security, and operational excellence
Build SRE and Platform Engineering functions from the ground up
Define reliability standards including SLOs, SLIs, incident response processes, and capacity planning
Drive automation across infrastructure operations
Organizational Leadership
Recruit and develop world-class Infrastructure, Platform, and SRE teams
Build a high-performance engineering culture focused on ownership and execution
Partner with executive leadership on company strategy and infrastructure investments
Manage infrastructure budgets, vendor relationships, and capacity planning
Required Experience
Must-Have Background
12+ years building and operating large-scale infrastructure systems
Experience leading infrastructure organizations while remaining hands-on technically
Previous experience building or operating a cloud platform at scale
Experience building GPU infrastructure or AI/ML compute platforms
Proven track record scaling infrastructure in high-growth startup environments
Deep Technical Expertise
Expert-level Kubernetes knowledge
Experience designing and operating multi-region cloud infrastructure
Strong understanding of Linux, networking, distributed systems, and storage architecture
Experience with Infrastructure-as-Code and automation frameworks
Deep expertise in observability, monitoring, and reliability engineering
Experience building highly available production systems
Strongly Preferred
Experience with GPU scheduling, Slurm, Kubernetes GPU operators, Ray, or distributed training systems
Experience managing thousands of GPUs in production environments
Background supporting AI training and inference platforms
Hyperbolic is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.



