Manage and scale GPU clusters with a focus on reliability and performance, utilizing various tools and interconnects.
We manage thousands of GPUs today and need to grow this with reliability, security and performance in mind.
You’ll be working on ops for multi-provider GPU clusters.
When applying please speak to:
- GPU type and count you’ve managed
- Providers you’ve worked with. Eg Hyperscalers, neoclouds, on prem.
- Interconnect you’ve managed.
- What tooling you used eg. for provision, scheduling, storage, monitoring, cost management etc.
- What tooling you developed.
Our culture
- 🚀 We move fast. We ship weekly—new features, improvements, and fixes go live fast. Our infra runs cluster scale up tests daily.👥 We test big. Every month, we stress test with large groups of users face to face, get real-world feedback, and iterate rapidly.
- 💻 We build together. Weekend hackathons push boundaries, drive innovation, and help us level up as a team.
- 🔄 We iterate relentlessly. Direct user feedback shapes our roadmap—we release, test, refine, and keep moving.
- ✈️ We travel when needed. Engineers may travel between SF and Sydney to run events, attend conferences, and meet with clients.
Top Skills
Cost Management Tools
Gpus
Monitoring Tools
Multi-Provider Gpu Clusters
Provision Tooling
Scheduling Tools
Storage Tools
Strong Compute Sydney, New South Wales, AUS Office
499-501 Kent St, Sydney, New South Wales, Australia, 2000
Similar Jobs
Consumer Web • eCommerce • Marketing Tech • Retail • Software • Analytics • Generative AI
The Senior Solutions Architect will drive technical sales campaigns, develop customer solutions, and support product alignment, requiring extensive technical knowledge and communication skills.
Top Skills:
JavaScriptNode.jsPythonReactRest Apis
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Senior Database Engineer will develop automation tools, improve operations, and solve performance issues. They will support large-scale applications and utilize AI in processes.
Top Skills:
AWSAzureCentosGCPJavaScriptMariadbMySQLPostgresPythonRedhatUnix
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Lead a team of Site Reliability Engineers to ensure the reliability of enterprise applications for federal clients, driving automation and continuous improvement.
Top Skills:
AIAutomation ToolsAzureCloud OperationsDatabasesItilLinux
What you need to know about the Sydney Tech Scene
From opera to comedy shows, the Sydney Opera House hosts more than 1,600 performances a year, yet its entertainment sector isn't the only one taking center stage. The city's tech sector has earned a reputation as one of the fastest-growing in the region. More specifically, its IT sector stands out as the country's third-largest, growing at twice the rate of overall employment in the past decade as businesses continue to digitize their operations to stay competitive.