Hyphen Connect Limited
LLM Pre-training & Distributed Engineer (AI Infrastructure)
Be an Early Applicant
Lead orchestration and optimization of large-scale LLM pretraining across 1,000+ GPUs. Manage distributed training with PyTorch/DeepSpeed/Megatron-LM, tune networking and memory (InfiniBand/RDMA), and implement checkpointing and robust failure recovery for long-running jobs.
We are seeking a highly skilled LLM Pre-training & Distributed Systems Engineer. This role is essential for orchestrating large-scale machine learning training runs and optimizing distributed infrastructure. The ideal candidate will have a deep understanding of GPU clusters and extensive experience in system engineering to ensure efficient and reliable training processes.
Responsibilities:
- Orchestrate distributed training runs across 1,000+ GPUs using PyTorch, DeepSpeed, or Megatron-LM.
- Optimize networking (InfiniBand/RDMA) and memory management to prevent out-of-memory errors.
- Automate checkpointing and failure recovery during month-long training runs.
Required Skills:
- Deep expertise in 3D parallelism (Data, Tensor, Pipeline).
- Experience managing SLURM or Kubernetes-based GPU clusters.
- Strong systems engineering background (C++, CUDA, Python).
Similar Jobs
Greentech • Hardware • Internet of Things • Machine Learning • Software • Business Intelligence • Agriculture
Drive sales growth and customer success across a designated territory in the beef industry. Prospect, close deals, manage onboarding, maintain accounts, gather field feedback, and collaborate with Product and Support to improve Halter's virtual fencing solutions.
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Generate and qualify leads for the ANZ SME & Growth sales pipeline. Conduct outreach (email, calling), qualify prospects, arrange meetings for AEs, maintain CRM data, support reporting, and collaborate with marketing and cross-functional teams to improve targeting and handoffs.
Top Skills:
CRM
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Lead the design team, ensuring alignment with business objectives and fostering innovation. Oversee design initiatives, mentor designers, and advocate for user needs in product development.
Top Skills:
Information ArchitectureInteraction DesignUser TestingUx Methodologies
What you need to know about the Sydney Tech Scene
From opera to comedy shows, the Sydney Opera House hosts more than 1,600 performances a year, yet its entertainment sector isn't the only one taking center stage. The city's tech sector has earned a reputation as one of the fastest-growing in the region. More specifically, its IT sector stands out as the country's third-largest, growing at twice the rate of overall employment in the past decade as businesses continue to digitize their operations to stay competitive.

.png)
