Lead the design and architecture of cloud-based AI infrastructure, managing MLOps platforms, and driving automation. Oversee cross-functional teams to enhance developer experience and ensure compliance with industry standards.
ROLE SUMMARY
The Sr. Manager/Staff Engineer, AI Infrastructure & MLOps Engineering is a senior technical leader responsible for architecting, building, and scaling Pfizer's AI infrastructure and developer platforms. This role leverages extensive experience in cloud engineering, DevOps, and MLOps to deliver robust, high-performance solutions supporting advanced AI/ML workloads in biotechnology, healthcare, and enterprise technology. The successful candidate will drive innovation in automation, reliability, and scalability, enabling scientists and engineers to rapidly develop, deploy, and monitor machine learning models in production environments.
ROLE RESPONSIBILITIES
Platform Architecture & Engineering
Platform Catalog & Developer Experience
Automation & DevOps Excellence
MLOps & Reliability Engineering
Collaboration & Leadership
Continuous Improvement
BASIC QUALIFICATIONS
PREFERRED QUALIFICATIONS
Work Location Assignment: Remote
Pfizer is an equal opportunity employer and complies with all applicable equal employment opportunity legislation in each jurisdiction in which it operates.
Information & Business Tech
The Sr. Manager/Staff Engineer, AI Infrastructure & MLOps Engineering is a senior technical leader responsible for architecting, building, and scaling Pfizer's AI infrastructure and developer platforms. This role leverages extensive experience in cloud engineering, DevOps, and MLOps to deliver robust, high-performance solutions supporting advanced AI/ML workloads in biotechnology, healthcare, and enterprise technology. The successful candidate will drive innovation in automation, reliability, and scalability, enabling scientists and engineers to rapidly develop, deploy, and monitor machine learning models in production environments.
ROLE RESPONSIBILITIES
Platform Architecture & Engineering
- Design, implement, and own large-scale cloud-based HPC and MLOps platforms supporting AI model training, genomic sequencing, and precision medicine.
- Architect multi-environment clusters (AWS, GCP, Azure), enabling GPU/FPGA workloads and advanced observability.
- Lead the development of developer and cloud platforms, including internal engineering accelerators and reusable toolsets.
Platform Catalog & Developer Experience
- Design, implement, and manage unified platform catalogs using Backstage, enhancing developer experience and application metadata management.
- Develop custom plugins and APIs for Backstage to support internal engineering workflows and documentation.
Automation & DevOps Excellence
- Build and maintain Python-based automation frameworks, CI/CD pipelines, and Infrastructure-as-Code (Terraform, Helm, Pulumi, AWS CDK).
- Operationalize containerized solutions using Docker and Kubernetes, integrating MLflow, Kubeflow, and other orchestration platforms.
- Implement robust automation for provisioning, configuring, and managing cloud resources across multiple environments.
MLOps & Reliability Engineering
- Lead the implementation of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and advanced observability (Prometheus, Grafana, PagerDuty).
- Develop and maintain APIs and services for model management, feature stores, and inference pipelines.
- Operationalize ML model serving at scale using frameworks such as TensorFlow Serving, TorchServe, KServe, and Seldon Core.
- Ensure compliance with industry standards (e.g., HIPAA, FDA) for data protection and reliability.
Collaboration & Leadership
- Mentor engineers and lead cross-functional teams to deliver integrated solutions.
- Champion engineering excellence through design documentation, code reviews, and testing automation.
- Present at industry summits, author technical proposals, and contribute to open-source projects (Kubernetes, Helm, Go, Envoy).
Continuous Improvement
- Drive agile delivery, sprint planning, and performance optimization.
- Lead incident response and disaster recovery initiatives for mission-critical platforms.
- Foster a culture of shared ownership, transparency, and innovation
BASIC QUALIFICATIONS
- 8+ years of hands-on software engineering experience in cloud infrastructure, DevOps, and MLOps.
- Deep expertise in Python, Kubernetes, Terraform, Helm, and CI/CD pipeline development.
- Proven experience architecting and operating containerized solutions on AWS, GCP, and Azure.
- Strong knowledge of Infrastructure-as-Code, distributed systems, and production system reliability.
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
PREFERRED QUALIFICATIONS
- Expertise in AWS cloud services (EC2, S3, Lambda, EKS, SageMaker, API Gateway, CloudFormation, IAM, etc.).
- Experience deploying and customizing Backstage as a unified catalog for teams, services, and technical documentation.
- Experience building and deploying microservices and REST/gRPC APIs for AI model delivery.
- Familiarity with MLflow, Kubeflow, and other MLOps orchestration platforms.
- Proficiency with model serving frameworks (TensorFlow Serving, TorchServe, KServe, Seldon Core, BentoML, etc.).
Work Location Assignment: Remote
Pfizer is an equal opportunity employer and complies with all applicable equal employment opportunity legislation in each jurisdiction in which it operates.
Information & Business Tech
Top Skills
AWS
Azure
Ci/Cd
Docker
GCP
Grafana
Helm
Kserve
Kubeflow
Kubernetes
Mlflow
Pagerduty
Prometheus
Python
Seldon Core
Tensorflow Serving
Terraform
Torchserve
Similar Jobs at Pfizer
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
The Japan Study Manager oversees clinical trials in Japan, ensuring patient safety, regulatory compliance, and quality execution while managing relationships with CROs and investigator sites.
Top Skills:
GCP
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Develop and execute access strategies for oncology assets, ensuring optimal patient access and alignment with stakeholder needs while managing pricing and reimbursement activities.
Top Skills:
CeaHeorHtaMarket Access
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Lead the Vaccine Medical Affairs team to develop strategies for vaccine business in Japan, focus on team leadership, scientific communication, regulatory affairs, and external engagement with stakeholders.
Top Skills:
Clinical Research MethodologiesCovid-19 VaccinesDatabases (Jmdc/Mdv/Ndb)Mrna Technology
What you need to know about the Sydney Tech Scene
From opera to comedy shows, the Sydney Opera House hosts more than 1,600 performances a year, yet its entertainment sector isn't the only one taking center stage. The city's tech sector has earned a reputation as one of the fastest-growing in the region. More specifically, its IT sector stands out as the country's third-largest, growing at twice the rate of overall employment in the past decade as businesses continue to digitize their operations to stay competitive.

