MLOps Lead, Central Technology

Remote Full-time
About the position Responsibilities • Provide technical MLOps leadership for a team of MLOps Engineers, managing and leading the team in operating AI training and inference systems. • Drive the application of MLOps and DevOps principles across multiple platforms, ensuring peak operational efficiency. • Define end to end metrics program including full proactive monitoring and alerting systems for the MLOps team. • Facilitate model training through collaboration with AI Researchers to ensure best practices in machine learning and deep learning. • Optimize Kubernetes based AI Lifecycle platform through IAC practices and integrate with On-Prem HPC systems. • Collaborate on Data systems for AI model training with Data Infrastructure Eng team and Science data teams. • Lead MLOps team supporting on-call rotation with a focus on automation and proactive alerting. Requirements • BS, MS, or PhD degree in Computer Science or a related technical discipline or equivalent experience. • 7+ years of relevant coding and systems experience. • 5+ years of systems Architecture and Design experience, with a broad range of MLOps experience. • Proven technical leadership in SRE and MLOps related experience. • Strong experience scaling containerized applications on Kubernetes or Mesos. • Cloud Platform proficiency with AWS, GCP, or Microsoft Azure. • MLOps experience working with medium to large scale GPU clusters in Kubernetes. • Working knowledge of Nvidia CUDA and AI/ML custom libraries. • Knowledge of Linux systems optimization and administration. • Solid Coding experience with a systems language such as Rust, C/C++, C#, Go, Java, or Scala. • Expertise with a scripting language such as Python, PHP, or Ruby. • Experience in integrating Data with the AI Lifecycle. • AI/ML Platform Operations experience in an environment integrated with challenging data and systems platform challenges. • Large scale Streaming data systems integration experience. • Experience with Hadoop, Spark, and/or Kafka deployments. • Workflow scheduling tools experience such as Apache Airflow, Dagster, or Apache Beam. • Understanding of Data Engineering, Data Governance, Data Infrastructure, and AI/ML execution platforms. Nice-to-haves • Experience with PyTorch, Keras, or Tensorflow. • Experience with HPC and Slurm. Benefits • Generous employer match on employee 401(k) contributions. • Annual benefit for employees that can be used for housing, student loan repayment, childcare, commuter costs, or other life needs. • CZI Life of Service Gifts awarded to employees to support causes closest to them. • Paid time off to volunteer at an organization of your choice. • Funding for select family-forming benefits. • Relocation support for employees moving to the Bay Area. Apply tot his job
Apply Now →

Similar Jobs

Software (ML Product) Engineer (Staff/ Senior, Open Source, Python)

Remote

MLOps Tech Lead; Remote

Remote

Staff Software Engineer (AI/ML Platform)

Remote

Senior Devops Engineer- ML Engineering Support

Remote

App Developer for Beauty Consultation App (iOS + Web)

Remote

Technical Product Manager, Mobile App Attribution and Measurement job at StackAdapt in US National

Remote

Lead Product Manager, AI Agents & Emerging Products

Remote

Principal Technical Product Manager – Driving Innovation in AT&T’s Flagship Mobile App Experience Across Multiple Locations

Remote

Self-Employed Mortgage Advisor

Remote

Mobile Mortgage Advisor

Remote

**Experienced Customer Service Representative – Fiber-Optic Telecommunications Expert**

Remote

**Experienced Customer Service and Remote Virtual Assistant – US Healthcare Operations**

Remote

Utilization Management Behavioral Care Advocate - Remote

Remote

Experienced Remote Data Entry Specialist for E-commerce Platforms – Part-Time Opportunity with blithequark for Detail-Oriented Individuals

Remote

Healthcare Customer Service Representative - Remote

Remote

**Experienced Data Entry Specialist – Remote Work Opportunity at blithequark**

Remote

Entry-level Private Jet Charter Sales Consultant (Midtown Atlanta)

Remote

School Psychologist - Part Time Remote Opportunity: Supporting Student Mental Health and Academic Success in a Dynamic Educational Setting

Remote

Product Marketing Specialist (Freelancer)

Remote

Experienced Customer Engineer – Technical Lead for Enterprise Software Solutions and Customer Success

Remote
← Back