📁
Information Technology Services
📅
109145 Requisition #

About the Role

Stanford Research Computing seeks an exceptional GPU Cluster Lead Engineer to oversee technical operations, optimization, and strategic development of Marlowe, Stanford's NVIDIA SuperPOD. This role combines deep technical expertise in GPU computing, large-scale cluster management, and leadership in supporting a diverse research community. You will serve as the technical authority on GPU infrastructure, driving system performance and reliability while enabling groundbreaking research in AI/ML, computational biology, physics, and beyond.

 

Key Responsibilities

System Operations & Management

  • Lead day-to-day operations of the GPU Cluster, ensuring optimal uptime and performance.
  • Architect monitoring, alerting, and observability solutions using Prometheus, Grafana, DCGM, and Base Command Manager.
  • Manage job scheduling and resource allocation using Slurm, implementing advanced GPU partitioning and configurations.
  • Coordinate maintenance windows, system upgrades, and capacity expansions; lead incident response and root cause analyses.
  • System storage management, optimization, benchmarking and observability reporting.

 

Performance Optimization & Engineering

  • Design performance tuning strategies for GPU utilization, job throughput, and system efficiency.
  • Optimize NVIDIA GPU fabric configurations including NVLink, NVSwitch, and InfiniBand RDMA networking.
  • Develop containerization strategies using NVIDIA NGC, Docker, and Singularity/Apptainer.
  • Engineer solutions for deep learning frameworks (PyTorch, TensorFlow, JAX) and CUDA application optimization.
  • Benchmark system performance and collaborate with NVIDIA on optimization programs.

 

User Support & Research Enablement

  • Serve as primary technical consultant for researchers using GPU-accelerated computing,
  • Develop documentation, best practices guides, and training materials; deliver workshops on GPU computing workflows.
  • Profile and optimize user workloads, scaling applications from single-GPU to multi-node distributed training.

 

Team Leadership & Strategy

  • Mentor junior engineers and contribute to strategic planning for GPU infrastructure expansion.
  • Evaluate emerging GPU technologies and manage vendor relationships with NVIDIA and hardware suppliers.
  • Represent SRC in ongoing interactions with the Stanford Data Sciences group on AI/ML infrastructure; participate in on-call rotation.

 

Education & Experience 

  • Bachelor's degree in Computer Science, Engineering, or related field and ten years of relevant experience or a combination of education and relevant experience.
  • 5+ years in HPC systems administration or research computing; 3+ years managing GPU clusters (NVIDIA A100/H100)

 

Required Qualifications

  • Expert knowledge of NVIDIA GPU architecture, CUDA, and GPU computing principles (NVLink, MIG, GPUDirect)
  • Advanced Linux administration (RHEL, Ubuntu); expertise with Slurm job scheduler
  • Experience with high-performance networking (InfiniBand, RoCE) and parallel filesystems (Lustre, GPFS)
  • Strong scripting (Python, Bash) and containerization experience (Docker, Singularity, Kubernetes)
  • Familiarity with AI/ML frameworks (PyTorch, TensorFlow) and distributed training techniques
  • Experience with monitoring tools (Prometheus, Grafana) and NVIDIA DCGM

 

Preferred Qualifications

  • Experience with Base Command Manager or Bright Cluster Manager
  • Background in academic research computing or national lab environments
  • Contributions to open-source HPC or GPU computing projects
  • Knowledge of MLOps practices and GPU virtualization (vGPU, MIG)

 

Key Competencies

  • Technical leadership
  • Creative problem-solving
  • Excellent communication with technical and non-technical audiences
  • Strong collaboration skills
  • Service-oriented mindset
  • Adaptability to rapidly evolving technology

 

What We Offer

  • Work with cutting-edge NVIDIA GPU technology enabling groundbreaking research
  • Professional development opportunities
  • Collaborative environment with talented engineers and researchers
  • Comprehensive Stanford benefits package including health, dental, retirement, and education benefits
  • Flexible work arrangements

 

Physical Requirements*:

  • Constantly perform desk-based computer tasks.
  • Frequently sit, grasp lightly/fine manipulation.
  • Occasionally stand/walk, writing by hand.
  • Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds.

* Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form.

 

Working Conditions:

  • May work extended hours, evenings, and weekends.

 

Work Standards:

  • Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations.
  • Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned.
  • Subject to and expected to stay in sync with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide, http://adminguide.stanford.edu.

 

The expected pay range for this position is $190,577 to $200,000  per annum.

Stanford University provides pay ranges representing its good faith estimate of the salary or hourly wage the university reasonably expects to pay for a position upon hire. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs.

At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (https://cardinalatwork.stanford.edu/benefits-rewards) provides detailed information on Stanford’s extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process.

 

The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned.

Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form.

My Submissions

Track your opportunities.

My Submissions

Similar Listings

Business Affairs: University IT (UIT), Stanford, California, United States

📁 Information Technology Services

Business Affairs: University IT (UIT), Stanford, California, United States

📁 Information Technology Services

Business Affairs: University IT (UIT), Stanford, California, United States

📁 Information Technology Services

Global Impact
We believe in having a global impact

Climate and Sustainability

Stanford's deep commitment to sustainability practices has earned us a Platinum rating and inspired a new school aimed at tackling climate change.

Medical Innovations

Stanford's Innovative Medicines Accelerator is currently focused entirely on helping faculty generate and test new medicines that can slow the spread of COVID-19.

Technology

From Google and PayPal to Netflix and Snapchat, Stanford has housed some of the most celebrated innovations in Silicon Valley.

Advancing Education

Through rigorous research, model training programs and partnerships with educators worldwide, Stanford is pursuing equitable, accessible and effective learning for all.

Working Here
We believe you matter as much as the work

Group Dance Class In A Gym
Nora Cata Portrait

I love that Stanford is supportive of learning, and as an education institution, that pursuit of knowledge extends to staff members through professional development, wellness, financial planning and staff affinity groups.

Nora Cata

School of Engineering

Students Working With A Robot Arm
Philip Cheng Portrait

I get to apply my real-world experiences in a setting that welcomes diversity in thinking and offers support in applying new methods. In my short time at Stanford, I've been able to streamline processes that provide better and faster information to our students.

Phillip Cheng

Office of the Vice Provost for Student Affairs

Students Working With A Robot Arm
Denisha Clark Portrait

Besides its contributions to science, health, and medicine, Stanford is also the home of pioneers across disciplines. Joining Stanford has been a great way to contribute to our society by supporting emerging leaders.

Denisha Clark

School of Medicine

Students Working With A Robot Arm
Laura Lind Portrait

I like working in a place where ideas matter. Working at Stanford means being part of a vibrant, international culture in addition to getting to do meaningful work.

Laura Lind

Office of the President and Provost

Getting Started
We believe that you can love your job

Join Stanford in shaping a better tomorrow for your community, humanity and the planet we call home.

  • 4.2 Review Ratings
  • 81% Recommend to a Friend

View All Jobs