MINIMUM REQUIREMENTS
Education and Experience:
- Bachelor’s degree and ten years of relevant experience, or a combination of education and relevant experience. Eight years of increasingly technical work experience preferred.
- In-depth experience managing complex multiuser HPC clusters and storage environments is necessary, as is experience managing GPU-based infrastructure.
Qualifications:
This position requires in-depth knowledge of and substantial hands-on experience with:
- HPC cluster system administration, preferably in an academic/research environment
- GPU technologies and their integration into HPC environments (driver management, software stack tools, monitoring)
- Infiniband (driver management, software stack tools, monitoring)
- Container platforms (ex: Apptainer)
- Slurm configuration and management
- NFS-based storage management and configuration
- High-performance parallel filesystem (Lustre) management and configuration
- Scripting for system management, monitoring and task automation
- Installing and repairing servers and associated cluster hardware
- Complex technical problem-solving and troubleshooting, with a proactive approach to system optimization and issue resolution
- Security practices and compliance standards in a computing environment
- Collaborating effectively across teams and with researchers
Additional desired skills and experience include:
- AI/ML software and frameworks, deep learning, and LLM training
- Bright Cluster Manager
- Pyxis/enroot
- CUDA
- System and storage benchmarking
- DataDirect Networks (DDN) SFA high-performance storage systems
Working Conditions
This is a hybrid position, in which you will work on-site at the Stanford campus for a minimum of 3 days a week through the first 9 months of employment, and at least 2 days a week thereafter.
You will be expected to travel to the research data center (3 miles away, on the SLAC campus) as needed to meet with and escort vendor technicians; inspect and troubleshoot hardware; receive and install FRU components for the system; and manage any RMAs. Typically, Stanford service vehicles can be checked out and used for your travel between the Stanford campus and the data center. Note that availability and ability to travel to/from the data center is required for all work days (not only on-site days) and in emergency off-hours situations.
Our core work hours are 9 am - 5 pm Pacific. This role occasionally will require extended hours and weekend work, and you will participate in rotation of on- and off-site responsibilities during the annual winter closure. Periodically, the data center is shut down for required maintenance. All team members with system responsibilities are expected to be physically on-site to return services to production status at the end of any planned facility outage.
The expected pay range for this position is $184,630 to $214,780 per annum.
Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs.
At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (https://cardinalatwork.stanford.edu/benefits-rewards) provides detailed information on Stanford’s extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process.
Why Stanford is for You:
Imagine a world without search engines or social platforms. Consider lives saved through first-ever organ transplants and research to cure illnesses. Stanford University has revolutionized the way we live and enriched the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with:
- Freedom to grow. We offer career development programs, tuition reimbursement, and course auditing. Join a TedTalk, watch a film screening, or listen to a renowned author or global leaders speak.
- A caring culture. We provide superb retirement plans, generous time-off, and family care resources.
- A healthier you. Choose from hundreds of health or fitness classes at our world-class exercise facilities. We provide excellent health care benefits.
- Discovery and fun. Stroll through historic sculptures, trails, and museums.
- Enviable resources. Enjoy free commuter programs, ridesharing incentives, discounts and more.
We look forward to receiving your application and cover letter.
The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned.
Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form.
Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law.