Lead, Systems Administrator

Date: Jun 24, 2025

Location: Saudi Arabia

Company: King Abdullah University of Science & Technology

Position Summary

Serve as the Lead for the team ensuring smooth operation of the Linux cluster consisting of 300+ GPU/CPU compute nodes including parallel filesystems and high-performance network. This is partly technical and partly people leading role which involves supervision of 3-4 experienced HPC system administrators. The role involves development, implementation and supervision of standard operating procedures for the system and the team.

Major Responsibilities

  • System operation and upgrade planning to meet laboratory and customer requirements
  • Workload scheduler policy development and implementation
  • Support of high-performance filesystems
  • Network infrastructure management including TCP/IP and HPC networks
  • Use of scripting languages for nodes automation and configuration management
  • Hardware failures and spare part management
  • Build effective relationships with staff, faculty and students through the Core Labs.
  • Manages multiple or significant projects which may require the use of sophisticated project planning techniques. 
  • Plans, schedules, conducts, or coordinates detailed phases of the work of a major project or in a total project of moderate scope.
  • Identifies technical training needs for staff attached to the area.
  • Serve as a resource and as a member to respond to security and safety incidents.
  • Creates opportunities to enhance technical methodology or content through expansion of existing, or development of, new efforts; may extend technology into new application areas; contributes or leads in major intellectual development activities. 
  • Provides innovative problem-solving approaches to enhance organizational capabilities; uses peer network to expand technical capabilities and identify new research opportunities.
  • Understands broad strategic objectives and contributes to them; nurtures and maintains relationships with major customers.
  • May initiate new project concepts; develops technical proposals and makes presentations to potential customers.
  • Will supervise several scientists, engineers or technicians on assigned work; provides major input to staffing of overall project teams; builds teams and staff to optimize efficiency and cost effectiveness. 
  • Identifies and evaluates candidates for open positions; mentors/trains staff in development of technical, project and business development skills.
 

Personal Requirements

Competencies

  • SLURM workload manager including GPU scheduling
  • Parallel filesystems (Weka IO, Lustre)
  • TCP/IP and high performance networks (Infiniband)
  • Proficient in scripting languages (i.e. Bash, Python, Ruby)
  • Familiar with configuration management tools (Puppet)
  • Proficient documentation skills.
  • Will have working level contact with users and suppliers
  • Demonstrates an analytical and systematic approach to problem solving.
  • Takes the initiative in identifying and negotiating appropriate development opportunities.
  • Demonstrates effective communication skills in written and oral English.
  • Works effectively with other teams in the Supercomputing Laboratory
  • Plans, schedules and monitors own work (and that of others) competently within limited deadlines and according to relevant legislation and procedures.
  • Ability to work successfully in a highly collaborative research environment.
  • Uses discretion in identifying and resolving complex problems and assignments.
  • Performs a broad range of work, sometimes complex and non-routine, in a variety of environments
  • Maintain expert-level knowledge in most of the laboratory systems, including high performance computing systems administration, high performance storage administration, or high performance network administration

Qualifications and Experience

Bachelor of Science (or equivalent) in a relevant discipline plus 10 years’ experience, OR Master of Science (or equivalent) in a relevant discipline plus 7 years’ experience OR Doctor of Philosophy (or equivalent) in a relevant discipline plus 5 years’ experience.