Systems and Data Administrator

Date: Apr 28, 2019

Location: Saudi Arabia

Company: King Abdullah University of Science & Technology

Position Summary:

 

Systems and Data Administrator is responsible for day-to-day oversight and management of ECRC computing infrastructure including leading-edge HPC servers and scientific workstations, tightly connected with hardware accelerators.

Leads the effort to provide research-computing services to the end-users while maintaining availability, supportability, and usability of servers, tools, and data.

Works with ECRC faculty, researchers, and students to facilitate achievement of research goals via the use of ECRC computational resources and services.

Develops ideas and executes projects to support and contribute to ECRC research goals.

 

Major Responsibilities:

 

  • Administer and monitor high-­end HPC servers and workstations with hardware accelerators running a variety of operating systems including Red Hat variants and Ubuntu.
  • Develop reports and customize tools that automate the monitoring process of critical systems and alert team when issues occur.
  • Solve escalated systems related issues, coordinate with vendors to isolate hardware problems, install firmware or software patches as necessary.
  • Plan for and deploy OS patches across workstations and Linux servers.
  • Lead procurement of new leading-edge servers in close collaboration with ECRC staff and hardware vendors.
  • Install and upgrade the usual HPC commercial and open-source software stack (compilers, numerical libraries, etc.).
  • Compile, install and test nightly builds of ECRC software packages on various systems using the Jenkins software.
  • Act as point of contact for system related matters for students, researchers and faculty members – within ECRC.
  • Act as point of contact for system related matters with regards to IT data center space and power management.
  • Produce thorough technical documentation.
  • Maintain the ECRC webpages up to­ date, in coordination with Divisional and University webmasters.
  • Create and support ECRC software development using the Git and Wiki tools.
  • Deploy and maintain incremental backup strategies over the network.

 

Competencies:

 

Technical Skills – required

  • Familiarity with configuration of MPI, OpenMP, Intel, PGI and GNU compilers, numerical libraries.
  • Experience with open source software compilation and modules.
  • In-depth knowledge of TCP/IP networking and related protocols, NFS, etc.
  • Experience in installing and configuring queuing systems on heterogeneous server infrastructures.
  • Knowledge of computer hardware installation and troubleshooting.
  • Experience with incremental backups and cloning solutions.
  • Knowledge of web servers and databases configuration (Apache, MySQL, PHP).
  • Experience with Version Control tools, such as Git.

 

Technical Skills – preferred

  • Knowledge of HPC software libraries.
  • Knowledge of CI software (Jenkins, Travis, etc).
  • Familiarity with CMake scripts.

 

Non-Technical Skills or Attributes

  • Proficient in English-language documentation and communication.
  • Ability to support users in an academic environment.
  • Self-learning and pro­‐active working skills.
  • Organization and planning skills.

 

Qualifications:

 

Bachelor's degree (minimum) in Computer Science (preferred)

 

Experience Required:

 

A minimum of 3-5 years of progressively responsible experience administering Linux installations (Red Hat, CentOS, Ubuntu, etc.)