Sustainable Talent

Senior Infrastructure Engineer

Sustainable Talent • Santa Clara, CA
Remote

Join the Sustainable Talent team, supporting NVIDIA as a Senior Infrastructure Engineer supporting the IPP (Infrastructure, Planning and Process) Cloud Infrastructure Team. This is a W-2 full-time 1 year contract based in Santa Clara, CA. We offer competitive pay $80- $100/hr based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture!

 

What You'll Be Doing:

  • Collaborate with the Infrastructure Team to manage and optimize operations within our Infrastructure and Cloud environments, with a strong focus on large-scale system configurations and automation.
  • Lead the deployment, configuration, and troubleshooting of data center and cloud-based infrastructures, ensuring efficient operations for NVIDIA's latest hardware and technologies.
  • Design and implement automated solutions for product onboarding into our hosted and private cloud environments, utilizing robust scripting techniques.
  • Work closely with engineers, architects, and product managers to strategize and execute product launches, enhancing deployment processes.
  • Tackle complex challenges related to multi-site deployments of NVIDIA products, applying innovative problem-solving skills.
  • Partner with multi-functional teams, including system engineering, software engineering, and operations, to deliver reliable and scalable platforms from concept to production.
  • Focus on managing systems at scale, writing code for simultaneous configuration of multiple servers, and improving deployment efficiency, including API integrations for automation.

 

What We Need to See:

  • Bachelor’s or Master’s Degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
  • 5+ years of relevant experience, with a strong emphasis on DevOps practices.
  • 3+ years of experience with Linux systems and scripting (Bash, Python).
  • Solid background in managing large-scale infrastructure operations with an emphasis on automation and configuration management.
  • Proven ability to quickly adapt to and implement new technologies, including system-level operations and tools.
  • Strong understanding of embedded systems, orchestration, data centers, and cloud architecture, along with excellent communication and planning skills.
  • Experience in product engineering, debugging, and hardware configuration, with a focus on system-level operations.

 

Ways to Stand Out from the Crowd:

  • Experience in large-scale QA environments and product bring-ups.
  • Familiarity with operations support, bug tracking, and ticket management.
  • Background in supporting GPUs, embedded device development, and CUDA applications.
  • Knowledge of converged and hyper-converged infrastructure.
  • Experience with configuration management tools (e.g., Puppet, Chef) for hardware setups.
  • Strong expertise in system configuration protocols (e.g., IPMI/BMC, Redfish).
  • Knowledge of CI/CD tools like Jenkins for automating deployment pipelines.
  • Experience working with APIs for system communication and automation.
  • Strong hardware knowledge, particularly in configuring hardware components (e.g., BIOS, CPU) in large-scale environments.
  • Experience configuring BIOS settings remotely in large hardware deployments.

 

Additional Requirements:

  • Proven experience in configuring systems at scale, focusing on automation and efficiency.
  • Familiarity with tools for managing remote server configurations, including BMC/IPMI systems.
  • Ideal candidates may have experience from companies like Dell, IBM, or HP, or in organizations that produce servers or operate on-premise cloud solutions.

 

Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.