Senior Infrastructure Engineer

Sustainable Talent • Santa Clara, CA

Remote

Join the Sustainable Talent team, supporting NVIDIA as a Senior Infrastructure Engineer supporting the IPP (Infrastructure, Planning and Process) Cloud Infrastructure Team. This is a W-2 full-time 1 year contract based in Santa Clara, CA. We offer competitive pay $80- $100/hr based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture!

What You'll Be Doing:

Collaborate with the Infrastructure Team to manage and optimize operations within our Infrastructure and Cloud environments, with a strong focus on large-scale system configurations and automation.
Lead the deployment, configuration, and troubleshooting of data center and cloud-based infrastructures, ensuring efficient operations for NVIDIA's latest hardware and technologies.
Design and implement automated solutions for product onboarding into our hosted and private cloud environments, utilizing robust scripting techniques.
Work closely with engineers, architects, and product managers to strategize and execute product launches, enhancing deployment processes.
Tackle complex challenges related to multi-site deployments of NVIDIA products, applying innovative problem-solving skills.
Partner with multi-functional teams, including system engineering, software engineering, and operations, to deliver reliable and scalable platforms from concept to production.
Focus on managing systems at scale, writing code for simultaneous configuration of multiple servers, and improving deployment efficiency, including API integrations for automation.

What We Need to See:

Bachelor’s or Master’s Degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
5+ years of relevant experience, with a strong emphasis on DevOps practices.
3+ years of experience with Linux systems and scripting (Bash, Python).
Solid background in managing large-scale infrastructure operations with an emphasis on automation and configuration management.
Proven ability to quickly adapt to and implement new technologies, including system-level operations and tools.
Strong understanding of embedded systems, orchestration, data centers, and cloud architecture, along with excellent communication and planning skills.
Experience in product engineering, debugging, and hardware configuration, with a focus on system-level operations.

Ways to Stand Out from the Crowd:

Experience in large-scale QA environments and product bring-ups.
Familiarity with operations support, bug tracking, and ticket management.
Background in supporting GPUs, embedded device development, and CUDA applications.
Knowledge of converged and hyper-converged infrastructure.
Experience with configuration management tools (e.g., Puppet, Chef) for hardware setups.
Strong expertise in system configuration protocols (e.g., IPMI/BMC, Redfish).
Knowledge of CI/CD tools like Jenkins for automating deployment pipelines.
Experience working with APIs for system communication and automation.
Strong hardware knowledge, particularly in configuring hardware components (e.g., BIOS, CPU) in large-scale environments.
Experience configuring BIOS settings remotely in large hardware deployments.

Additional Requirements:

Proven experience in configuring systems at scale, focusing on automation and efficiency.
Familiarity with tools for managing remote server configurations, including BMC/IPMI systems.
Ideal candidates may have experience from companies like Dell, IBM, or HP, or in organizations that produce servers or operate on-premise cloud solutions.

Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.

Apply Now