FAQs
What are the key responsibilities of a Senior Site Reliability Engineer - Cloud at NVIDIA?
The key responsibilities include designing, building, and maintaining large scale production systems with high efficiency and availability using software and systems engineering practices, ensuring reliability and uptime of GPU cloud services, enabling developers to make system changes through careful planning and preparation, automating manual work, performance tuning, and optimizing production systems.
What skills and knowledge are required for this role?
Skills and knowledge required include expertise in systems, networking, coding, database, capacity management, continuous delivery and deployment, Kubernetes, OpenStack, and other cloud enabling technologies. Additionally, a mindset of problem solving, diversity, intellectual curiosity, and openness is important for success in this role.
What is the culture like at NVIDIA's Site Reliability Engineering organization?
The culture at NVIDIA's Site Reliability Engineering organization is one of diversity, collaboration, intellectual curiosity, problem solving, and openness. The organization encourages collaboration, thinking big, taking risks in a blame-free environment, self-direction on meaningful projects, and providing support and mentorship for learning and growth.