Logo of Huzzle

Staff Software Engineer, Site Reliability Engineering

image

Google

6d ago

  • Job
    Full-time
    Senior Level
  • Software Engineering
    IT & Cybersecurity
  • Dublin

AI generated summary

  • You need a Bachelor's in Computer Science or equivalent, 5 years in software development, 8 years with data structures, 3 years leading projects, and experience with distributed systems.
  • You will design, deploy, and monitor services, ensuring reliability and scalability through automation while engaging in incident response and postmortems for continuous improvement.

Requirements

  • Minimum qualifications:
  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • Candidates will typically have 5 years of experience with software development in one or more programming languages.
  • Typically 8 years of experience with data structures or algorithms.
  • Typically 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems.
  • Preferred qualifications:
  • Experience working in computing, distributed systems, storage, or networking.
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Ability to debug, optimize code, and to automate routine tasks.
  • Systematic problem-solving approach, coupled with effective verbal and written communication skills.

Responsibilities

  • Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.

FAQs

What is the minimum educational requirement for this position?

A Bachelor’s degree in Computer Science, a related field, or equivalent practical experience is required.

How many years of software development experience is typically required for this role?

Candidates will typically have 5 years of experience with software development in one or more programming languages.

What are the preferred qualifications for this position?

Preferred qualifications include experience working in computing, distributed systems, storage, or networking, expertise in designing, analyzing, and troubleshooting large-scale distributed systems, the ability to debug and optimize code, as well as effective verbal and written communication skills.

What does the Site Reliability Engineering (SRE) team focus on?

The SRE team combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems, ensuring reliability and performance for Google Cloud's services.

What are key responsibilities associated with this role?

Key responsibilities include improving the whole lifecycle of services, supporting services before they go live, maintaining live services, scaling systems sustainably through automation, and practicing sustainable incident response.

Is there an emphasis on diversity within the team?

Yes, the SRE culture promotes diversity, intellectual curiosity, problem-solving, and openness, encouraging collaboration and diverse perspectives.

Are there opportunities for support and mentorship in this role?

Yes, there is a focus on creating an environment that provides the support and mentorship needed for learning and growth.

Does the role involve working directly on system troubleshooting?

Yes, the role involves designing, analyzing, and troubleshooting distributed systems and maintaining service health.

Will I need to perform incident response for live systems?

Yes, you will practice sustainable incident response and engage in blameless postmortems.

How important is automation in this position?

Automation is crucial as the role involves scaling systems sustainably through mechanisms like automation and eliminating routine work.

Technology
Industry
10,001+
Employees
1998
Founded Year

Mission & Purpose

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.