Logo of Huzzle

System Development Manager, AWS Resilience, AWS Incident Response

image

Amazon

16d ago

  • Job
    Full-time
    Expert / Leadership (9+ years)
  • Software Engineering
    IT & Cybersecurity
  • Dublin
    Remote

AI generated summary

  • You need 5+ years in cloud technologies, team management, agile development, operational best practices, customer issue resolution, and strong communication skills.
  • You will define strategic goals, coordinate cross-team communication, manage incident/change processes, and oversee performance and career development for the global incident response team.

Requirements

  • 5+ years of direct experience with cloud hosting technologies (AWS, Azure, etc.
  • 5+ years experience managing an engineering team operating at scale.
  • Deep understanding of infrastructure delivered through the software development lifecycle in an API-enabled environment – including agile development, software /patterns, and modern cloud services.
  • Experience in implementing, supporting, and evaluating tools and services with a security, scalability, and performance mindset
  • Ability to handle multiple competing priorities in a fast-paced environment
  • Ability to interact with and influence people at all levels.
  • Excellent written and verbal communication skills and ability to get ideas across to the team, peers and customers.
  • Strong understanding of fundamental operational best practices such as monitoring, alerting, deployment and change policies (ITIL a plus)
  • Experience running agile frameworks or other workflow methodologies in an DevOps setting.
  • Experience dealing with customers during issue resolution and operating under pressure.
  • Routine communication of status to senior management
  • SLA definition and refinement
  • Goal-setting for reduction and elimination of customer facing defects
  • Leading post-mortem analysis, including ensuring a high quality bar for analysis and follow through of consequent action items

Responsibilities

  • Define and Deliver Business Priorities
  • You will be a key contributor and owner of the direction of the global AWS Incident Response team. You will define, plan, track and deliver on strategic goals for the team, while ensuring that the team remains unblocked and focused.
  • Cross-Site, Cross-Team Coordination
  • You will be responsible for coordinating with your counterparts to ensure that a clear communication channel exists between AWS Operations teams. You will also work closely with systems and product teams to create and maintain a proper processes for monitoring and alarming on services. A portion of this process will include establishing both solid operational acceptance criteria and a concrete feedback loop for resolving deviations from that process.
  • Incident/Change Management
  • You will be the point of contact for inquiries regarding engagement processes and issues within the global Amazon platform during your team’s coverage. Responsibilities include delegation of emergent engagement issues to team members, driving initiatives regarding improvements to existing tools & processes and providing feedback on new practices & procedures in order to scale with the rapid expansion of the AWS Services and customer base.
  • Performance Management/Team Health
  • You will own all facets of performance and career management for the team.

FAQs

What qualifications are required for the System Development Manager position?

The position requires 5+ years of direct experience with cloud hosting technologies, 5+ years of experience managing an engineering team, and a deep understanding of infrastructure delivered through the software development lifecycle in an API-enabled environment.

What is the focus of the AWS Resilience team?

The AWS Resilience team focuses on preventing and responding to availability and security issues for all AWS Services, ensuring the cloud's operation and resilience.

What types of methodologies does the team use for project management?

The team utilizes agile development methodologies and modern cloud services, emphasizing monitoring, alerting, deployment, and change policies.

What will be my responsibilities concerning incident management?

You will oversee the escalation of emergent engagement issues, drive initiatives for improving tools and processes, and ensure performance management and career development for your team.

What kind of communication skills are required for this role?

Excellent written and verbal communication skills are required, as you will need to effectively convey ideas to your team, peers, and customers and communicate status updates to senior management.

Will I have the opportunity to influence team direction?

Yes, you will be a key contributor and owner of the direction of the global AWS Incident Response team, defining and tracking strategic goals.

Is there a focus on diversity and inclusion within the team?

Yes, Amazon is committed to creating a diverse and inclusive workplace and values diversity in its workforce as central to its success.

Are there opportunities for career growth within this role?

Yes, this position offers significant growth potential and opportunities to make a substantial impact within the team and across the organization.

How does the team handle high-visibility incidents?

You will ensure your team effectively directs the resolution of high-visibility incidents, coordinating with global teams and leveraging data to improve automation and tooling.

Do I need prior experience with AWS to apply for this position?

While not explicitly stated, having experience with AWS or similar cloud hosting technologies is highly valuable for this role.

Retail & Consumer Goods
Industry
10,001+
Employees
1994
Founded Year

Mission & Purpose

Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. We are driven by the excitement of building technologies, inventing products, and providing services that change lives. We embrace new ways of doing things, make decisions quickly, and are not afraid to fail. We have the scope and capabilities of a large company, and the spirit and heart of a small one. Together, Amazonians research and develop new technologies from Amazon Web Services to Alexa on behalf of our customers: shoppers, sellers, content creators, and developers around the world. Our mission is to be Earth's most customer-centric company. Our actions, goals, projects, programs, and inventions begin and end with the customer top of mind. You'll also hear us say that at Amazon, it's always "Day 1."​ What do we mean? That our approach remains the same as it was on Amazon's very first day - to make smart, fast decisions, stay nimble, invent, and focus on delighting our customers.