We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Senior Manager AI Reliability Engineering

Lenovo
remote work
United States, North Carolina, Morrisville
Mar 07, 2026


General Information
Req #
WD00096060
Career area:
Software Engineering
Country/Region:
United States of America
State:
North Carolina
City:
Morrisville
Date:
Friday, March 6, 2026
Working time:
Full-time
Additional Locations:
* United States of America - Illinois - Chicago

Why Work at Lenovo
We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Description and Requirements

About Our Team

Lenovo is building Quantum, a nextgeneration hybrid AI platform that spans Windows, Android, and cloud. As part of this vision, we are expanding the reliability engineering organization powering Qira, Lenovo's crossdevice Personal AI that operates seamlessly across Lenovo and Motorola products.

We are hiring a Senior Manager, AI Reliability Engineering to lead the engineering teams responsible for Qira's foundational reliability capabilities - including systemlevel observability, telemetry, performance engineering, resiliency architecture, and the reliability of Qira's hybrid edge/cloud AI service.

This is a highimpact leadership role shaping how we measure, operate, and improve reliability across one of Lenovo's most ambitious AI initiatives.


Location: Open to remote work in the US. The preferred work location is Chicago, IL.

What You'll Do

Engineering Leadership

  • Lead and grow multiple engineering teams focused on reliability, observability, and system performance across Qira's hybrid AI ecosystem.

  • Define strategy, roadmaps, and priorities to improve reliability, insight, and operational readiness across device, edge, and cloud systems.

  • Champion reliability as an engineering discipline through design patterns, best practices, and a culture of continuous improvement.

Observability & Telemetry

  • Own the systems that deliver metrics, logs, traces, distributed tracing, AIspecific signals, dashboards, and alerting.

  • Drive the adoption of unified telemetry standards and instrumentation across all Qira components.

  • Ensure engineers have actionable insight into performance, reliability, cost, and AI behavior.

Service Reliability & Performance Engineering

  • Lead engineering efforts to improve the reliability, performance, and scalability of Qira's service architecture - including inference, retrieval, data pipelines, and hybrid edge/cloud workflows.

  • Drive the design and adoption of resilience patterns such as graceful degradation, fallback paths, bulkheads, and ratelimiting strategies.

  • Oversee capacity planning, cost optimization, and performance tuning for highthroughput AI systems.

System Design & Architectural Influence

  • Work with crossfunctional engineering teams to embed reliability early in the design process ("shift left").

  • Guide architectural decisions to ensure Qira's engineering foundations remain stable, observable, and predictable at scale.

  • Set service readiness standards for new components entering production.

CrossFunctional Collaboration

  • Partner with Applied AI/ML Engineering, Platform Engineering, Firmware, Product, and Security to align reliability goals with Qira's broader roadmap.

  • Collaborate closely with the incident management and operations teams to ensure strong signal quality, runbook depth, and operational tooling.

  • Act as a reliability engineering representative in executive and engineering leadership forums.

Team & Talent Development

  • Hire and develop worldclass engineers across observability, reliability, and performance domains.

  • Provide coaching, mentorship, and clear technical and leadership career paths.

  • Foster a culture of ownership, operational craftsmanship, and datadriven engineering.

Basic Qualifications

  • 12+ years of experience in Site Reliability Engineering, Observability Engineering, Platform Engineering, or largescale distributed systems, including 5+ years leading engineering teams.

  • Bachelor's Degree in Computer Science, Engineering, or a related technical field.

  • Engineering experience in several of the following:

  • Observability systems (OpenTelemetry, metrics/logs/traces)

  • Distributed systems reliability and performance

  • Cloud infrastructure (Azure preferred)

  • Kubernetes and containerized environments

  • CI/CD pipelines and deployment workflows

  • Infrastructure-as-Code (Terraform, Bicep, etc.)

  • Deep understanding of Linux systems, networking, scalability, and system performance fundamentals.

  • Proven ability to lead engineering teams and drive crossorganizational initiatives.

Preferred Qualifications

  • Experience building or operating largescale telemetry and observability platforms.

  • Handson experience with Grafana, Prometheus, Loki, Tempo, or similar tooling.

  • Experience supporting AI/ML inference systems, vector databases, or GPUaccelerated compute.

  • Background in hybrid systems spanning device, edge, and cloud.

  • Experience implementing resilience patterns and reliability frameworks.

  • Experience with SLOs, SLIs, error budgets, and reliability governance.

  • Passion for building scalable reliability engineering teams and systems.

Why This Role Matters

Qira's reliability is missioncritical to delivering a safe, fast, and trustworthy AI experience to millions of users.
In this role, you will:

  • Build the telemetry and reliability insights that power Qira

  • Architect the servicelevel reliability patterns that keep Qira stable at scale

  • Lead the engineering teams that ensure Qira performs predictably across devices, edge, and cloud

  • Shape how reliability engineering is practiced across Lenovo's AI ecosystem

This is a rare opportunity to define the engineering foundation of a nextgeneration global AI platform.

The base salary budgeted range for this position is $190K - $230K. Individuals may also be considered for bonus and/or commission.

Lenovo's various benefits can be found on www.lenovobenefits.com.
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.
Additional Locations:
* United States of America - Illinois - Chicago
* United States of America
* United States of America - Illinois
* United States of America - Illinois - Chicago

Applied = 0

(web-6bcf49d48d-kx4md)