Virtual Platform Engineering Sr. Manager, Annapurna Labs Machine Learning Accelerators, AWS
Company: Amazon
Location: Cupertino
Posted on: April 3, 2026
|
|
|
Job Description:
AWS's Trainium and Inferentia chips power the world's largest
machine learning clusters. Our team builds virtual platforms —
full-system C++ models of these custom SoCs — that let software
teams start development months before silicon exists. For
Trainium3, our virtual platform enabled a full training workload
within 12 hours of first silicon, putting servers in customer hands
within weeks! We're hiring a hands-on engineering manager to lead
and scale our virtual platform effort. You'll own the end-to-end
delivery of virtual platforms that our software partners depend on
— from model architecture and level of detail, through deployment
and customer enablement. This is a builder-leader role: you'll set
the technical direction while staying close to the code and your
customers. What you'll do: - Lead the team delivering virtual
platforms used by design verification as well as driver, runtime,
collectives, and application software teams to develop and validate
software pre-silicon - Own the virtual platform roadmap — deciding
what to model, at what fidelity, and when to deliver it, based on
customer needs and chip schedules - Drive platform usability,
performance, and scalability so teams can run real workloads on
your models efficiently - Build and improve the tooling, CI, and
release infrastructure around the virtual platform so customers get
reliable, well-documented drops - Partner closely with software
teams and design verification to understand their workflows and
shape the platform to maximize their productivity - Hire and
develop a team of strong modeling engineers, setting high standards
for code quality, testing, and delivery - Dive into technical
problems when needed - debug model issues, review architecture
decisions, and unblock the team Why this role is interesting: -
Your virtual platform directly accelerates AWS's most strategic
silicon programs — software teams literally can't start without you
- You'll own a product with real internal customers who give you
direct feedback, not just a component buried in a larger system -
The problem space is rich: full-system simulation, multi-subsystem
integration, QEMU development, performance at scale, machine
learning at the bleeding edge - Small team, big impact, startup
pace inside AWS's custom silicon org No ML background needed.
You’ll learn the required ML domain knowledge on the job. What
matters is deep virtual platform or system modeling experience and
the ability to lead a technical team. About the team More details
about Trainium3, our team's latest achievement, as well as some
insights into our team culture: -
(https://www.aboutamazon.com/news/aws/trainium-3-ultraserver-faster-ai-training-lower-cost)
- 7 years of engineering team management experience - Knowledge of
SoC architecture - 15 years writing functional or performance
models for SoCs, CPUs, GPUs, or ASICs - Strong C++ and/or SystemC
skills in large-scale OOP codebases - Experience hiring, developing
and promoting engineering talent - Experience with
high-performance, multi-threaded, or distributed systems -
Experience developing and calibrating performance models for custom
silicon - Background writing benchmarks and analyzing model
performance - ML accelerator architecture knowledge (a plus, not
required) - Experience building CI/CD regression frameworks and
developer tooling - Familiarity with AWS EC2 for development
workflows Amazon is an equal opportunity employer and does not
discriminate on the basis of protected veteran status, disability,
or other legally protected status. Los Angeles County applicants:
Job duties for this position include: work safely and cooperatively
with other employees, supervisors, and staff; adhere to standards
of excellence despite stressful conditions; communicate effectively
and respectfully with employees, supervisors, and staff to ensure
exceptional customer service; and follow all federal, state, and
local laws and Company policies. Criminal history may have a
direct, adverse, and negative relationship with some of the
material job duties of this position. These include the duties and
responsibilities listed above, as well as the abilities to adhere
to company policies, exercise sound judgment, effectively manage
stress and work safely and respectfully with others, exhibit
trustworthiness and professionalism, and safeguard business
operations and the Company’s reputation. Pursuant to the Los
Angeles County Fair Chance Ordinance, we will consider for
employment qualified applicants with arrest and conviction records.
Our inclusive culture empowers Amazonians to deliver the best
results for our customers. If you have a disability and need a
workplace accommodation or adjustment during the application and
hiring process, including support for the interview or onboarding
process, please visit
https://amazon.jobs/content/en/how-we-hire/accommodations for more
information. If the country/region you’re applying in isn’t listed,
please contact your Recruiting Partner. The base salary range for
this position is listed below. Your Amazon package will include
sign-on payments and restricted stock units (RSUs). Final
compensation will be determined based on factors including
experience, qualifications, and location. Amazon also offers
comprehensive benefits including health insurance (medical, dental,
vision, prescription, Basic Life & AD&D insurance and option
for Supplemental life plans, EAP, Mental Health Support, Medical
Advice Line, Flexible Spending Accounts, Adoption and Surrogacy
Reimbursement coverage), 401(k) matching, paid time off, and
parental leave. Learn more about our benefits at
https://amazon.jobs/en/benefits . USA, CA, Cupertino - 253,100.00 -
342,300.00 USD annually
Keywords: Amazon, San Bruno , Virtual Platform Engineering Sr. Manager, Annapurna Labs Machine Learning Accelerators, AWS, Engineering , Cupertino, California