Full Stack Software Engineer (AI Infrastructure; SWE3) [D.26.0012]

Full Time
Maryland
Posted 3 months ago
This position has been filled

Requires Top Secret/SCI with Full Scope Poly

Description: Join us in building the next generation of AI infrastructure that will power innovation across the customer organization. We’re seeking a senior full-stack software engineer to support our AI infrastructure team. In this role, you’ll lead the development and operation of critical AI platform components, with a focus on scalable inference services and the broader AI application ecosystem. This role includes project leadership responsibilities and people care for a small, integrated team within a larger AI platform organization.

Responsibilities:

Design, implement, and optimize infrastructure for AI model inference at scale.
Lead the development and maintenance of production AI services and applications, including retrieval augmented generation (RAG), autonomous agents, and emerging technologies.
Serve as technical lead for AI infrastructure initiatives, coordinating work across integrated teams.
Conduct regular one-on-ones and provide coaching, feedback, and support for assigned team members.
Act as the team point of contact (POC) for contract administration functions.
Navigate ambiguity and define solutions for complex, underspecified systems and requirements.
Establish new technical policies, standards, and governance frameworks where gaps exist.
Drive adoption of new technologies and practices across engineering teams.
Implement and oversee monitoring, logging, and observability solutions for AI services.
Ensure high availability, reliability, performance, and security of AI platform components.
Communicate effectively with stakeholders at multiple organizational levels.

Skills Requirements:

Extensive experience designing, building, and operating large-scale production systems.
Deep expertise in systems integration across diverse technologies and platforms.
Hands-on experience with cloud engineering in AWS.
Advanced proficiency with Kubernetes administration and deployment patterns
Strong Python programming skills.
Experience implementing and scaling observability solutions (APM, OpenTelemetry, Grafana, Prometheus).
Proven ability to lead technical initiatives and influence organizational change.
Experience developing technical policies and governance frameworks.
Excellent communication, stakeholder management, and leadership skills.
Ability to balance hands-on engineering with leadership and coordination responsibilities.

Nice to Haves:

Experience with AI inference serving technologies (vLLM, LiteLLM, etc.).
Previous experience with agentic frameworks (LangChain).
Knowledge of vector databases and embedding systems.
Experience with high-performance computing or distributed systems.
Track record of successfully driving technical and cultural change.

YOE Requirement: 12 yrs., B.S. in a technical discipline or 4 additional yrs. in place of B.S.