Full Stack Software Engineer (AI Infrastructure; SWE3) [D.26.0012]

Requires Top Secret/SCI with Full Scope Poly

Description: Join us in building the next generation of AI infrastructure that will power innovation across the customer organization. We’re seeking a senior full-stack software engineer to support our AI infrastructure team. In this role, you’ll lead the development and operation of critical AI platform components, with a focus on scalable inference services and the broader AI application ecosystem. This role includes project leadership responsibilities and people care for a small, integrated team within a larger AI platform organization.

Responsibilities:

  • Design, implement, and optimize infrastructure for AI model inference at scale.
  • Lead the development and maintenance of production AI services and applications, including retrieval augmented generation (RAG), autonomous agents, and emerging technologies.
  • Serve as technical lead for AI infrastructure initiatives, coordinating work across integrated teams.
  • Conduct regular one-on-ones and provide coaching, feedback, and support for assigned team members.
  • Act as the team point of contact (POC) for contract administration functions.
  • Navigate ambiguity and define solutions for complex, underspecified systems and requirements.
  • Establish new technical policies, standards, and governance frameworks where gaps exist.
  • Drive adoption of new technologies and practices across engineering teams.
  • Implement and oversee monitoring, logging, and observability solutions for AI services.
  • Ensure high availability, reliability, performance, and security of AI platform components.
  • Communicate effectively with stakeholders at multiple organizational levels.

Skills Requirements:

  • Extensive experience designing, building, and operating large-scale production systems.
  • Deep expertise in systems integration across diverse technologies and platforms.
  • Hands-on experience with cloud engineering in AWS.
  • Advanced proficiency with Kubernetes administration and deployment patterns
  • Strong Python programming skills.
  • Experience implementing and scaling observability solutions (APM, OpenTelemetry, Grafana, Prometheus).
  • Proven ability to lead technical initiatives and influence organizational change.
  • Experience developing technical policies and governance frameworks.
  • Excellent communication, stakeholder management, and leadership skills.
  • Ability to balance hands-on engineering with leadership and coordination responsibilities.

Nice to Haves:

  • Experience with AI inference serving technologies (vLLM, LiteLLM, etc.).
  • Previous experience with agentic frameworks (LangChain).
  • Knowledge of vector databases and embedding systems.
  • Experience with high-performance computing or distributed systems.
  • Track record of successfully driving technical and cultural change.

YOE Requirement: 12 yrs., B.S. in a technical discipline or 4 additional yrs. in place of B.S.

To apply for this job email your details to jobs@dovernetworks.com

Scroll to Top