Requires Top Secret/SCI with Full Scope Poly
Description: Join us in building the next generation of AI infrastructure that will power innovation across the customer organization. We’re seeking a senior full-stack software engineer to support our AI infrastructure team. In this role, you’ll lead the development and operation of critical AI platform components, with a focus on scalable inference services and the broader AI application ecosystem. This role includes project leadership responsibilities and people care for a small, integrated team within a larger AI platform organization.
Responsibilities:
- Design, implement, and optimize infrastructure for AI model inference at scale.
- Lead the development and maintenance of production AI services and applications, including retrieval augmented generation (RAG), autonomous agents, and emerging technologies.
- Serve as technical lead for AI infrastructure initiatives, coordinating work across integrated teams.
- Conduct regular one-on-ones and provide coaching, feedback, and support for assigned team members.
- Act as the team point of contact (POC) for contract administration functions.
- Navigate ambiguity and define solutions for complex, underspecified systems and requirements.
- Establish new technical policies, standards, and governance frameworks where gaps exist.
- Drive adoption of new technologies and practices across engineering teams.
- Implement and oversee monitoring, logging, and observability solutions for AI services.
- Ensure high availability, reliability, performance, and security of AI platform components.
- Communicate effectively with stakeholders at multiple organizational levels.
Skills Requirements:
- Extensive experience designing, building, and operating large-scale production systems.
- Deep expertise in systems integration across diverse technologies and platforms.
- Hands-on experience with cloud engineering in AWS.
- Advanced proficiency with Kubernetes administration and deployment patterns
- Strong Python programming skills.
- Experience implementing and scaling observability solutions (APM, OpenTelemetry, Grafana, Prometheus).
- Proven ability to lead technical initiatives and influence organizational change.
- Experience developing technical policies and governance frameworks.
- Excellent communication, stakeholder management, and leadership skills.
- Ability to balance hands-on engineering with leadership and coordination responsibilities.
Nice to Haves:
- Experience with AI inference serving technologies (vLLM, LiteLLM, etc.).
- Previous experience with agentic frameworks (LangChain).
- Knowledge of vector databases and embedding systems.
- Experience with high-performance computing or distributed systems.
- Track record of successfully driving technical and cultural change.
YOE Requirement: 12 yrs., B.S. in a technical discipline or 4 additional yrs. in place of B.S.
To apply for this job email your details to jobs@dovernetworks.com
