Building the Agent Is Simple. The Loop Is the Challenge.
Much of my understanding comes from my experience at Netflix, where I helped engineers integrate AI into production. I often encountered similar queries: How is AI engineering different from ML? Why does my agent fail with real users? What metrics should I track?
After many such discussions, I defined AI engineering as a distinct role—not just a rebranded ML engineer or a developer using an LLM API. It’s a unique discipline with specific skills, mindset, and processes.
Demos are easy. Dependability is the Job.
Creating a demo is simple: a bit of coding, a good prompt, and you’re set. However, when tested with variability, these demos often fail because they predict rather than think.
The essence is in bridging the gap between potential and reliability. An AI engineer’s main task is to transform something promising but unstable into a dependable product. An unreliable system, like an unreliable employee, is not useful.
AI Engineer vs. ML Engineer
A common question, especially from those shifting to AI from ML or other tech roles, is the difference.
ML engineers focus on model training and optimization, living in the model’s realm. Alongside them, research engineers and scientists advance the field through experiments and publications.
AI engineers operate at the application level, turning models and research into usable products. They work with agentic architectures, transforming theory into functional tools for users.
Key Skills
Reviewing AI engineer job listings on LinkedIn revealed four key skills:
1. RAG
2. Evaluations
3. Agents
4. Production deployment
Of these, production deployment is unique to each workplace, requiring tailored learning on the right questions to ask.
Daily tasks form the discipline’s core:
– Context engineering: efficiently sending tokens to models.
– Tool design: ensuring agents can perform necessary tasks without errors.
– Evaluation: measuring and improving agent performance.
– Production reliability: maintaining system performance and handling errors.
These tasks define AI engineering at the application level, distinct from ML engineering.
The Build, Eval, Improve Loop
The main focus is the continuous build-eval-improve loop:
Build → Eval → Improve → Eval → Improve
Building agents is straightforward, often achievable with minimal coding. However, the challenge lies in evaluating weaknesses, addressing them, and repeating the process continuously. The job is never “done,” as maintaining a dependable, non-deterministic system is an ongoing process.
Team Effort
AI engineering is a specific discipline, as evidenced by OpenAI’s hiring across diverse roles like tool selection and safety. Even with dedicated teams, systems like ChatGPT have room for improvement.
As AI-driven companies grow, teams of engineers specializing in various system aspects will become more common. This specialization reflects the discipline’s complexity and the effort required to develop reliable agents.
Picking Metrics
Choosing the right evaluation metrics is the job’s toughest part. It’s about selecting the right data and metrics to provide meaningful insights, evolving them with system dependability, and relying largely on subjective judgment.
The effectiveness of an AI engineer lies in selecting metrics that ensure system improvement, differentiating this role from traditional software or ML engineers.
The Practice Area, Not the Buzzword
AI engineering is a legitimate discipline rather than a mere buzzword. It’s defined by distinct workflows, metrics, and mindsets, differing significantly from both ML engineering and traditional app development. As a growing career path, it focuses on continual enhancement of AI systems.
For developers exploring this field, remember:
Building the agent is easy. The loop is the job.
