LLM-powered systems are gradually being deployed into production, but this process poses challenges that are not typically faced in traditional software practices. Models and agents are non-deterministic systems, making it hard to test changes, understand failures, and confidently release updates. This has driven the demand for new evaluation tools specifically designed for LLM properties.
Comet is a platform integrating Roots and MLOps into the rapidly evolving domain of agent-based systems by treating prompts, tools, and workflows as components that can be optimized and enhanced over time.
Gideon Mendels is the co-founder and CEO of Comet. He previously worked at Google on detecting hate speech and deception, and he founded GroupWise, which created and deployed NLP models that processed billions of chats. In this episode, Gideon joins Kevin Ball to discuss how agent development bridges software engineering and ML, why eVals are crucial for many AI teams, prompt optimization as a search problem, and the future of continuously improving production agents.
Full Disclosure: This episode is sponsored by Comet.
Kevin Ball or KBall is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space.
Please click here to see the transcript of this episode.
