Agentic AI Isn't Ready. That's Why Product and Engineering Have to Be.

If you mean... We'd call that... What it takes to ship it
"It runs one tool when I ask" Level 1 -- Single step Low risk. Good first use case.
"It can choose from a few tools to get the job done" Level 2 -- Tool choice Needs guardrails and clear boundaries.
"It can follow a defined sequence of actions" Level 3 -- Multi-step workflow Needs fallback plans, logging, and testing.
"It figures out what to do and how to do it" Level 4 -- True agent Technically possible, rarely stable. Proceed carefully.
"It acts on its own without being asked" Level 5 -- Fully autonomous Sci-fi. Not production-ready. Fund R&D if you must.

A conversation between product and engineering on building usable AI when the model falls short

Intro: When AI systems are fragile, collaboration isn't optional

If you're building AI tools that operate across multiple steps, call APIs, and act on decisions, you're building agentic workflows—even if you don't use the term.

And if you've tried, you've probably hit the same wall: the model isn't reliable enough yet. It makes small mistakes. It compounds errors. It handles 80 percent of the work, then breaks in unpredictable ways.

This is where a lot of projects stall. Product teams worry about user trust. Engineering teams worry about safety and traceability. Stakeholders wonder if it's ready to ship.

Here's the good news: you can still ship. Here's the catch: you need product and engineering in the room, solving the right problems together.

Below, you'll find two perspectives on this exact moment—one from the product side, one from engineering. These aren't theoretical positions. They're shaped by the systems we're building today.

From the Engineering Side: Why Most Agentic AI Projects Fail

If you're building something with 30 steps and your model is 99% accurate at each one, you're ending up with a system that's right 74% of the time. Best case.

That's why most agentic AI projects break down before launch. Every step introduces new failure points, and most frameworks aren't built to handle the complexity. We tried using tools like Model Context Protocol and AutoGen. They weren't production-ready. So we built our own.

But the hard part isn't always the code. It's defining what you're actually trying to build. Most businesses want level 5 autonomy. What they really need is level 2 or 3—with constraints, human-in-the-loop, and traceability at every step.

This is where I like to work with product before writing code. If we can define where the model needs backup, what counts as "done," and how to handle failure modes safely, we can actually build something that works.

"You don't solve reliability through more modeling. You solve it through guardrails, traceability, and product design." (read more insight from our engineering team here)

From the Product Side: Most Agentic AI Isn't Ready. You Should Still Ship It.

There's always a reason to wait. The model's not there yet. The edge cases aren't handled. The team's worried about trust.

But reliability isn't just an attribute of the model. It's something we can design.

When a system feels unpredictable, it's usually not because of raw accuracy. It's because users don't know what to expect, when to intervene, or what control they have. That's a product problem.

This is where I like to work with engineers like Brian and Alex early in the process. Product defines how the system builds trust: through phased rollout, visible confidence thresholds, smart defaults, and fast feedback loops. We don't hide the AI's limits. We make them clear, navigable, and safe.

"You don't need a perfect model to ship a reliable experience. But you do need product and engineering in the same room, designing for what's real." (read more insight from our product team here)

Closing: What Engineering & Product Agree On

Agentic AI systems are hard to build. They're unpredictable. They're fragile. But if you scope carefully and design thoughtfully, you can ship now—and learn in the real world.

This isn't about lowering the bar. It's about shifting how we define readiness.If the model is good enough to try, the team should be good enough to contain it.

Want help building reliable Agentic AI workflows? This is exactly the kind of work we do at Invene. We specialize in early-stage healthcare AI systems where trust, traceability, and usability all matter. Reach out if you want to explore what's possible—before the model is perfect.

Transform Ideas Into Impact

Discover how we bring healthcare innovations to life.