Stop Guessing: How Structured Outputs and Repair Loops Fix LLM Automation
If you’ve built a production automation pipeline that relies on Large Language Models, you know the specific frustration of the “it works in the notebook” phase. You prompt the model, get a beautiful JSON object, and your downstream code processes it perfectly. Then you deploy it. Suddenly, the keys are inconsistent, the syntax breaks on edge cases, or the model decides to add a helpful comment inside the JSON block. Your pipeline crashes.
The industry has spent years trying to force LLMs to behave like deterministic databases. The result is a messy landscape of brittle regex parsers and fragile prompt engineering. The shift toward structured outputs isn’t just a convenience; it’s a fundamental architectural requirement for LLM automation that actually works.
Treat LLM outputs as data contracts, not creative writing. That means moving away from hoping the model gets it right and toward building systems that enforce, validate, and repair those outputs automatically.
The Problem with LLM Automation
The core issue with current automation workflows is the assumption that LLMs can reliably generate consistent formats without constraints. They can’t. LLMs are probabilistic engines designed to predict the next token, not to adhere to strict structural schemas. When you ask an LLM for JSON “just because,” you’re relying on the model’s implicit understanding of syntax, which varies wildly between versions and contexts.
In production, this manifests as inconsistent keys (e.g., user_id vs userId), malformed syntax (missing commas, trailing brackets), or hallucinated fields that don’t exist in your database schema. The cost of handling these failures is high. You end up writing extensive manual parsing logic, retry loops, and error handling that adds latency and complexity to your stack.
The real-world impact is broken integrations and unreliable data extraction. If your automation pipeline feeds into a CRM, a financial ledger, or a UI component, a single malformed JSON object can break the entire chain. AI reliability cannot be achieved through prompt engineering alone. It requires structural enforcement.
Structured Outputs vs. Function Calling
A significant source of confusion in the ecosystem is the distinction between function calling and structured outputs. While both involve schemas, they serve fundamentally different purposes.
Function calling is designed for connecting models to tools and external data. It allows the LLM to trigger actions, query databases, or retrieve information. The schema here defines the inputs to a tool. In contrast, structured outputs are designed for structuring the model’s response to the user. As noted in the official OpenAI documentation, structured outputs use a specific response_format to ensure the model’s final answer adheres to a defined schema, which is critical for consistent UI generation and data processing 1.
When building automation, choose the right tool for the job. Use function calling when you need the model to do something (e.g., “update the record”). Use structured outputs when you need the model to provide something in a specific format (e.g., “return the updated record as a JSON object”).
The role of response_format is to enforce schema adherence at the API level. This shifts the burden of validation from your application code to the model provider, reducing the likelihood of malformed data entering your pipeline. However, this isn’t a silver bullet. It requires careful schema design and, crucially, a mechanism to handle the inevitable failures.
Building the Schema: From Manual to Automated
Defining strict JSON schemas is the foundation of reliable automation. Historically, this has been a manual, error-prone process. Developers would write JSON schemas by hand, often leading to discrepancies between the schema and the actual data structures used in the codebase.
The modern approach uses tools like Pydantic or Zod to define schemas programmatically. Pydantic, for instance, allows you to define data structures using Python type hints, which are then used to generate the corresponding JSON schema. This ensures that your schema is always in sync with your code.
But we can go further. Instead of manually writing schemas, automate their generation from existing code. By using Python’s inspect module, you can dynamically extract type information from your classes and generate schemas automatically. This reduces maintenance overhead and ensures that any changes to the data model are immediately reflected in the LLM’s constraints.
However, generating the schema is only half the battle. You must also validate the schema itself. JSON Schema validation libraries must support the 2020-12 draft to correctly handle complex features like recursive references and dynamic resolution. Without this support, you risk validating against an outdated or incomplete specification, leading to false positives or negatives in your validation logic 5.
The Repair Loop: Self-Healing Automation
Even with perfect schemas and strict enforcement, LLMs will occasionally fail. The key to robust automation isn’t preventing all errors but handling them efficiently. This is where iterative repair loops come in.
The concept is simple: use validation errors as feedback for the LLM. Instead of failing silently or throwing an exception, the system captures the validation error, formats it into a clear message, and sends it back to the LLM with a request to correct the output. This creates a self-healing workflow that can recover from minor mistakes without human intervention.
This approach aligns with the three-phase workflow demonstrated in OpenAI’s cookbook for building iterative repair loops: Review, Repair, and Validate 4.
- Review: Inspect the artifact (the LLM’s output) against the schema.
- Repair: If validation fails, apply edits or prompt the LLM to correct the issues.
- Validate: Run the checks again. If it passes, proceed. If it fails, loop back to Repair.
By cycling through these phases, you separate judgment from proof. The validation library provides the proof (pass/fail), and the LLM provides the judgment (how to fix it). This loop can be implemented using Codex or custom agents, refining outputs until they pass validation. This is particularly effective for handling dynamic data structures, where rigid objects might fail but key-value pair lists can adapt to the LLM’s output 3.
Practical Implementation for Builders
For those ready to implement this in their own projects, the path is clear but requires attention to detail.
First, set up JSON mode in your OpenAI API calls. This is the baseline for ensuring the model outputs valid JSON. However, JSON mode alone isn’t enough. You must combine it with Pydantic validation to enforce the specific structure of your data.
Integrating validation libraries into your automation chain is straightforward. Use Pydantic to define your models, generate the JSON schema, and pass it to the API. On the receiving end, validate the output against the schema. If validation fails, extract the error message and feed it back to the LLM.
Handling dynamic data structures is another critical consideration. In many cases, rigid objects are too restrictive. Instead, use key-value pair lists or arrays of objects. This allows the LLM to output a flexible structure that can be validated against a more permissive schema. This approach is particularly useful for scenarios where the number of fields is not known in advance.
Finally, ensure that your validation library supports the latest JSON Schema draft. This ensures that you can handle complex schemas without falling back to workarounds that compromise reliability.
Conclusion: Reliability Over Creativity
The future of agentic workflows isn’t about more creative prompts or larger models. It’s about structured, validated, and repairable outputs. Automation requires deterministic results, not creative ones. When we treat LLM outputs as data contracts, we unlock the potential for reliable, scalable automation.
I wouldn’t ship an automation pipeline without a validation layer. The cost of fixing broken integrations later is far higher than the effort of building a robust schema and repair loop now. The tools are available. The patterns are established. The only question is whether you’re ready to stop guessing and start enforcing.
Sources and further reading
Find more practical writing from the RodyTech archive.
RodyTech publishes practical writing on AI systems, infrastructure, and software that teams can actually ship. Use the archive paths below to keep reading by topic or browse the full library.
- Browse the full archive by publication date and topic
- Hands-on notes from real builds, deployments, and ops work
- Category paths for AI, infrastructure, developer tools, and security
No comments yet