AI Updates

Boost AI Agent Scalability by Separating Logic & Search

By
Build Console
February 6, 2026

Researchers from Asari AI, MIT Computer Science and Artificial Intelligence Laboratory, and Caltech have announced a new programming model called Probabilistic Angelic Nondeterminism (PAN) and a corresponding Python implementation named ENCOMPASS. The approach decouples the core workflow logic of an AI agent from the inference strategies that manage uncertainty, aiming to improve scalability and reduce technical debt in production‑grade agents.

Background

Generative large language models (LLMs) are inherently stochastic. A prompt that succeeds once may fail on the next run, prompting developers to embed complex error‑handling loops, retries, and branching logic around business rules. This entanglement of business logic and uncertainty handling creates maintenance challenges and limits experimentation with different inference strategies.

Traditional agent designs combine the sequence of steps required to complete a task with the methods used to navigate uncertainty, such as best‑of‑N sampling or tree search. Switching from one strategy to another often requires a complete rewrite of the agent’s control flow, discouraging teams from adopting more reliable approaches.

Technical Approach

ENCOMPASS introduces a primitive called branchpoint() that marks locations in code where an LLM call may produce divergent outcomes. Developers write the “happy path” of the workflow as if the LLM call will succeed. At runtime, the framework interprets these branch points to build a search tree of possible execution paths.

By treating inference strategies as a search over execution paths, the framework allows developers to apply algorithms such as depth‑first search, beam search, or Monte Carlo tree search without modifying the underlying business logic. This separation creates what the authors term “program‑in‑control” agents, where the code defines the overall workflow and the LLM performs only specific subtasks.

Case Study: Legacy Code Migration

The research team applied ENCOMPASS to a Java‑to‑Python translation agent. The workflow involved translating repository files, generating inputs, and validating outputs through execution. In a conventional Python implementation, adding search logic required defining a state machine, obscuring business logic and complicating code readability. Implementing beam search demanded explicit state management across a dictionary of variables.

Using ENCOMPASS, the team inserted branchpoint() statements before LLM calls, keeping the core logic linear and readable. Beam search applied at both the file and method levels outperformed simpler sampling strategies. The study found that performance improved linearly with the logarithm of inference cost, and the most effective fine‑grained beam search strategy would have been the most complex to implement with traditional coding methods.

Cost and Performance Scaling

Managing inference cost is a key concern for data officers overseeing AI project budgets. The researchers compared scaling the number of refinement loops in a “Reflexion” agent pattern—where an LLM critiques its own output—to using a best‑first search algorithm. The search‑based approach achieved comparable performance to the standard refinement method while reducing cost per task.

These results suggest that externalizing inference strategy allows teams to balance compute budget and accuracy without rewriting application code. A low‑stakes internal tool could employ a cheap, greedy search strategy, whereas a customer‑facing application could use a more exhaustive search, all within the same codebase.

Implications for Enterprise AI

Decoupling inference strategy from workflow logic aligns with established software engineering principles of modularity. Hard‑coding probabilistic logic into business applications creates technical debt, hampers testing, auditing, and upgrades. Separating concerns enables independent optimization of both logic and inference strategy.

Governance benefits also emerge. If a particular search strategy produces hallucinations or errors, it can be adjusted globally without reviewing each agent’s code. This simplifies versioning of AI behaviors, a requirement in regulated industries where the “how” of a decision is as important as the outcome.

Future Directions

As inference‑time compute scales, managing execution paths will become increasingly complex. Enterprise architectures that isolate this complexity are likely to prove more durable than those that allow it to permeate the application layer. The research team plans to further evaluate ENCOMPASS in additional domains, including summarization and creative generation, where defining reliable scoring functions remains a challenge.

Overall, the PAN model and ENCOMPASS framework offer a structured way to separate logic from search in AI agents, potentially improving reliability, reducing maintenance overhead, and enabling more flexible cost‑performance trade‑offs in production environments.