Rethinking LLM Integrations: What Happened When I Sent an Entire OpenAI Conversation as Plain Text to Another Agent
Modern AI applications are becoming deeply tied to specific providers.
A project starts with a simple OpenAI integration. Then come:
- custom tools,
- memory systems,
- function calling,
- role-based prompts,
- orchestration layers,
- agent-to-agent communication,
- structured outputs,
- provider-specific behaviors.
After a few months, the AI model is no longer “just an API.” It becomes part of the application architecture itself.
That is exactly the situation I faced in one of my legacy systems.
The project already had:
- a large set of internal tools,
- its own agent communication logic,
- memory management,
- knowledge systems,
- role-based prompts,
- workflow-specific instructions deeply embedded into the codebase.
Rewriting everything for another provider would have required an enormous amount of work.
At first, the obvious approach seemed to be:
- recreate OpenAI-compatible APIs,
- emulate function calling,
- rebuild protocol compatibility layers,
- rewrite orchestration logic.
But then another idea appeared:
What if I simply stopped caring about the provider protocol entirely?
Instead of rebuilding the infrastructure, what if I just sent the whole conversation — including roles, tools, system prompts, and technical metadata — as plain text to another AI agent?
That experiment led to some surprisingly interesting results.
The agent used in the experiment was based on the open-source HAIH agent system:
The Core Idea
Originally, the system used OpenAI-style chat messages:
[
{
"role": "system",
"content": "Rules and instructions..."
},
{
"role": "user",
"content": "User request..."
}
]
With tools, memory, and agent coordination layered on top.
Instead of rewriting the entire architecture, I tried something radically simpler:
- take the entire message array,
- serialize it into text,
- send it as a single user message to another agent.
Not as a structured API call.
Not through native function calling.
Just raw text.
Something conceptually like this:
OPENAI_PAYLOAD:
[
{ role: "system", content: "..." },
{ role: "assistant", content: "..." },
{ role: "user", content: "..." }
]
TOOLS:
[...]
INSTRUCTION:
Interpret this conversation and continue it correctly.
At first glance, this sounds wrong.
A lot of developers assume:
- roles are deeply enforced,
- system prompts are privileged channels,
- tools require native APIs,
- provider protocols are fundamentally special.
But the experiment showed something important:
Large language models are surprisingly good at reconstructing protocol semantics from plain text alone.
The Unexpected Result
The setup was tested on:
- Qwen 3.5 4B (local model),
- Gemini Flash Lite.
Both models successfully:
- understood role hierarchies,
- followed embedded instructions,
- respected formatting constraints,
- handled large serialized contexts,
- maintained tool-related semantics,
- returned clean outputs without extra commentary.
Even when the entire conversation history was embedded into a single user message.
One test included nearly 70,000 prompt tokens containing:
- system instructions,
- memory records,
- internal rules,
- tool descriptions,
- previous assistant responses,
- knowledge base entries,
- orchestration logic.
The final task itself was trivial: format business information into markdown.
And yet both models consistently produced the correct output.
Without native function calling.
Without provider-specific APIs.
Without special role handling.
What This Experiment Actually Demonstrates
This does not mean that native APIs are useless.
Native APIs still provide:
- validation,
- constrained decoding,
- better reliability,
- streaming support,
- structured outputs,
- runtime safety layers.
But it suggests something deeper:
Many protocol semantics are not hard architectural requirements.
They are learned behavioral patterns.
That distinction matters.
A lot of AI infrastructure today is built on the assumption that:
- system roles are fundamentally privileged,
- chat protocols are strict execution environments,
- function calling is a mandatory architectural layer.
But in practice, many of these mechanisms behave more like conventions than hard boundaries.
The model does not “understand” roles as operating system permissions.
It sees tokens.
And modern models are extremely good at reconstructing intent from structured text.
The Security Illusion
This also exposes a dangerous misconception in the AI ecosystem.
Many developers treat system prompts as if they were immutable security policies.
They are not.
A system prompt is not a sandbox.
It is not an access control system.
It is not a guarantee.
It is a strong probabilistic instruction.
That distinction becomes obvious once you start serializing protocols manually.
If a model can correctly interpret:
role: system
inside a plain text block embedded in a user message, then the boundary between “special protocol” and “ordinary text” becomes much thinner than many people assume.
This is one reason prompt injection remains such a difficult problem.
The model is fundamentally operating in a language environment, not a formally isolated execution environment.
Why This Matters for Legacy Systems
The practical implication is significant.
Many teams today are heavily coupled to:
- OpenAI APIs,
- Anthropic-specific behavior,
- provider-native tools,
- SDK-specific orchestration.
Migrating away often feels impossible.
But this experiment suggests another approach:
Instead of rewriting the orchestration layer, virtualize the protocol itself.
In some cases, you can:
- preserve your existing architecture,
- serialize the operational context,
- route it through another model,
- and continue operating with surprisingly few changes.
This is especially valuable for:
- legacy systems,
- experimental agent platforms,
- self-hosted models,
- multi-provider infrastructures,
- rapid migration scenarios.
Important Limitations
This approach is not magic.
It has tradeoffs:
- lower reliability,
- weaker guarantees,
- larger prompts,
- less deterministic behavior,
- increased prompt complexity.
And some providers handle role semantics more strictly than others.
For example, Anthropic models appear to rely more heavily on structured protocol handling and runtime orchestration layers than smaller open-weight models.
So the results are not universally identical across all LLM ecosystems.
Still, the broader observation remains important:
Modern LLMs can reconstruct surprisingly complex interaction protocols from plain text alone.
And that changes how we should think about AI architecture.
Final Thought
One of the biggest lessons from this experiment is that many AI systems are probably over-coupled to provider-specific abstractions.
The model itself is often more flexible than the surrounding ecosystem assumes.
In practice, this means:
- protocols can sometimes be virtualized,
- orchestration can sometimes be decoupled,
- migration may be easier than expected,
- and many “hard requirements” are actually conventions reinforced by tooling layers.
As AI systems become more complex, understanding the difference between:
- model behavior,
- runtime behavior,
- and protocol conventions
may become one of the most important engineering skills in the field.