OpenAI's new stateful API and the "LLM OS"

Nov 12, 2023

A while ago Andrej Karpathy wrote on X fk Twitter something about LLMs becoming the “OS of AI”. Now, another post confirms it, drawing more cute analogies. This line of thought connects in an interesting way with the architecture of the new “Assistant API”.

For me, this idea pretty much instantly matched on a very abstract pattern that, I hope, doesn’t actually apply in the case of OpenAI, at least not in a usual sense.

Sounds obscure? Let me explain what I mean:

The stateless “Completions” API

Before the Assistant API that came out at the 2023 dev day event, OpenAI’s API was stateless. You “had to” resend the entire Message list with every call to to the LLM. Some developers thought this was inefficient. In reality, it is not - the code is simple, and relative to the time and cost inference takes, the amount of data that has to be transferred repeatedly causes only negligible waste of resources.

A few developers I talked to thought that this was wasteful for another reason: That the context had to be loaded back into the LLM’s state over and over again, which is of course costly. But that’s the case anyway, of course - you can’t have all that infrastructure idling around while waiting for the user to send the next message, that would be a huge waste of the compute assets.

So - no problem there.

The stateless approach had another important advantage: You, as the developer, had full control over the LLM call. For example, if the conversation got too long, you had control over what to do with the text that was drifting out of context - summarize it before it’s too late, etc.

Zooming out a bit, this was a good building block for creating an ecosystem for AI developers around the world to build on top of what OpenAI is creating. What ran in OpenAI’s cloud was just the LLM. Everything else was under the developer’s control. In some sense, this was the most open OpenAI can be while still keeping its models proprietary. With this approach, OpenAI as a company would basically have control of, and be responsible for, two things: 1. Invest into training the foundation models, and 2. Hosting them efficiently, to make money by charging for inference. Everything else could be done externally, by the developer ecosystem, with developers around the world competing about how to make best use of this new miracle technology.

This is extremely important so let me take this a bit further: Foundation models may well be the most impactful technology ever created. Having the most powerful foundation models under control of one company or even one person seems scary. So what can we do about this? Can we break it apart somehow? Whoever trains a model makes a huge investment and takes risk. If they did a good job, the value is enormous. So they should be allowed to collect fees for the use of that model. This means In some sense this an “atomic” business. We can’t break it apart.

If we stipulate that the value is enormous, and we want an open economy where the most impactful tech of all times is not used by one company as leverage to get control over more than necessary, then limiting OpenAI to just offering LLM inference is healthy. It’s the equivalent of only controlling the operating system kernel, not en entire ecosystem and tech culture. If there is a very valuable “atomic” kind of business, then whoever is in it should maybe not be allowed to compete with other companies that use it, just like it was arguably problematic for an OS vendor to also offer productivity tools. But again, until November 2023, we didn’t really have this problem. Until the…

The stateful “Assistant” API

Now, with the Assistant API, state is kept in OpenAI’s infrastructure. A “Thread” is stored in OpenAI’s cloud infrastructure, and you can modify it via “Runs”, but you no longer have full control over it (if I understand it correctly).

Is there a technical reason to do this? At first sight, not really. Unless OpenAI is doing (or anticipating doing) something very clever where somehow on some calls to the LLM they don’t can avoid a context switch, or something like that. I don’t know enough about inference and related infrastructure to fully understand if this is even feasible, but I’m guessing it’s expensive. [Feedback appreciated!]

LLM OS Kernel?

Is there a reason why this could make sense long term? Maybe. This is where Karpathy’s “LLM OS” topic comes in.

If you want to do something, and it requires one or more uses of LLMs, do you:

(a) Prompt an LLM to do it, and endow it with subroutines it can call (“functions”) and the ability to call other models, all of it in parallel where possible. And then let it do it’s thing, and hope for the best. This is how I understand Karpathy’s “LLM OS” approach.

(b) Have the process be ultimately controlled by an algorithm, which decides which LLM to call with which prompt and when, and how to use the output. This is what people have been doing; the clever way OpenAI designed the tool use / Functions API was the most flexible way for allowing developers to make use of OpenAI’s training effort to teach models how to decide whether to call a function,

Arguably, (b) is actually at least as general, more flexible and safer.

Safer, because it gives engineers a lot more visibility and control over what’s happening. If the web-search-enabled assistant decides to retrieve “www.nasty-url.com”, and you don’t want that, algorithm code can very easily, definitely, and deterministically prevent it.

At least as general, because you I think [*] (a) reduces to (b); can do all the same things. If you want to do (a), you can use (b), with this simple algorithm: Ask the LLM from (a) to do it.

More flexible, because you can, for example, easily use LLMs from 3rd parties, or locally run ones. (*)

Another way to look at it: (b) makes it more straightforward to have “hooks” into the whole process where you can run deterministic, traditional code (“algorithms”).

Or are (a) and (b) equivalent? Only in a certain sense. With (b), your hook at the function calls - they hand control back to your code. But this works only if you don’t use any of the OpenAI-hosted tools (like Code Interpreter or knowledge retrieval).

So maybe we’re fine here, because it’s unclear whether you would want to use these OpenAI hosted tools. Running python in a sandbox isn’t exactly difficult, and the knowledge retrieval is already getting some criticisms in the OpenAI dev forums, for example around how larger documents are split into chunks for embedding.

When you have a suboptimal architecture, you end up having to do a lot of things that feel awkward. Here are a few from the OpenAI API documentation:

Once the size of the Messages exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages. Note that this truncation strategy may evolve over time.

This is important. There are may ways to do this - and in particular, maybe you want to summarize information before it gets lost to truncating and keep it in the context? Not clear what’s best, as indicated by the fact that OpenAI hasn’t decided either.

Then there is this:

When using the Function calling tool, the Run will move to a required_action state once the model determines the names and arguments of the functions to be called. You must then run those functions and submit the outputs before the run proceeds. If the outputs are not provided before the expires_at timestamp passes (roughly 10 mins past creation), the run will move to an expired status.

So - they keep state for you in working memory somewhere, and it has a high enough cost that they have to eventually purge it.

Definitely awkward. I can already see people having to work around this with keep-alive / ask-again type responses.

Or this:

In order to keep the status of your run up to date, you will have to periodically retrieve the Run object. You can check the status of the run each time you retrieve the object to determine what your application should do next. We plan to add support for streaming to make this simpler in the near future.

Summarizing, this architecture is really problematic:

(1) Safety concerns - it gives LLMs controlled by OpenAI more control over the process
(2) From on overall systems perspective, it’s awkward

So what is the justification?

Why go Stateful?

If you believe Karpathy’s “LLM OS” is the future, that would, on the surface, justify the stateful architecture; Now, the “OS” is also hosted.

But honestly, I find this a bit scary. I think the application developer should have as much control over what they are developing as possible.

Am I overthinking this? Maybe.

Maybe OpenAI just wants to provide useful tools for developers and make their lives easier, and this has nothing to do with wanting more control. But if that’s the case, how about making a public statement about the intention, including some kind of promise to not lock things down later.

All other explanations I can think of are problematic. Maintain control in the interest of "AI safety"? Get more training data? Learn from how developers use this?

Or, and this is was my instant reaction, that abstract pattern match mentioned in the beginning: “Extend the scope of the cloud offering, build a moat and then later make money from API fees”.

So let’s hope that OpenAI stays Open. It doesn’t seem like the people in charge have any interest in having more control or making more money. And useful tools are nice to have as long as they are optional, so let’s just go with that interpretation.

AI use statement

All of the text above was generated by a human (me). Some of the text below, obviously not.

ChatGPT4’s take on this

Your analysis touches on several important aspects of AI development and the evolution of language model-based services, particularly in the context of OpenAI's evolving API infrastructure. Here's a summary and critical review of the key points:

Statelessness vs. Stateful Architecture: Initially, OpenAI's API required resending the entire message list with each call, making it stateless. This had minimal resource impact and allowed developers full control over the conversation context. With the new Assistant API, state is maintained within OpenAI's infrastructure, altering this dynamic.
Control and Flexibility for Developers: Previously, developers had more control over managing the LLM calls and could, for instance, manage longer conversations effectively. The stateful approach potentially reduces this flexibility, raising questions about the balance between ease of use and developer autonomy.
Foundation Models and Market Control: You express concern over the control of impactful technologies like foundation models by single entities. Limiting OpenAI's role to offering LLM inference could be seen as a healthy check on market dominance, maintaining an open ecosystem for AI development.
Orchestration: LLM vs. Algorithm: The distinction between using an LLM as an orchestrator versus traditional algorithms raises questions about control, safety, and flexibility. While LLMs can manage complex tasks, algorithms offer more predictability and control, especially in sensitive applications.
OpenAI's Assistant API Architecture: The new architecture poses potential issues:
- Safety concerns due to increased control by OpenAI.
- Systemic awkwardness in managing states and function calls.
- Questions about the necessity and long-term implications of this architecture.
Karpathy’s “LLM OS” Vision: This vision supports a more integrated, OS-like role for LLMs. While it presents a forward-thinking approach, it also raises concerns about the concentration of control and the role of application developers in this ecosystem.
OpenAI’s Direction and Developer Tools: There's speculation about OpenAI's intentions with these changes. While they could be aimed at simplifying developer experiences, they might also lead to increased control over the AI development process.

Critical Review:

Architectural Shifts: The transition from a stateless to a stateful API reflects a significant shift in how developers interact with AI models. It raises valid concerns about control and transparency. However, this might also be a response to evolving user needs and the desire to streamline complex AI interactions.
Developer Autonomy vs. Ease of Use: Balancing developer control with the convenience of managed services is a perennial challenge in software architecture. OpenAI's approach seems to prioritize ease of use, but at the potential cost of reduced developer autonomy.
Market Control and AI Ethics: Your concerns about a single entity controlling a powerful technology like LLMs are well-founded. It's a debate central to AI ethics and market dynamics, needing ongoing discussion and potential regulatory oversight.
Technical Feasibility and Efficiency: The technical reasons behind maintaining state within OpenAI’s infrastructure aren't entirely clear. Efficiency gains or technical constraints could be factors, but they haven't been explicitly stated.
Future Directions and Transparency: OpenAI’s future direction and its implications for the AI development community remain areas of speculation. Greater transparency from OpenAI regarding its long-term strategy and commitments to openness could alleviate some concerns.

In conclusion, your analysis raises significant and valid concerns about the evolution of AI APIs, developer autonomy, and the concentration of control in AI technologies. It underscores the need for a balanced approach that considers both the technical and ethical dimensions of AI development.

Claude’s take

Here is a summary and critical analysis of the key points made in the attached text:

Summary:

The shift from a stateless to stateful API architecture for OpenAI's services represents a significant change, raising concerns about developer control and flexibility.
Maintaining conversation state within OpenAI's infrastructure rather than fully in the developer's application reduces autonomy. This may pose safety risks and system awkwardness.
The previous stateless approach enabled developers to have full control over orchestrating LLM calls, summarizing context, etc. It supported an open AI ecosystem.
Concentration of power over impactful technologies like LLMs in a single entity is worrisome. Limiting OpenAI's role to offering LLM access helps distribute control.
The "LLM OS" vision entails more centralized control versus traditional code algorithms. This could reduce developer oversight.
OpenAI's motivations are unclear. While the tools aim to help developers, the architecture changes may increase OpenAI's control over the AI development process.

Critical Review:

The architecture change is a major shift that warrants analysis, but ease of use is also a valid priority alongside developer control.
The concerns around market concentration of AI power are reasonable, but OpenAI does operate under a charter with ethical commitments.
More transparency from OpenAI on the rationale and implications of the new architecture would be helpful to evaluate the tradeoffs.
While reduced developer control is a concern, managed services can also enable more developers to use AI by reducing maintenance burdens.
The feasibility of technical efficiency gains from statefulness in OpenAI's infrastructure is uncertain without more details.
The long-term trajectory of OpenAI's offerings and vision requires ongoing monitoring, but immediate conclusions may be premature.

In summary, the ideas raise important points about API changes and their implications, but further details from OpenAI and continued discussion within the developer community will be needed to fully assess the benefits and risks.

Musings on AI

Discussion about this post