OpenAI President Greg Brockman on GPT-5.5 “Spud,” AI Model Moats, and Cybersecurity Risks

Overview

This emergency episode focuses on OpenAI's new model, GPT 5.5, which Greg Brockman confirms is the long-rumored "Spud." He frames it as less of a benchmark upgrade and more of a shift toward AI that can carry out real work across coding, spreadsheets, slides, browser tasks, and other everyday computer jobs with less hand-holding.

The bigger theme is OpenAI's view of where AI is heading: from chat-based tools to agents that act more like assistants or teams, with users setting goals and checking outputs rather than managing every step.

Key Takeaways

Brockman's main point is that GPT 5.5 crosses a practical threshold. He says the jump is not just better coding, which people already expect, but broader usefulness in general computer work. The model is meant to take high-level instructions and handle more of the low-level execution on its own.

He pushes back on the idea that one training trick explains the result. Instead, he describes a full-stack effort across pre-training, reinforcement learning, data, systems, and product design. His analogy is a car: a strong engine is not enough if the rest of the machine is weak. The claim is that OpenAI's edge comes from coordinating the whole system, not from one isolated breakthrough.

There is also a change in how OpenAI says it measures progress. Brockman says the company used to focus more on benchmark gains and raw model capability. Over the last year to year and a half, he says the focus moved toward real applications in finance, sales, marketing, and other computer-based work. That suggests OpenAI is judging models less by abstract intelligence and more by whether they can finish actual tasks.

On competition and pricing, Brockman argues that higher-capability models can still justify their cost because small gains in intelligence can unlock much bigger changes in what users can do. He also says OpenAI's longer-term business is simple: turn compute into intelligence at a positive margin, then scale as demand rises. His answer to open-source pressure is that the hard part is not only training one model, but building the organization and process that repeatedly produces better ones.

On safety, especially cyber risk, he argues for gradual release rather than either full lockdown or total openness. OpenAI's view is that defenders need access too, and that safeguards, trusted-access programs, and staged deployment should move together.

Practical Steps

For listeners trying to get value from current AI tools, a few ideas stand out:

Give models goals, not just commands. Instead of writing step-by-step prompts, start with the outcome you want and let the model propose the path.
Test AI on end-to-end work, not isolated tasks. Try assigning a full research sweep, spreadsheet build, presentation draft, or debugging session and review the result afterward.
Build oversight before you scale usage. If your team is using agents, set rules for access, logging, review, and approval early.
Start with lower-risk workflows. Use agents for research, internal drafting, reporting, or code experiments before giving them broader permissions.
Watch where the model fails. Brockman admits current agents still miss tasks they should handle and may communicate poorly. Use those failures to decide where human review stays in place.

Notable Quotes

Greg Brockman: "It's a new class of intelligence."
Greg Brockman: "You are the overseer, you are the CEO of almost this autonomous corporation or this fleet of agents."
Greg Brockman: "We are headed to a world of compute scarcity."

Full Transcript

Source: openai 27m runtime

This episode is brought to you by ServiceNow. If you want to see where enterprise AI is actually headed, Knowledge 2026 is the place to be. It's ServiceNow's annual conference, May 5th through 7th in Las Vegas, where thousands of business and tech leaders come together. Expect headline keynotes from ServiceNow chairman and CEO Bill McDermott, real stories from companies running AI at scale, and major partnership announcements turning AI ambition into actual business results. I'll be there in person, sitting down with some of the most influential voices in the space and will be bringing those conversations back to you here on Big Technology. This episode is brought to you by True Diagnostic. I've been trying to get more intentional about my health lately, not just how I feel day-to-day, but what's actually going on under the hood. That's why I checked out True Diagnostic. They offer at-home tests that measure your biological age, not just how old you are, but how your body is aging on a cellular level. Their TrueAge test looks at things like your pace of aging, organ system health, and even risk factors tied to lifestyle, giving you real data to act on. What I like is that it's not guesswork. You can track changes over time and see how things like sleep, diet, or exercise are actually impacting your body. And taking the test at home was so easy. If you're serious about optimizing your health and longevity, this is a really powerful tool. Right now, Big Technology podcast listeners can get 20% off at truediagnostic.com. Use code bigtech at checkout. That's truediagnostic.com and use bigtech for 20% off today. Choose TrueAge, TrueHealth, or the combo kit as a one-time purchase or a subscription. OpenAI president and co-founder Greg Brockman joins us to discuss OpenAI's newest model, Spud, a.k.a. GPT 5.5, and where it leaves OpenAI competitively. That's coming up right after this. Welcome to Big Technology podcast. Today we have an emergency episode with OpenAI president and co-founder Greg Brockman, all about GPT 5.5, the famous Spud model, looking at what it does and what it means for OpenAI. Greg, great to see you. Welcome back to the show. Thank you for having me. Hope it's not too much of an emergency. Well, I am definitely recording in a Vegas hotel room, so more emergency than our last conversation, but we had some time to prepare, so it's great to be on with you. So let's just start with this. Can you confirm GPT 5.5 is Spud? Yes. Okay. What is GPT 5.5? Well, it's an amazing model. I think in many ways it is a step towards a new way of getting work done with a computer. It's a new class of intelligence. It's extremely useful at things like programming, right, and all the different aspects of debugging and solving very hard and gnarly problems, just being very proactive and really being able to solve problems end-to-end with little instruction. But the thing that's, to me, most remarkable is not necessarily the fact that it got better at coding. Like that, I think, is what everyone kind of expects. But the fact that it's now really crossed the threshold of usefulness for general kinds of applications. And so it's much better at creating slides, spreadsheets, much better at computer use, using your browser, being able to kind of click through applications that are otherwise hard to have an AI operate. And so I think that we're really seeing the emergence of this new way of using a computer, and it starts with this kind of intelligence at the core. When we spoke last, you mentioned that this was effectively the culmination of a two-year research process. So was this planned two years ago? Is that how far back OpenAI plans? I would say that, yes, we do have very long horizons for how we plan. Now, one note is that we stack together many research ideas and bets on a variety of timescales. And so the way to think about it is that we are making constant progress across every single part of the stack. And so what GPT 5.5 represents is not an endpoint. In many ways, it's a beginning point. It's really a step towards the kinds of models that we see coming over even just upcoming months. And I think that you should expect that we are going to have even larger improvements in the capability across a wide variety of these aspects of what the model can do. And that's something that I think will be very exciting, and we're just always thinking about how can we make what we're producing more useful for real-world use for real users and real applications. Can you share specifically what those aspects are that we should be looking out for over the next few months? If this is the beginning, what is it the beginning of? Well, I think that the big vision we have, and you can see it reflected in many things, not just the models, but the kind of, you know, you think about the models as the brain. You can think about the systems and the harnesses like codex and the applications like the super app as almost the body around it to make it into a useful AI. And that's really what's happening is a shift from language models being the thing that is produced by labs like ourselves to an AI that's actually useful. It's actually an assistant that's out there trying to solve your goal, that's really operating according to your instruction. And you can see right now codex is becoming this app that's not just for the coders. It's really for anyone using a computer. And it's not perfect, right? That there are still some tasks where that it should be able to do it and it doesn't quite get it right. Sometimes the personality isn't quite what you wanted, right? That it doesn't quite, you know, it's like extremely powerful and out there doing a lot of really amazing things, but the way it communicates back to you that you have to still spend some time really trying to read through, okay, exactly how did it solve this problem? And so these aspects, we know exactly how to make them much better. And I think we've already had a pretty remarkable improvement from 5.4 to 5.5. I think we're going to have even more remarkable improvements across every single aspect of what makes these models useful. And one thing to know internally is that we think a lot about the end application. Like that is one thing that changed for us over the past 12, you know, 18 months, something like that, is that we used to really just be focused on, let's be, let's improve on the benchmarks. Let's make these models more cerebrally capable. But we now are really focused on let's bring them to real world application. Let's think about finance, sales, marketing, every single function that someone uses a computer. How can we help with their computer work? have not just the theoretical capability to help, but is actually experienced those kinds of tasks. It's actually been able to see what good looks like. And I think that the place we're going is one where you as a person doing work, that you are the overseer, you are the CEO of almost this autonomous corporation or, you know, this fleet of agents perhaps is the way to say it, and that they are operating according to your goals. Now you are still accountable, right? You're still in the driver's seat. You're still the person who thinks about, well, is this what I actually wanted? Was this work up to a standard? But that the details of exactly what buttons were clicked and exactly the kind of code that was written or exactly how the formula and the spreadsheet works, that you can abstract yourself from those if they're not important to the evaluation of whether or not something was what you wanted. And so I think it's like increasing leverage for every worker. Okay. Let me take my best guess as to what's happening and you tell me how close I am. I mean, I'm thinking about this. This is like a, like you mentioned, a culmination of two years of work. There's two different types of, I mean, not to tell you know this, but for our audience, two different types of AI training. There's the pre-training, or it's at least the ones that have been pertinent for these models. The pre-training, where you just make the model generally smart by having it predict the next word in the reinforcement training, where you have it like go out and actually take, you know, try to accomplish different tasks and you reward it when it does a good job with those tasks and effectively it sort of teaches it or it learns how to, how to do those tasks. Is what you're saying basically that like this is the first result that we're seeing where OpenAI has just loaded a ton of reinforcement learning on task specific stuff into this model. And that's what's producing the results you're talking about. Well, I would actually say it a little differently. I would say that there's many steps in the pipeline, right? That there's pre-training and mid-training, reinforcement learning. There's, you know, the data collection. There's like a lot of these different things that all come together to produce the end result and the way in which it's connected to the world. That's also very key to making it useful. And the thing that I'm really saying is we have been investing on every single one of these and have a repeatable, we have like a team, right? That it's not just about individuals working on these pieces, but a team that really comes together and looks across the whole stack to say, how do we make this more useful for real-world applications? And so it's not really any one thing that we do. It's really about the, the overall effort of trying to, like if you think about if you're building a car, right? That there's, it's not just about, do you have like a better engine, right? You can build a great engine, but if the rest of the car is not up to the quality level of the engine, it's not going to matter. And so I think that that is the real innovation. It's really the end to end co-design and all coming together in a repeatable fashion to make these models better and better for our users. You were on a media call earlier today with myself and a number of members of the press. And one of the interesting things that you said, or basically, I think you said this right off the bat, is that the model more intuitively knows what you want and you don't have to spell it out exactly as you as you would in the past. Here's a tweet from Roon. There are early signs of 5.5 being a competent AI research partner. Several researchers let 5.5 run variations of experiments overnight given only a high level, algorithmic idea, waking up to find a completed sweep, dashboards and samples, never having touched the code or terminal at all. Just to, if you can answer briefly on this, a two-parter, how do you do that? And does that mean prompt engineering is dead? Number one, I think it really comes down to when we say there's a new class of capability, a new class of intelligence, that's really what we mean, right? The models are becoming much more intuitive to use because they have deeper understanding of what it is you're asking of them, right? That they really look at the context, try to understand and puzzle out, what am I being asked to do? And it really makes you realize, you know, to the second part, is prompt engineering dead, which I actually think the prompt engineering in some ways may be even more vibrant than before. But you spend so much time right now trying to explain to your computer what you even want. You try to pack in this context and be like, well, here's what's going on. Here's the situation. Here's the thing I want from you. And you're just like, why do I have to explain this to my computer, right? Like the whole thing is the computer should be doing the work to help me. Like, I don't want to have to be sort of, you know, breaking down the task, trying to explain to it step-by-step how to do things. I want to point it in a direction and I want it to be able to take care of the details and to get me the result, again, in a way that I can observe and kind of provide feedback along the way. But I want it to be the driver of the, of the, of those like low level execution. And so I think that in some ways where prompt engineering is going to go is it's going to be about, you can get so much more out of these models with so much less effort, but with the same amount of effort, you still have a multiplier. Think about how much more you could even get. And I think that we're just at the leading edge right now of seeing the ceiling of what is capable, what even today's models are capable of. Okay. Let me briefly speak with you about the economics of building a model like this. There's been this pattern where these big massive models now, you're not saying how much money or compute you used, you've used to train this. But I think we can be safe in assuming it was a lot. And there's been this pattern where these massive models come out, they get distilled by open source model makers. And then open source is just a couple months behind the leading foundational models. And, you know, I guess like when the investment was smaller being a couple months ahead, you know, mattered a lot, but I'm curious now that the investment is so big and the models are capabilities are increasing, you know, fairly dramatically, you know, as you go, how is this defensible in the longterm if you're just going to have that pattern repeat over and over? Well, I look at it a little differently. Like I think that the real investment that we are making isn't to that end to end co-design, right? Of having a system, a system of people, right? Who are producing this technology, right? A way of working together. And some of this is about how you leverage these massive supercomputers to produce these models. Now it is also the case that it's not as simple as you can take the output of these models and distill and you have exactly the model of the same capability. It's just smaller and can run fast. If that were the case, we would just do that. And then we would also have a model that would be, you know, much more easy to serve in many ways. And of course there's a lot of Great things there. But the point that I'm getting at is that the real thing that we are investing in is the machine that makes the machine. Now, at the deployment side, we think a lot about safeguards. We think a lot about mitigations, and we do that for many, many different aspects of how these models could be misused in real situations. And that's something that we have been investing in for many years. And we think about that across areas like cyber, or thinking about that in areas like bio, that we have a long-standing effort that you can see in our preparedness framework, which is public, about how we approach these kinds of uses of the model and how we try to make maximize the benefits, mitigate the risks. And so I think it's a real motion that every piece of what we do needs to connect to the question of how do we continue to make progress, but also how do we make these models broadly available? Because that's something that we really believe in, that we believe this technology empowers people and that we want it to benefit people and lift everyone up. Yeah, but just to go back on that, the pricing on this model is, I think, double the last model, GPT 5.4. And so from an economics or business standpoint, the question would be, you know, let's say you keep on progressing, but because there's been all this infrastructure that's been put towards training the models, if open source can deliver not as good performance, but almost as good and do it cheaper, how do you handle that threat? Well, again, I look at it a little differently. So first of all, if you look at our history, which really is not driven by anything in competition, it's just like our own sort of progress and desire, we have dropped prices on the same level of intelligence year over year, sometimes by literally a factor of a hundred, right? It's like at least in order of magnitude year over year, sometimes literally a hundred. But the thing that keeps happening, it's real Jevons paradox where it's like you lower the cost of something, way more activity happens, right? And I think that what we keep seeing is that there are returns to intelligence, right? That for the kinds of tasks that these models are now capable of doing, that a little bit more intelligence goes a long way. And I think that is the story of 5.5, that in some ways you can almost look at it as like, oh, there's just an incremental improvement in intelligence, but I think there's going to be a massive improvement in terms of what people use it for. And by the way, I actually think that incremental is actually very much an understatement for this model relative to 5.5. You know, it's a 0.1 improvement in some ways, but I think that that actually really undersells the magic that we see within this model and that our early testers have really seen in their practical work. So if people see these numbers and they say, there's IPO pressure on OpenAI, and therefore the, you know, we've been getting a great deal on intelligence and the free ride is over. You would argue against that. Yeah, look, the way I think about this is that we have a very simple business in some ways, right? We rent, build, buy compute, and we resell it with some positive margin. And as long as it's positive operating margin and as long as there's scalable demand for intelligence, which I think is true as long as there's problems to solve, like no one's going to run out of problems to solve. And we've seen this at every step that the demand outstrips our supply, then we can scale that compute all day. And I think that in my mind, that's the main directive that I ask of the team. It's just like, just think about, we need to add value on top of the raw compute and make sure that we are at positive operating margin on it. And that that is something where it's actually not even about the different competition in the marketplace. It's just a question of, can you have compute that gets turned into intelligence? And that's just how, you know, that it does that at a, you know, slightly improved, you know, value coming out relative to the cost going in. And I think that that is something where, again, we're always trying to make more efficient models, but then we just want more of them. And then we want the more intelligent models. And regardless of where they're coming from, it's kind of all the same compute that's going in. And so I think that it's actually a great, like competition in this marketplace has been great for innovation, but I think that it's actually something where it's driving more usage and more overall spend in the ecosystem. And you can see that in the revenue numbers of us and, you know, others in this industry. Okay, I want to take a quick break and come back and talk with you about cybersecurity, trust, and whatever else we can get to in our time on this emergency show. We'll be back right after this. And we're back here on Big Technology Podcast with OpenAI president and co-founder Greg Brockman. Greg, let me ask you about the cybersecurity implications here. Two very different approaches between OpenAI and Anthropic. Anthropic's latest massive model Mythos is not released to the public. This one, you know, spud or 5.5 is released to the public. I mean, let me just ask you straight up. Is there a chance that releasing this powerful model into the public without this like step-by-step practice could lead to some major cyber attacks? Well, I actually have a different view on the premise of the question. So the thing to understand is that we have been investing in cyber safeguards and cybersecurity as a part of our preparedness framework for years, right? This is something we have invested in far ahead of having the kinds of capabilities we see coming. And so we have been taking a very deliberate step-by-step approach. You can see even just over the past couple of weeks where we've expanded our trusted access for cyber program. And in general, we believe in ecosystem resilience, right? That we think that you do want to go step-by-step, that these models are going to continuously better. We have line of sight to even more capable ones. And that you want to be able to put these models in the hands of defenders to make sure that you're able to protect critical infrastructure. And we believe in that resilience of as you can bring these models into people's hands, that then they're able to explore in ways that you would not be able to without that kind of access. And so you kind of want this graduated approach and to make sure that you are moving down that pipeline as you can bring in additional safeguards in order to make sure that you can maximize the benefits and mitigate the risks. And so we've really taken a deliberate approach. I think our team has been working incredibly hard to think through the cyber implications of this model. We also believe in iterative deployment. Really bringing the models as they continuously get better, and we believe in democratic access. And we believe that ultimately, the goal of creating this technology is to empower people to ensure that it does benefit all of humanity. And so, we are constantly trying to solve for how do we safely and responsibly bring this technology to bear in the world in a broad way. Right. And I think it's suffice to say that your team hasn't been fans of the way that Anthropic's deployed Mythos. That's a quote from Sam. It's clearly incredible marketing to say, we have built a bomb. We're about to drop it on your head. We will sell you a bomb shelter for $100 million to run across all your stuff, but only if we pick you as a customer. Let me talk through the other case, and then get your response. The other case would be, you can't account for everything. And there are clearly going to be some vulnerabilities that will only be found by people or entities deploying this and looking for them. So maybe it makes sense to start with a trusted group of testers before you deploy it broadly. What do you think? Well, I believe the correct answer here is subtle. And I think it is rooted in the technical specifics of what you have in front of you and many, many factors, right? You need to think about how are the models progressing, right? Not just your own capabilities, but others in the ecosystem. You need to think about what kind of benefit do you get from having a small group that has access and are able to have, you know, are they able to have high leverage by being able to find and produce patches, but then how do you actually coordinate the disclosure of those across an industry? And so there's a lot of factors that go into it. And I think that the true answer is like, if either extreme is not quite right, there are tools that can be applied to a specific situation. And I think that this is not the first time we've had to think about this problem. It's not the last time we will have to think about it. But one thing to note is that we have had our model in the hands of defenders for some time that we've been building up our trusted access program. The model that we're releasing is actually not cyber permissive, right? That it actually has a number of safeguards built into it and that you can then have a gap between what you're privately sharing, testing, those kinds of things. And so I think my short answer is like, it's, there's definitely these different schools of thought in terms of values of is the value that you want to get these models into people's hands and empower them, or is the value that you kind of want them to be centralized and controlled and that you don't want them in people's hands. That is something that is a maybe underlying tension in some of these debates. But I think that the tactics, right, that, you know, that those almost flow from the details and that they can be informed by these values, but either extreme reflexively, I don't think will yield the best outcome for the world. Okay, I want to ask you about agents, back to agents if we could. These agents work, work the best if you sort of let them have a high degree of autonomy. I mean, it sort of makes sense. So I'm just kind of curious to hear your perspective as we get more agents that can do more things and access more files and work across programs. What is the proper amount of trust to put into agents right now? So, I think that right now, actually, agents tend to be quite reliable and even things like prompt injections. I think that there are still holes there, but that we're patching them and the models are becoming much more resilient. But I also think that the flip side is that as these models are given increasing responsibility and access to more important context, that you need to have some answer for just like if you have employees, you know, if you have a team of five employees, they're all kind of trustworthy, fine. But if you have 500,000 of the same employees, that somehow those numbers, right, just like that there's a lot of large numbers that you start to worry about, okay, how do I have good governance and oversight? And so this is something where as we're investing in these capabilities and making the super app more accessible, not just to coders, but to any person doing work with a computer. We're also investing in governance and oversight. And you can see this very concretely in workspace agents, which we released recently. So that's within your enterprise, you can now define agents. So you get a hosted codex harness in the cloud. You can hook up tools, you can hook it up to your Slack, and it's doing work. It's like awesome. A lot of people use it. It's been very cool to see how sort of viral it goes within an organization. When you, you use someone else's agent, you're like, wait, I can build one of these too. And you can just fork it and do your own thing. And then that's an opportunity to have great governance in that you can see that's baked into the product where your IT organization can see all the agents have been created that for an agent, you can see the conversations it's had, and that you can think about exactly what the guardrails are around it. So I think that the short answer is like, you want to ramp the responsibility entrusted with the agent and the diversity of things that agents are doing together with security, safety, observability, oversight. And if you're not doing those in hand in hand, then I think that that, that that's a little bit out of balance. And I think it's important to, to think about both sides. Yeah, basically go ahead, but be careful. But you, and really lean in, right? I think it's like as you scale, like you can prototype and it's just the nature of scale that starts to bring in the, do you still have the ability to, to oversee what's going on? So you need to kind of make sure at each step, do you feel like you're calibrated? Do you understand what your teams are up to? Greg, let's end with this. You call this a compute powered economy. What does that mean? Well, I think we are heading to a world where the more compute is poured into a problem, the faster that problem will be solved. And that the ceiling of problem that can be solved depends on how much compute is available. And you think about things like drug discovery, right? Being able to solve complex diseases. Like those are, solving complex diseases like Alzheimer's is kind of outside of humanity's reach right now. We've never really done it. But imagine a world where you can take a gigawatt data center and have it just think about how to solve Alzheimer's for a month, for a year, however long it takes. And it may not be literally just cerebrally solving this problem, but it may have to consult with world experts. Maybe it has to suggest experiments that get run in a wet lab. But if you can actually solve such a problem, that would be such a transformatively positive thing for humanity. And I think we're heading to a world where that is how important problems get solved. And that is how tasks in your daily life can also be solved. Whether it's having an agent that knows you, that has your personal context that is trustworthy, that you can ask for advice on health and you get back. And that's just the thing. It's a smartphone that's in your pocket, right? You can just talk to and it'll be out there doing things and proactively knows what are your goals, what are your interests and how it can help you. And I think that big and small, compute is going to be the resource that shows how much computers can be used to help people to do work on behalf of people. And I think we're heading to that world and it's one that we're all building collectively. Yeah, and that, I think, would explain the massive investments that you've led making these big infrastructure bets. Still not enough. We're going to feel the scarcity. We're going to feel it. We're feeling it already. You can sense it right now on people who are trying to use these agents and just simply cannot, you know, hitting the rate limits. So we're working on behalf of our customers on behalf of everyone who wants to use these agents to ensure that there is enough. And I don't think we're going to get there. We're going to do our best, but I think that we are headed to a world of compute scarcity. And again, I think this is something where we can all contribute to trying to help there just be more availability of this in the world. Craig, busy day. Always appreciate your time. Always great to speak with you. Thanks again for coming on. Likewise. Great chatting. Hi, I'm John Ray, your friend and jeweler at Shane Company. Getting engaged? So friends, listen up. Shane Company engagement rings are protected with our all-inclusive free lifetime warranty. We stand behind the same quality we've offered since 1929. Our rings are crafted to last. It's why our warranty is the only one that protects your center stone. We also cover repairs, resizings, cleanings, and so much more. That's money you keep for you and your family because a friend always has your back. Shane Company, your friend and jeweler.