Samsung is rolling out generative AI across its devices, and now OpenAI is getting in on the action with a new tool called Operator, announced on January 23. Operator builds on the same tech as ChatGPT but operates within a proprietary web browser. This means it can handle tasks like ordering groceries or booking travel all on its own.
In a recent blog post, OpenAI hinted that Operator could unlock fresh engagement possibilities for businesses but didn’t go into detail about how that would work.
So, what exactly is Operator? It’s an app that combines a web browser with the generative AI model GPT-4o. OpenAI developed it to enhance GPT-4o’s ability to navigate and interact with typical web pages. What sets Operator apart is its knack for making multi-step plans and self-correcting when things go off track. It’s specifically trained to deal with common web elements like buttons and forms.
Right now, Operator is in beta. OpenAI plans to gather feedback from early users to refine the tool. If you’re a ChatGPT Pro subscriber, you can sign up for Operator today, and it will soon be available to Plus, Team, and Enterprise users as well. Eventually, OpenAI will incorporate Operator’s features into ChatGPT more broadly, with the Computer-Using Agent (CUA) soon accessible through their API.
How does Operator actually work? The CUA uses a technique they call an “inner monologue” to follow a logical path and adapt when faced with surprises. It takes screenshots of web pages and utilizes a virtual mouse and keyboard for navigation. Just like with ChatGPT, you can give Operator custom instructions that it will remember, such as your favorite airline.
Users can prompt Operator using natural language, but it won’t handle logging into sites, providing payment details, or solving CAPTCHAs—those steps will be handed back to you. Operator won’t process sensitive actions like banking transactions or take part in pivotal decisions like hiring an employee. If it encounters an interface it can’t navigate, it will also defer to the user.
OpenAI collaborated with various companies to ensure Operator could interact smoothly with their platforms, including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber.
However, the early version of Operator has its challenges, especially with complicated interfaces like creating slideshows or managing calendar events.
Operator enters a competitive landscape, sharing some features with rivals like Google Gemini and Apple Intelligence. It’s also reminiscent of Microsoft’s Recall feature, which uses screenshots for navigation. While some functions overlap, Operator’s ability to autonomously navigate websites could set it apart. The concept of agentic AI, where generative models take on multi-step tasks for users, is gaining traction, yet there are still limits to these products.