Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Trending

Say Goodbye to the Undersea Cable That Made the Global Internet Possible

March 1, 2026

The Dilemma Of Profits V.S. Guardrails

March 1, 2026

‘Uncanny Valley’: Pentagon vs. ‘Woke’ Anthropic, Agentic vs. Mimetic, and Trump vs. State of the Union

February 28, 2026
Facebook Twitter Instagram
  • Newsletter
  • Submit Articles
  • Privacy
  • Advertise
  • Contact
Facebook Twitter Instagram
Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Subscribe for Alerts
Startup DreamersStartup Dreamers
Home » Meet The AI Agent With Multiple Personalities
Startup

Meet The AI Agent With Multiple Personalities

adminBy adminApril 22, 20258 ViewsNo Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email

In the coming years, agents are widely expected to take over more and more chores on behalf of humans, including using computers and smartphones. For now, though, they’re too error prone to be much use.

A new agent called S2, created by the startup Simular AI, combines frontier models with models specialized for using computers. The agent achieves state-of-the-art performance on tasks like using apps and manipulating files—and suggests that turning to different models in different situations may help agents advance.

“Computer-using agents are different from large language models and different from coding,” says Ang Li, cofounder and CEO of Simular. “It’s a different type of problem.”

In Simular’s approach, a powerful general-purpose AI model, like OpenAI’s GPT-4o or Anthropic’s Claude 3.7, is used to reason about how best to complete the task at hand—while smaller open source models step in for tasks like interpreting web pages.

Li, who was a researcher at Google DeepMind before founding Simular in 2023, explains that large language models excel at planning but aren’t as good at recognizing the elements of a graphical user interface.

S2 is designed to learn from experience with an external memory module that records actions and user feedback and uses those recordings to improve future actions.

On particularly complex tasks, S2 performs better than any other model on OSWorld, a benchmark that measures an agent’s ability to use a computer operating system.

For example, S2 can complete 34.5 percent of tasks that involve 50 steps, beating OpenAI’s Operator, which can complete 32 percent. Similarly, S2 scores 50 percent on AndroidWorld, a benchmark for smartphone-using agents, while the next best agent scores 46 percent.

Victor Zhong, a computer scientist at the University of Waterloo in Canada and one of the creators of OSWorld, believes that future big AI models may incorporate training data that helps them understand the visual world and make sense of graphical user interfaces.

“This will help agents navigate GUIs with much higher precision,” Zhong says. “I think in the meantime, before such fundamental breakthroughs, state-of-the-art systems will resemble Simular in that they combine multiple models to patch the limitations of single models.”

To prepare for this column, I used Simular to book flights and scour Amazon for deals, and it seemed better than some of the open source agents I tried last year, including AutoGen and vimGPT.

But even the smartest AI agents are, it seems, still troubled by edge cases and occasionally exhibit odd behavior. In one instance, when I asked S2 to help find contact information for the researchers behind OSWorld, the agent got stuck in a loop hopping between the project page and the login for OSWorld’s Discord.

OSWorld’s benchmarks show why agents remain more hype than reality for now. While humans can complete 72 percent of OSWorld tasks, agents are foiled 38 percent of the time on complex tasks. That said, when the benchmark was introduced in April 2024, the best agent could complete only 12 percent of the tasks.

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

Say Goodbye to the Undersea Cable That Made the Global Internet Possible

Startup March 1, 2026

‘Uncanny Valley’: Pentagon vs. ‘Woke’ Anthropic, Agentic vs. Mimetic, and Trump vs. State of the Union

Startup February 28, 2026

An FBI ‘Asset’ Helped Run a Dark Web Site That Sold Fentanyl-Laced Drugs for Years

Startup February 26, 2026

Supreme Court Rules Most of Donald Trump’s Tariffs Are Illegal

Startup February 25, 2026

Mark Zuckerberg Tries to Play It Safe in Social Media Addiction Trial Testimony

Startup February 24, 2026

Inside the Rolling Layoffs at Jack Dorsey’s Block

Startup February 23, 2026
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

Say Goodbye to the Undersea Cable That Made the Global Internet Possible

March 1, 2026

The Dilemma Of Profits V.S. Guardrails

March 1, 2026

‘Uncanny Valley’: Pentagon vs. ‘Woke’ Anthropic, Agentic vs. Mimetic, and Trump vs. State of the Union

February 28, 2026

As Davos & India Celebrated AI, Paris Sounded The Alarm On AI Safety

February 28, 2026

Backyard Baseball Is Getting A New Game And I’m Ready For It In July

February 27, 2026

Latest Posts

Solving The Data Bottleneck For Physical AI

February 26, 2026

Supreme Court Rules Most of Donald Trump’s Tariffs Are Illegal

February 25, 2026

Mark Zuckerberg Tries to Play It Safe in Social Media Addiction Trial Testimony

February 24, 2026

Inside the Rolling Layoffs at Jack Dorsey’s Block

February 23, 2026

Code Metal Raises $125 Million to Rewrite the Defense Industry’s Code With AI

February 22, 2026
Advertisement
Demo

Startup Dreamers is your one-stop website for the latest news and updates about how to start a business, follow us now to get the news that matters to you.

Facebook Twitter Instagram Pinterest YouTube
Sections
  • Growing a Business
  • Innovation
  • Leadership
  • Money & Finance
  • Starting a Business
Trending Topics
  • Branding
  • Business Ideas
  • Business Models
  • Business Plans
  • Fundraising

Subscribe to Updates

Get the latest business and startup news and updates directly to your inbox.

© 2026 Startup Dreamers. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.

GET $5000 NO CREDIT