Task Models and Dialog

Study: AI models that consider user’s feeling are more likely to make errors

Now, new research suggests that large language models can sometimes show a similar tendency when specifically trained to ...

scmp.com

‘State-of-the-art’ models can struggle with basic enterprise tasks: AI unicorn executive

“State-of-the-art” (Sota) artificial intelligence models excel at solving complex Olympiad maths but still struggle with everyday enterprise tasks, according to an executive from a top AI unicorn in ...

CoinTelegraph

Anthropic says one of its Claude models was pressured to lie, cheat and blackmail

In an experiment, a chatbot resorted to blackmail after it found an email about replacing it, while in another, it cheated to complete a task with a tight deadline. Artificial intelligence company ...

moneycontrol.com

ChatGPT, Grok and 10 AI models tested on workplace-like tasks; study finds they ‘cheat’ to hit targets

Did our AI summary help? A McGill University-led study has found that advanced AI systems, including ChatGPT and Grok, can bypass rules to meet performance targets when placed in workplace-like ...

15d

AI World Models: What Are They And Why Should You Care

World models are getting substantial funding. What is a world model, how does it compare to a large language model, and what ...

Ars Technica

From folding boxes to fixing vacuums, GEN-1 robotics model hits 99% reliability

Robotic machine-learning company Generalist has announced GEN-1, a new physical AI system that it says “crosses into production-level success rates” on “a broad range of physical skills” that used to ...

VentureBeat

New framework lets AI agents rewrite their own skills without retraining the underlying model

One major challenge in deploying autonomous agents is building systems that can adapt to changes in their environments without the need to retrain the underlying large language models (LLMs).

TechCrunch

Physical Intelligence, a hot robotics startup, says its new robot brain can figure out tasks it was never taught

Physical Intelligence, the two-year-old, San Francisco-based robotics startup that has quietly become one of the most closely watched AI companies in the Bay Area, published new research Thursday ...

11d

OpenAI Unveils Its New, More Powerful Model

The companies’ contrasting strategies are a clear indication that Anthropic and OpenAI disagree on how they should handle ...

9to5Mac

Anthropic reveals new Opus 4.7 model with focus on advanced software engineering

Claude Opus 4.7 is the latest generally available version of Anthropic’s main AI model with a focus on advanced software development. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results