article thumbnail

How Grok 3 compares to ChatGPT, DeepSeek and other AI rivals

Mashable Tech

xAI is promoting Grok 3 as the best model on the market, claiming it surpassed competitors from OpenAI , Google , Anthropic, and DeepSeek on key benchmarks. Shortly after the benchmarks were shared on the livestream, OpenAI product engineer Rex Asabor posted an "updated" chart with o3 beating Grok 3 Reasoning in math and science benchmarks.

Flash 131
article thumbnail

Benchmarking: Networked Nonprofits Measure Their Social Media Results In A Context

Beth's Blog: How Nonprofits Can Use Social Media

Arts and Social Media. At Zoetica, we facilitating a social media peer learning project called “ Leveraging Social Media: Becoming A Networked Nonprofit.&# Devon Smith, who writes the 24 Usable Hours blog, and a self-described “data nerd&# did a benchmarking analysis for participants. Benchmarking Study by Devon Smith.

professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Social Media and the Arts: How Strong Is Your Social Net?

Beth's Blog: How Nonprofits Can Use Social Media

Note From Beth: Back in 2011, I had pleasure of facilitating a panel discussion Grantmakers in the Arts pre-conference on technology and media with Rory MacPherson and Jai Sen from Sen Associates where I learned about research study they were conducting about social media use in the arts. keep spending level.

article thumbnail

Universal Speech Model (USM): State-of-the-art speech AI for 100+ languages

Google Research AI blog

USM is a family of state-of-the-art speech models with 2B parameters trained on 12 million hours of speech and 28 billion sentences of text, spanning 300+ languages. For en-US, USM has a 6% relative lower WER compared to the current internal state-of-the-art model. USM, which is for use in YouTube (e.g., Lower WER is better.

Language 140
article thumbnail

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

Google Research AI blog

Even many of the standard datasets we use today have been shown to have mislabeled data that can destabilize established ML benchmarks. In this blogpost, we outline dataset development bottlenecks confronting researchers and discuss the role of benchmarks and leaderboards in incentivizing researchers to address these challenges.

article thumbnail

The most innovative companies in artificial intelligence for 2025

Fast Company Tech

The o1 model rose quickly to the top of the rankings in common benchmark tests, and soon Google DeepMind , Anthropic , DeepSeek and others were training their models for real-time reasoning. Even before the appearance of new reasoning models, some of AIs hottest companies produced state-of-the-art new AI systems.

Companies 107
article thumbnail

Introducing MASK: A Benchmark for Measuring Honesty in AI Systems

The AI Alignment Forum

Published on March 5, 2025 10:56 PM GMT In collaboration with Scale AI, we are releasing MASK (Model Alignment between Statements and Knowledge) , a benchmark with over 1000 scenarios specifically designed to measure AI honesty. 1] Many state-of-the-art models lie under pressure. Interventions: Can We Make AI More Honest?