Benchmark, Comparison and Measure - Nonprofit Technology

A new AI test is outwitting OpenAI, Google models, among others

Mashable Tech

MARCH 25, 2025

are nowhere near achieving AGI (Artificial General Intelligence), according to a new benchmark. The Arc Prize Foundation, a nonprofit that measures AGI progress, has a new benchmark that is stumping the leading AI models. Google, OpenAI, DeepSeek, et al. OpenAI's o3-low model scored 75.7 percent on the first edition of ARC-AGI.

Test

Test Model Google Benchmark

Blackbaud Luminate Online® Benchmark Report Highlights

sgEngage

MARCH 8, 2024

The 16th annual Blackbaud Luminate Online Benchmark Report is here! It’s also a valuable tool to help nonprofits evaluate their results by giving them a comparison point for their performance against organizations of similar sizes and issue areas. We look forward to this report every year.

Blackbaud

Blackbaud Benchmark Online Report

ASUS Zenbook A14 review: A lightweight in every sense

Engadget

MARCH 7, 2025

The A14 is an ideal machine for writing on the go, since you can travel with it effortlessly and it offers a whopping 18 hours and 16 minutes of battery life (according to the PCMark 10 benchmark). But in comparison to the Surface Pro and Laptop, it's like driving an entry-level car instead of a true luxury offering.

Review

Review Laptop Benchmark Video

Webinars

The Everyday Donor: Unlocking Prospecting Segments Through Behavior Analysis

MORE WEBINARS

Measuring Your Crowdsourcing Efforts by Aliza Sherman

Beth's Blog: How Nonprofits Can Use Social Media

SEPTEMBER 19, 2011

We’ve been chatting about how to measure the impact of the crowd and she offered to write this guest post on the topic. Measuring Your Crowdsourcing Efforts by Aliza Sherman. In order to know how to measure crowdsourcing results, you first need to understand what kind of crowdsourcing you’re implementing. Measuring Work.

Measure

Measure Site Consultant Aggregator

Is 2013 the Year of Video for Nonprofits?

Beth's Blog: How Nonprofits Can Use Social Media

JANUARY 23, 2013

While techniques and equipment are important, it is also useful to have some benchmarks and best practices in the nonprofit sector to inform your strategy and measurement plan. Tactics will only go so far. Currently, there are no significant benchmarks around video for nonprofits. Why should you participate?

Video

Video Nonprofit Benchmark Survey

Storytelling Tips: Measure the ROI of Your Non-Profit’s Stories

The Storytelling Non-profit

JANUARY 24, 2022

One of the best storytelling tips I can give you is to set yourself up for measurement success from the very beginning. It can be near impossible to measure the success of a story if you haven’t first thought about what your desired outcomes are and drivers of success. This could be a quarter, month or week.

ROI

ROI Storytelling Measure Story

Please Use Streaming Workload to Benchmark Vector Databases

Towards Data Science

DECEMBER 1, 2023

Static workload benchmark is insufficient. The standard way to evaluate ANN indexes is to use a static workload benchmark , which consists of a fixed dataset and a fixed query set. A static workload benchmark. This evaluation approach was popularized by the ann-benchmarks project which started 5 years ago. MIT Licence.

Benchmark

Benchmark Database Stream API

[VIDEO] Measure Of Success: Creating Tools And Process To Report Impact

Bloomerang

NOVEMBER 13, 2021

And, as I was sharing earlier, we usually teach this in person over a period of two to three hours, so, you all are getting the, what we’ll call, the boot-camp version of “Measure of Success.” Was there anything helpful about that comparison between service engagement and impact? Is it measurable? 0001 pounds.

Impact

Impact Measure Report Process

Running Code and Failing Models

DataRobot

FEBRUARY 10, 2021

Over the holidays, I used DataRobot to reproduce a few machine learning benchmarks. The SARCOS dataset is a widely used benchmark dataset in machine learning. I tested this dataset because it appears in various benchmarks by Google and fast.ai. For comparison, a random forest model achieves 2.38 SARCOS Dataset Failure.

Model

Model Benchmark Metrics Training

Unicef’s Little Bet on Pinboard

Beth's Blog: How Nonprofits Can Use Social Media

NOVEMBER 28, 2012

We’d like to try to benchmark it against other disruptive pinterest campaigns but we’re not sure there is a good comparison case study. We predicted it might be smaller as the campaign is disruptive to the usual pinterest pattern. Please tell us if you know of one! Pinterest has only been around for short time.

Sierra Leone

Sierra Leone Case Study Benchmark Studies

Measuring the Value of Your Blog: Reflections Over the Last Year

Beth's Blog: How Nonprofits Can Use Social Media

AUGUST 4, 2008

About a year ago, I decided to benchmark my blog using some tips suggested by Avinash Kaushik. A year ago, he said that measuring outcomes for social media is, "an evolving art (not quite a science yet) and you have to be up to the challenge of both thinking a bit differently and be ok with leveraging several different tools.

Reflection

Reflection Measure Technorati ROI

How Socialble Are You? I'm 359

Beth's Blog: How Nonprofits Can Use Social Media

SEPTEMBER 1, 2008

HowSociable provides a simple way for you to begin measuring your brand???s It measures mentions on these twenty sites. For comparison, I benchmarked myself against Chris Pirillo. Click for larger image or here. s visibility on the social web. I scored 359 on September 1st.

Comparison

Comparison Benchmark Measure Site

What's Your Social Media Baseline?

Beth's Blog: How Nonprofits Can Use Social Media

MARCH 7, 2009

Take her ROI and Measurement list. I've definitely added that link to my social media metrics personal learning space ) She recently pointed to a blog post called " Ten Ways To Measure Social Media Success " by Chris Lake. What I found most valuable was the tip about getting a baseline measurement before you begin.

Social Media

Social Media Media Social ROI

7th Annual Nonprofit Technology Staffing & Investments Report: A Closer Look (Staffing Levels)

NTEN

MAY 20, 2013

You can download the complete report here , and don''t forget the companion online benchmarking tool , where you can compare some of your organization''s data against your peers in our research. Another way we measure technology staffing levels is determing the Tech Staff - to - Organizational Staff Ratio.

Technology

Technology Report Ratio Nonprofit

Preference learning with automated feedback for cache eviction

Google Research AI blog

JUNE 23, 2023

LRB , LHD , storage applications ), it remains a challenge to outperform robust heuristics in a way that can generalize reliably beyond benchmarks to production settings, while maintaining competitive compute and memory overheads. The labels for these pending comparisons can only be resolved at a random future time.

Learning

Learning Sample Comparison Policy

An ML-based approach to better characterize lung diseases

Google Research AI blog

APRIL 27, 2023

One challenge in this process is how we make sense of the vast amount of clinical measurements — the UK Biobank has many petabytes of imaging, metabolic tests, and medical records spanning 500,000 individuals. Precision-recall curves for COPD status and outcomes for our ML model (green) compared to traditional measures.

Method

Method Statistics Measure Associations

What We Talk About When We Talk About Open Rates

NTEN

AUGUST 25, 2011

The traditional measure of open rates – total messages opened ÷ total messages delivered – should be used to gauge the overall health of your e-mail program. Because the comparisons will never be exact, the most important thing to watch is your True Open Rate over time. That's where the "Unweighted Open Rate" comes in. and take action.

Rate

Rate Open Mail Benchmark

7 Things I learned About Social Media Powered Online Fundraising and A Big Heartfelt Thank You for #OceanLoveEarl

Beth's Blog: How Nonprofits Can Use Social Media

JULY 16, 2013

If I have learned anything from co-writing a book about measurement , that it is not only important to collect your data, but leave space for reflection at the end of a campaign to harvest insights for the next campaign. I try to do this with any project I work on, whether it is a social media campaign as well as a training workshops.

Social Media

Social Media Fundraising Online Media

Scaling vision transformers to 22 billion parameters

Google Research AI blog

MARCH 31, 2023

Human object recognition alignment To find out how aligned ViT-22B classification decisions are with human classification decisions, we evaluated ViT-22B fine-tuned with different resolutions on out-of-distribution (OOD) datasets for which human comparison data is available via the model-vs-human toolbox. Cat or elephant? Car or clock?

Training

Training Train Model Arts

Stephen Downes On Blog Metrics

Beth's Blog: How Nonprofits Can Use Social Media

MAY 18, 2007

Stephen Downes summarized my post on Social Media Metrics and Measuring Blog Outcomes and added some commentary. Conversation Rate (measuring success in a social medium). Technorati "Authority" (measuring your impact on the world!). Stephen goes on to say: Measuring "your blog's outcome" is ridiculous.

Metrics

Metrics Technorati Measure Reflection

Frank Barry, Guest Post: 4 Facebook Tips for Nonprofit Success – See What Others are Doing

Beth's Blog: How Nonprofits Can Use Social Media

JULY 12, 2009

According to the “ eNonprofit Benchmarks Study ” done by NTEN (shout out to Holly Ross ) email is still the “killer app” that reaches the most people. By understanding your activity and performance, fan response, trends and comparisons, you are better equipped to improve your presence on Facebook. What is measured you ask? .

Facebook

Facebook Tips Stats Social Media

Retrieval-augmented visual-language pre-training

Google Research AI blog

JUNE 1, 2023

REVEAL achieves higher accuracy in comparison to previous works including ViLBERT , LXMERT , ClipCap , KRISP and GPV-2. We also evaluate REVEAL on the image captioning benchmarks using MSCOCO and NoCaps dataset. REVEAL achieves a higher score in comparison to Flamingo , VinVL , SimVLM and CoCa.

Language

Language Train Training Knowledge

How much does it cost to build the world’s hottest startups?

The Next Web

DECEMBER 2, 2013

Then Google and Benchmark pumped $258 million more into it this past August. There’s also the enormous looming cost of distribution — one that’s hard to measure and even harder to predict. In comparison to creating effective and data-driven distribution funnels to get your app out to millions, software is cheap.

Build

Build San Francisco New York City Design

Google Research, 2022 & beyond: Algorithmic advances

Google Research AI blog

FEBRUARY 10, 2023

As an example, for graphs with 10T edges, we demonstrate ~100-fold improvements in pairwise similarity comparisons and significant running time speedups with negligible quality loss. We find that academic GNN benchmark datasets exist in regions where model rankings do not change. All transactions are stored to allow fault-tolerance.

Research

Research Google Technique Model

AMD Radeon RX 6800 review: entry-level 4K

The Verge

NOVEMBER 18, 2020

inches) and slightly slimmer, the Nvidia card is also a bit quieter under load — both in terms of the RX 6800 having a somewhat louder hum and audibly ramping its fan up and down a tad more often in the middle of a benchmark. Not only is it a full inch shorter than the RX 6800 (9.5 inches versus 10.5 Each still has HDMI 2.1

Review

Review Game Chart Comparison

What to watch for at today’s Apple silicon Mac event

The Verge

NOVEMBER 10, 2020

To me, you don’t include a “pro” model on day one unless you are very confident in the benchmarks and performance. Apple is surely going to tout some impressive benchmarks for these Macs. tons and measures about 15-18 feet long, according to Giegel. Better to stick with just the mid-range model if you’re not sure. It weighs 2.5

Review

Review Camera Video Benchmark

Learning with Queried Hints

Google Research AI blog

JANUARY 25, 2023

The user’s satisfaction is measured by a reward that depends on unknown factors such as user preferences and road segment delays. The best expert in hindsight (and hence the benchmark to compare against) is the middle one, with total reward 21. An instance of the experts problem.

Hints

Hints Learning Comparison Problem

Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs

The AI Alignment Forum

MARCH 12, 2025

Published on March 12, 2025 5:56 PM GMT Summary The Stages-Oversight benchmark from the Situational Awareness Dataset tests whether large language models (LLMs) can distinguish between evaluation prompts (such as benchmark questions) and deployment prompts (real-world user inputs).

Awareness

Awareness Evaluation Sample Benchmark

Huawei launches new Pura 70 series phones equipped with self-developed Kirin 9010 chip

TechNode

APRIL 19, 2024

The Kirin 9010 processor achieved a single-core score of 1,442 and a multi-core score of 4,471 on Geekbench, a cross-platform benchmarking tool used to measure and compare the performance of CPUs and GPUs across various computing devices.

Phone

Phone Develop Blogger Camera

Data to support the relentless pursuit of racial equity

Candid

FEBRUARY 6, 2024

This was a considerable commitment, but many in the sector saw a need to measure the impact and effectiveness of such funding—specifically, whether the funds were going to nonprofits led by communities of color or to more traditional, white-led institutions. The best way to answer these questions is to measure and analyze consistent data.

Data

Data Support Demographics Evaluation

Trusted AI Cornerstones: Performance Evaluation

DataRobot

APRIL 20, 2021

At DataRobot , we define the benchmark of AI maturity as AI you can trust. This includes the basics: Computing summary statistics on each feature Measuring associations between features Observing feature distributions and their correlation with the predictive target Identifying outliers. Use Multiple Tools and Visualizations.

Evaluation

Evaluation Open Source Model Metrics

Nonprofit Web Design Process Part 2a: Analytics Data as User Research

Connection Cafe

JULY 22, 2013

To establish benchmarks for measuring success of our design efforts. Once we’ve set the timeframe, we then start digging into the data to answer some key questions: What are some benchmark stats for improvement? Our purposes for using Analytics as User Research are: To learn about current visitors to the website. Methodology.

Analytics

Analytics Research Design Web

NTEN and TechSoup Webinar: Share Your Story - ROI and Social Media - Slides and Notes

Beth's Blog: How Nonprofits Can Use Social Media

FEBRUARY 21, 2009

And, it also includes measurement - not just qualitative information. It uses metrics to measure your results and help you improve your strategy over time. ROI had it origins as an accounting term and was originally a measure of return on the total investment in the entire business. Use of metrics to measure your results.

ROI

ROI Social Media Slides NTEN

Your Guide to Tableau Viz Extensions

Tableau

OCTOBER 10, 2024

Polar Areas charts are particularly effective for showcasing relationships and proportions among multiple variables in a format emphasizing comparisons and trends. To build your own, assign one dimension with categories to the Level mark and assign one or multiple measures to Value. Radial Chart Viz Extension by Actinvision in Tableau.

Chart

Chart Guide Relationship Data

International Organizations and Social Media: News, Engagement, and Social Data for Policy Change

Beth's Blog: How Nonprofits Can Use Social Media

JANUARY 14, 2014

I’m teaching a graduate class at the Monterey Institute of International Studies based on my books, The Networked Nonprofit and Measuring the Networked Nonprofit. Benchmark Studies and Examples. The course is about how to leverage networks and social media for learning and impact.

Social Media

Social Media Policy International Social

What do web stats mean, anyway?

Zen and the Art of Nonprofit Technology

SEPTEMBER 17, 2007

As some sort of measure of accountability, raw web statistics (this site got x visits and y pageviews in t timeframe) mean zilch. to care a whole lot about how many hits they got in comparison to similar (or different) organizations. On a related note, I think a benchmarking study might be a useful exercise for nonprofits.

Stats

Stats Web Statistics NTEN

The Acer ConceptD 7 Ezel is the dream computer I’ll never own

The Verge

MAY 11, 2021

The Dell XPS 15 with the same processor and a GTX 1650 Ti took four minutes and 23 seconds (though different versions of Premiere Pro can impact export times, so synthetic benchmarks such as Cinebench are more precise for direct comparison).

Laptop

Laptop Adobe Artist Movie

Google Research, 2022 & Beyond: Language, Vision and Generative Models

Google Research AI blog

JANUARY 18, 2023

Performance comparison between the PaLM 540B parameter model and the prior state-of-the-art (SOTA) on 58 tasks from the Big-bench suite. One of the areas where multi-step reasoning is most clearly beneficial and measurable is in the ability of models to solve complex mathematical reasoning and scientific problems.

Language

Language Generation Model Research

Major Gift Metrics: What You Need to Know

DNL OmniMedia

NOVEMBER 12, 2020

Because of this, it’s important to carefully measure major gifts key performance indicators (KPIs) to understand the success of your program as a whole and specific strategies individually. What are the major gift metrics that matter when measuring success? Why is it important to track fundraising benchmarks?

Metrics

Metrics Gift Track Consultant

Key Findings from the 2021 donorCentrics® Sustainer Summit

sgEngage

APRIL 29, 2021

This year’s summit included data from a variety of sectors, drawn directly from participant CRMs and standardized to allow for consistent comparisons. Offering a premium can work for conversion, though careful monitoring of retention should be a part of the measured results. A lower ask amount may be necessary to convert more donors.

Retention

Retention Gift Donor Conversation

Nvidia GeForce RTX 3070 review: the 1440p sweet spot

The Verge

OCTOBER 27, 2020

I previously tested the RTX 3080 on an older Core i7-7700K , so I’ve gone back and tested Nvidia’s flagship on this new system to provide a comparison between the RTX 2080, RTX 3070, and RTX 3080. As you can see in the benchmark chart below, you won’t often need an RTX 3080 to max out today’s games with a 1440p monitor.

Review

Review Test Game Rate

AdaTape: Foundation model with adaptive computation and dynamic read-and-write

Google Research AI blog

AUGUST 8, 2023

This model is a Transformer-based architecture that uses a dynamic set of tokens to create elastic input sequences, providing a unique perspective on adaptivity in comparison to previous works. In the paper “ Adaptive Computation with Elastic Input Sequence ”, we introduce a new model that utilizes adaptive computation, called AdaTape.

Model

Model Foundation Evaluation Sample

The Nonprofit Engagement Metrics You May Have Overlooked

Neon CRM

JUNE 2, 2023

Want to learn more about the nonprofit email benchmarks that your organization should be using to measure success? However, this metric doesn’t truly measure social media engagement. It’s a very simple measure of engagement that’s often used in the for-profit sector. Download the full report today!

Metrics

Metrics Nonprofit Social Media Rate

Using Machine Learning for Sentiment Analysis: a Deep Dive

DataRobot

MARCH 9, 2022

There are a few standard datasets in the field that are often used to benchmark models and compare accuracies, but new datasets are being developed every day as labeled data continues to become available. In the field of sentiment analysis, one model works particularly well and is easy to set up, making it the ideal baseline for comparison.

Analysis

Analysis Learning Training Train

A new AI test is outwitting OpenAI, Google models, among others

Blackbaud Luminate Online® Benchmark Report Highlights

Webinars

Trending Sources

ASUS Zenbook A14 review: A lightweight in every sense

Webinars

Measuring Your Crowdsourcing Efforts by Aliza Sherman

Is 2013 the Year of Video for Nonprofits?

Storytelling Tips: Measure the ROI of Your Non-Profit’s Stories

Please Use Streaming Workload to Benchmark Vector Databases

[VIDEO] Measure Of Success: Creating Tools And Process To Report Impact

Running Code and Failing Models

Unicef’s Little Bet on Pinboard

Measuring the Value of Your Blog: Reflections Over the Last Year

How Socialble Are You? I'm 359

What's Your Social Media Baseline?

7th Annual Nonprofit Technology Staffing & Investments Report: A Closer Look (Staffing Levels)

Preference learning with automated feedback for cache eviction

An ML-based approach to better characterize lung diseases

What We Talk About When We Talk About Open Rates

7 Things I learned About Social Media Powered Online Fundraising and A Big Heartfelt Thank You for #OceanLoveEarl

Scaling vision transformers to 22 billion parameters

Stephen Downes On Blog Metrics

Frank Barry, Guest Post: 4 Facebook Tips for Nonprofit Success – See What Others are Doing

Retrieval-augmented visual-language pre-training

How much does it cost to build the world’s hottest startups?

Google Research, 2022 & beyond: Algorithmic advances

AMD Radeon RX 6800 review: entry-level 4K

What to watch for at today’s Apple silicon Mac event

Learning with Queried Hints

Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs

Huawei launches new Pura 70 series phones equipped with self-developed Kirin 9010 chip

Data to support the relentless pursuit of racial equity

Trusted AI Cornerstones: Performance Evaluation

Nonprofit Web Design Process Part 2a: Analytics Data as User Research

NTEN and TechSoup Webinar: Share Your Story - ROI and Social Media - Slides and Notes

Your Guide to Tableau Viz Extensions

International Organizations and Social Media: News, Engagement, and Social Data for Policy Change

What do web stats mean, anyway?

The Acer ConceptD 7 Ezel is the dream computer I’ll never own

Google Research, 2022 & Beyond: Language, Vision and Generative Models

Major Gift Metrics: What You Need to Know

Key Findings from the 2021 donorCentrics® Sustainer Summit

Nvidia GeForce RTX 3070 review: the 1440p sweet spot

AdaTape: Foundation model with adaptive computation and dynamic read-and-write

The Nonprofit Engagement Metrics You May Have Overlooked

Using Machine Learning for Sentiment Analysis: a Deep Dive

Stay Connected