How We Analyze Calorie Tracking App Reviews: Our Methodology

Sergey Oreshko - Co-founder and CEO of MyNetDiary

Our complete methodology: data sources, app selection, filtering, authenticity assessment, and the exact analysis prompt. Full transparency.

Transparency is rare in app comparison content. Most "best calorie tracker app" articles don't disclose how they select apps, filter reviews, or arrive at their conclusions. Many haven't been meaningfully updated even when they display a recent date at the top.

We think that should be different. The Diet App Scorecard publishes its complete methodology so that readers and other analysts can evaluate, replicate, or challenge our process. Everything is documented here: the data source (free and publicly accessible), the app selection criteria (verifiable on the App Store), the filtering criteria (explicit), and the analysis prompt (published verbatim below).

This methodology is applied equally to every app analyzed, including MyNetDiary. I'm the CEO of MyNetDiary and the author of this analysis, so yes — you should evaluate these findings with that context in mind. That's exactly why we publish the methodology in full: so you can judge for yourself whether the process introduces bias.

Why Monthly Reviews, Not Star Ratings

The scorecard analyzes one calendar month of user reviews at a time and recalculates the average rating from that month's reviews only. This is deliberate. The App Store has three problems that make its default view unreliable for evaluating apps.

The Review Summary skews positive

Apple generates an automatic summary of each app's reviews that appears prominently on the store page. Here's the problem: this summary is predominantly positive even when most recent reviews are negative. An app where 70% of last month's reviews are one- and two-star complaints can display a cheerful AI-generated summary suggesting users love it. It's often the first thing consumers see, and it misrepresents what users are actually saying.

"Most Helpful" shows old reviews

When you tap into Ratings & Reviews, the App Store defaults to "Most Helpful" — which typically means reviews from several years ago, heavily skewed positive. You might read the first five displayed reviews and see nothing but praise from 2020 or 2021, completely missing the current wave of complaints about bugs, redesigns, or billing problems.

The star rating is not a review rating

This one catches a lot of people off guard. The App Store displays an average star rating accumulated over the app's entire lifetime — which can span a decade or more. It's a star rating, not a review rating. It's not calculated from recent reviews. It's not weighted toward recent experience. An app that was excellent five years ago but has since raised prices, removed features, or pushed a bad redesign still carries those old five-star ratings in its average.

In our analysis, we've observed apps where the monthly review rating is 1.5 to 2.0 stars lower than the all-time average. That's a real decline in user experience that the displayed star rating completely hides.

The Diet App Scorecard exists to show you the current picture these App Store defaults can't.

Data Source

Primary source: App Store RSS feed

We pull monthly user reviews from the US App Store using Apple's public RSS feed. This feed provides individual review text, star rating, reviewer name, and date. It's freely accessible to anyone — which means any reader can access the same raw data we analyze.

Alternative sources for verification

Reviews can also be obtained through third-party services like data.ai or Sensor Tower, or by manually copying and pasting from the App Store app on any Apple device. We use the RSS feed because it's free, public, and machine-readable, but we encourage readers to verify our data using whichever method they prefer.

Time window

Each scorecard analyzes reviews from one full calendar month. No partial months, no rolling windows. A clean, consistent time boundary that anyone can verify.

Why US App Store only

Two reasons.

First, verifiability. Anyone can open the App Store on a US-region Apple device and manually read the same reviews we analyze. That kind of verification is realistic for one market. It's not realistic across dozens of country-specific stores where review volume is sparse and the sheer number of stores makes it impractical.

Second, the US App Store is the largest English-language market and produces the highest review volume across all the apps we track. Even smaller apps generate enough reviews for meaningful comparison.

We may add non-US markets in the future. If we do, we'll document the change here.

Which Apps We Analyze and How We Select Them

We don't maintain a fixed, editorially chosen list. On the last day of each month, we check the US App Store's top 100 apps in the Health & Fitness category and identify all apps whose primary function is calorie tracking, food tracking, or diet tracking.

From that top 100, we exclude apps that aren't primarily calorie or food tracking: workout-only trackers, step counters, meditation apps, period trackers, sleep apps, and general fitness apps without food logging. What's left forms that month's scorecard.

Apps enter when they reach the top 100 and exit when they fall below it. If an app is popular enough to rank in the top 100 — whether organically or through paid marketing and advertising — it's relevant to consumers and belongs in the analysis. This also means we naturally capture apps that spend heavily on advertising to acquire downloads but may not deliver a satisfying user experience. The contrast between chart position and review rating is itself a finding.

The core of the calorie tracking category consistently appears in the top 100, so we have good longitudinal comparability. When apps enter or exit, we note it in that month's Trends section.

How We Filter Reviews

After pulling reviews for each app, we apply two filters:

Random and unrelated reviews removed. Reviews that don't relate to the app experience: accidental posts, reviews about a completely different product, entries that are just random characters or test text. We report the count for each app.
Duplicate reviews removed. Same user posting the same review multiple times, or substantially identical reviews from different accounts. The App Store RSS feed includes updated reviews alongside original submissions, so when a user edits their review, both versions may appear in the feed. We keep only the most recent version and remove earlier duplicates. Count reported.

What stays in? Everything else. We don't filter based on sentiment, star count, or tone. Negative reviews stay. Positive reviews stay. Competitor mentions, bug reports, emotional rants, feature requests — all of it stays. If MyNetDiary receives a one-star review describing a genuine frustration, it stays in and affects our calculated average exactly as it should.

The goal is simple: the filtered set should reflect what users actually think, with noise removed but no editorial thumb on the scale.

How We Calculate Ratings

After filtering, we recalculate the average star rating. Simple arithmetic mean: add up all the star ratings, divide by the number of filtered reviews. No weighting by review length, recency, or helpfulness votes. The simplest possible calculation, chosen so anyone can replicate it.

How this differs from the App Store's displayed rating

Three differences:

Time window. The App Store displays an average accumulated over the app's entire lifetime. We calculate from one calendar month only.
Star rating vs. review rating. The App Store includes all ratings ever given, including quick-tap ratings with no written review. We calculate from written reviews only, so every rating has context.
Filtering. The App Store includes everything. We remove duplicates and unrelated reviews first.

These differences explain why our monthly rating can diverge significantly from the App Store's number. That divergence isn't an error — it's the point. The scorecard tells you what's happening now. The App Store star rating tells you what happened over the past decade.

How We Summarize Reviews

Each app gets a summary of about 100 to 120 words — a length we've found is the sweet spot for capturing what matters without losing nuance. Summaries don't compare apps to each other, so each one stands on its own. They mention other apps only when reviewers themselves do, since switching behavior ("I left MyFitnessPal because...") is a genuine part of the review record.

Every summary is generated by AI — currently Claude 4.6 Extended Thinking by Anthropic — using exactly the same prompt for every app in every month. The AI receives the raw reviews and the prompt below. It doesn't know which app it's analyzing. It gets the same instructions for MyNetDiary as for Cal AI, for Cronometer as for BitePal.

This is the cornerstone of the scorecard's methodology. A human summarizing nine apps' worth of reviews every month would inevitably bring varying attention, unconscious biases, and fatigue to the task. I know this because I tried it myself before switching to AI — my summaries of MyNetDiary were consistently gentler than my summaries of competitors. The AI doesn't have that problem. It applies the same analytical rigor to the 300th review as to the first. The prompt:

"Provide a summary of user reviews, about 100-120 words long, while filtering out random (unrelated) reviews. Report only the count of filtered random/unrelated and duplicate reviews, without listing details. In the summary, don't compare to other apps (mentioning other apps is OK if it's present in the reviews and warranted), so that the summary is independent. Represent positives and negatives proportionally to the total reviews body. Flag authenticity concerns only if review patterns appear suspicious (e.g., clusters of generic five-star reviews, signs of incentivized reviews, or ratings before meaningful use). If patterns appear organic, state so in one sentence. Calculate average review rating after filtering out the duplicate and random reviews."

We publish both the prompt and the AI model because transparency requires it. You can evaluate whether the instructions introduce bias. Other analysts can replicate the process with the same prompt, the same model, and the same publicly available review data, and compare their results to ours. If we change the AI model in the future, we'll document the change here with the date and rationale.

How We Assess Review Authenticity

Review authenticity is assessed by the same AI, using the same prompt, as part of each app's review analysis. The prompt instructs the AI to flag authenticity concerns only when review patterns appear suspicious, and to state that patterns appear organic when they do. The AI classifies each app into one of three designations:

Organic. Review patterns look natural — a mix of ratings, specific feature references, proportional praise and criticism consistent with genuine use.
Bimodal / Polarized. Strong split between high and low ratings. This isn't inherently suspicious: some apps genuinely divide users. We note the pattern so readers can evaluate it.
Suspicious. Patterns suggesting incentivized, fake, or manipulated reviews.

What triggers Suspicious: clusters of brief, generic five-star reviews contrasting with detailed negatives; apps that prompt ratings before users have accessed any features; reviews from users who say they haven't tried the app yet; rating patterns that don't match what the reviews actually say.

What doesn't trigger it: lots of negative reviews (that's just a poorly received app, not a fake review issue), bimodal distributions where both ends have substance, or review volume differences between apps.

One thing worth addressing directly: negativity bias is real and applies to all apps equally. Frustrated users are more likely to leave reviews. But negativity bias doesn't explain why some apps show clusters of generic five-star reviews alongside detailed complaints. That's a different pattern, and it's the one we flag.

Why does this matter? Because calorie tracking apps influence health decisions. A user who downloads an app based on manipulated ratings and gets wildly inaccurate calorie estimates may make dietary choices that undermine their health goals. Review authenticity in health apps isn't academic. It has real consequences.

What the Scorecard Does Not Cover

We're upfront about the limitations:

Google Play reviews. iOS App Store only for now. We may add Google Play in the future.
Non-US markets. US App Store only. Manual verification works for one market; it doesn't scale across dozens of country-specific stores.
Feature testing or analytical dimensions. The scorecard reports what users say in their reviews. We don't independently evaluate database quality, accuracy, free tier value, or ease of use here. Those analyses live in separate companion articles.
Customer support quality. Some reviews mention support, but we don't test it independently.
Paid review incentives. We can flag suspicious patterns, but we can't definitively prove reviews are paid.
Apps outside the top 100. If an app doesn't make the top 100 Health & Fitness on the last day of the month, it's not included. This may miss niche apps with small but loyal user bases.

These limitations are deliberate. The scorecard is designed to do one thing well: report what users are saying right now, with a transparent and repeatable methodology. Other dimensions of app quality are addressed in separate content where they can get the depth they deserve.

Disclosure

The Diet App Scorecard is published by MyNetDiary. We're one of the apps analyzed whenever we appear in the top 100 Health & Fitness apps on the US App Store. We apply the same methodology, filtering criteria, and analysis prompt to our own reviews as to every other app. The methodology is published in full above.

I've been reading App Store reviews every morning since we launched MyNetDiary in 2008. I author the analysis to bring that industry perspective. All health-related content is reviewed by Sue Heikkinen, MS, RDN, CDCES, BC-ADM, ACE-PT. Our inclusion in the scorecard is determined by the same market-driven criteria as every other app: we have to rank in the top 100 on the last day of the month.

All product names, logos, and brands are the property of their respective owners.

Frequently Asked Questions

Why should I trust a review analysis published by one of the apps being reviewed?

Fair question. The methodology is published in full, including the exact analysis prompt. App selection is objective: any calorie tracking app in the top 100 Health & Fitness gets included regardless of how it compares to MyNetDiary. The filtering criteria are explicit and don't favor any app. And anyone with access to the App Store's public RSS feed can replicate the entire process. We publish the methodology specifically so you don't have to take our word for it.

How do you decide which apps to include?

Calorie and food tracking apps in the top 100 US App Store Health & Fitness category on the last day of each month. The market determines relevance, not us.

What if an app enters or leaves the top 100?

We check monthly. If an app enters the top 100, it joins the next scorecard. If it exits, it's noted and removed. Historical data stays in previous scorecards and the archive.

Where does the review data come from?

US App Store, via Apple's public RSS feed. You can also get reviews through data.ai, Sensor Tower, or by manually copying from the App Store app. We use the RSS feed because it's free and public, but you're welcome to verify using any source.

Why US App Store only?

Verifiability. Anyone with a US-region Apple device can read the same reviews we analyze. That level of verification isn't realistic across dozens of country-specific stores. The US is also the largest English-language market with the highest review volume.

How is your rating different from the App Store star rating?

The App Store shows an all-time average spanning the app's entire history. We calculate a review rating from one month's written reviews, filtered to remove duplicates and unrelated content. We measure current experience. The App Store measures history. For some apps, our monthly rating is significantly lower than the displayed star rating.

Can I replicate your analysis?

Absolutely. The data source (App Store RSS feed), selection criteria (top 100 Health & Fitness), filtering criteria, analysis prompt, and AI model (Claude 4.6 Extended Thinking) are all published on this page. Have at it.

How often is this methodology updated?

The methodology is stable by design — consistency over time is what makes longitudinal comparisons valid. If we make changes, we'll document them here with the date and rationale. The current methodology has been in use since the inaugural scorecard in February 2026

Also available as www.plateai.com

Tracking & MyNetDiary->App Reviews

Apr 15, 2026

How We Analyze Calorie Tracking App Reviews: Our Methodology