How We Analyze Calorie Tracking App Reviews: Our Methodology
- 9 Minute Read
Our complete methodology: data sources, app selection, filtering, authenticity assessment, and the exact analysis prompt. Full transparency.
Transparency is rare in app comparison content. Most "best calorie tracker app" articles don't disclose how they select apps, filter reviews, or arrive at their conclusions. Many haven't been meaningfully updated even when they display a recent date at the top.
We think that should be different. The Diet App Scorecard publishes its complete methodology so that readers and other analysts can evaluate, replicate, or challenge our process. Everything is documented here: the data source (free and publicly accessible), the app selection criteria (verifiable on the App Store), the filtering criteria (explicit), and the analysis prompt (published verbatim below).
This methodology is applied equally to every app analyzed, including MyNetDiary. I'm the CEO of MyNetDiary and the author of this analysis, so yes — you should evaluate these findings with that context in mind. That's exactly why we publish the methodology in full: so you can judge for yourself whether the process introduces bias.
The scorecard analyzes one calendar month of user reviews at a time and recalculates the average rating from that month's reviews only. This is deliberate. The App Store has three problems that make its default view unreliable for evaluating apps.
Apple generates an automatic summary of each app's reviews that appears prominently on the store page. Here's the problem: this summary is predominantly positive even when most recent reviews are negative. An app where 70% of last month's reviews are one- and two-star complaints can display a cheerful AI-generated summary suggesting users love it. It's often the first thing consumers see, and it misrepresents what users are actually saying.
When you tap into Ratings & Reviews, the App Store defaults to "Most Helpful" — which typically means reviews from several years ago, heavily skewed positive. You might read the first five displayed reviews and see nothing but praise from 2020 or 2021, completely missing the current wave of complaints about bugs, redesigns, or billing problems.
This one catches a lot of people off guard. The App Store displays an average star rating accumulated over the app's entire lifetime — which can span a decade or more. It's a star rating, not a review rating. It's not calculated from recent reviews. It's not weighted toward recent experience. An app that was excellent five years ago but has since raised prices, removed features, or pushed a bad redesign still carries those old five-star ratings in its average.
In our analysis, we've observed apps where the monthly review rating is 1.5 to 2.0 stars lower than the all-time average. That's a real decline in user experience that the displayed star rating completely hides.
The Diet App Scorecard exists to show you the current picture these App Store defaults can't.
We pull monthly user reviews from the US App Store using Apple's public RSS feed. This feed provides individual review text, star rating, reviewer name, and date. It's freely accessible to anyone — which means any reader can access the same raw data we analyze.
Reviews can also be obtained through third-party services like data.ai or Sensor Tower, or by manually copying and pasting from the App Store app on any Apple device. We use the RSS feed because it's free, public, and machine-readable, but we encourage readers to verify our data using whichever method they prefer.
Each scorecard analyzes reviews from one full calendar month. No partial months, no rolling windows. A clean, consistent time boundary that anyone can verify.
Two reasons.
First, verifiability. Anyone can open the App Store on a US-region Apple device and manually read the same reviews we analyze. That kind of verification is realistic for one market. It's not realistic across dozens of country-specific stores where review volume is sparse and the sheer number of stores makes it impractical.
Second, the US App Store is the largest English-language market and produces the highest review volume across all the apps we track. Even smaller apps generate enough reviews for meaningful comparison.
We may add non-US markets in the future. If we do, we'll document the change here.
We don't maintain a fixed, editorially chosen list. On the last day of each month, we check the US App Store's top 100 apps in the Health & Fitness category and identify all apps whose primary function is calorie tracking, food tracking, or diet tracking.
From that top 100, we exclude apps that aren't primarily calorie or food tracking: workout-only trackers, step counters, meditation apps, period trackers, sleep apps, and general fitness apps without food logging. What's left forms that month's scorecard.
Apps enter when they reach the top 100 and exit when they fall below it. If an app is popular enough to rank in the top 100 — whether organically or through paid marketing and advertising — it's relevant to consumers and belongs in the analysis. This also means we naturally capture apps that spend heavily on advertising to acquire downloads but may not deliver a satisfying user experience. The contrast between chart position and review rating is itself a finding.
The core of the calorie tracking category consistently appears in the top 100, so we have good longitudinal comparability. When apps enter or exit, we note it in that month's Trends section.
After pulling reviews for each app, we apply two filters:
What stays in? Everything else. We don't filter based on sentiment, star count, or tone. Negative reviews stay. Positive reviews stay. Competitor mentions, bug reports, emotional rants, feature requests — all of it stays. If MyNetDiary receives a one-star review describing a genuine frustration, it stays in and affects our calculated average exactly as it should.
The goal is simple: the filtered set should reflect what users actually think, with noise removed but no editorial thumb on the scale.
After filtering, we recalculate the average star rating. Simple arithmetic mean: add up all the star ratings, divide by the number of filtered reviews. No weighting by review length, recency, or helpfulness votes. The simplest possible calculation, chosen so anyone can replicate it.
Three differences:
These differences explain why our monthly rating can diverge significantly from the App Store's number. That divergence isn't an error — it's the point. The scorecard tells you what's happening now. The App Store star rating tells you what happened over the past decade.
Each app gets a summary of about 100 to 120 words — a length we've found is the sweet spot for capturing what matters without losing nuance. Summaries don't compare apps to each other, so each one stands on its own. They mention other apps only when reviewers themselves do, since switching behavior ("I left MyFitnessPal because...") is a genuine part of the review record.
Every summary is generated by AI — currently Claude 4.6 Extended Thinking by Anthropic — using exactly the same prompt for every app in every month. The AI receives the raw reviews and the prompt below. It doesn't know which app it's analyzing. It gets the same instructions for MyNetDiary as for Cal AI, for Cronometer as for BitePal.
This is the cornerstone of the scorecard's methodology. A human summarizing nine apps' worth of reviews every month would inevitably bring varying attention, unconscious biases, and fatigue to the task. I know this because I tried it myself before switching to AI — my summaries of MyNetDiary were consistently gentler than my summaries of competitors. The AI doesn't have that problem. It applies the same analytical rigor to the 300th review as to the first. The prompt:
"Provide a summary of user reviews, about 100-120 words long, while filtering out random (unrelated) reviews. Report only the count of filtered random/unrelated and duplicate reviews, without listing details. In the summary, don't compare to other apps (mentioning other apps is OK if it's present in the reviews and warranted), so that the summary is independent. Represent positives and negatives proportionally to the total reviews body. Flag authenticity concerns only if review patterns appear suspicious (e.g., clusters of generic five-star reviews, signs of incentivized reviews, or ratings before meaningful use). If patterns appear organic, state so in one sentence. Calculate average review rating after filtering out the duplicate and random reviews."
We publish both the prompt and the AI model because transparency requires it. You can evaluate whether the instructions introduce bias. Other analysts can replicate the process with the same prompt, the same model, and the same publicly available review data, and compare their results to ours. If we change the AI model in the future, we'll document the change here with the date and rationale.
Review authenticity is assessed by the same AI, using the same prompt, as part of each app's review analysis. The prompt instructs the AI to flag authenticity concerns only when review patterns appear suspicious, and to state that patterns appear organic when they do. The AI classifies each app into one of three designations:
What triggers Suspicious: clusters of brief, generic five-star reviews contrasting with detailed negatives; apps that prompt ratings before users have accessed any features; reviews from users who say they haven't tried the app yet; rating patterns that don't match what the reviews actually say.
What doesn't trigger it: lots of negative reviews (that's just a poorly received app, not a fake review issue), bimodal distributions where both ends have substance, or review volume differences between apps.
One thing worth addressing directly: negativity bias is real and applies to all apps equally. Frustrated users are more likely to leave reviews. But negativity bias doesn't explain why some apps show clusters of generic five-star reviews alongside detailed complaints. That's a different pattern, and it's the one we flag.
Why does this matter? Because calorie tracking apps influence health decisions. A user who downloads an app based on manipulated ratings and gets wildly inaccurate calorie estimates may make dietary choices that undermine their health goals. Review authenticity in health apps isn't academic. It has real consequences.
We're upfront about the limitations:
These limitations are deliberate. The scorecard is designed to do one thing well: report what users are saying right now, with a transparent and repeatable methodology. Other dimensions of app quality are addressed in separate content where they can get the depth they deserve.
The Diet App Scorecard is published by MyNetDiary. We're one of the apps analyzed whenever we appear in the top 100 Health & Fitness apps on the US App Store. We apply the same methodology, filtering criteria, and analysis prompt to our own reviews as to every other app. The methodology is published in full above.
I've been reading App Store reviews every morning since we launched MyNetDiary in 2008. I author the analysis to bring that industry perspective. All health-related content is reviewed by Sue Heikkinen, MS, RDN, CDCES, BC-ADM, ACE-PT. Our inclusion in the scorecard is determined by the same market-driven criteria as every other app: we have to rank in the top 100 on the last day of the month.
All product names, logos, and brands are the property of their respective owners.
Fair question. The methodology is published in full, including the exact analysis prompt. App selection is objective: any calorie tracking app in the top 100 Health & Fitness gets included regardless of how it compares to MyNetDiary. The filtering criteria are explicit and don't favor any app. And anyone with access to the App Store's public RSS feed can replicate the entire process. We publish the methodology specifically so you don't have to take our word for it.
Calorie and food tracking apps in the top 100 US App Store Health & Fitness category on the last day of each month. The market determines relevance, not us.
We check monthly. If an app enters the top 100, it joins the next scorecard. If it exits, it's noted and removed. Historical data stays in previous scorecards and the archive.
US App Store, via Apple's public RSS feed. You can also get reviews through data.ai, Sensor Tower, or by manually copying from the App Store app. We use the RSS feed because it's free and public, but you're welcome to verify using any source.
Verifiability. Anyone with a US-region Apple device can read the same reviews we analyze. That level of verification isn't realistic across dozens of country-specific stores. The US is also the largest English-language market with the highest review volume.
The App Store shows an all-time average spanning the app's entire history. We calculate a review rating from one month's written reviews, filtered to remove duplicates and unrelated content. We measure current experience. The App Store measures history. For some apps, our monthly rating is significantly lower than the displayed star rating.
Absolutely. The data source (App Store RSS feed), selection criteria (top 100 Health & Fitness), filtering criteria, analysis prompt, and AI model (Claude 4.6 Extended Thinking) are all published on this page. Have at it.
The methodology is stable by design — consistency over time is what makes longitudinal comparisons valid. If we make changes, we'll document them here with the date and rationale. The current methodology has been in use since the inaugural scorecard in February 2026
Still new to MyNetDiary? Learn more today by downloading the app for FREE.
Check out PlateAI, our new AI-powered diet app at PlateAI.com
Tracking & MyNetDiary->App Reviews