Data Is Not Diagnosis: The great misunderstanding of Computational Pathology

Jul 11, 2025

Artificial intelligence (AI) is quietly reshaping the heart of diagnostic medicine—especially in fields like pathology, where visual interpretation has long been the cornerstone of practice. The arrival of whole-slide imaging (WSI) opened the door to a tempting promise: algorithms capable of detecting complex patterns, automating routine tasks, and assisting pathologists in clinical decision-making (1).

But behind the excitement lies a dangerous misunderstanding that must be addressed from the outset: data is not diagnosis.

This statement, simple on the surface, touches a deep tension at the core of digital pathology. The mere availability of data—no matter how massive or seemingly precise—does not equate to clinical understanding. What a computational model predicts with high confidence is not always what a patient needs to hear, nor what a physician is ready to sign off on.

True diagnosis requires more than statistical correlations. It demands context, clinical judgment, cross-disciplinary knowledge, and—above all—responsibility.

The fundamental disconnect:

Why raw data, no matter how vast, is not enough for a clinical diagnosis

Artificial intelligence has shown extraordinary talent in analyzing images, detecting imperceptible patterns, and processing data at a speed far beyond human capacity. In pathology, this translates into models capable of identifying tumor regions, estimating scores, and even predicting molecular alterations from a single H&E image. But there is a persistent misunderstanding we need to dismantle: pattern recognition is not diagnosis.

Clinical diagnosis is not a mere aggregation of data. It is a complex construction that emerges from the integration of multiple layers of information: histopathologic features, molecular profiles, clinical context, patient evolution—and often, elements that are not in the data but live in the physician’s experience. It is an act of synthesis, not of summation.

This is where we encounter what might be called the fundamental disconnect: the distance between what AI can compute and what a physician needs to decide. While models learn from large datasets, their reasoning remains opaque. Most operate as black boxes, unable to explain how they reached a conclusion (2). And this is not a minor technical detail: it prevents critical validation by the pathologist, limits traceability, and—above all—undermines trust.

Moreover, this opacity doesn’t eliminate the possibility of error—it disguises it. The more “confident” a model appears in its prediction, the more dangerous the illusion of certainty becomes. Without explainability, high algorithmic confidence turns into a paradox: the more certain the output seems, the greater the need for human oversight (3). Because in medicine, the gravest mistakes often stem not from ignorance, but from false certainty.

Ultimately, diagnosis is also an ethical and legal responsibility. It’s not enough for a model to suggest a possibility—someone must interpret it, assume it, and communicate it to a patient. And that someone, at least for now, cannot be an algorithm.

The illusion of algorithmic certainty

The black box: when not understanding becomes a clinical risk

In recent years, artificial intelligence has made remarkable progress in assisted diagnostic tasks. But there’s a question we don’t ask often enough: how does a model reach its conclusions?

Many algorithms—especially those based on deep learning—function as true black boxes: they produce results, sometimes with very high confidence, yet offer no clear or understandable explanation of how they got there (2). For an engineer, this might be a technical issue. For a physician, it’s a matter of life and death.

In medicine, prediction is not enough—we must be able to explain, defend, and justify our decisions (as we’ve learned to do through experience, clinical cases, and the insights of colleagues). Behind every prediction, there is a human being waiting for an answer. And if the physician can’t understand how that prediction was generated, how can they take responsibility for it? (4) How can they challenge it, adapt it to the patient’s context, or simply say: “this doesn’t add up”?

This lack of explainability threatens one of the core ethical pillars of clinical practice: accountability. Trust between doctor and patient doesn’t rely solely on accuracy—it’s built on our ability to make sense of what we’re doing. When the algorithm starts acting more like an oracle than a tool, that trust begins to erode.

Europe has already begun to regulate this decisively: the EU AI Act now requires that clinical decision-support systems be explainable by design. Transparency is no longer a technical luxury—it is a clinical, legal, and ethical imperative (5).

Because if we don’t understand what AI is doing, we won’t know when it fails either. And that’s a risk no pathologist—or any physician—should be willing to accept.

Algorithmic bias: when the model learns our inequalities

One of the most dangerous myths surrounding artificial intelligence is its supposed neutrality. We often hear that “data doesn’t lie,” but that’s not entirely true. Data reflects our practices, our systems, and our past decisions—and that inevitably includes our biases (6).

In pathology, as in many branches of medicine, algorithms are trained on large volumes of historical data. But if that data comes mostly from patients in a specific region—where hospitals are well-funded and healthcare systems are stable—then the model will learn to see the world from that perspective (5). Not because it intends to exclude, but because it doesn’t know any other reality.

The consequences run deep: algorithms that perform better in some populations than in others (6), that detect tumors more accurately in patients who fit that training reality, and that systematically fail to recognize less common variants due to underrepresentation in the dataset.

This isn’t just a technical problem (one that could be solved with some code fixes or retraining). Are we facing a clinical injustice? Or is it rather a training injustice? Were we naïve when building these models? Or did we simply see what we could—and wanted—to see?

Misdiagnoses, delayed treatments, or flawed decisions fall—once again—on the same historically underserved groups (7). Not because we intended it (as we've explained), but because what we perceive as “the world” is only a small part of it. And to make things worse, this bias often goes unseen. When the model is a black box, we don’t even know why it’s failing.

Preventing bias cannot be reduced to reactive data cleaning. It demands an active strategy (8): diverse development teams, rigorous data curation from the source, and an ethical framework integrated into every stage of the model’s lifecycle (9). It’s not enough to detect bias—we must design systems that are born with equity as a structural principle (which may sound complicated, but the solution—like always—is right in front of us).

Because if we don’t change what feeds the algorithm, all we’ll achieve is a more elegant—and more dangerous—version of our existing inequalities.

One of the main appeals of artificial intelligence is its ability to find patterns, establish correlations, and deliver results quickly. But sometimes, that efficiency comes at the expense of something essential: complexity.

Medicine—and pathology in particular—is full of gray areas, borderline cases, and exceptions that don’t follow the rules. AI, however, tends to force reality to fit what it has learned. It turns continuous variables into discrete categories, reduces complex phenomena into binary labels, and generalizes across patients who, in the real world, are rarely “average.”

And this has consequences.

A treatment plan suggested by a model may make statistical sense, yet be completely unworkable for a patient who lives far from a hospital, lacks access to continuous care, or faces invisible barriers like exhausting work schedules, energy poverty, or untreated mental health issues (7). These social determinants of health rarely show up in datasets, but they shape outcomes as much—or more—than a biomarker (7).

There’s also a subtler but equally important issue: many models are evaluated using metrics like “accuracy” or “AUC,” without asking whether those results actually lead to better decisions. A model might perform well in cross-validation, but if it has learned a superficial correlation—or picked up on a slide artifact—its clinical utility is illusory (10).

In that sense, there's a troubling paradox: in trying to translate clinical practice into data language, we risk losing the clinical meaning altogether.

And the most ironic part? This simplification—often marketed as progress—may end up pulling us further away from our original goal: improving care. Because if AI pushes us to see patients as numerical vectors rather than complex stories, medicine loses what makes it human.

Beyond Association

Correlation vs. Causation: a boundary we can’t afford to ignore

In the age of big data, finding correlations has become almost trivial. Feed enough information into a model, and it will detect patterns between genes, images, mutations, and clinical outcomes. But as any clinical researcher knows: correlation is not causation.

Correlation describes a statistical relationship where two or more variables change together—but it doesn’t mean that one causes the other. A classic, almost cartoonish example makes this clear: ice cream sales and drowning incidents tend to rise at the same time (11). But one doesn’t cause the other—they’re both influenced by a third variable: warm weather.

Causation, on the other hand, means that one variable directly produces a change in another (11). Demonstrating this is no easy task—it requires careful experimental design, causal inference techniques, or deep domain knowledge. In machine learning, correlations help boost predictive accuracy, but relying solely on them—without investigating causal relationships—can lead to dangerously misleading conclusions.

In medicine, this isn’t just a theoretical concern. It’s a trap that can lead to useless biomarkers, flawed clinical decisions, or failed trials.

Take a concrete example from neuro-oncology: some AI models can predict the methylation status of the MGMT gene in glioblastomas directly from H&E slides (12). Impressive? Absolutely. Useful? Only if we understand what that prediction actually represents, if it’s properly validated, and if it has real therapeutic implications. Otherwise, we’re looking at a visual correlation that doesn’t change the prognosis—or the clinical decision.

The same holds for predictions of recurrence or disease progression. Knowing who is at higher risk doesn’t always mean knowing why, or what to do about it. A model might predict with high confidence, but if there's no actionable mechanism behind it, we’re facing a diagnostic mirage.

This is why the future of computational pathology must go deeper than surface-level associations. We need models that integrate causal inference, that generate plausible biological hypotheses, and that truly support clinical decision-making. Because a good biomarker isn’t just one that predicts—it’s one that can change the course of disease.

The biomarker trap: when prediction isn’t enough

In oncology—and especially in neuro-oncology—the enthusiasm for predictive biomarkers has grown in parallel with the rise of artificial intelligence. But within that excitement lies a trap: believing that what predicts well can necessarily guide treatment.

The case of MGMT promoter methylation in glioblastomas is emblematic. We know it’s associated with response to temozolomide. Some AI models have learned to predict this methylation status directly from H&E slides or MRI scans (12). What looks like a significant technical advance can become clinically ambiguous.

Why? Because promoter methylation doesn’t always lead to protein silencing, and MGMT expression itself isn’t a definitive predictor of resistance or sensitivity (13). Even among experts, there’s ongoing debate over the best way to assess it: PCR, IHC, or transcriptomic profiling.

Now imagine training a model on that “ambiguous truth” and then relying on its prediction as the foundation for a therapeutic decision. That’s where the biomarker trap appears: confusing predictive power with therapeutic utility.

This isn’t a one-off issue. It happens repeatedly: AI models that predict recurrence, progression, or response… yet can’t translate that prediction into actionable steps—because we don’t understand the biological mechanism behind it. We know what’s likely to happen, but not why, or how to prevent it.

And here’s something even more concerning: if the training data is heterogeneous, ambiguous, or even contradictory—as is often the case in real-world settings—the model may amplify that confusion instead of resolving it. Instead of bringing us closer to truth, we end up refining an imprecision.

That’s why a validated biomarker must be more than just correlational. It needs to be biologically grounded, mechanistically coherent, and—above all—able to guide a concrete clinical decision (14).

Share Beyond the Slide

Noise dressed as pattern: the risk of amplifying heterogeneity

One of the subtlest risks of artificial intelligence in medicine is its ability to detect patterns even where none truly exist(we don’t know why it happens, but it does). When we train a model on inconsistent or low-reproducibility clinical data, we’re not resolving uncertainty—we’re codifying it.

This is especially evident in tumors like gliomas, where interpretation of MGMT status or histological classification can vary even among experts. If a model is trained on these unstable “truths,” it will end up confidently replicating what is, in fact, a clinical ambiguity.

The problem grows when data comes from multiple centers, scanners, and protocols without clear harmonization. AI doesn’t correct that variability—it absorbs it. And what the model learns may not be a true biological signal, but a statistical artifact disguised as truth.

That’s why building robust models starts with a basic but critical demand: ensuring the quality and consistency of data from the very beginning. Because if we train on noise, no matter how sophisticated the algorithm is—the result will still be noise, just better presented.

The Human Imperative

From Prediction to Care: when the algorithm needs interpretation

Artificial intelligence models can generate scores, heatmaps, alerts, and predictions. But turning those outputs into real clinical decisions—into concrete therapeutic actions—remains a deeply human task.

Many models in computational pathology can predict mutations, tumor grades, or recurrence risks. Yet when those predictions land on the clinician’s desk, an inevitable question arises: what do we actually do with this?
Would we trust an algorithm blindly—even when something inside us, something shaped by experience and clinical reasoning, tells us: this doesn’t quite fit?

A digital score might indicate high risk, but is that enough to alter clinical conduct? Does it carry prognostic weight? Will it change treatment? Is it aligned with the patient’s clinical context, medical history, comorbidities, values, and environment?

The truth is, many AI outputs today—no matter how precise they may seem—are not directly actionable in clinical practice (12). They often require integration with other layers of information—clinical, genetic, social—to gain meaning. And that interpretive process still depends on human judgment(3).

This isn’t a call to reject AI. It’s about placing it where it truly adds value: as one more signal in a broader diagnostic process. A tool that helps us see more clearly—not a voice that overrides the physician’s own(3).

The transition from microscope to digital workflow was already a challenge. But what truly demands a paradigm shift isn’t the technology—it’s the pathologist’s role. Today, it’s no longer just about observing and describing; it’s about connecting data, context, and clinical decisions. About moving from prediction to care.

And for that, the algorithm isn’t enough.

What the algorithm doesn’t see: critical thinking, context, and responsibility

No matter how advanced artificial intelligence becomes, there’s something it still cannot do: recognize when it’s wrong.

An algorithm can identify patterns it has seen before, but it struggles with the unexpected. It lacks clinical intuition, contextual awareness, and the sensitivity that comes from years of experience—behind the microscope and alongside the patient. And while it can automate tasks efficiently, it cannot take responsibility for a clinical diagnosis.

That responsibility still falls on the pathologist(3).

Interpreting an algorithmic output is not just about validating a number; it’s about asking whether that number fits the case, the history, and everything that lies outside the image but within the clinical context. It’s about exercising judgment. Holding space for doubt. Having the courage to say: this doesn’t convince me.

The real challenge isn’t just technical—it’s cultural (15). Integrating AI into clinical workflows requires rethinking roles, overcoming inertia, and acknowledging that without proper data governance, interoperability, reasonable load times, or proven clinical impact, adoption simply won’t happen.

Or, even more concerning, AI adoption might be politically driven—out of sync with the medical community. This could cause deep frictions in the healthcare system and damage public trust in the technology, leading to a perception of AI not as a solution, but as the most expensive tool in history with the least clinical usefulness.

AI can amplify our capabilities.
But it cannot replace clinical insight.
It cannot substitute critical thinking.
And it cannot carry the ethical weight of the diagnosis.

That still belongs to us.

Pioneers of the Future

From Technical Promise to Clinical Use: three keys to move forward

If we truly want artificial intelligence to transform medical practice, we need to stop focusing solely on more accurate models and start building more useful systems. It’s not just about improving metrics—it’s about solving real-world friction. And to do that, there are three critical areas where decisive action is needed:

Actionable XAI: it’s not enough to show—we must explain

Explainability cannot be limited to a heatmap or a line of text on a screen. What pathologists need is not just to know where the algorithm looked, but why it reached that conclusion. Actionable XAI goes beyond visual transparency: it allows us to understand the underlying reasoning, question it, and sometimes even uncover new biological clues that might have gone unnoticed (16,17).

This doesn’t just build trust—it enhances clinical learning and opens the door to true human–AI collaboration.

Human-centered AI: design for those who decide

Many current models seem built to win technical benchmarks, not to fit within real clinical environments. Truly useful AI should adapt to the medical workflow, speak the clinician’s language, and support the decisions that only a physician can make. Diagnostic copilots—conversational models with contextual clinical understanding—are a step in that direction(18).

The goal isn’t to replace the pathologist (3), but to make them more strategic, more precise, and more focused on what truly matters.

Data quality: what feeds the model defines its value

If the data is poorly labeled, poorly scanned, or poorly defined, no algorithm—no matter how sophisticated—can produce reliable results. Data quality is not a minor technical issue; it’s a strategic decision.

Shifting from reactive cleanup to proactive prevention means setting clear standards, automating quality control from the source, and accepting that without strong data foundations, anything built on top will be fragile.

From Tools to Transformation: three pathways toward real impact

Beyond accuracy or innovation, what will define the true value of AI in pathology is its ability to integrate, protect, and transform. Not with futuristic promises, but with concrete solutions to real-world obstacles. Here are three essential directions:

Multimodal integration: seeing the whole patient, not just the specimen

Most models today operate on a single data type—a slide, a sequence, a report. But patients are not one-dimensional. They are clinical histories, molecular signatures, imaging, life context. AI that truly adds value will be the kind that connects all of that.

Foundation models trained across multiple modalities—images, text, genomics, clinical data—are opening that door (1). They don’t just aim to predict, but to understand contextually, much like a pathologist at a multidisciplinary tumor board.

Federated learning: protecting privacy without fragmentation

One of the biggest barriers to AI development in healthcare is access to data. Privacy laws—legitimate and necessary—restrict centralization. Federated learning offers an elegant solution: allow collaborative model training without moving patient data from its origin (19).

This not only safeguards privacy. It also enables the creation of more diverse, more robust, and more generalizable models by incorporating knowledge across clinical contexts while preserving confidentiality.

Transformative training and ethics by design: preparing those who decide

No AI model will be truly useful if the physician doesn’t know how to interpret it, validate it, or reject it. AI literacy should no longer be optional—it must become a core part of training for the pathologist of the 21st century.

At the same time, ethics cannot come as an afterthought. It must be built into every stage: from model design and data selection to deployment and feedback loops. Clinically useful AI is, first and foremost, responsible AI.

And who will build it?

Talking about solutions is just the first step. The real challenge isn’t identifying them—it’s shaping them into clinical, technical, and operational reality. It won’t be enough to name them in presentations or mention them at conferences. Implementing explainable AI. Designing workflows centered on the pathologist. Deploying multimodal models that actually work in real-world settings. All of this takes more than vision—it takes craft.

And that, perhaps, is the part I find most compelling.

Because what lies ahead is no longer about imagining the future.
It’s about implementing it.
And in that process, speeches won’t be enough.

These six strategic directions are not a wishlist. They’re a concrete, realistic, deployable roadmap. And each one already has mature technology behind it, solid publications, and teams ready to collaborate.

The problem was never a lack of solutions.
It was a lack of people willing to build them—
with clinical judgment and systemic vision.

That’s exactly what I intend to do.
And those who know, already know.

References:

How AI Is Transforming Pathology: Progress & Challenges - ThinkBio.Ai, access: julio 9, 2025, https://www.thinkbio.ai/resources/ai-in-pathology-smart-diagnostics/
Lopes S, Mascarenhas M, Fonseca J, Fernandes MGO, Leite-Moreira AF. Artificial Intelligence in Thoracic Surgery: Transforming Diagnostics, Treatment, and Patient Outcomes. Diagnostics. 2025 Jan;15(14):1734.
Default [Internet]. [cited 2025 Jul 11]. Why AI Won’t Replace Laboratory Professionals and Pathologists. Available from: https://criticalvalues.org/news/all/2023/07/05/why-ai-won-t-replace-laboratory-professionals-and-pathologists
Introduction to AI in Pathology: Main Values & Challenges [Internet]. Scopio Labs. 2024 [cited 2025 Jul 11]. Available from: https://scopiolabs.com/ai/introduction-to-ai-in-pathology-main-values-challenges/
Ethical issues in computational pathology - Buscar con Google [Internet]. [cited 2025 Jul 11]. Available from: https://jme.bmj.com/content/48/4/278
Shen IZ, Zhang L. Digital and Artificial Intelligence-based Pathology: Not for Every Laboratory – A Mini-review on the Benefits and Pitfalls of Its Implementation. Journal of Clinical and Translational Pathology. 2025 Jun 30;5(2):79–85.
AI Algorithms Used in Healthcare Can Perpetuate Bias [Internet]. [cited 2025 Jul 11]. Available from: https://www.newark.rutgers.edu/news/ai-algorithms-used-healthcare-can-perpetuate-bias
Chanda C. Navigating the Ethical Complexities in AI-driven Digital Pathology [Internet]. PreciPoint. 2025 [cited 2025 Jul 11]. Available from: https://precipoint.com/en/digital-microscopy/navigating-the-ethical-complexities-in-ai-driven-digital-pathology
Lottu O, Jacks B, Ajala O, Okafor E. Towards a conceptual framework for ethical AI development in IT systems. World Journal of Advanced Research and Reviews. 2024 Mar 30;21:408–15.
Ho SY, Wong L, Goh WWB. Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy. PATTER [Internet]. 2020 May 8 [cited 2025 Jul 11];1(2). Available from: https://www.cell.com/patterns/abstract/S2666-3899(20)30025-8
Correlation vs. Causation in Experimentation and Data Analysis [Internet]. [cited 2025 Jul 11]. Available from: https://www.geteppo.com/blog/correlation-vs-causation-guide, https://www.geteppo.com/blog/correlation-vs-causation-guide
He Y, Duan L, Dong G, Chen F, Li W. Computational pathology-based weakly supervised prediction model for MGMT promoter methylation status in glioblastoma. Front Neurol. 2024 Feb 7;15:1345687.
Zappe K, Pühringer K, Pflug S, Berger D, Böhm A, Spiegl-Kreinecker S, et al. Association between MGMT Enhancer Methylation and MGMT Promoter Methylation, MGMT Protein Expression, and Overall Survival in Glioblastoma. Cells. 2023 Jan;12(12):1639.
Correlation vs. Causation: How Causal AI is Helping Determine Key Connections in Healthcare and Clinical Trials [Internet]. DIA Global Forum. 2024 [cited 2025 Jul 11]. Available from: https://globalforum.diaglobal.org/issue/october-2024/correlation-vs-causation-how-causal-ai-is-helping-determine-key-connections-in-healthcare-and-clinical-trials/
Reis-Filho JS, Kather JN. Overcoming the challenges to implementation of artificial intelligence in pathology. J Natl Cancer Inst. 2023 Mar 17;115(6):608–12.
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, et al. [Explainable artificial intelligence in pathology]. Pathologie (Heidelb). 2024 Mar;45(2):133–9.
Rathod J. Beyond the Black Box: Explainable AI (XAI) for Ethical AI | Blog [Internet]. ACL Digital. 2025 [cited 2025 Jul 11]. Available from: https://www.acldigital.com/blogs/beyond-black-box-explainable-ai-ethical-responsible-ai
Bilal M, Aadam, Raza M, Altherwy Y, Alsuhaibani A, Abduljabbar A, et al. Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact [Internet]. arXiv; 2025 [cited 2025 Jul 11]. Available from: http://arxiv.org/abs/2502.08333
Jeong H, Chung TM. Security and Privacy Issues and Solutions in Federated Learning for Digital Healthcare. In 2022 [cited 2025 Jul 11]. p. 316–31. Available from: http://arxiv.org/abs/2401.08458

Beyond the Slide