3.2 AlphaFold Discovery
The 2020 discovery by AlphaFold was not just another AI victory in a scientific competition. It was a fundamental breakthrough in biology, comparable in significance to deciphering the structure of DNA or the invention of the microscope. AlphaFold solved one of science's greatest and most complex challenges — the protein folding problem — which humanity had struggled with for over 50 years.
The Problem: The Riddle of Life's "Second Code"
What is a protein? Proteins are molecular machines that perform virtually all functions in a living organism: they transport oxygen (hemoglobin), digest food (enzymes), protect against viruses (antibodies), and form muscles and tissues.
What is "folding"? A protein is synthesized in a cell as a long chain of amino acids (20 types). This chain does not remain straight — it folds within fractions of a second into an extremely complex three-dimensional structure, unique to each protein. It is this 3D shape that determines the protein's function. Just as a key fits a lock, the shape of an enzyme's active site fits the molecule it is meant to break down.
What's the difficulty? Predicting how a chain of hundreds of amino acids will fold in space is incredibly difficult. The number of possible conformations is astronomically large (Levinthal's paradox). Experimental methods for determining structure (X-ray crystallography, cryo-electron microscopy) require months or years of meticulous work, expensive equipment, and often fail.
AlphaFold's Solution: How AI "Saw" the Invisible
DeepMind's AlphaFold is not a single algorithm but a complex "pipeline" of several neural networks that process information at different levels.
1. Gathering "Evolutionary Clues" (Multiple Sequence Alignment - MSA)
The system begins not by analyzing a single protein sequence, but thousands of related sequences from different species.
The Logic: If two amino acids in different proteins mutate together throughout evolution, it is a strong signal that they are close to each other in the 3D structure. It's like figuring out how furniture is assembled by looking at instructions in different languages — similar descriptions will point to the same connections.
Neural Network #1 (Evoformer): This is the heart of AlphaFold. A special transformer architecture (Evoformer) analyzes a giant multiple sequence alignment matrix. It incredibly efficiently identifies deep, non-obvious correlations between amino acids that may be separated by hundreds of positions in the chain but end up as neighbors in space. Its output is a potential distance map between all pairs of amino acids in the protein.
2. Building the 3D Model
Neural Network #2 (Structure Module): It takes information from the Evoformer and builds a spatial model. Its task is to design the actual 3D coordinates of the protein backbone atoms so that they correspond to the predicted distances and angles.
Physical Plausibility: The model doesn't just build abstract geometry. It accounts for the laws of physics (bond angles, bond lengths, steric hindrances) to make the structure chemically plausible. The final model is a "cloud" of atoms with spatial coordinates.
3. Assessing Reliability (Confidence Score - pLDDT)
AlphaFold doesn't just give an answer. For each predicted region of the protein (and for each amino acid), the model assigns a confidence score from 0 to 100 (pLDDT). Regions with a high score (>90) are predicted with accuracy comparable to experimental methods. A low score (<50) often indicates intrinsically disordered, flexible regions of the protein — and this too is important biological information.
The Scale of the Achievement: Numbers That Changed Science
CASP14 (2020): At the prestigious Critical Assessment of protein Structure Prediction (CASP) competition, AlphaFold2 achieved an average accuracy of ~92 GDT (Global Distance Test), far exceeding the result of any previous system (~75 for the previous CASP winner) and surpassing the threshold considered equivalent to experimental accuracy.
AlphaFold DB: In July 2021, DeepMind, in collaboration with the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), released to the public the predicted structures for nearly all human proteins and 20 other key organisms — over 350,000 structures in total. Later, this database expanded to over 200 million structures, covering almost all known protein sequences.
Practical Consequences: A Revolution in Biology and Medicine
AlphaFold did not replace experimental biology, but radically accelerated and redirected it.
Drug Discovery: Scientists can now see the exact 3D structure of a drug target (e.g., a protein on the surface of a virus or cancer cell) and use computer-aided design to create a molecule that fits into it perfectly. This cuts years of costly trial and error.
Understanding Diseases: Many genetic diseases are caused by a "break" in a protein's amino acid sequence, which changes its shape and function. Now, we can instantly model how exactly a mutation distorts the structure and understand the disease mechanism.
Synthetic Biology: Engineers creating new enzymes to break down plastic or produce biofuels can use AlphaFold to design a protein with the desired function "from a blueprint."
A New Language for Biology: Structural information is becoming a basic, accessible property of a protein, just like its genetic sequence. This changes the very methodology of science: hypotheses can now first be tested "in simulation."
Philosophical Meaning: AI as a Tool of Cognition
AlphaFold is not just a powerful calculator. It is a system that uncovered fundamental patterns in evolutionary data that are non-obvious to humans. It learned to "read" the history of life recorded in protein sequences and translate it into the language of three-dimensional structures. This proves that AI can become a fundamentally new tool for scientific discovery, capable of finding connections in data of such scale and complexity that they surpass human intuition.
The AlphaFold breakthrough showed that the most complex scientific problems can be solved not only by the genius of an individual scientist but also by the genius of a neural network architecture trained on the entire corpus of human knowledge. This opens an era of accelerated scientific and technological progress, driven by the symbiosis of human and artificial intelligence.