Template Learning: Deep learning with domain randomization for particle picking in cryo-electron tomography

Table of Contents

Simulations to train deep learning models on particle picking

Template Learning is a streamlined pipeline for simulating cryo-ET synthetic data for training deep learning models on particle picking. The fundamental goal of Template Learning is to train deep learning models in three aspects:

1.

Identifying targeted particles across diverse variations.
2.

Differentiating target particles from other structures, especially in crowded environments.
3.

Expanding the capabilities of the models to handle the inherent variations present in cryo-ET experimental data.

The practical application of these aspects is achieved through our proposed pipeline, illustrated in Fig. 1 and further elaborated in the subsequent sections.

**Fig. 1: Template Learning workflow for generating simulations used for supervised deep learning used for cryo-ET picking of target particles.**

To enable trained deep learning models to effectively identify various variations of the target biomolecule, we utilize following two strategies:

1.

We use multiple templates of one target biomolecule to account for compositional or significant conformational variabilities of the target particles.
2.

We generate random flexible variations of each template using Normal Mode Analysis (NMA—a method for fast molecular mechanics simulations, see Methods for details)³⁹. The templates with their flexible variations are then simulated at uniformly random orientations to account for orientational variations.

Templates in this study are typically atomic structures. However, since the availability of atomic structures may vary for different target biomolecules, using lower resolution cryo-EM density maps (volumes) is also possible and is explored in a subsequent section.

The existing literature on domain randomization for object recognition in natural images simulates the target objects within a background containing a library of other random objects⁴⁰. These additional objects, distractors, obscure or cast shadows on segments of the target objects. In this study, we selected 100 dissimilar protein assemblies, with diverse molecular weights ranging from 30 kDa to 1 MDa to serve as elements within the background (a comprehensive list of these proteins is available in Supplementary Fig. 1). We inspired our selection of these distractors from the work outlined in TomoTwin, knowing that these structures have different enough feature representations as interpreted by convolutional neural networks allowing to cluster them into corresponding classes. To achieve a similar effect of distractors for cryo-ET data, i.e., overlapping at the level of the tilt series and the missing wedge artifacts, these proteins must be positioned in close proximity to the templates during data modeling.

Recent works^27,41 that involve simulating cryo-ET data for practical applications (e.g., deep learning) utilize an iterative brute-force random placement algorithm, involving the rotation of a duplicate of a molecule in each iteration to find a suitable non-overlapping position within the sample. Nonetheless, random placement leads to unstructured empty spaces between the molecules that prevent achieving highly dense samples.

Two other approaches simulate molecular crowding of cryo-ET data^42,43. The first represents molecules as spheres, based on calculating the minimum bounding sphere for each molecule, and simulates crowding by optimizing a sphere-packing problem. However, this approach is limited to generating high crowding only for spherical-shaped molecules. The second builds on the first by including an additional step of molecular dynamics simulations to enhance packing density. However, routine application of this approach requires significant computational resources.

To achieve high molecular crowding while maintaining generality and efficiency, we drew inspiration for a simplified approach from a recent concept on packing generic 3D objects⁴⁴, a solution we term the “Tetris algorithm.” Our algorithm, illustrated in Fig. 2, operates on an intuitive principle: it places molecules iteratively, with each iteration positioning a new molecule at a uniform random orientation as close as possible to those already placed (the user only needs to choose the minimum distance). For more details on the principles of the Tetris algorithm, see “Methods”.

**Fig. 2: The Tetris algorithm for generating crowded 3D samples.**

Simulations can partially replicate the domain of real-world data; however, variations in the real world can be unpredictable, and simulations simplify some complex phenomena for practical reasons. Thanks to domain randomization^34,36, simulating an exact match of the real world is not necessary for training deep learning models capable of generalizing to real-world data. Instead, domain randomization focuses on training these models on diverse simulated scenarios to minimize the influence of the domain on their capabilities.

In this work, we employ a cryo-ET physics-based simulator called Parakeet²⁸. Parakeet, and its MULTEM backend⁴⁵, provide control over numerous parameters for simulating cryo-ET hardware and the behavior of biological samples during data recording. While many parameters related to the electron beam, lenses, detector, sample, and data acquisition strategy can be controlled, it might not be necessary to randomize every available parameter. On one hand, the combinations required to cover all conditions grow exponentially with the number of variables. On the other hand, some variables might have undesirable effects on the overall simulation. Here, we propose varying a few essential parameters that significantly influence the output simulation: the electron dose, defocus, tilting range, tilting step, and ice density, while giving other parameters default values. The electron dose is crucial for varying the SNR and the irradiation effects on the sample. The defocus plays a key role in controlling the Contrast Transfer Function (CTF)⁴⁶. Varying the tilting range and step may not replicate experimental data acquisition strategies, however, it can expose the deep learning model to variations that increase its robustness, for example, to artifacts and imperfections resulting from tilt series alignment and reconstruction. Lastly, different ice densities can simulate the variability of sample thickness and solvent composition. A combination of three values for the defocus and two values for each of the other variables results in 48 combinations that we found sufficient for efficient domain randomization.

Benchmarking Template Learning for picking ribosomes in situ

In a recent study²⁰, the first exhaustively annotated in situ cryo-ET dataset was published (EMPIAR-10988). Here, we use this dataset to evaluate the proposed Template Learning thoroughly.

In the following, we generate, based on the Template Learning workflow, a simulated dataset to train DeepFinder¹⁹ for ribosome annotation. We compare the performance of this model (i.e., DeepFinder trained solely on simulations) to the contemporary techniques, namely, DeePiCt²⁰ and DeepFinder (trained solely on annotated experimental data), and 3D template matching (we will refer to it as template matching hereafter).

We organized our experiments based on two principles. First, we benchmark the complete method that integrates the aforementioned concepts of Template Learning. Then, we conducted experiments where specific elements from the data simulation pipeline were intentionally omitted (e.g., the structural variations of the target template, the crowding, etc.) to assess their impact on performance and to offer guidelines for future users. Second, acknowledging that the size of a simulated dataset is user-defined (generally in deep learning, the more training data, and the more diversity, the better), we ensured the feasibility of routine use of the method by limiting its time requirement to a maximum of two days on a single GPU (within one day on a typical computing node with 4 GPUs, see Code Availability for details).

To establish a Template Learning workflow, we proceeded with the following steps. We selected 6 eukaryotic ribosome PDB structures (PDB IDs: 4UG0, 4V6X, 6Y0G, 6Y2L, 6Z6M and 7CPU).

The 6 selected structures differ slightly in composition or conformation. By applying NMA, we generated 25 flexible variations for each structure (refer to the NMA section in Methods for details). While generating synthetic data, we employed the aforementioned general distractors (a full list of distractors is given in Supplementary Fig. 1). To simulate a crowded environment, we generated volumes from the template structures (i.e., the 6 ribosome PDBs and their flexible variations) and distractors with a sampling rate of 16 Å using Eman2⁴⁷ software e2pdb2mrc. We used the generated volumes as input to populate 48 crowded volumetric distributions of size 192 × 192 × 64 voxel³ using the Tetris algorithm, with ribosome templates appearing at a frequency of one for every 5 distractors to produce a balanced volumetric density ratio balance between distractors and templates in the simulated tomograms (see Tetris algorithm section in Methods for details). Subsequently, we fed the sample information (i.e., atomic structures of templates and distractors with the positions and orientations of crowding generated using the Tetris algorithm) to Parakeet²⁸ to simulate 48 tilt series using a dose symmetric tilting scheme using the parameters listed in Supplementary Table 1. All other parameters adhered to default values for Volta Phase Plates (VPP) simulations within the Parakeet software. Subsequently, we binned the simulated tilt series to a similar sampling rate as the analyzed dataset (13 Å/pixel) and reconstructed them in IMOD⁴⁸ using Weighted Back Projection. We used the tomograms and their corresponding ground-truth template coordinates (around 6500 simulated ribosomes) and segmentations (i.e., volumes of the same size as the tomograms where template instances appear in white over black background, illustrated in Supplementary Fig. 2) to train DeepFinder¹⁹ using its default parameter settings (the list of parameters is given in Supplementary Table 2).

We applied the DeepFinder simulations-trained model to the 10 VPP tomograms of the EMPIAR-10988 dataset without any preprocessing, resulting in segmentation maps, i.e., ribosome segmentations based on the decisions of the model. Subsequently, we employed the MeanShift algorithm offered by the DeepFinder software, using a clustering radius of 10 voxels, to extract both the coordinates and size of these segments (count of the number of voxels of each segment), with and without the application of a mask (specifically, a cytosol mask sourced from the dataset, utilized to eliminate false positives outside the mask; see Supplementary Note 1 for more details about these masks). We compare the extracted coordinates to the expert-validated annotations provided by the dataset at different levels of segment sizes used as a threshold (i.e., segments smaller than the threshold were removed). For this comparison, we maintained the criterion that two annotations—one from the output of the deep learning model and the other from the expert annotations—were considered to target the same particle if their spatial distance was within 10 voxels (a value consistent with the methodology of a previous study analyzed the same dataset²⁰). An example of a tomogram, its corresponding segmentation map, and the coordinates set at a threshold that removed smaller objects compared to expert annotations are presented in Fig. 3. To evaluate the performance of the method, we computed the overall Recall and Precision with and without masking computed over all the tomograms jointly (refer to the Methods section for more details on the assessment metrics). The point where Precision and Recall curves intersect was utilized to determine the F₁ score per tomogram for the dataset (shown in Fig. 4A). The results, further illustrated in Fig. 4I, J, reveal that Template Learning used to train DeepFinder exclusively on simulations, achieved state-of-the-art performance, outperforming template matching and previous deep learning methods trained solely on annotations derived from the same experimental dataset based on the median F₁ score (compared to values reported in ref. ²⁰).

**Fig. 3: Template Learning trains DeepFinder to single molecule precision on segmenting and annotating ribosomes in situ.**

**Fig. 4: DeepFinder trained on Template Learning simulations only outperforms previous techniques for ribosome annotations in cryo-electron tomograms.**

To reflect on these results, it is essential to highlight that the previously reported findings for DeePiCt and DeepFinder were based on models trained with 8 out of 10 fully annotated tomograms, containing approximately 20,000 ribosome annotations²⁰. Hence, previous evaluations were conducted on 2 out of 10 tomograms, for annotating the remaining 5000 ribosomes, utilizing three cross-validation splits (in each split, a random set of 8 tomograms was used for training and the remaining 2 tomograms for benchmarking). Additionally, each of the two models underwent a different training strategy to achieve optimal performance. In particular, in the case of DeepFinder, the training encompassed two classes—ribosomes and Fatty Acid Synthase (FAS)—as training solely on ribosomes led to suboptimal performance. On the other hand, DeePiCt was exclusively trained on ribosomes, as unlike DeepFinder, combining ribosomes and FAS during training led to suboptimal performance.

In contrast, the DeepFinder model trained solely on the simulations generated from the Template Learning workflow, not fine-tuned on any annotated experimental data, was benchmarked on the complete dataset (i.e., all 10 tomograms). Hence, despite the relatively modest increase in the F₁ score for the Template Learning method (0.85, compared to the previous best of 0.83), it stands as a demonstration that deep learning can be trained effectively for picking a target structure, starting from only prior templates, and domain-randomized simulations. Also, a model of the same network architecture (DeepFinder) trained only on simulations outperforms its counterpart trained only on annotated experimental datasets for cryo-ET particle picking.

To systematically assess the contribution of each component of the Template Learning workflow, we performed a series of ablation and variation experiments, as detailed in the Supplementary Materials – Template Learning variations and Supplementary Notes 3 and 4. These analyses revealed that both incorporating multiple atomic structures and generating flexible molecular variations enhance annotation performance, and that introducing artificial flexible variations can partially compensate for the absence of multiple templates (Fig. 4A–C compared to Fig. 4D and Supplementary Fig. 3). The comparative impact of these design choices is illustrated by recall, precision, and F₁ score. The inclusion of a diverse set of distractor molecules was found to be critical for high precision in particle picking; omitting or restricting distractor variability substantially increased false positive rates (Fig. 4E, F). Furthermore, simulating densely crowded environments via the Tetris algorithm (Supplementary Fig. 4) was essential, as reducing crowding in training simulations led to a marked drop in median F₁ score (Fig. 4G). We also validated the use of low-resolution volumetric templates by developing a real-time conversion to pseudoatomic models compatible with physics-based simulations; this approach enabled training with reasonable recall, albeit at the expense of some precision (Fig. 4H). In addition, Template Learning could be readily adapted to different imaging domains, such as VPP and conventional defocus (DEF), and offered improved performance on cross-domain tasks compared to previous approaches (Fig. 5 and Supplementary Fig. 5).

**Fig. 5: Template Learning does not necessitate data preprocessing for DEF tomograms.**

Fine-tuning Template Learning-trained models using real data

The literature on domain randomization indicates that models pre-trained on simulations and subsequently fine-tuned on real-world data outperform models trained exclusively on real-world data³⁵. This approach can be particularly valuable to cryo-ET particle picking studies that involve challenging-to-annotate molecules, or when training on domain randomization simulations alone does not yield satisfactory results.

In the same dataset used for benchmarking Template Learning on ribosomes in situ (EMPIAR-10988), another molecule, FAS, is annotated. Previous attempts based on template matching and supervised deep learning on experimental datasets have reported challenges in annotating FAS, attributed to its distinctive shell-like structure, where particles have lower SNR compared to ribosomes²⁰.

In the following, we present the benchmarking of Template Learning combined with DeepFinder, initially trained exclusively on a simulated dataset, on FAS annotation in EMPIAR-10988 tomograms. Subsequently, we gradually introduce fine-tuning based on the experimental data annotations to assess whether performance can be improved.

To establish a Template Learning workflow for FAS, we utilized two PDB structures (PDB IDs: 4V59 and 6QL5). By using NMA, we generated 80 flexible variations for each structure (refer to the NMA section in the Methods for details). We kept the remaining parameters for simulated data generation consistent with those aforementioned for the ribosome study, except for training the models on sphere segmentations due to the shell-like structure of FAS.

The F₁ score results of the simulation-trained DeepFinder model for FAS annotations are presented in Fig. 7. The results show that on the VPP dataset, the DeepFinder model trained on the Template Learning workflow outperformed that trained exclusively on experimental data. The results on the DEF dataset gave an F₁ score ranging up to 22%, outperforming previous attempts that failed to pick any particle, based on the results reported in ref. ²⁰. However, these results can still benefit from improvements compared to previously obtained results with DeePiCt trained on annotated experimental data.

We proceeded to fine-tune the simulations-trained DeepFinder model progressively using annotations from experimental data. In the following experiment, we fine-tuned it using annotations from 2 VPP tomograms containing approximately 150 FAS annotations. We kept all the DeepFinder training parameters to default (listed in Supplementary Table 2), except for reducing the number of steps and epochs to 10 steps for 10 epochs (in place of 100 steps for 100 epochs) to avoid overfitting (since the number of training data examples is low). We performed three cross-validation experiments (i.e., in each experiment, the data is randomly split into 2 tomograms for training and 8 for benchmarking). The results presented in Fig. 6 show a significant increase in the median F₁ score compared to training on simulated data only. Notably, de Teresa-Trueba et al.²⁰ previously reported that training a deep learning model with fewer than 600 particles was ineffective. Therefore, the ability to fine-tune our model with only 150 particles after pre-training on simulations represents a major advancement over previous supervised methods that required larger datasets to train models from scratch using only experimental annotations.

**Fig. 6: DeepFinder can be pre-trained on Template Learning simulations and fine-tuned on experimental data annotations.**

In a subsequent experiment, we fine-tuned the simulation-trained DeepFinder model on 8 VPP tomograms containing approximately 600 FAS annotations and inferred the model on the remaining 2 tomograms, using 3 cross-validation splits. Again, we kept all the DeepFinder training parameters to default, except for the number of steps and epochs, keeping 10 steps for 30 epochs (again, to prevent overfitting). The results presented in Fig. 6 show a significant increase in the median F₁ score, outperforming previous supervised deep learning methods trained solely on experimental data.

Finally, we performed a cross-domain experiment, where we fine-tuned the simulation-trained DeepFinder model on the VPP dataset and applied this model to annotate FAS in the DEF dataset after SM preprocessing. Consistent with previous findings, the results presented in Fig. 6 show that training on simulations and fine-tuning on experimental data outperforms training only on experimental data.

Template Learning improves precision and isotropy in nucleosome picking

In cryo-ET data processing, localizing a target biomolecule in a new dataset is a common objective. The application of supervised deep learning to this task requires an initial training dataset of particles of interest extracted directly from the new data. Such a dataset is usually in the order of thousands of particles in different orientations, where more particles are needed for better performance. Manually annotating these particles is time-consuming, making template matching followed by manual elimination of obvious false positives and further curation through STA and classification the most common approach^19,20. The principle of Template Learning offers a time and computing-efficient alternative to this complex procedure by annotating new data directly in a single step using a model trained on synthetic data.

In this section, we explore the efficiency of Template Learning for nucleosome annotation within a new cryo-ET dataset of partially decondensed mitotic chromosomes in vitro (refer to the Method section for sample preparation details) and compare it with a template matching-based annotation routine.

A tomogram central slice of our data is shown in Fig. 7A. Despite this dataset being isolated chromosomes in vitro, it contains an abundance of structures other than nucleosomes that can be false positives which are: DNA linkers, gold nanoparticles, percoll, and the non-histone components of chromatin.

**Fig. 7: Template Learning outperforms traditional template matching in Precision and orientational isotropy in annotating nucleosomes.**

To establish a Template Learning workflow, we utilized six nucleosome template structures with PDB IDs: 2PYO, 7KBE, 7PEX, 7PEY, 7XZY, and 7Y00.

By applying NMA, we generated 100 flexible variations for each structure (more details can be found in the NMA section of “Methods”). Notably, these templates represent different compositional variations of the nucleosome. The 7PEX structure includes the H1 protein, while the other five templates lack H1 but exhibit varying DNA linker lengths.

The remaining parameters of the Template Learning workflow were configured similarly to those used for the workflow established for ribosome and FAS annotation, with a few notable exceptions explained below.

Firstly, recognizing the smaller size of nucleosomes compared to ribosomes, we adjusted the frequency of appearance of nucleosome templates to one nucleosome for every two distractors (in contrast to one ribosome for every five distractors). This adjustment aimed to maintain a balanced volumetric density ratio between distractors and templates in the simulated tomograms.

Secondly, recognizing that nucleosomes have fewer atoms than ribosomes, we observed increased speed in the execution of the physics-based simulator (i.e., Parakeet). This allowed for the generation of larger tomograms for training, all within the same runtime for the ribosome study (see Code Availability for details). In particular, the tomogram size generated in the Tetris algorithm here was set to 256 x 256 x 64 (for a pixel size of 16 Å), in contrast to 192 x 192 x 64 used for ribosome simulations.

Lastly, we adjusted the pixel size of the simulated data to 8 Å through binning, closely approximating the pixel size of our experimental tomogram, which had been previously binned four times before annotation (the unbinned pixel size was 2.075 Å).

Consequently, we trained DeepFinder on the resulting Template Learning simulations and applied it to segment our data. The corresponding segmentation map (score map), is shown in Fig. 7B. Upon visual inspection, it is evident that that model has high Precision in localizing nucleosomes. Subsequently, we employed the MeanShift algorithm from the DeepFinder software to extract the corresponding annotations for the nucleosomes, utilizing a clustering radius of 6 pixels, approximately equivalent to the nucleosome’s radius at this pixel size. We excluded particles close to the edges of the tomogram, the air-water interface, and the sample-carbon interface.

Following this process, we obtained 18 k annotated particles and proceeded with the reference-free STA procedure using Relion V4.0¹⁵ by generating an initial model from the data, followed by two stages of refinement at binning 4 and binning 2, resulting in resolving the nucleosome structure at 12.8 Å (at 0.143 FSC threshold) resolution shown in Fig. 7B.

The angular distribution of the averaged particles seems isotropic, showcasing that the Template Learning method is not biased to certain orientations.

To further verify the Precision of nucleosome identification, we classified the particles into 10 and 50 classes (presented in Supplementary Fig. 6). All class average results showed well-recognizable nucleosome shapes with DNA gyres, with the major differences being in the heterogeneity of the DNA entry-exit segments. We also performed a local resolution assessment using ResMap⁴⁹ (presented in Supplementary Fig. 7) and observed that resolution has ranged from 12–20 Å, with the highest resolution in the DNA close to the core histones, and the lowest resolution around the DNA entry-exit segments.

Our averaging and classification results show that our Template Learning workflow generated precise nucleosome annotations of uniform orientations, facilitating high-throughput STA without the need for multiple rounds of classification to eliminate false-positive particles.

We applied template matching starting from a nucleosome template structure generated from the PDB 2PYO using the default procedure in PyTom¹⁸. We then applied the template matching procedure to our new data using an angular sampling of 7° increments (45,123 rotations).

The score map is shown in Fig. 7C. The visual inspection of the score map shows some high signal for some nucleosomes also identified by the Template Learning procedure (Fig. 7A), but also obvious false positive signal for other objects (an example of high false signal for the Percoll particles is shown at the center of the zoomed-in image in Fig. 7C).

To have a meaningful and fair comparison with the Template Learning procedure, we extracted from the same tomographic region the 18 k particles showing the highest cross-correlation scores. Unlike the results of Template Learning, a brief visual inspection of the template matching peaks showed 50 obvious false positives corresponding to the 10 nm gold and Percoll particles. The false positives were removed, and the data were processed Relion V4.0¹⁵. The reference-free initial model was generated, followed by a stage of refinement at binning 4. Unlike Template Learning, the refinement did not result in a significant improvement of the resolution of the initial model (judged by the shape and the FSC), indicating the presence of a significant ratio of non-nucleosome particles among the annotations. In agreement with this assumption, classification, and alignment into 10 classes at the same binning resulted in only 2 nucleosome-like classes that sum to 10.25 k picks (57% of the original picks). The particles of these 2 classes were selected for a further refinement process at binning 4 reached Nyquist resolution, and the further refinement at binning 2 led to 12.9 Å resolution, which is comparable to that achieved with Template Learning.

Importantly, the angular distribution of nucleosomes annotated by template matching showed a significant imbalance towards side views compared to round top views (Fig. 7C). Our simulations (Supplementary Fig. 8) showed that due to the cylindrical shape of nucleosomes and the missing wedge problem of cryo-ET data, the constrained cross-correlation between a template and the particles is a function of the orientation, where side views show higher cross-correlation than top and oblique views. This problem results in extracting only side views of nucleosomes at an adequate Precision.

The orientation bias of template matching hampers its efficiency for particle detection and is prone to resolution loss in STA. In our case, the resolution of the nucleosome was not affected significantly, because its pseudo-symmetric cylindrical shape allowed the complete 360° orientation range of side views to have cross-correlation sufficient for detection (Fig. 7C). However, this constraint may pose a more significant challenge when annotating particles with asymmetric non-spherical shapes. To further validate the Template Learning pipeline in a crowded cellular environment, we compared its performance with supervised deep learning and template matching for nucleosome annotation within a tomogram of a vitreous section from Drosophila embryonic brain tissue^8,50,51. Our STA results demonstrated that Template Learning outperformed the other two approaches in terms of precision and orientational isotropy (see Supplementary Note 5). Notably, template matching, a manual expert annotation, and a supervised model trained on this annotation exhibited significant orientational bias, favoring nucleosome side views. Our results demonstrate that Template Learning is capable of overcoming this limitation.

link

Template Learning: Deep learning with domain randomization for particle picking in cryo-electron tomography

Simulations to train deep learning models on particle picking

Benchmarking Template Learning for picking ribosomes in situ

Fine-tuning Template Learning-trained models using real data

Template Learning improves precision and isotropy in nucleosome picking

‘Playful’ teaching gaining credibility, say Lego researchers

AI-guided competitive docking for virtual screening and compound efficacy prediction

Stratasys launches multi-material 3D printed model preset for dental training

‘Playful’ teaching gaining credibility, say Lego researchers

AI-guided competitive docking for virtual screening and compound efficacy prediction

Stratasys launches multi-material 3D printed model preset for dental training

Dhammajarinee Witthaya School pioneers learning transformation through AI and Buddhist principles to build “Bridge of Opportunity” for Thai youth

K12 Education Technology Analysis Report 2026: $96.5 Bn

Simulations to train deep learning models on particle picking

Benchmarking Template Learning for picking ribosomes in situ

Fine-tuning Template Learning-trained models using real data

Template Learning improves precision and isotropy in nucleosome picking

More Stories

You may have missed