Introduction

Single-crystal substrates facilitate the epitaxial growth of crystalline thin films based on their surface similarity, in terms of lattice parameters and symmetry. Many single-crystal substrate materials are commercially available with different facet orientations. However, while exfoliated 2D materials such as graphene are naturally smooth, cleaving a crystal does not guarantee an atomically smooth surface. Energetic instabilities of cleaved facets can result in the formation of various defects that limit the quality of a substrate for synthesis efforts. Identifying a wider range of substrates with low surface energies would enable the growth of high-quality materials by minimizing the presence of undesirable defects. For example, the cubic perovskite BaSnO3 has diminished electronic properties due to the lattice mismatch of its (100) surface and the substrates suitable for its growth1,2,3. In addition to lattice mismatch, one must consider the reactivity between a substrate and the thin film being grown, for both the precursor chemical reactions and the final thin film composition, which will further reduce the number of available options.

The discovery of two-dimensional (2D) materials present similar challenges to substrates, most notably that both require materials with low surface energies. Computational efforts to identify 2D material have significantly expanded the list of potential monolayers and helped guide experimental synthesis4,5,6,7,8,9,10,11. One discovery technique is data mining, which searches bulk materials databases for yet unidentified monolayers6,7,8,9,10,11. A recent effort in this direction identified and characterized the bonding network of a crystal structure using the topological scaling algorithm (TSA)7. This algorithm identifies bonding clusters within a finite number of unit cells, then uses the scaling of this network size as a function of supercell size to define dimensionality. One of the most significant contributions of this approach is that one does not require one to define a facet to search along a priori. The algorithm itself will identify a monolayer with a surface parallel to any facet cut, and thus renders the issue of orientation mute. This resulted in the discovery of over 600 low-energy 2D materials7 and the creation of the MaterialsWeb.org 2D material database.

Previous computational efforts have been made to identify potential substrate materials, e.g., Ding et al.12 searched the Materials Project database for a substrate to stabilize a metastable polymorph of VO2. Using lattice mismatch and induced thin-film strain as criteria, they identify several candidates. That search considered a set of low-index facets but did not account for the surface energy of the facets. This illustrates the need for an algorithm to identify likely substrates and a database of low-energy work of cleavage facets, which is the scope of this work.

In this work, we present an approach to identify substrates using a high-throughput approach, which utilizes the computational cleavage of bulk crystals. This effort is motivated by our recent 2D structure search using the genetic algorithm software GASP13,14 for the Ga2O3 system. In that search, we identified a monolayer structure that can be cleaved from the bulk crystal and was already experimentally synthesized15,16, shown in Fig. 1. Thus, we developed data mining techniques to discover planes of cleavage in fully periodic crystals. Rather than using the TSA to identify van der Waals gaps, we use it first to identify conventionally bonded solids; then, we systematically break bonds in the crystal to create a low-dimensional structure. When the breaking of bonds generates a surface (i.e., gives the material a 2D structural motif), we extract a unit cell thick monolayer and calculate the work of cleavage (i.e., the energy cost required to cleave the crystal and form two surfaces) using density functional theory (DFT). This approach identifies nearly 4,000 potential substrates with surface energies comparable to experimentally used substrates.

Fig. 1: Example of a cleavable crystal, Ga2O3.
figure 1

The (20\(\overline{1}\)) plane exhibits a low density of bonds that, when cleaved, creates a low energy surface.

RESULTS

Candidate materials

For our study, we select all materials in the MaterialsProject database17 with five or fewer atoms in the primitive unit cell and within 50 meV atom−1 of the convex hull, the latter to promote thermodynamic stability of the final surface. Next, the topological scaling algorithm (TSA) identifies conventionally networked structures from this subset and creates a list of all bonds between neighboring atoms in these crystals. The bonding identification utilizes the empirical atomic radii of the elements, as provided in the pymatgen software package18, which we increase by 10%.

Bond cleavage

We implement two approaches for cleaving bonds in a crystal structure, which we illustrate in Fig. 2. The first approach breaks all periodic instances of a bond between two atoms in the supercell and is denoted as the “periodic bonds” approach. In this case, the three A–B bonds shown in Fig. 2(a) are treated separately. The second approach breaks the bonds between all periodic instances of the atoms in the supercell and is referred to as the “periodic atom” approach, illustrated in Fig. 2(b).

Fig. 2: Representation of the bond cleaving algorithm.
figure 2

a The periodic bond approach cleaves all periodic images of a bond. b The periodic atom approach cleaves the bonds between all periodic images of the atoms. The periodic atom approach requires at least three atoms in the primitive cell to generate a cleaved facet.

We systematically break symmetrically unique bonds in the primitive cell up to a maximum number of bonds given by

$${N}_{{{{\rm{bonds}}}}}={{{\rm{round}}}}\left(\alpha \ {N}_{{{{\rm{atoms}}}}}^{2/3}\right),$$
(1)

where Natoms is the number of atoms in the primitive cell, and α is a scalable parameter based on the desired maximum number of bonds broken. The exponent of 2/3 appropriately scales the number of bonds per plane with increasing cell size. The choice of α = 1 in this work results in one to three cleaved bonds in primitive cells of one to five atoms. For the periodic bond approach, Nbonds scales with the square of the supercell size. For the periodic atom approach, Nbonds scales at least with the square of the supercell size, with the potential to break a significantly larger number of bonds than the periodic bond approach.

These two approaches are complementary, as illustrated in Fig. 2. For example, applying the periodic atom approach to the structure in Fig. 2(a) or the periodic bond approach to the structure in Fig. 2(b) would both fail to cleave a surface with α = 1 in Eq. (1). As the unit cell size increases, the periodic bond and periodic atom approaches become equivalent. With a high enough α value, the periodic bond approach will also identify the cleaved surfaces of the periodic atom approach. However, this would be significantly more computationally expensive due to the increase in possible combinations of broken bonds, which need to be considered.

Topology of resulting structure

To determine if a cleavage surface has been generated, the TSA is run on the three-dimensional primitive cell with Nbonds or fewer bonds broken (using either approach). If the TSA identifies a two-dimensional structural motif with the same stoichiometry as the overall cell, we have created a cleavable surface. This surface is isolated as a unit cell thick monolayer and oriented such that the \(\overrightarrow{{{{\bf{a}}}}}\) and \(\overrightarrow{{{{\bf{b}}}}}\) lattice vectors span the 2D lattice of the monolayer structure and the \(\overrightarrow{{{{\bf{c}}}}}\) lattice parameter is chosen perpendicular to the \((\overrightarrow{{{{\bf{a}}}}},\overrightarrow{{{{\bf{b}}}}})\) plane. Due to the crystal symmetry, our algorithm can identify multiple instances of some surfaces. We remove these duplicates and identify the unique surfaces extracted from each crystal using the pymatgen structure matching algorithm17.

Predicted substrates

From a starting set of 120,612 materials in the Materials Project database17, 9,089 crystals meet the criteria of exhibiting (i) a formation energy within 50 meV atom−1 of the thermodynamic hull (72,316), (ii) a primitive cell of five or fewer atoms (9,980), (iii) all lattice angles greater than 16 and less than 164 (9,973), and (iv) a bonding network of three-dimensional topology. Criteria (iii) is introduced as a result of the exfoliation algorithm, which is found to return incorrect cleaved surfaces if the lattice has an extreme lattice angle. Though applying the algorithm to the conventional cell of these crystals resolves the issue, this results in a unit cell with greater than five atoms and thus is not considered. Applying the periodic bond and periodic atom cleavage approaches to the 9,089 materials generates 1,925 and 4,006 symmetrically unique surfaces, respectively. We apply the pymatgen structure tool to the combined list of 5,931 surfaces to identify a total of 4,693 unique cleaved surfaces across 2,133 bulk crystals. respectively. We apply the pymatgen structure tool to the combined list of 5,845 surfaces to identify a total of 4,693 unique cleaved surfaces across 2133 bulk crystals. It is worth noting here that Ga2O3 is not identified as cleavable by this search, as (i) the number of atoms in the primitive cell of the crystal is greater than five and (ii) the TSA identifies the crystal as layered when using our choice of 1.1 times the atomic radii of Ga and O, rather than as a fully networked structure.

To determine the stability of the surfaces, we perform DFT calculations with VASP19,20. We extract a single unit cell thick slab for each of the identified 4,693 unique cleavage surfaces and perform geometric optimization of atomic positions within the slabs and a single-point calculation for the bulk precursor to maintain consistency across our energy comparisons. Of the 4,693 monolayers, 4,614 converged or reached our maximum number of iterations with a final surface energy of at most half the work of cleavage. A sizable number of 2,015 of these surfaces contain f-valence elements, for which semilocal exchange-correlation functionals exhibit larger formation energy errors21. Therefore, care should be taken when considering these materials in the MaterialsWeb.org substrate database.

We use the work of cleavage of a monolayer, Ecleave, to measures the energy required to cleave a unit area of the bulk precursor material and the surface energy, Esurf, to describe the thermodynamic stability of a material’s facet:

$${E}_{{{{\rm{cleave}}}}}=\frac{({E}_{{{{\rm{sub,as-cut}}}}}-{E}_{{{{\rm{bulk}}}}}\ \frac{{N}_{{{{\rm{sub}}}}}}{{N}_{{{{\rm{bulk}}}}}})}{{A}_{{{{\rm{sub}}}}}}$$
(2)

and

$${E}_{{{{\rm{surf}}}}}=\frac{({E}_{{{{\rm{sub,opt}}}}}-{E}_{{{{\rm{bulk}}}}}\ \frac{{N}_{{{{\rm{sub}}}}}}{{N}_{{{{\rm{bulk}}}}}})}{2{A}_{{{{\rm{sub}}}}}},$$
(3)

where Esub,as-cut and Esub,opt are the energies of the as-cut substrate immediately after cleaving and of the optimized substrate after relaxing the atomic positions; Ebulk is the energy of the bulk precursor, Nbulk and Nsub represent the number of atoms in the bulk and substrate, respectively, and Asub denotes the substrate area. The factor of two difference is due to the work of cleavage being a measure of the energy needed to create two surfaces, while the surface energy is the thermodynamic stability of a single surface.

To validate that a monolayer accurately approximates the work of cleavage for cleaving a crystal, we calculate the change in work of cleavage with slab thickness for a subset of 21 (001) surfaces, from one to four unit cells. We find changes in the work of cleavage of less than 3 meV Å−2 in these systems, confirming that one unit cell thick slabs sufficiently describe the work of cleavage for a cleaved surface in this high-throughput effort. To verify that the work of cleavage indicates surface stability, we compare the calculated work of cleavage for 4,614 cleaved, free-standing slabs to their partially-optimized surface energy in Fig. 3. We observe that 2,557 materials display surface energies within 10% of half their work of cleavage and 1,615 within 5%.

Fig. 3: The work of cleavage and the surface energy after ten relaxation steps of the cleaved surfaces.
figure 3

Red indicates a higher density of points. The solid, dashed, and dotted black lines represent the work of cleavage of the (0001) CdS, ZnO, and AlN substrates, respectively. The grey region indicates materials have a work of cleavage greater than common substrates.

We note that there is the possibility of surface reconstructions for some of the predicted substrates. However, due to the high computational cost, we do not perform a search for possible reconstructions. The thermodynamic driving force for surface reconstructions is surface energy. Hence, the probability for reconstructions decreases for lower energy surfaces. However, before pursuing synthesis of a substrate in this work, we recommend determining at least the dynamic stability of a surface structure using phonon calculations and, if possible, searching for reconstructions using, e.g., genetic algorithm searches14. Note that as this is a global optimization problem22, there is no guarantee that any reconstruction, let alone the most stable reconstruction, will be identified with computational techniques. Reconstructions will also be significantly altered by the adsorption of species in the first stages of the film growth on the substrate, making this problem more closely related to the specific film-substrate interface.

Discussion

Our search identifies three surfaces currently used as substrates: hexagonal CdS, ZnO, and AlN. These crystals share the same structure and are all commercially available as (0001) substrates. In our search, we identify four substrates for each material: one (1\(\overline{1}\)0), one (\(\overline{1}\)01), and two (001) facets. These facets are equivalent to the Miller-Bravais convention of (1\(\overline{1}\)00), (\(\overline{1}\)011), and (0001). The most stable facet for these crystals is (1\(\overline{1}\)0), with a work of cleavage of 56, 123, and 275 meV Å−2 for CdS, ZnO, and AlN, respectively. The most stable (001) facet of each crystal has a work of cleavage of 92, 196, and 407 meV Å−2, respectively. Figure 3 highlights 3,991 surfaces (2,307 f-valence free and 2,183 with an entry in the Inorganic Crystal Structure Database (ICSD) entry) with a work of cleavage below that of (001) AlN, and 935 (718 f-valence free and 418 with an ICSD entry) surfaces below that of (001) CdS.

We consider any slab with a work of cleavage less than (001) AlN to be a suitable substrate and included in our substrate database. Some facet cuts could have different terminations for the same cleaved unit. For example, the cleaved (001) AlN can has both an aluminum-terminated and nitrogen-terminated facet. For these cases, the reported work of cleavage and surface energy are an average of these terminations. In addition to the energy benchmarks, we identify the conventional cell hkl index of the facets we cut. Of the 4,693 facets, 4,626 have Miller indices no greater than one, showing that the vast majority of planes we cleave are low-index facets. Of the remaining facets, 63 have a maximum Miller index of two and four have a maximum Miller index of three. While our choice of primitive cells biases the indices found, these results demonstrate how our approach is agnostic to the orientation of the facet cuts and the potential for future searches to expand this substrate database.

Figure 4 shows the trend between the number of cleaved bonds and the work of cleavage across the 4614 converged cleaved surfaces. The joint distribution in Fig. 4(a) indicates that our cleavage criterion of Eq. (1) results in both a low density of cleaved bonds and a moderate spread of work of cleavage. Furthermore, the distribution indicates two clustering trends in the plot, which correspond to different average bond energies. We attribute these trends to different types of chemical bonds being broken. Metallic systems typically display high coordination numbers and somewhat lower bond energies. Covalent and ionic bonds share localized electrons, which results in lower coordination numbers and higher energies per bond.

Fig. 4: Distribution of the work of cleavage vs. bond density of cleaved surfaces, with kernel density estimations.
figure 4

a represents all 4,614 cleavage surfaces, (b) for surfaces with metallic precursors, and (c) for surfaces with precursors exhibiting an electronic bandgap. We use the bandgap reported by Materials Project for the bulk precursors of the surfaces. The dashed lines indicate the mean energy per bond. On average, the metallic precursors exhibit a higher coordination number, and hence bond density for the cleavage plane in conjunction with a lower energy per bond than the more covalent and ionically bonded precursors. This trend is reflected in the average energy of the cleaved bonds for metals being lower at 1.3 eV bond−1 compared to 1.8 eV bond−1 for the cleaved covalent and ionic precursors.

To test this hypothesis, Fig. 4(b) and (c) show the work of cleavage and bond density distribution for monolayers derived from metallic and insulating precursors, respectively, based on the precursor bandgap reported in the MaterialsProject database17. We observe that the two clusters in the distribution indeed correspond predominantly to metallic and covalent/ionic bonding. Metallic bonds typically occur in materials with higher coordination numbers and have lower average energy than covalent or ionic bonds, resulting in a broad spread of bond densities in Fig. 4(b). In contrast, covalent/ionic bonds typically correspond to lower coordination numbers with higher energies for each bond, which is reflected in the distribution in Fig. 4(c). The average bond energies of the metallic and insulating precursors are 1.3 and 1.8 ev bond−1, respectively, further demonstrating the difference of bonding in these systems. As bonds exist across a continuous spectrum of metallic, covalent, and ionic character, the average bond energy is not a precise description of the bonding in these systems. However, the observed bond energies in the metallic and insulating precursor materials reveal an overall consistent trend.

To determine the potential of our data mining approach to expand the set of known substrates, we characterize the cleavage surfaces’ symmetry and lattice parameters. Figure 5 illustrates the lattice parameter distribution of the hexagonal, square, and tetragonal substrates identified in this work which are free of f-valence elements, as well as the electronic properties of their bulk precursors. The broad range of electronic behavior and lattice parameters for each symmetry indicates that these cleaved crystals could provide suitable substrates for a variety of thin film systems.

Fig. 5: Lattice vectors and bandgaps for the substrates.
figure 5

Semiconducting or insulating substrates identified are shown for (a) hexagonal, (b) square, and (c) tetragonal surfaces. For the tetragonal substrates in (c), the a and b lattice vectors are both plotted and connected with a dashed line, and the kernel density of the b lattice vectors is plotted in grey. 93 of the 106 commercially available insulating substrates identified are shown in red filled markers. For metallic substrates, (d) shows boxplots for the hexagonal and square surface structures and the b/a ratio of the tetragonal ones. The 23 monoclinic surfaces are not represented in these figures. The symmetry is classified using an in-plane lattice vector length tolerance of 1% and angle tolerance of 0.5˚. Substrates containing f-valence elements are not shown.

We follow by creating a database of commercially available substrates totaling 190 substrates and include the data in Fig. 5 using crystal structures and band gaps sourced from the Materials Project database. Comparing the database of predicted substrates to one of commercially available substrates shows two significant advancements. First, we greatly expand the list of substrates exhibiting band gaps within the visible spectrum and beyond, creating more opportunities for optical excitation of thin films from beneath the substrate rather than from above. Second, we identify substrates with a broader range of lattice parameters, which can help synthesis efforts. For example, currently, 23 commercially available hexagonal substrates exhibit lattice parameters between 2.5 and 5.0 Å. The computational database expands this list to 984 hexagonal substrates within the same range. This increase provides more opportunity to identify a substrate that both epitaxially matches the growth crystal and is chemically compatible.

To demonstrate the application of this substrate database, we epitaxially match cubic perovskite (100) BaSnO3, using the crystal structure and stiffness tensor provided by MaterialsProject17,23 to substrates with a work of cleavage below that of (0001) AlN. We use the pymatgen18 lattice matching algorithm to epitaxially match the perovskite to the layers extracted in our search. We identify 42 cleavage surfaces as potential substrates when using the screening criteria of (i) a work of cleavage less than that of (0001) AlN, (ii) an induced strain energy below 2 meV atom−1, (iii) no f-valence species, and (iv) epitaxial matches to a single unit cell of BaSnO3. Substrate matching also requires considering the chemical compatibility between the substrate and the thin film, and thus we focus further discussion on substrates with chemically inert surfaces.

We highlight here three potential substrates: (001) Rb2NiO224, (001) NiO25, and (001) CaSe26. The work of cleavage for each substrate is small, being 19, 72, and 56 meV Å−2, respectively. The resulting epitaxial strain of BaSnO3 for Rb2NiO2 and NiO are also small, with only +0.3% and +0.4%, which correspond to strain energies of 0.2 and 0.4 meV atom−1, respectively. These are an order of magnitude smaller than currently used substrates such as (001) SrTiO3 (−5.4%) and (001) MgO (+2.2%) indicating the opportunity for defect-free growth of BaSnO3 on these substrates. The strain energy when using CaSe is larger at 1.3 meV atom−1, though still with a low lattice mismatch of +0.7%. All three precursor materials–Rb2NiO2, NiO, and CaSe–have been experimentally synthesized24,25,26, providing increased opportunities for the growth of BaSnO3. In addition, the former two being oxides indicates that the surfaces will be fairly inert and unlikely to form strong chemical bonds with BaSnO3.

To facilitate the use of these substrates for further studies and future synthesis efforts of epitaxial single-crystal thin films, we provide the substrate structures and the data on their stability under an open-source license at MaterialsWeb.org. Furthermore, we make the software for cleaving crystal structures available as part of the open-source MPInterfaces package27,28.

In conclusion, we developed a data mining approach that systematically breaks bonds in three-dimensional crystals to identify cleavage planes for substrate synthesis. We identify 4,693 symmetrically unique cleavage surfaces across 2,133 periodic crystals and determine their structure and stability. We show that 3,991 surfaces display a work of cleavage comparable to that of the known substrate material (0001) AlN. These cleavage surfaces show a broad distribution of electronic properties and lattice parameters. We illustrate their utility by identifying 42 substrates for BaSnO3 with epitaxial matches that are an order of magnitude better than currently used substrates, explicitly discussing two oxide and one chalcogenide substrate. Though we limited our search to small primitive cells, the low surface energy of Ga2O3 indicates that there are many more low-energy substrates which can be cleaved from periodic crystals.

METHODS

Characterization of cleaved materials

To calculate the stability of the cleaved surfaces, we perform density functional theory (DFT) calculations with the projector-augmented wave (PAW) method as implemented in the VASP package19,20. The choice of PAW potentials follows the recommendation of pymatgen17. We employ the Perdew-Burke-Ernzerhof (PBE)29 approximation for the exchange-correlation functionals. To obtain convergence of the energy to 1 meV atom−1, we use a Γ-centered k-point mesh with a density of 60 k-points per Å−1 and a cutoff energy for the plane-wave basis set of 600 eV. For the cleaved slabs, we employ a vacuum spacing of at least 14 Å and reduce the number of k-points in the out-of-plane direction to one. The Brillouin zone integration uses Gaussian smearing with a width of 0.03 eV. The DFT calculations are performed spin-polarized with an initial ferromagnetic configuration. Transition metal and f-valence atoms are initialized with a magnetic moment of 6 μBohr and all others with a moment of 0.5 μBohr. We perform single-point calculations on the bulk precursors sourced from the Materials Project database to ensure accurate energy and consistency of calculation settings across all systems. We allow a maximum of ten iterations (ionic steps) during the optimization of the atomic positions in the slabs, and as not all surfaces converged within that window, we refer to this energy as “partially-relaxed.” For the systems that reached ten iterations, the average and median surface energy gained are 1.6 and 0.2 meV Å−2, respectively, from the relaxations. We keep lattice constants of the slabs fixed at the cleaved values for these calculations.

Workflow

We use the software packages pymatgen18 and MPInterfaces27 to prepare the input files, organize the results for the cleavage algorithm, and analyze the results of the DFT calculations. We use the pymatgen structure matching algorithm with an atomic position tolerance of 10−4 and not permitting any primitive cell reduction, lattice scaling, or supercell transformations17. Decreasing the tolerance to 10−5 only marginally increases the number of surfaces by 27, indicating that the choice of 10−4 provides high confidence in the identified substrates being symmetrically distinct. Any surfaces which diverged in energy or displayed a final surface energy greater than half the work of cleavage are excluded from the analysis.