Implementation of an In-House Platform for Rapid Screening of SARS-CoV-2 Genome Variations

Background: Global real-time monitoring of SARS-CoV-2 variants is crucial to controlling the COVID-19 outbreak. The purpose of this study was to set up a Sanger-based platform for massive SARS-CoV-2 variant tracking in laboratories in low-resource settings. Methods: We used nested RT-PCR assay, Sanger sequencing and lineage assignment for 930-bp of the SARS-CoV-2 spike gene, which harbors specific variants of concern (VOCs) mutations. We set up our platform by comparing its results with whole genome sequencing (WGS) data on 137 SARS-CoV-2 positive samples. Then, we applied it on 1028 samples from March-September 2021. Results: In total, 125 out of 137 samples showed 91.24% concordance in mutation detection. In lineage assignment, 123 out of 137 samples demonstrated 89.78% concordance, 65 of which were assigned as VOCs and showed 100% concordance. Of 1028 samples screened by our in-house method, 78 distinct mutations were detected. The most common mutations were: S:D614G (21.91%), S:P681R (12.19%), S:L452R (12.15%), S:T478K (12.15%), S:N501Y (8.91%), S:A570D (8.89%), S:P681H (8.89%), S:T716I (8.74%), S:L699I (3.50%) and S:S477N (0.28%). Of 1028 samples, 980 were attributed as VOCs, which include the Delta (B.1.617.2) and Alpha (B.1.1.7) variants. Conclusion: Our proposed in-house Sanger-based assay for SARS-CoV-2 lineage assignment is an accessible strategy in countries with poor infrastructure facilities. It can be applied in the rapid tracking of SARS-CoV-2 VOCs in the SARS-CoV-2 pandemic.


Introduction
A novel coronavirus, entitled severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), emerged at the end of 2019 in the city of Wuhan, China, and caused the coronavirus disease 2019 (COVID- 19) pandemic, leading to more than 552504629 infections and six million deaths around the world as of July 11, 2022. [1][2][3] The SARS-CoV-2 is a spherical enveloped betacoronavirus with non-segmented, positive singlestranded RNA with a genome size of nearly 30 kb, 4 showing 79% genomic sequence similarity with SARS-CoV and 50% with MERS-CoV; the two known coronaviruses identified in 2002 and 2012, respectively. 2,4,5 The genome structure of SARS-CoV-2 translates to nonstructural proteins (NSP1- 16), structural proteins of the virus including S (spike), E (envelope), M (membrane), N (nucleocapsid) and several accessory proteins. 5 The spike (S) proteins are homotrimeric glycoproteins with a total length of 1273 amino acids that are encoded by a 3783bp S gene and involved in binding to angiotensin-converting enzyme 2 (ACE2) receptors and virus entry into host cells. 2,6,7 S proteins are constituted of the S1 subunit which has a 211-amino acid region called receptor-binding domain (RBD) that has a crucial role in viral entry by recognition and attachment to the host ACE2 receptor; the S2 subunit with heptad repeat regions and the fusion peptide is involved in viral membrane fusion. 2,5,7 Mutations in the spike gene can result in the increase of virus transmissibility, infectivity and immune escape and make it a major target for research studies, diagnostic and therapeutic strategies and vaccine development. [8][9][10] Whole genome sequencing (WGS) of SARS-CoV-2 by next-generation sequencing (NGS) has characterized the complete genome sequence of the virus since the beginning of the pandemic, allowing detection of all possible lineages and variants of the virus which can be used for designing various diagnostic methods, drugs and vaccines development. 11,12 As of July 12, 2022, approximately 11 642 776 complete SARS-CoV-2 genome sequences have been deposited in the Global Initiative on Sharing All Influenza Data (GISAID) EpiCoV™ public database. 13 Considering the continuous evolution of SARS-CoV-2 and the emergence of several variants with specific phenotypic features which can threaten global public health, the World Health Organization (WHO) designated "variants of interest" (VOIs) or "variants of concern" (VOCs) for prioritization of worldwide monitoring and research studies. 14,15 Variants accompanied by expanded transmissibility, disease severity and decline in efficacy of vaccines or therapeutics are classified as VOCs and need strict screening. 8 Overall, five VOCs have been declared by the WHO, namely, Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), Delta (B.1.617.2) and Omicron (B.1.1.529) which were respectively first identified in the United Kingdom, South Africa, Brazil, India and South Africa. 8,16,17 Despite the fact that Iran was one of the first and most affected countries facing the SARS-CoV-2 rapid expansion, there was no large-scale SARS-CoV-2 genome sequencing data available for monitoring the circulating SARS-CoV-2 variants throughout the country in the early days of the pandemic. 18,19 A considerable effort by the Genetics Research Center (GRC) of the University of Social Welfare and Rehabilitation Sciences (USWR) subjected 369 SARS-CoV-2 samples from different regions of Iran to WGS from March 2020 to May 2021. 18,19 Although WGS is the commonly used platform 12 applied for SARS-CoV-2 genomic surveillance, several factors including long turnaround time, expensiveness, and the need for appropriate sequencing facilities with trained staff are not routinely accessible in all laboratories. Therefore, the establishment of an alternative method is of paramount importance. 11,20 Considering the critical functions of RBD in viral attachment to the receptor cells, Sanger sequencing of RBD in the S gene can be used as one of the best alternative approaches to identifying most of the known variants of SARS-CoV-2. 21, 22 We developed an in-house platform comprising a nested RT-PCR assay followed by Sanger sequencing to target characteristic mutations in the crucial region of the SARS-CoV-2 Spike gene. This method can be applied routinely for rapid scanning of SARS-CoV-2 positive samples in laboratories with basic Sanger sequencing facilities.

Patient Recruitment
We obtained 1028 SARS-CoV-2 positive RNA samples from all over the country between March 2021 and September 2021 from private laboratories, the COVID-19 laboratory network and the Iranian Network for Research in Viral Diseases (INRVD). These centers have performed viral RNA extractions from respiratory samples (Naso/ Oro-pharyngeal swabs) using standard protocols and SARS-CoV-2 infection confirmation via real-time RT-PCR assays.

Primer Selection
Our study used the RT-Nested primers to target the regions between the nucleotide positions 22886 and 23796 of the Wuhan (Wu-1) reference genome (GenBank accession: NC_045512.2) of the SARS-CoV-2 Spike gene, partly, covering characteristic mutations of VOCs as shown in Table 1. The primers used in the first and second rounds of RT-Nested PCR were 23 : Nested1-F:

Nested RT-PCR Assay and Sanger Sequencing
Nested RT-PCR assay was performed with SMOBIO [RQ2110] ExcelRT™ One-Step RT-qPCR Kit and the Nested RT-PCR conditions described in Table 2. The 931-bp PCR products were analyzed by 2% agarose gel electrophoresis (Figure 1). After purification with ExoSAP-IT (Affymetrix), sequencing reactions were performed with the BigDyeTM Terminator v3.1 Cycle Sequencing Kit by Applied Biosystems Genetic Analyzer, following the manufacturer's instructions.

Sanger Sequencing-based Mutation Analysis
The general protocol in our in-house platform was as follows: 1. Manually mutation analysis of the electropherograms, using CodonCode Aligner v.9.0.1 (Figure 2). 2. The probable lineage assignment of each sample according to the VOCs key mutations shown in Table 1. 8,10,14,15,24 Lineage assignment for samples that did not harbor these significant sets of mutations was performed with the help of https://outbreak.info/ and several studies in the literature (Table 1). 24,25 Before applying this proposed method for monitoring VOCs on 1028 SARS-CoV-2 positive samples, we checked the concordance of Sanger and NGS results by blind comparison of these two platforms on 137 samples. It should be noted that most of these 137 samples were the cases investigated in the two previous studies of our group and the others were among the samples collected for this project. The details of the WGS method and lineage assignment of samples have been mentioned in our two previous studies. 18,19 Results A comparison of Sanger and WGS results in 137 of our samples showed (125/137) 91.24% concordance in mutation detection and (123/137) 89.78% concordance in lineage assignment. It should be noted that of these 137 samples, 65 were attributed as VOCs and 72 were attributed as non-VOCs. Mutation detection and lineage assignment in VOCs had 87% and 100% concordance, respectively.

Base Pairs (bp)
Control+ S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15  1.617.1 or B.1.617.3). Thirteen samples failed to assign a distinct lineage because the presence of only a small number of spike mutations was not enough for conclusive lineage assignment, and these mutations were seen in many Pango lineages.

Discussion
Considering the role of the spike protein in binding to ACE2 receptors and entry to the host cells and its importance in developing vaccines and antibody-based therapeutics strategies, scanning of SARS-CoV-2 spike gene mutations is important. 9,21 Although NGS platforms would be ideal for whole-genome surveillance of SARS-CoV-2 variants and comprehensive analysis of the spike gene region, massive sequencing of SARS-CoV-2 samples by these platforms is still impossible in many developing and underdeveloped countries due to factors such as high cost, limited supplies and infra-structure and the need for skilled staff for WGS. 12,20 Therefore, establishing alternative platforms for massive monitoring of SARS-CoV-2 VOCs can help overcome the challenges. So, we proposed an alternative sanger-based platform for lineage assignment of SARS-CoV-2 samples by analyzing characteristic mutations in a distinct part of the spike gene. Sanger sequencing with 99.99% accuracy is widely accessible in most laboratories. 26 It can sequence fragments with relatively low cost, rapid turnaround time, and conditions of low viral loads. 12,27 Despite all these advantages, the length of amplicons is limited in Sanger sequencing (often less than 1000 bp). It is practically impossible to sequence long fragments like the whole genome of SARS-CoV-2 or the full spike as one amplicon. 12,26,27 So, we had to select a more critical part of the spike for lineage assignment just by one amplicon. Our selected 930 bp region included characteristic mutations of VOCs in the period when we set up this protocol.
According to our study, the selected region in Spike was successfully amplified and sequenced, and the probable lineages were assigned. Among all 137 compared NGS and Sanger data, 73 samples (53.28%) were compatible and included B.1.1.7 and B.1 lineages; 50 samples    File 2). The use of the terms "Compatible", "Partially Compatible", and "Incompatible" in lineage assignment for comparing the results of the two platforms well covered the classification of samples. The term "Compatible" includes samples with 100% similarity and "Incompatible" refers to completely different samples. The term "Partially Compatible" is used for samples that differed in ascribing the exact sub-lineage but both descended from a similar either lineage, A or B. Also, the use of the word "Not assigned" in determining the lineages included samples that either did not have mutations in the spike or the mutations detected in the spike corresponded to many Pango lineages. The distribution of 1028 investigated samples is related to just ten Iran provinces, of which about 82% are located in the northern parts of the country, 14.11% in the southern regions and 3.89% in the central parts; it should be noted that the following statistics might mostly reflect the dominant lineages in these areas. Our study showed that 980 out of 1028 samples were designated as VOCs  Figure 4). Furthermore, the circulation trend of VOCs in these regions of the country during this period is relatively similar to the global landscape. 13 In line with studies of other groups, our study showed that many lineages, especially VOCs, can be recognized successfully using spike sequences. 28 Still, it should be considered that exact lineage assignment using Spike alone is not possible for many samples whose mutations are not in the Spike gene or whose harboring mutations are compatible with many lineages. 28 To solve these challenges, recently, O'Toole et al proposed the term "lineage set", which describes a series of Pango lineages concordant with detected mutations in a given spike gene sequence. 28 Given that the results of our proposed platform are consistent with the results of other studies, it seems that using this method is a practical and valuable way to identify VOCs in the country.
In conclusion, our proposed method was practical for rapid and straightforward monitoring of SARS-CoV-2 VOCs for outbreak tracing of SARS-CoV-2 in our country.