Tag Archives: #genomes

Untwisting DNA Reveals New Force That Shapes Genomes (Biology)

Advances in microscopy have enabled researchers to picture loops of DNA strands for the first time. The images reveal how the human genome organizes itself in three-dimensional space at much higher resolution than previously possible.

The findings, published in a new study in the journal Molecular Cell, also reveal that the process of DNA being copied into RNA—transcription—indirectly shapes the architecture of the genome. An international team led by Pia Cosma at the Center for Genomic Regulation (CRG) in Barcelona and Melike Lakadamyali at the Perelman School of Medicine at the University of Pennsylvania in the United States found that transcription generates a force that moves across DNA strands like ripples through water.

Known as supercoiling, the force causes structural proteins known as cohesins to ‘surf’ across DNA strands, changing the scaffold’s architecture and morphing the overall shape of the genome. While it is known that genome organization regulates gene transcription, it is the first-time researchers have found transcription to impact genome organization the other way round through supercoiling.

According to the researchers, the discovery of this new force may have future implications for the understanding of genetic diseases such as Cornelia de Lange syndrome, which is caused by mutations in genes encoding for cohesin or cohesin regulators. The findings may also be relevant for developmental disorders linked to how chromatin folds, as well as opening new avenues of research in genome fragility and cancer development.

Researchers used state-of-the-art microscopy to capture how the genome folds inside a space that is just 6 micrometres wide. The resulting 3-D render shows cohesin in magenta, DNA in blue and the enzyme RNA polymerase II, which transcribes DNA into RNA, in green. Credit: Vicky Neguembor/CRG

The researchers studied the biological mechanisms that enable two meters of DNA to be squeezed into a tight space in each human cell. In this condensed state, the DNA, also known as chromatin, contains many loops that bring together different regions of the genome that would normally be far apart. The resulting physical proximity is important for transcribing DNA into RNA which then makes proteins, making chromatin looping a fundamental biological mechanism for human health and disease.

According to Vicky Neguembor, Staff Scientist at the CRG and first author of the paper, “Chromatin looping is what allows individual cells to switch different information on and off, which is why for example a neuron or a muscle cell with the same genomic information can still behave so differently. Loops are also one of the ways the genome gets compacted to fit into the nucleus.”

“What we have found is important because it shows the biological process of transcription plays an additional role beyond its fundamental task of creating RNA that eventually turn into proteins. Transcription indirectly compacts the genome in an efficient manner and helps different regions of the genome talk to each other.”

Previous techniques used to study this process could predict where loops were located but not their actual shape or how they look like within the cells. To improve image resolution, the researchers used a special type of microscopy that use high-power lasers under specific chemical conditions to track the blinking of fluorescent molecules. The technique provides ten times higher resolution than conventional microscopy, and combined with advanced imaging analysis techniques the researchers were able to identify chromatin loops, and the cohesins that hold the structure together like paper clips, within intact cells.

Featured image: Example of DNA pictured by state-of-the-art microscopy. Credit: Vicky Neguembor / CRG


Reference: Maria Victoria Neguembor, Laura Martin, �lvaro Castells-García, Pablo Aurelio Gómez-García, Chiara Vicario, Davide Carnevali, Jumana AlHaj Abed, Alba Granados, Ruben Sebastian-Perez, Francesco Sottile, Jérôme Solon, Chao-ting Wu, Melike Lakadamyali, Maria Pia Cosma, Transcription-mediated supercoiling regulates genome folding and loop formation, Molecular Cell, 2021, , ISSN 1097-2765, https://doi.org/10.1016/j.molcel.2021.06.009. (https://www.sciencedirect.com/science/article/pii/S1097276521004561)


Provided by Center for Genomic Regulation

New Study Traces Back the Progenitor Genomes Causing COVID-19 and Geospatial Spread (Medicine)

Many variant strains were shown to be present before the first known cases identified in China

In the field of molecular epidemiology, the worldwide scientific community has been steadily sleuthing to solve the riddle of the early history of SARS-CoV-2. Despite recent efforts by the World Health Organization, no one to date has identified the first case of human transmission, or “patient zero” in the COVID-19 pandemic.

Finding the earliest possible case is needed to better understand how the virus may have jumped from its animal host first to infect humans as well as the history of how the SARS-CoV-2 viral genome has mutated over time and spread globally.

Since the first SARS-CoV-2 virus infection was detected in December 2019, well over a million genomes of SARS-CoV-2 have been sequenced worldwide, revealing that the coronavirus is mutating, albeit slowly, at a rate of 25 mutations per genome per year. The sheer number of emerging variants, including the UK (B.1.1.1.7), South African (B.1.351), South American (P.1) and now, Indian (B.1.617) have not only come to replace prior dominant strains in their respective regions, but still threaten world health due to their potential to escape today’s vaccines and therapeutics.

“The SARS-CoV-2 virus has already infected more than 145 million people and caused 3 million deaths across the world,” said Sudhir Kumar, director of the Institute for Genomics and Evolutionary Medicine, Temple University. “We set out to find the genetic common ancestor of all these infections, which we call the progenitor genome.”

This progenitor genome (proCoV2) is the mother of all SARS-CoV-2 coronaviruses that has infected and continue to infect people today.

In the absence of patient zero, Kumar and his research team now may have found the next best thing to aid the worldwide molecular epidemiology detective work. “We reconstructed the genome of the progenitor and its early pedigree by using a big dataset of coronavirus genomes obtained from infected individuals since December 2019,” said Kumar, the lead author of a new study, appearing in advanced online edition of the journal Molecular Biology and Evolution.

They found that the progenitor gave rise to a family of coronavirus strains, whose members included the strains found in Wuhan, China, in December 2019. “In essence, the events in December in Wuhan, China, represented the first superspreader event of a virus that had all the tools necessary to cause a worldwide pandemic right out of the box.” said Kumar.

Kumar’s group estimates that the SARS-CoV-2 progenitor was already circulating with an earlier timeline–at least 6 to 8 weeks prior to the first genome sequenced in China, known as Wuhan-1. “This timeline puts the presence of proCoV2 in late October 2019, which is consistent with the report of a fragment of spike protein identical to Wuhan?1 in early December in Italy, among other evidence,” said Sayaka Miura, a senior author of the study.

“We have found progenitor genetic fingerprint in January 2020 and later in multiple coronavirus infections in China and the USA. The progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China,” said Pond.

Besides their findings on SARS-CoV-2’s early history, Kumar’s group also has developed intuitive mutational fingerprints and Greek symbol classification (ν, α, β, γ, δ, and ε) to simplify the categorization of the major strains, sub-strains and variants infecting an individual or colonizing a global region. This may help scientists better trace and provide context for the order of emergence of new variants.

“Overall, our mutational fingerprinting and nomenclature provide a simple way to glean the ancestry of new variants as compared to phylogenetic designations, e.g., B.1.351 and B.1.1.7,” said Kumar.

For example, an α fingerprint refers to genomes that one or more of the α variants and no other subsequent major variants, and αβ fingerprint refers to genomes that contain all α, at least one β variant, and no other major variants.

“With our tools, we observed the spread and replacement of prevailing strains in Europe (αβε with αβζ) and Asia (α with αβε), the preponderance of the same strain for most of the pandemic in North America (αβ?δ), and the continued presence of multiple high?frequency strains in Asia and North America,” said Pond.

Getting to the root of the problem

To identify the progenitor genome, they used a approach not applied to SARS-CoV-2 previously, called mutation order analysis. The technique, which is used extensively in cancer research, relies on a clonal analysis of mutant strains and the frequency in which pairs of mutations appear together to find the root of the virus.

Many previous attempts in analyzing such large datasets were not successful because of “the focus on building an evolutionary tree of SARS-CoV-2,” says Kumar. “This coronavirus evolves too slow, the number of genomes to analyze is too large, and the data quality of genomes is highly variable. I immediately saw parallels between the properties of these genetic data from coronavirus with the genetic data from the clonal spread of another nefarious disease, cancer.”

Kumar and Miura have developed and investigated many techniques for analyzing genetic data from tumors in cancer patients. They adapted and innovated these techniques to build a trail of mutations that traced back to the progenitor genetic fingerprint. “The mutation tracking approach produced the progenitor and the family history of its major mutation. It is a great example of how big data coupled with biologically-informed data mining reveals important patterns,” said Kumar.

An earlier timeline emerges “This progenitor genome had a sequence very different from what some folks are calling the reference sequence, which is what was observed first in China and deposited into the GISAID SARS-CoV-2 database,” said Kumar.

The closest match was to eight genomes sampled 26 to 80 days after the earliest sampled virus from 24 December 2019. Multiple close matches were found in all sampled continents and detected as late as June 2020 (pandemic day 181) in South America. Overall, 140 genomes Kumar’s group analyzed all contained only synonymous differences from proCoV2. That is, all their proteins were identical to the corresponding proCoV2 proteins in the amino acid sequence. A majority (93 genomes) of these protein-level matches were from coronaviruses sampled in China and other Asian countries.

These spatiotemporal patterns suggested that proCoV2 already possessed the full repertoire of protein sequences needed to infect, spread and persist in the global human population.

They found the proCoV2 virus and its initial descendants arose in China, based on the earliest mutations of proCoV2 and their locations. Furthermore, they also demonstrated that a population of strains with at least three mutational differences from proCoV2 existed at the time of the first detection of COVID-19 cases in China. With estimates of SARS-CoV-2 acquiring 25 mutations per year, this meant that the virus must already have been infecting people several weeks before the December 2019 cases.

Mutational signatures

Because there was strong evidence of many mutations before the ones found in the reference genome, Kumar’s group had to come up with a new nomenclature of mutational signatures to classify SARS-CoV-2 and account for these by introducing a series of Greek letter symbols to represent each one.

For example, they found that the emergence of α SARS-CoV-2 genome variants came before the first reports of COVID-19. This strongly implies the existence of some sequence diversity in the ancestral SARS-CoV-2 populations. All 17 of the genomes sampled from China in December 2019, including the designated SARS-CoV-2 reference genome, carry all three α variants. But, 1,756 genomes without α variants were sampled across the world until July 2020. Therefore, the earliest sampled genomes (including the designated reference) were not the progenitor strains.

It also predicts the progenitor genome had offspring that were spreading worldwide during the earliest phases of COVID-19. It was ready to infect right from the start.

“The progenitor had all the ability it needed to spread,” said Pond. “There is an overabundance of non-synonymous changes in the population. What happened between bats and humans remains unclear, but proCoV2 could already infect at pandemic scales.”

A global spread

Altogether, they have identified seven major evolutionary lineages and the episodic nature of their global spread. The proCoV2 genome gave rise to many major offspring lineages, some of which arose in Europe and North America after the likely genesis of the ancestral lineages in China.

“Asian strains founded the whole pandemic,” said Kumar. “But over time, many variants that evolved elsewhere are now infecting Asia much more.”

Their mutational-based analyses also established that North American coronaviruses harbor very different genome signatures than those prevalent in Europe and Asia.

“This is a dynamic process,” said Kumar. “Clearly, there are very different pictures of spread that are painted by the emergence of new mutations, the three εs, γ&delta, which we found to occur after the spike protein change (a β mutation). Scientists are still figuring out if any functional properties of these mutations have sped up the pandemic.”

Remarkably, the mutational signature of αβ?δ has remained the dominant lineage in North America since April 2020, in contrast to the turn?over seen in Europe and Asia. More recently, novel fast?spreading variants including an S protein variant (N501Y) from South Africa and the UK (B.1.1.17) have rapidly increased. Coronaviruses with N501Y variant in South Africa carry the αβγδ genetic fingerprint, whereas those in the UK carry the αβε genetic fingerprint, according to their classification scheme. “Therefore, αβ ancestor continues to give rise rise to many major offshoots of this coronavirus.” Said Kumar.

Real-time updates

The MBE study relied on three snapshots were retrieved from GISAID on July 7, 2020, (a dataset of 60,332 genomes), October 12, 2020, (contained 133,741 genomes), and finally, an expanded dataset of 172,480 genomes sampled on December 30, 2020.

Moving forward, they will continue to refine their results as new data becomes available.

“More than a million SARS-CoV-2 genomes are sequenced now,” said Pond. “The power of this approach is that the more data you have, the more easily you can tell the precise frequency of individual mutations and mutation pairs. These variants that are produced, the single nucleotide variants, or SNVs, their frequency, and history can be told very well with more data. Therefore, our analyses infer a credible root for the SARS-CoV-2 phylogeny.”

The MBE study is part of their effort to maintain a continuous, live real-time monitoring of SARS-CoV-2 genomes, which has now grown to include more than 350,000 genomes.

“We have set up a live dashboard showing regularly updated results because the processes of data analysis, manuscript preparation, and peer?review of scientific articles are much slower than the pace of expansion of SARS-CoV-2 genome collection,” said Pond. “We also provide a simple “in the browser” tool to classify any SARS-CoV-2 genome based on key mutations derived by the MOA analysis.

“These findings and our intuitive mutational fingerprints and barcodes of SARS-CoV-2 strains have overcome daunting challenges to develop a retrospective on how, when and why COVID-19 has emerged and spread, which is a prerequisite to creating remedies to overcome this pandemic through the efforts of science, technology, public policy and medicine,” said Kumar.

Featured image: The progenitor (proCoV2) virus and its initial descendants arose in China, based on the earliest mutations of proCoV2 and their locations, which were traced back to occur 6-8 weeks prior to the Wuhan China outbreak. Furthermore, the science team also demonstrated that a population of strains with at least three mutational differences (alpha 1-3) from proCoV2 existed at the time of the first detection of COVID-19 cases in China. The current major variants of interest including the UK (B.1.1.1.7), South African (B.1.351), South American (P.1) and now, Indian (B.1.617) are shown within the pedigree. These variants have not only come to replace prior dominant strains in their respective regions, but still threaten world health due to their potential to escape today’s vaccines and therapeutics. © Sudhir Kumar, Temple University


Reference: Sudhir Kumar, Qiqing Tao, Steven Weaver, Maxwell Sanderford, Marcos A Caraballo-Ortiz, Sudip Sharma, Sergei L K Pond, Sayaka Miura, An evolutionary portrait of the progenitor SARS-CoV-2 and its dominant offshoots in COVID-19 pandemic, Molecular Biology and Evolution, 2021;, msab118, https://doi.org/10.1093/molbev/msab118


Provided by SMBE Journals

Carp Genomes Uncover Speciation and Chromosome Evolution of Fish (Biology)

In a study published online in Molecular Ecology Resources, a research team led by Prof. HE Shunping from Institute of Hydrobiology (IHB) of the Chinese Academy of Sciences, and the collaborators, revealed the evolutionary history of the East Asian cyprinids, and further explored the evolution and speciation of the silver carp and bighead carp, as well as genomic differentiation between the populations. 

By integrating short-read sequencing and genetic maps, Prof HE’s team presented chromosomal-level genome assemblies with high quality and contiguity for the silver carp and the bighead carp. 

They sampled 20 silver carp (seven from the Pearl River, four from the Amur River and nine from Yangtze River) and 22 bighead carp (eight from the Pearl River, four from the Amur River and 10 from Yangtze River) for re-sequencing, and found that an East Asian cyprinid genome-specific chromosome fusion took place ~9.2 million years after this clade diverged from the clade containing the common carp and Sinocyclocheilus. The result suggested that the East Asian cyprinids may possess only 24 pairs of chromosomes due to the fusion of two ancestral chromosomes. 

Besides, through phylogenetic analysis, the researchers found that the bighead carp formed a clade with the silver carp, with an estimated divergence time of 3.6 million years ago. Population genetics and introgression indicated that silver carp and bighead carp were highly divergent, yet introgression between these species was detected in population analysis. They then identified the regions which might be associated with divergence or speciation.  

The result showed that genes associated with the divergent regions were associated with reproductive system development and the development of primary female sexual characteristics, and the divergent regions might have influence on early speciation, reproductive isolation and environmental adaptations between the two species. 

“These genomic data are important resource for further study of these East Asian cyprinids on their evolution, conservation and commercial breeding,” said YANG Liandong from Prof. HE’s team. 

Featured image: Carps jumping out of water (Image by IHB)


Reference: Jian, J, Yang, L, Gan, X, et al. Whole genome sequencing of silver carp (Hypophthalmichthys molitrix) and bighead carp (Hypophthalmichthys nobilis) provide novel insights into their evolution and speciation. Mol Ecol Resour. 2020; 00: 1– 12. https://doi.org/10.1111/1755-0998.13297


Provided by Chinese Academy of Sciences

“Genomic Rosetta Stone” For Discovering The Rules Of Gene Regulation (Biology)

As early as 1975, biologists discovered that the protein-coding parts of the chimpanzee and human genomes are more than 99 percent identical. Yet, chimpanzees and humans are clearly different in significant ways. Why?

©Caltech

The answer lies in the fact that how DNA is used is as important as what it says. That is, the genes that make up a genome are not always being used; they can be turned on or off or dialed up or down over time, and they interact with one another in complex ways. Some genes encode instructions for producing specific proteins and others encode information about regulating other genes.

Now, researchers in the laboratory of Rob Phillips, the Fred and Nancy Morris Professor of Biology and Biophysics, have developed a new tool for determining how various genes in the common bacterium Escherichia coli are regulated. Though E. coli has been used as a model organism in biology and bioengineering for decades, researchers understand the regulatory behavior of only about 35 percent of its genes. The new method from the Phillips laboratory sheds light on how nearly 100 previously uncharacterized genes are regulated and lays the foundation for studying many others.

A paper describing the new technique appears in the journal eLife.

Imagine you could read the alphabet and punctuation of some new language, but you could not understand what individual words meant or any of the rules of grammar. You could read a book and recognize each letter you read without having any comprehension of what a sentence or paragraph was saying. This is analogous to the challenge faced by biologists in the modern genomic era: Sequencing an organism’s genome is now rapid and straightforward, but actually understanding how each gene is regulated is much more difficult. An understanding of gene regulation is key to understanding health and disease, and is important if we are to one day repurpose cells so they can do things that we have designed them to do.

“We’ve developed a general tool that researchers could use on nearly any microbial organism,” says Phillips. “Our dream is that someone like Victoria Orphan [James Irvine Professor of Environmental Science and Geobiology] could go down to the ocean floor and come back with some never-before-seen bacterium, and we could use our tool on it to determine not only the sequence of its genome but how it is regulated.”

In the new method, researchers make systematic perturbations to the genome, and see what happens. Essentially, the equivalent of typographical errors are made in the genome, and the impact of those typos on cellular function is observed. For example, if you replace the letter “k” in the word “walk” with the letter “x” to make “walx,” the intent of the original word is still fairly clear. This is not the case if you swap the letter “w” for a “t” to produce “talk.” This suggests that the letter “w” carries important information about the meaning of the original word.

In the same way, making changes to a genome using the DNA alphabet allows researchers to figure out which letters are most important for the correct “meaning.”

To validate their method, Phillips and colleagues first examined 20 particular E. coli genes that researchers already knew how to turn off and on. Their method correctly characterized these 20 genes. Next, the team moved on to 80 other, less-understood genes to understand how they work as well.

For now, the method has only been used on bacterial cells, but ultimately Phillips envisions being able to examine eukaryotic cells (such as human cells), which are more complex, with a modified version of the method.

“This was a decade-long project supported by the NIH Director’s Pioneer Award, and required a sustained hard effort and funding,” says Phillips. “This is the kind of project where there are no quick results.”

The paper is titled “Deciphering the regulatory genome of Escherichia coli, one hundred promoters at a time.” The study’s first author is former graduate students Nathan Belliveau (PhD ’18 and Nicholas McCarty (MS ’20); Michael Sweredoski and Annie Moradian, senior bioinformatician and senior lab manager, respectively, of the Proteome Exploration Laboratory; and Justin Kinney of Cold Spring Harbor Laboratory. Funding was provided by the National Institutes of Health and the Howard Hughes Medical Institute.

References: William T. Ireland et al., “Deciphering the regulatory genome of Escherichia coli, one hundred promoters at a time”, Physics and Living systems, 2020 DOI: 10.7554/eLife.55308 link: https://elifesciences.org/articles/55308

Provided by Caltech

Genomic Study Reveals Evolutionary Secrets Of Banyan Tree (Botany)

The banyan fig tree Ficus microcarpa is famous for its aerial roots, which sprout from branches and eventually reach the soil. The tree also has a unique relationship with a wasp that has coevolved with it and is the only insect that can pollinate it.

The banyan tree Ficus macrocarpa produces aerial roots that give it its distinctive look. A new study reveals the genomic changes that allow the tree to produce roots that spring from its branches. ©Photo by Gang Wang

In a new study, researchers identify regions in the banyan fig’s genome that promote the development of its unusual aerial roots and enhance its ability to signal its wasp pollinator.

The study, published in the journal Cell, also identifies a sex-determining region in a related fig tree, Ficus hispida. Unlike F. microcarpa, which produces aerial roots and bears male and female flowers on the same tree, F. hispida produces distinct male and female trees and no aerial roots.

Understanding the evolutionary history of Ficus species and their wasp pollinators is important because their ability to produce large fruits in a variety of habitats makes them a keystone species in most tropical forests, said Ray Ming, a plant biology professor at the University of Illinois, Urbana-Champaign who led the study with Jin Chen, of the Chinese Academy of Sciences. Figs are known to sustain at least 1,200 bird and mammal species. Fig trees were among the earliest domesticated crops and appear as sacred symbols in Hinduism, Buddhism and other spiritual traditions.

The relationship between figs and wasps also presents an intriguing scientific challenge. The body shapes and sizes of the wasps correspond exactly to those of the fig fruits, and each species of fig produces a unique perfume to attract its specific wasp pollinator.

To better understand these evolutionary developments, Ming and his colleagues analyzed the genomes of the two fig species, along with that of a wasp that pollinates the banyan tree.

“When we sequenced the trees’ genomes, we found more segmental duplications in the genome of the banyan tree than in F. hispida, the fig without the aerial roots,” Ming said. “Those duplicated regions account for about 27% of the genome.”

The duplications increased the number of genes involved in the synthesis and transport of auxins, a class of hormones that promote plant growth. The duplicated regions also contained genes involved in plant immunity, nutrition and the production of volatile organic compounds that signal pollinators.

“The levels of auxin in the aerial roots are five times higher than in the leaves of trees with or without aerial roots,” Ming said. The elevated auxin levels appear to have triggered aerial root production. The duplicated regions also include genes that code for a light receptor that accelerates auxin production.

When they studied the genome of the fig wasp and compared it with those of other related wasps, the researchers observed that the wasps were retaining and preserving genes for odorant receptors that detect the same smelly compounds the fig trees produce. These genomic signatures are a signal of coevolution between the fig trees and the wasps, the researchers report.

Ming and his colleagues also discovered a Y chromosome-specific gene that is expressed only in male plants of F. hispida and three other fig species that produce separate male and female plants, a condition known as dioecy.

“This gene had been duplicated twice in the dioecious genomes, giving the plants three copies of the gene. But Ficus species that have male and female flowers together on one plant have only one copy of this gene,” Ming said. “This strongly suggests that this gene is a dominant factor affecting sex determination.”

Provided by University Of Illinois