Comprehensive coverage

The alternative genome

There is 99% identity between the human and chimpanzee genomes, including tiny mobile genetic elements, called Alu sequences, found only in primates.

Gil Ast

There is 99% identity between the human and chimpanzee genomes, including tiny mobile genetic elements, called Alu sequences, found only in primates. It is possible that Alu sequences are responsible for the creation of new proteins through alternative splicing, these proteins are what caused the divergence of primates from other mammals. The split of man from the other primates was also possibly due to alternative splicing. Recent studies have shown that nearly identical genes between humans and chimpanzees make identical proteins in most tissues, except in certain areas of the brain. In these regions, certain human genes are more active, and others create significantly different proteins through alternative splicing of gene transcripts.
Direct link to this page: https://www.hayadan.org.il/genime190805.html

The conventional explanation of "one gene, one protein" is no longer correct. The more complex the animal, the more likely it is to assume that it reached its high level of development by extracting many different protein meanings from single genes.

In the spring of 2000, molecular biologists were betting on the number of genes that would be found in the human genome when its nucleotide sequence was determined. Estimates even reached 153,000 genes. After all, many said, humans make about 90,000 different proteins, so we should have at least the same number of genes to code for them. And given our complexity, we should have greater genetic diversity than the roundworm Caenorhabditis elegans, which has 1,000 cells and a pool of 19,500 genes, or the corn plant, which has about 40,000 genes.

So, in the summer of 2000, when the research team published the first draft of the human genome sequence, some readers were surprised by their estimate that it contained only 30,000 to 35,000 protein-coding genes. The low number was almost embarrassing. In the years that have passed since the human genome map was completed and the updated estimate of the number of genes even lowered their number to less than 25,000. But in the meantime, geneticists began to understand that the number of genes in the human genome can actually indicate our level of sophistication because humans make incredibly diverse use of such a small number of genes.

With the help of a mechanism known as alternative splicing, it is possible to edit the information stored in the genes of complex organisms in a variety of ways, so that one gene can encode information to create two different proteins and even more. By comparing the human genome to the genomes of other creatures, scientists are beginning to understand that alternative splicing accounts for much of the variation between creatures whose gene pools are relatively similar. In addition, alternative splicing allows different tissues in the same organism to perform different functions using the same small gene pool.

Indeed, it seems that the frequency of alternative splicing increases as the complexity of the creature increases, and therefore about three quarters of all human genes are subject to alternative splicing. The alternative splicing mechanism itself probably contributed to the evolution of all animals, and is the one that can contribute to our continued evolution. In the near term, scientists are even beginning to understand how defective splicing of genes causes certain types of cancer and congenital diseases, as well as how the splicing mechanism can be used for medical treatment.

decisive decisions

The importance of alternative editing in the functioning of many organisms cannot be overstated. For example, life and death depend on it, at least when it is a damaged cell that has to decide whether to continue living. Each cell constantly senses the conditions outside and inside, so it can decide whether to continue living or destroy itself in a controlled process called apoptosis. Cells that cannot repair DNA will activate the process of apoptosis. Craig B. Thompson of the University of Pennsylvania and his research partners recently showed that a gene known as Bcl-x, which is responsible for controlling apoptosis, undergoes alternative splicing to form one of two separate proteins, Bcl-x(L) or Bcl-x(S). The first protein suppresses apoptosis while the second promotes it. So the decision whether to live or die depends on the question of which of the two proteins will be created from the gene in the alternative splicing process.

The fact that a cell can produce different proteins from a single gene was already discovered about 25 years ago, but the phenomenon was considered rare. Recent comparisons between different genomes have revealed that this phenomenon is extremely common and of crucial importance. This dramatically changed the classical view of how the information stored in genes is translated into proteins. Most of the known facts are still true: a complete genome contains all the instructions necessary to produce and maintain an organism, and these instructions are encoded in the four-letter language of the nucleotides that make up DNA (and are represented by the letters A, G, C and T). In a person's chromosomes, about three billion nucleotides are woven together on two complementary strands that form the DNA double helix (genomic DNA). When the time comes when the instructions of the genes must be "expressed", the double helix zipper of the DNA opens along a segment that allows the creation of a single-stranded copy of the gene sequence from RNA nucleotides (the "brother" of DNA). Any nucleotide sequence of DNA transcribed into RNA in this way is called a gene. Some of the RNA molecules that are formed are never translated into proteins but are used for maintenance and control operations inside the cell [see: The Hidden Genome - Diamonds in the Trash, V. White Gibbs, Scientific American Israel, February 2004]. In the end, a cellular mechanism reads the RNA transcripts that do encode proteins and translates them into an appropriate sequence of amino acids, which are the building blocks of proteins. But first the primary RNA transcript must go through an editing process.

In 1977, Philip A. Sharp from the Massachusetts Institute of Technology (MIT) and Richard G. Roberts from New England BioLabs discovered that these primary RNA transcripts are like books containing many meaningless chapters scattered throughout the text. These meaningless chapters, called introns, must be removed and the meaningful chapters must be joined together so that the RNA can tell a meaningful story. In the process of cutting and pasting, known as splicing, the introns are cut out, found out of the primary transcript and thrown aside. Segments of the transcript containing a significant protein-coding sequence, called exons, are joined together to form a final version of the transcript now known as messenger RNA (mRNA). (see review at the end of the article)

But in 1980, Randolph Wall of the University of California, Los Angeles showed that this basic view of messenger RNA splicing, according to which all introns are always discarded and all exons are always joined to the messenger RNA, is not always correct. In fact, the cellular mechanism can "decide" to cut out an exon, or leave an intron or part of an intron, in the final transcript of the messenger RNA. This ability to alternatively edit primary RNA transcripts can considerably increase the coding capacity of each gene and gives the splicing mechanism enormous power to decide whether one type of protein will be produced in excess over other types of proteins encoded by the same gene.

In 1984 Tom Maniatis, Michael Green and their collaborators at Harvard University developed a procedure that could be performed in vitro to identify the molecular mechanism responsible for the cutting of introns and the splicing of exons. The details of the mechanism's operation and how it is controlled are still being researched, but this research reveals an extremely complex system that I find fascinating.

The splicing mechanism

In complex organisms, two levels of molecular mechanisms are involved in the splicing of the primary RNA transcripts. The basic mechanism, well conserved during evolution, is present in every creature whose genome contains introns, from yeast to humans. The mechanism consists of five molecules known as small nuclear RNA (snRNA) and labeled U5, U4, U2, U1 and U6. These molecules bind to 150 proteins and form a coupling called a spliceosome whose function is to recognize the start and end sites of the introns, cut them out of the primary RNA transcript and connect the exons to create messenger RNA.

Four short nucleotide sequences within the introns serve as signals that tell the spliceosome where to cut [see box on previous page]. One of these splicing signals is at the beginning of the intron and is referred to as the '5' splice site; The others, located at the end of the intron are called the splice site, the polypyrimidine sequence, and finally the '3' splice site.

A separate control system controls the splicing process by directing the underlying mechanism to these splicing sites. So far more than ten different splicing control proteins (SR proteins) have been identified. There may be different SR proteins in different tissues or at different developmental stages of the same tissue. SR proteins can bind to short nucleotide sequences in the exons of the primary RNA transcript. These binding sites are called exon splicing catalysts (ESE) because when an appropriate SR protein binds to the ESE, it recruits the snRNA molecules of the basal apparatus to the splicing sites adjacent to each of the exon ends. But the SR protein can also bind to a sequence that acts as an inhibitor of exon splicing (ESS), this will suppress the ability of the basic mechanism to bind to the ends of the axon and cause the exon to be cut out of the final messenger RNA transcript.

Skipping even one exon can cause dramatic effects. In the fruit fly, for example, alternative splicing is used to control the fly's sex-determining pathways. When a gene known as Sex-lethal is expressed, a male exon can be skipped during splicing, so that a female Sex-lethal protein is produced. This protein can now bind to all the primary RNA transcripts that will subsequently be produced from that gene and ensure that in all subsequent splicing events the male exon will be cut and only female proteins will be produced. If, on the other hand, the male axon remains in the RNA transcript in the first round of editing, an inactive messenger RNA is created that causes the fly cells to develop according to the male pathway.

Exon skipping is the most common type of alternative splicing in mammals. But several other types of alternative splicing have been identified, one of which causes introns to remain in the final messenger RNA. This type is especially common in plants and less developed multicellular organisms. Intron-leaving is probably the earliest evolutionary version of alternative splicing. Even today, the splicing mechanism of single-celled organisms such as yeast works by recognizing introns, in contrast to the SR protein system of more evolved organisms, which works by defining exons for the basic splicing mechanism.

In the unicellular system, the splicing machinery can only recognize intronic sequences containing less than 500 nucleotides. Shimar has no problem with this limitation because it has very few introns with an average length of only 270 nucleotides. But as genomes expanded during evolution, their intron sequences grew longer, and it is thought that the cellular splicing machinery had to change from a system that recognized short intronic sequences within exons to one that recognized short exons within seas of introns. For example, the average length of a human protein-coding gene is 28,000 nucleotides, with 8.8 exons separated by 7.8 introns. Exons are relatively short, usually about 120 nucleotides, while introns range in length from 100 to 100,000 nucleotides.

Compared to any other creature, the number of introns in each gene in the human genome is the highest, so their length and quantity raise another interesting issue. The cost of preserving introns is high. A significant part of the energy we consume every day is devoted to the maintenance and repair of introns in DNA, to the transcription of the primary RNA and the removal of the introns, and even to the breakdown of the introns at the end of the splicing process. Furthermore, this system can cause costly mistakes. Any mistake in the cutting and splicing of the primary RNA leads to a change in the protein coding sequence and possibly even to the production of a defective protein.

For example, an inherited disease I research, familial dysautonomia, is caused by a single nucleotide mutation in a gene known as IKBKAP that causes it to undergo alternative splicing in the nervous system. Due to this, less normal IKBKAP protein is produced, and this causes poor development of the nervous system. About half of the patients with this disease die before the age of 30. At least 15% of the mutations that cause genetic diseases (and probably also certain types of cancer) affect primary RNA splicing. Why then has evolution preserved such a complex system capable of causing disease? Maybe because the advantages outweigh the disadvantages.

The advantages of alternatives

Alternative splicing allows humans to make more than 90,000 proteins without having to maintain 90,000 genes. The creation of several types of messenger RNA from one gene enables the production of several types of proteins from the same gene. On average, each of our genes produces about three different messenger RNA molecules created through alternative splicing. But this number still does not explain our need for so many introns and it also does not explain why the sequence of introns is the main sequence within genes, to the extent that the exonic sequences make up only one to two percent of the human genome.

After the research teams that sequenced the genome discovered, in 2001, that most of the genome was seemingly empty of information, another mystery arose in 2002, after the publication of the mouse genome. It turned out that the mouse has almost the same number of genes as the human. Although about 100 million years have passed since the common ancestor of man and mouse lived, most of the genes in man and mouse originate from the same ancestor. Most of these genes have the same array of exons and introns, and the sequence of nucleotides in the exons is largely conserved. So the question arises, if human and mouse genes are so similar, what makes us so different from rodents?

Christopher J. Lee and Barmak Moderk of the University of California, Los Angeles recently discovered that a quarter of the exons that undergo alternative splicing in the two genomes are unique to either the human or the mouse. That is, these exons can create proteins that are unique to humans or mice, so they may be responsible for the differences between species. Indeed, one group of exons that undergo alternative splicing is unique to primates (humans, great apes, and monkeys) and may have contributed to the differentiation of primates from other mammals. By studying the processes that cause the creation of these unique exons, we can begin to discover the advantages inherent in introns, thus justifying the effort we invest to maintain them.

The origin of the exons unique to primates is from mobile genetic elements called Alu sequences, which belong to a larger group of elements called retrotransposons, i.e. short DNA sequences whose role seems to be to replicate themselves and then insert the replicated copies back into the genome at random points, a kind of small genomic parasites . Retrotransposons are found in almost all genomes and have a major impact as they contribute to genome expansion, a process that accompanied the evolution of multicellular organisms. Almost half of the human genome consists of mobile elements, of which Alu sequences are the most common.

Alu sequences are only 300 nucleotides long and have a unique sequence ending in a "poly-A tail." Our genome today contains about 1.4 million copies of Alu sequences and many of them continue to reproduce and insert themselves into new sites in the genome at a rate of once every 100 to 200 births.

Alu sequences have long been considered genomic junk, but now they are beginning to gain appreciation as scientists begin to understand how Alu sequences can increase the protein-making capacity of genes. About 5% of the alternatively spliced ​​exons in the human genome contain Alu sequences. These exons were created, apparently, when an Alu element "jumped" into an intron of a gene, which would normally have no negative effects on a primate because most introns are cut and discarded. But following subsequent point mutations, the Alu sequence could transform the intron in which it resides into a genetically significant sequence, i.e. an exon. Such a thing could happen if changes in the Alu sequence created new '5' or '3' splice sites within the intron and caused part of the intron to be recognized as an "exon" by the spliceosome. (Such mutations usually occur during cell division, when the genome is copied and proofreading errors fall into it.)

If the new Alu exon can undergo alternative splicing and join the messenger RNA transcript, the creature can enjoy both worlds. By including the Alu exon, his cells can make a new protein. But this new ability does not impair the original function of the gene, because the previous types of messenger RNA are still produced when the Alu exon is cut out. The problem only arises when a mutant Alu sequence undergoes permanent splicing, meaning it joins all the messenger RNA transcripts that are created from the gene, and then it may cause a genetic disease caused by the absence of the old protein. To date, three such diseases caused by misplaced Alu sequences have been identified: Alport and Sly syndromes and OAT deficiency disease.

My research colleague and I have shown that to turn an intronic Alu element into a true exon only one letter in the Alu sequence needs to be changed. Currently, the human genome contains approximately 500,000 Alu sequences located within introns, approximately 25,000 of which can become new exons through the same point mutation. That is, Alu sequences have the potential to further enrich the pool of genetic information available for the creation of new human proteins.

Healing with the help of RNA

At least 400 research laboratories and about 3,000 researchers worldwide are trying to understand the complex processes involved in alternative splicing. Although this research is at an early stage, there is agreement among the researchers that the findings hint at future healing possibilities, such as new gene therapy strategies that will take advantage of the splicing mechanism to treat inherited and acquired diseases such as cancer in a timely manner.

One approach could be by directing a short sequence of RNA or DNA, called a complementary sequence, so that it binds like a zipper to a unique target site in the patient's DNA or RNA. A complementary sequence can be inserted into the cells to mask a particular splice site or another control sequence, thereby diverting the splicing activity to another site. Richard Cole of the University of North Carolina at Chapel Hill demonstrated this method for the first time in blood stem cells taken from patients with an inherited disease known as beta-thalassemia. In this disease, a defective '5 splice site impairs the activity of the hemoglobin molecules responsible for transporting oxygen in the blood. By masking the mutation, Cole was able to redirect the splice back to the correct splice site and cause active hemoglobin to be produced.

Cole then showed that the same method could be used in cancer cells growing in culture. By masking the 5th splice site of the Bcl-x gene transcript that controls apoptosis, he was able to divert the splicing activity and cause it to produce the Bcl-x(S) type RNA transcript instead of Bcl-x(L). This caused the cancer cell to produce less protein that suppresses apoptosis and increase the production of the protein that promotes apoptosis. In some cancer cells, this change activated the apoptosis pathway, while in other cells it increased the apoptotic effect of the chemotherapy given concurrently with the mask sequences.

Another way to use the alternative splicing mechanism for healing purposes was demonstrated in 2003 by Adrian Kreiner and Luca Cartangi from Cold Spring Harbor Laboratories in Long Island, New York State. They found a way to make the cells add an exon to the RNA transcript that the cell should skip. They created a synthetic molecule that could be programmed to bind to any piece of RNA according to its sequence, and then attached to it the region of the SR protein that binds RNA. Therefore, this molecule can bind both to a unique sequence on the primary RNA transcript, and recruit the basic splicing mechanism to the correct splicing region. Kreiner and Cartengi used this method in human cells growing in culture to correct splicing errors in the mutant versions of the BRCA1 gene, which is involved in breast cancer, and the SMN2 gene, which causes spinal muscular atrophy.

A third method utilizes the ability of the spliceosome to join two different primary RNA transcripts generated from the same gene to create a complex messenger RNA molecule. This type of event, called trans-splicing, is common in worms but rarely occurs in human cells. If we succeed in forcing the spliceosome to cross-over, we can cut out a mutant region of the primary RNA that causes the disease and replace it with a sequence that codes for a normal protein. Not long ago, John Engelhart of the University of Iowa used this method in cultured cells to partially repair the primary RNA of a gene that produces a defective protein in the cells that line the airways of cystic fibrosis patients.

Before the decoding of the human genome, only a handful of scientists believed that an organism as complex as a human could exist with only 25,000 genes. Since the sequence was completed, alternative splicing has emerged as the central process that allows a small number of genes to produce a much larger variety of proteins required for building the body and brain and also for controlling the production of proteins in different tissues and at different times. Furthermore, the splicing process explains how the enormous variation between humans, mice and probably all mammals can arise from genomes that are so similar.

Evolution presents organisms with new possibilities and then selects for them the possibilities that give them an advantage. Thus, new proteins created through splicing of new exons created by Alu sequences likely helped create man as he is today. The continuation of research on alternative splicing of ciphers holds the promise of continuing to improve the quality of our lives.

Overview / The complexity of cutting and pasting

It is possible to edit the northern instructions in one gene using a cellular mechanism and get several meanings from that gene. Thus, a small pool of protein-coding genes can produce a much larger variety of proteins.

It has been understood for a long time that such alternative splicing of genetic information is possible, but only when the genome sequences of man and other creatures were decoded and it was possible to compare them did geneticists realize how common the alternative splicing is in complex organisms and how much this mechanism contributes to the creation of diversity between creatures with similar gene pools.

Alternative splicing allows a minimal number of genes to produce and maintain highly complex organisms by controlling the timing of gene expression, the type of proteins the organism produces and their location. Humans will soon be able to control the splicing process of human genes to fight disease.
One gene, many proteins

The conventional explanation for gene expression was simple: first the gene is transcribed from DNA to RNA, then the cellular splicing mechanism cuts out "junk" sequences called introns and joins the meaningful parts called exons into a final RNA version (messenger RNA ). Finally, the messenger RNA is translated into a protein. However, it turns out that these rules are not always correct. In complex organisms, the primary RNA transcript can undergo alternative splicing, that is, sometimes exons are thrown out as introns, or parts of introns are retained in the final transcript. This creates a variety of messenger RNA molecules and therefore a variety of proteins from one shield.

distinct expression of genes

A DNA sequence is transcribed into a single-stranded copy made of RNA. A cellular machinery cuts and splices (splices) the primary transcript. Each intron in the transcript is defined by means of unique nucleotide sequences at the beginning and at the end of the intron called the '5' and '3' splice sites respectively. The mechanism removes the introns and discards them, while it connects the exons to the messenger RNA version of the gene, the version that will eventually be translated by the cell into a protein.
*Gil Ast is a senior lecturer in the Department of Human Heredity and Molecular Medicine at the School of Medicine at Tel Aviv University. His research focuses on the molecular mechanism of primary RNA splicing, the evolution and control of alternative splicing and splicing defects associated with cancer and hereditary diseases. Recently, he collaborated with scientists at the Israeli company Compugen to develop a bioinformatics system to predict alternative splicing events to identify new versions of proteins.

The human genome scientist

The Scientific American website in Hebrew, where you can also purchase a subscription to the journal
https://www.hayadan.org.il/BuildaGate4/general2/data_card.php?Cat=~~~242885665~~~48&SiteName=hayadan

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.