Ugrás a tartalomhoz

Molecular diagnostics

Dr. István Balogh, Dr. János Kappelmayer, Dr. József Tőzsér (2011)

University of Debrecen

Chapter 1. 1. Biological information

Chapter 1. 1. Biological information

Table of Contents

The main purpose of molecular diagnostical procedures is to detect qualitative and/or quantitative changes in the human genome. Molecular diagnostics in its wider sense involves all molecular biological methods that analyze the background of inborn errors, therefore, it contains in addition to the genetic and genomic approaches, the proteomic technologies as well. This chapter, however, deals with the narrower sense of molecular diagnostics that includes purely the analysis of nucleic acids.

The human genome consists of two main elements, DNA is present in the nucleus and in the mitochondria. Mitochondrial DNA is organized into one circular DNA molecule, which is 16.6 kilobases (kb) long. It is important from the molecular pathogenetic perspective that the mitochondrial DNA is inherited only through the mother. Mitochondrial DNA contains 13 protein-coding intronless genes, which are different from the majority of the nuclear genes, as these latter ones usually contain introns. The nuclear genome is 3.1 gigabases (Gb) long and it is organized into chromosomes. The human chromosome set contains 22 autosomes and two sex chromosomes, namely X and Y. There are approximately 20,000 protein-coding genes with a gene density of 1/120 kb. Human genes show extreme variablity in their length and in their exon number. The average human gene consists of 10 exons. The highest number of exons in any human gene has been shown to be 363 (in the protein called titin, which is expressed in the muscle). The largest human gene is dystrophin (2400 kb). The encoded protein with the same name is also expressed in the muscle tissue. Mutations in the dystrophin gene lead to the Duchenne/Becker muscular dystrophy.

The functional expression of the genetic information coded in the DNA is regulated in many different ways (Figure 1.1).

Figure 1.1. Figure 1.1 Organization of the biological information from the perspective of genetics

Figure 1.1 Organization of the biological information from the perspective of genetics

The general direction of the flow of genetic information is DNA-RNA-protein. The sources of the original information is in the genomic or mitochondrial DNA. DNA is transcribed to RNA and the genetic information will finally be present in the expressed proteins (Figure 1.1a).

All levels are affected by complex regulatory processes, many of which are still unresolved in terms of details. Complexes that are formed between DNA and proteins define the active regions together with the epigenetic modifications of DNA. Transcriptional regulation can be observed in the different patterns of splicing, in the tissue-specific protein expression and in using small regulatory RNA molecules. Proteins, when translated, might be subject to extensive postranslational modifications (Figure 1.1b).

The average length of an exon is 300 base pairs. The main functional expression of the genetic information is protein synthesis. The genetic information coded in the genes is transcribed to RNA first, which is translated to protein. The primary transcript contains the entire sequence of the genes. During the maturation of the mRNA, the introns get spliced. The mRNA molecule will go through some modifications as described below:

  • 5’ capping. mRNA molecule is modified in its 5’ end with a 7-methylguanosine. Main function of the 5’ cap is to protect mRNA from the 5’- 3’ exonuclease digestion to facilitate the transport of mRNA to cytoplasm tofacilitate the splicing and binding of the ribosomal apparatus.

  • Polyadenilation signal. The mRNA molecule is modified in its 3’ end with an approximately 200 nucleotide-long adenine tale. The site of addition is labelled by AAUUAAA sequences motif. mRNA is cleaved approximately 15-30 nucleotide downstream from this signal and the adenines are added. Role of the polyadenilation signal are similar to that of the 5’ cap. It supports the cytoplasmic transport of mRNA, it stabilizes the mRNA and facilitates the recognition of ribosomal apparatus. A summary of the mRNA modifications is shown in Picture 2.

The importance of the sequence motives around the polyadenilation signal is highlighted by the fact that mutations occuring in this region might be pathogenic by interfering with the addition of the signal (prothrombin gene 20210A allele). Wild type allele in this case is the guanine at the nucleotide position 20210. When G to A mutation occurs, the result will be a more stable or better processed mRNA molecule, which will serve as a template for more effective protein synthesis. This increased amount of newly synthesized prothrombin protein will be secreted, the blood plasma level of prothrombin will be elevated as a molecular phenotypic consequence. The elevated plasma level of prothrombin incerases the risk of venous thrombosis by 2- to 3-fold as compared to individuals not possessing the mutant allele. The importance of the above mentioned mutation is dual. It is not only a good example of the defect of the fine-tuning of the transcription regulation but it also marks a phenomenon whose prevalence rate might be as high as 2-4% in the general population.

Figure 1.2. Figure 1.2 From the gene to the protein.

Figure 1.2 From the gene to the protein.

Almost all protein-coding genes contain introns. Non-coding introns constitute the largest part of the gene, coding exons are normally much shorter. During the transcription process, the primary RNA transcript contains the introns and splicing will produce intronless mRNA. For the splicing to take place correctly, well-defined signals are necessary, which indicate the exon-intron boundary. Figure 1.3 shows a gene segment with one intron and 2 exons. The 5’ end of the intron is the donor splicing site and 3’ and is the acceptor site. Immediately preceding and following the exons, invariant GT and AG dinucleotides can be found in the intron (numbers above the letters show the relative occurrence of the given nucleotide). The adenine in the branch point sequences is also invariable. Y means either citosine or timine, N can be any nucleotide. Mutations that might occur in the invariant sites will have a fundamental effect on the process of splicing, as a result of which the proteins generated will often be decreased in amount or altered in their structure. In addition to the GT-AG introns, there are other types of introns, although they are very rare. The importance of the splicing signals is highlighted by the fact that almost all mutations affecting them are pathogenic.

Figure 1.3. Figure 1.3 Consensus sequences of the exon intron boundary

Figure 1.3 Consensus sequences of the exon intron boundary