Comparing genomes (continued)
Insects: Drosophila and Anopheles
The insects are the most species-rich and morphologically diverse animal group on earth. Two insect genomes have been sequenced. Drosophila melanogaster has been a laboratory model for genetic studies for much of the last century, and is arguably the best understood gene system in biology. Anopheles gambiae, the malaria mosquito, along with Plasmodium falciparum, the protistan parasite it transmits, have together an enormous impact on the worlds health, and the genomes of both were sequenced in 2002.
The fruit fly Drosophila and the mosquito Anopheles are separated by approximately 250 million years of evolution, and appear to have evolved more rapidly over that interval than vertebrates. The extent of similarlity between these two insects is similar to that between humans and pufferfish, which diverged 450 million years ago. The organization of genes on the chromosomes has undergone significant shuffling between the two insect species. Interestingly, Drosophila exhibits less noncoding DNA than Anopheles, although the evolutionary force driving this reduction in noncoding regions is not clear.
The Protist Plasmodium
The parasitic protist Plasmodium falciparum, which causes malaria, has a relatively small genome of 23 Mb that proved very difficult to sequence, as it has an unusually high proportion of adenine and thymine. The project took five years to complete. P. falciparum appears to have about 5300 genes, with genes of related function clustered together, suggesting they might share the same regulatory DNA.
One interesting finding concerns an odd subcellular component called the apicoplast, found only in Plasmodium and its relatives. It appears to be derived from a chloroplast appropriated from algae consumed by the parasites ancestor. Analysis of the Plasmodium genome reveals that about 12% of all the parasites proteins, encoded by the nuclear genome, head for the apicoplast. These proteins act there to produce fatty acids, the only place the parasite makes the fatty acids it needs to survive. This suggests that drugs targeted at this biochemical pathway might be very effective against malaria.
Flowering Plants: Rice and Arapidopsis
Few plant genomes have been sequenced, the first being Arabidopsis thaliana, the wall cress, a tiny member of the mustard family used often as a model organism for studying plant molecular genetics and development. The genome sequence was largely completed in 2000. It has 25,948 genes, about as many as humans genes as humans have.
While the importance of Arapidopsis is largely as an experimental model with no commercial significance, the second plant genome for which a draft sequence has been prepared, rice, is of enormous economic significance. Oryza sativa belongs to the grass family, which includes maize (corn), wheat, barley, sorghum, and sugarcane. Together these crops provide most of the worlds food and animal feed. Unlike most grasses, rice has a relatively small genome of 430 Mb (the maize genome is 2500 Mb, and barley is an enormous 4900 Mb). Two different varieties of rice were sequenced, yielding similar results. The proportion of the rice nuclear genome devoted to repetitive DNA, for example, was 42% in one variety, and 45% in the other. Retrotransposons, the most numerous large repeats, account for more than 15% of the rice genome in each study.
The rice genome has proven to contain a suprisingly large number of genes. Both rice draft sequences place the gene number for Oryza higher than any other genome yet sequenced. One study suggests 53,000 to 63,000 genes; the other, using more conservative criteria, indicates 33,000 to 50,000 genes. These numbers will become more precise as genome annotation continues.
More than 80% of the genes found in rice are also found in Arabidopsis. Among the other 20% must be the genes responsible for the many physiological and morphological differences between rice (a monocot) and Arabidopsis (a dicot), very different kinds of flowering plants.
About one-third of the genes in Arabidopsis and rice appear to be in some sense plant genes, not found in any animal or fungal genome sequenced so far. These include the many thousands of genes involved in photosynthesis and photosynthetic anatomy. Among the other plant genes are many that are very similar to those found in animal and fungi genomes, particularly genes involved in basic intermediary metabolism, in genome replication and repair, and in protein synthesis.
Both rice and Arabidopsis have higher copy numbers for gene families (multiple slightly divergent copies of a gene) than are seen in animals or fungi, spread among unlinked chromosomes amidst clusters of other duplicated genes. This suggests that these plants have undergone numerous episodes of polyploidy and/or segmental duplication during the 150 to 200 million years since rice and Arabidopsis diverged from a common ancestor.