University of Oulu

Ojeda, D., Mattila, T., Ruttink, T., Kujala, S., Kärkkäinen, K., Verta, J., Pyhäjärvi, T. (2019) Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris. G3: Genes, Genomes, Genetics October 1, 2019 vol. 9 no. 10 3409-3421;

Utilization of tissue ploidy level variation in de novo transcriptome assembly of Pinus sylvestris

Saved in:
Author: Ojeda, Dario I.1; Mattila, Tiina M.2; Ruttink, Tom3;
Organizations: 1Department of Ecology and Genetics, Norwegian Institute of Bioeconomy Research, 1433, Ås, Norway
2Department of Ecology and Genetics
3Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), 9090 Melle, Belgium
4Department of Ecology and Genetics, Natural Resources Institute Finland (Luke), 90570, Oulu, Finland
5Natural Resources Institute Finland (Luke), 90570, Oulu, Finland
6Organismal and Evolutionary Biology. University of Helsinki, P.O. Box 3, 00014, Helsinki, Finland
7Department of Ecology and Genetics, Biocenter Oulu, University of Oulu, P.O.Box 8000, FI-90014, Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 1.4 MB)
Persistent link:
Language: English
Published: Genetics Society of America, 2019
Publish Date: 2020-01-31


Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.

see all

Series: G3. Genes, genomes, genetics
ISSN: 2160-1836
ISSN-E: 2160-1836
ISSN-L: 2160-1836
Volume: 9
Issue: 10
Pages: 3409 - 3421
DOI: 10.1534/g3.119.400357
Type of Publication: A1 Journal article – refereed
Field of Science: 1184 Genetics, developmental biology, physiology
Funding: This work was funded by the Academy of Finland (287431 and 293819 to TP, GENOWOOD no. 307582), and the European Commission (EVOLTREE
Academy of Finland Grant Number: 287431
Detailed Information: 287431 (Academy of Finland Funding decision)
293819 (Academy of Finland Funding decision)
307582 (Academy of Finland Funding decision)
Copyright information: © 2019 Ojeda et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.