An Informatics system for Transcriptome Analysis Data
TransAtlasDB is a sophisticated bioinformatics system, which incorporates both the Relational Database and NoSQL Database system.
The relational database, MySQL, enforces data assurance and integrity and the NoSQL database system, FastBit, ensures rapid performance for querying big data.
Hence, both database systems are required to be installed and added to the systems’ or users’ executable path.
First, we need to download appropriate databases and required Perl modules from the Usage & Downloads page, before proceeding to the next section.
Assume that we have downloaded TransAtlasDB toolkit and unpacked the TAR package using
You will see that the directory contains several Perl programs with .pl suffix.
IMPORTANT: Make sure the TransAtlasDB folder is in your desired permanent location before proceeding, preferably the
NOTE: If you already added the transatlasdb path into your system executable path, then typing
The TransAtlasDB database system and necessary components need to be installed to a local disk using INSTALL-tad.pL and user must have root privileges to install.
IMPORTANT: TransAtlasDB installation requires superuser (root) privileges.
Only the
# Syntax for installing TransAtlasDB using defaults and mysql password as 'mysql-password'.
sudo perl transatlasdb-location/INSTALL-tad.pL -password mysql-password
Otherwise, arguments such as the mysql username
# Syntax for installing TransAtlasDB specifying username as 'mysql-username' and databasename as 'mysql-databasename'.
sudo perl INSTALL-tad.pL -p mysql-password -username mysql-username -databasename mysql-databasename
The NoSQL folder-name
# Syntax for installing TransAtlasDB specifying username as 'mysql-username' and databasename as 'mysql-databasename' with nosql folder 'nosqlfolder'.
sudo perl INSTALL-tad.pL -p mysql-password -u mysql-username -d mysql-databasename -fastbitname nosqlfolder
The installational module needs to be carried out once per local disk to prevent database access conflict. However, if such conflict arises user settings can be viewed and/or corrected for using connect-tad.pL.
# Syntax for viewing user installation details.
connect-tad.pL
# Syntax for setting existing user installation details. Parameters are the same as the INSTALL-tad.pL script.
connect-tad.pL -p mysql-password -u mysql-username -d mysql-databasename -fastbitname nosqlfolder
Transcriptome analysis data can be imported using tad-import.pl.
The samples information, commonly known as, samples metadata consists of the relevant details to uniquely identify each specimen used for RNAseq.
The samples metadata can be imported via
# Syntax for importing the example FAANG metadata provided.
tad-import.pl -metadata example/metadata/FAANG/FAANG_GGA_UD.xlsx
# The second option to import a tab-delimited file using the example provided.
tad-import.pl -metadata -t example/metadata/TEMPLATE/metadata_GGA_UD.txt
Transcriptome analysis results can be inserted using
# Syntax for importing all RNASeq expression and variant results in for sample 'GGA_UD_1004'.
tad-import.pl -data2db -all example/sample_sxt/GGA_UD_1004
# Syntax for importing RNASeq expression profiling information in for sample 'GGA_UD_1004'.
tad-import.pl -data2db -gene example/sample_sxt/GGA_UD_1004
# Syntax for importing RNASeq variant results in for sample 'GGA_UD_1004'.
tad-import.pl -data2db -variant example/sample_sxt/GGA_UD_1004
The variant functional annotations predicted from either VEP or ANNOVAR can also be imported using additional flags
# Syntax for importing RNASeq variant results with VEP annotations for sample 'GGA_UD_1004'.
tad-import.pl -data2db -variant -vep example/sample_sxt/GGA_UD_1014
# Syntax for importing RNASeq data with ANNOVAR annotations for sample 'GGA_UD_1004'.
tad-import.pl -data2db -all example/sample_sxt/GGA_UD_1004 -annovar
NOTE: The analysis data should be stored in a single folder for each sample and the folder-name must be the same sample name as represented in the samples metadata. A typical folder directory structure is shown here.
Transcriptome analysis data previously stored can be retrieved using tad-export.pl.
NOTE: Analysis data can also be retrieved using the web interface.
The export module offers two methods of extracting data from the database; one by performing data manipulation language (DML) SQL statements using
Users can execute direct SQL statements via
More information on writing Select statements to FastBit can be viewed here.
For instance, executing 'show tables' will retrieve all the rows currently in the database, which can be stored as a tab-delimited file.
# Syntax to view all the tables in the database.
tad-export.pl -query 'show tables'
# Syntax to retrieve all the rows in the Sample table and store the results in a tab-delimited file 'output.txt'.
tad-export.pl -query 'select * from Sample' -output output.txt
# Syntax on nosql database, to view the first ten rows in the gene-information folder.
tad-export.pl -nosql gene-information -query 'select genename,organism,tissue,fpkm,tpm where 1=1 limit 10'
The pre-defined user statements can be accessed via
# Syntax to view the expression values of genes 'MST' and 'GDF' for the 'Gallus gallus' specie in the database.
tad-export.pl -db2data -avgfpkm -gene 'MST,GDF' -species 'Gallus gallus'
# To export prior syntax as a tab-delimited file 'output.txt'.
tad-export.pl -db2data -avgfpkm -gene 'MST,GDF' -species 'Gallus gallus' -o output.txt
# Syntax to view all the genes of all samples in the database for 'Gallus gallus' organism.
tad-export.pl -db2data -genexp -species 'Gallus gallus'
# To export prior syntax as a tab-delimited file 'output.txt'.
tad-export.pl -db2data -genexp -species 'Gallus gallus' -o output.txt
# Syntax to view expression profiles of genes 'OPTN' and 'GDF' for samples 'GGA_UD_1004'.
tad-export.pl -db2data -genexp -species 'Gallus gallus' -gene 'OPTN,GDF' -sample 'GGA_UD_1004' -o output.txt
# Syntax to view chromosome variant counts for all samples in the database for 'Gallus gallus' organism.
tad-export.pl -db2data -chrvar -species 'Gallus gallus'
# To export prior syntax as a tab-delimited file 'output.txt'.
tad-export.pl -db2data -chrvar -species 'Gallus gallus' -o output.txt
# Syntax to view variant count fo expression profiles of chromosomes 'chr1, chr2, chr3' for all samples.
tad-export.pl -db2data -chrvar -species 'Gallus gallus' -chromosome 'chr1,chr2,chr3' -o output.txt
# Syntax to view variant count fo expression profiles of chromosomes 'chr1, chr2, chr3' for sample 'GGA_UD_1004'.
tad-export.pl -db2data -chrvar -species 'Gallus gallus' -chromosome 'chr1,chr2,chr3' -sample 'GGA_UD_1004' -o output.txt
# Syntax to view associated variants with respective annotation information for 'OPTN' gene in 'Gallus gallus' organism.
tad-export.pl -db2data -varanno -gene 'OPTN' -species 'Gallus gallus'
# Syntax to view associated variants with respective annotation for chromosomes 'chr1,chr4,chr7' in 'Gallus gallus' organism.
tad-export.pl -db2data -varanno -chromosome 'chr1, chr4, chr7' -species 'Gallus gallus'
# Syntax to view associated variants with respective annotation for chr1:50000-900000 in 'Gallus gallus' organism.
tad-export.pl -db2data -varanno -chromosome 'chr1' -region 50000-900000 -species 'Gallus gallus'
# To export prior syntax as a tab-delimited file 'output.txt'.
tad-export.pl -db2data -varanno -chromosome 'chr1' -region 50000-900000 -species 'Gallus gallus' -o output.txt
# To export prior syntax as a vcf file 'output.vcf'.
tad-export.pl -db2data -varanno -chromosome 'chr1' -region 50000-900000 -species 'Gallus gallus' -o -vcf output.vcf
If uncertain on how to proceed with the export module, the interaction module provides an easy-to-use menu-driven interface.
The menu offers seven choices of exploratory research interest and provides a detailed description of what can be done from the module.
# Syntax to begin the interaction module.
tad-interact.pl
NOTE: The interaction module only outputs a small subset of results and further instructions on how to export the complete results will be displayed.
Sample results can only be imported once to ensure data integrity, nonetheless, previously imported data can be cautiously deleted using
# Syntax for deleting sample 'GGA_UD_1004' from the database.
tad-import.pl -delete GGA_UD_1004
Please click the menu items to navigate through this repository. If you have questions, comments and bug reports, please email me directly.
Thank you very much for your help and support!