Difference between revisions of "Tassel 5 GBS v2 Pipeline sample script for calling SNPs using reference genome"

From Poland Lab Wiki
Jump to: navigation, search
Line 146: Line 146:
*''Copy keyfile and .sh file for backup''
*''Copy keyfile and .sh file for backup''
<syntaxhighlight lang="bash">
mv /homes/$user/gbs/jobs/name.* keyFileSh/
mv /homes/$user/gbs/jobs/name.* keyFileSh/

Revision as of 16:12, 21 October 2016

For this script to work properly, you should have gbs folder in your home directory, and jobs and projects folders inside gbs

Only copy the code, not the guiding text. Once you have the full script copied in a simple text file with extension .sh, the name variable should be replaced with something more explicit, such as hessian_F3 for a Hessian fly project with F3 population. To do this in a semi-automatic fashion, use find and replace function in your text editor and replace name variable with hessian_F3 or any other project name.

To keep everything organized, I recommend using the same name for the project and the files, for example

Project name - hessian_F3

Keyfile - hessian_F3.txt

Shell script - hessian_F3.sh

Script can be also downloaded from here, but make sure to copy and paste in a simple text file with .sh extension, for example hessian_F3.sh.

Once the Keyfile and shell script are ready, put them in jobs folder and run qsub hessian_F3.sh

  • Start your script with the following piece of code. First line is called Shebang and is required for a shell script. The PATH variable can be changed if the software is installed elsewhere, otherwise, leave it as it is
export PATH=$PATH:/homes/nss470/usr/bin:/homes/nss470/usr/bin/bin

  • Update the user variable with your K-State eID

  • Requesting beocat resources
#$ -l h_rt=60:00:00
#$ -l mem=64G
#$ -M your_email -m aes
#$ -N name
#$ -cwd

  • Your keyfile should have following four mandatory headers
'Flowcell' 'Lane' 'Barcode' 'FullSampleName'

  • Path to sequence directory

  • Database path (reference genome)

  • TASSEL5 path

  • Create required directories for output
mkdir /homes/$user/gbs/projects/name
mkdir /homes/$user/gbs/projects/name/keyFileSh
cd /homes/$user/gbs/projects/name

  • Running TASSEL 5 SNP calling pipeline in series of steps. Each step uses mostly the default settings that can be changed if desired
## GBSSeqToTagDBPlugin - RUN Tags to DB
$tasselPath -Xms64G -Xmx64G -fork1 -GBSSeqToTagDBPlugin -e PstI-MspI \
    -i $seqDir \
    -db name.db \
    -k $keyFile \
    -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 250000000 \
    -endPlugin -runfork1 >> z_name_GBSSeqToTagDB.out
## TagExportToFastqPlugin - export Tags
$tasselPath -fork1 -TagExportToFastqPlugin \
    -db name.db \
    -o name_tagsForAlign.fq -c 1 \
    -endPlugin -runfork1 >> z_name_TagExportToFastq.out
bowtie2 -p 15 --very-sensitive-local \
    -x $dbPath \
    -U name_tagsForAlign.fq \
    -S name.sam >> z_name_bowtie2.out
## SAMToGBSdbPlugin - SAM to DB
$tasselPath -Xms64G -Xmx64G -fork1 -SAMToGBSdbPlugin \
    -i name.sam \
    -db name.db \
    -aProp 0.0 -aLen 0 \
    -endPlugin -runfork1 >> z_name_SAMToGBSdb.out
## DiscoverySNPCallerPluginV2 - RUN DISCOVERY SNP CALLER
$tasselPath -Xms64G -Xmx64G -fork1 -DiscoverySNPCallerPluginV2 \
    -db name.db \
    -mnLCov 0.1 -mnMAF 0.01 -deleteOldData true \
    -endPlugin -runfork1 >> z_name_DiscoverySNPCaller.out
## SNPQualityProfilerPlugin - RUN QUALITY PROFILER
$tasselPath -Xms64G -Xmx64G -fork1 -SNPQualityProfilerPlugin \
    -db name.db \
    -statFile name_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_name_SNPQualityProfiler.out
$tasselPath -Xms64G -Xmx64G -fork1 -UpdateSNPPositionQualityPlugin \
    -db name.db \
    -qsFile name_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_name_UpdateSNPPositionQuality.out
## ProductionSNPCallerPluginV2 - RUN PRODUCTION PIPELINE - output .h5
$tasselPath -Xms64G -Xmx64G -fork1 -ProductionSNPCallerPluginV2 \
    -db name.db \
    -i $seqDir \
    -k $keyFile \
    -o name.h5 \
    -e PstI-MspI -kmerLength 64 \
    -endPlugin -runfork1 >> z_name_ProductionSNPCaller.out
## ProductionSNPCallerPluginV2 - RUN PRODUCTION PIPELINE - output .vcf
$tasselPath -Xms64G -Xmx64G -fork1 -ProductionSNPCallerPluginV2 \
    -db name.db \
    -i $seqDir \
    -k $keyFile \
    -o name.vcf \
    -e PstI-MspI -kmerLength 64 \
    -endPlugin -runfork1 >> z_name_ProductionSNPCaller.out
## Convert to Hapmap format
$tasselPath -Xms64G -Xmx64G -fork1 -vcf name.vcf \
	-export name -exportType Hapmap

  • Copy keyfile and .sh file for backup
mv /homes/$user/gbs/jobs/name.* keyFileSh/

If everything ran correctly, you should get a folder named hessian_F3 in your gbs/projects folder with all the output files including SNPs containing .vcf and .hmp.txt files that can be used for further analyses