Difference between revisions of "Tassel 5 GBS v2 Pipeline sample script for calling SNPs using reference genome"

From Poland Lab Wiki
Jump to: navigation, search
(Created page with "==== For this script to work properly, you should have 'gbs' folder in your home directory, and 'jobs' and 'projects' folders inside 'gbs' ==== Once you have the full script...")
 
Line 2: Line 2:
  
 
Once you have the full script copied in a file with extension '.sh', the 'name' variable should be replaced with something more explicit, such as '''hessian_F3''' for a Hessian fly project with F3 population. To do this in a semi-automatic fashion, use 'find and replace' function in your text editor and replace 'name' variable with 'hessian_F3' or any other project name.
 
Once you have the full script copied in a file with extension '.sh', the 'name' variable should be replaced with something more explicit, such as '''hessian_F3''' for a Hessian fly project with F3 population. To do this in a semi-automatic fashion, use 'find and replace' function in your text editor and replace 'name' variable with 'hessian_F3' or any other project name.
 +
 +
 +
Script can be downloaded from [[Media:Tassel5v2_sample_script.docx|here]] too, but make sure to copy and paste in a simple text file with '.sh' extension, for example '''hessian_F3.sh'''
  
  

Revision as of 05:35, 21 October 2016

For this script to work properly, you should have 'gbs' folder in your home directory, and 'jobs' and 'projects' folders inside 'gbs'

Once you have the full script copied in a file with extension '.sh', the 'name' variable should be replaced with something more explicit, such as hessian_F3 for a Hessian fly project with F3 population. To do this in a semi-automatic fashion, use 'find and replace' function in your text editor and replace 'name' variable with 'hessian_F3' or any other project name.


Script can be downloaded from here too, but make sure to copy and paste in a simple text file with '.sh' extension, for example hessian_F3.sh


Start your script with the following piece of code. First line is called Shebang and is required for a shell script. The 'PATH' variable can be changed if the software is installed elsewhere, otherwise, leave it as it is

#!/bin/bash
export PATH=$PATH:/homes/nss470/usr/bin:/homes/nss470/usr/bin/bin


Update the user variable with your K-State eID

user=eID


Requesting beocat resources

#$ -l h_rt=60:00:00
#$ -l mem=64G
#$ -M your_email -m aes
#$ -N name
#$ -cwd


Path to keyfile --> keyfile should have following 4 mandatory headers

'Flowcell' 'Lane' 'Barcode' 'FullSampleName'

Place the keyfile in your 'jobs' folder. It should be renamed to the what you are replacing the name variable with, for example 'hessian_F3.txt'

keyFile=/homes/$user/gbs/jobs/name.txt


Path to sequence directory

seqDir=/bulk/jpoland/sequence


Database path (reference genome)

dbPath=/bulk/jpoland/genome/ChineseSpring/pseudomolecules_v1.0/161010_Chinese_Spring_v1.0_pseudomolecules_parts


TASSEL5 path

tasselPath=/homes/nss470/softwares/tassel5/run_pipeline.pl


Create required directories for output

mkdir /homes/$user/gbs/projects/name
mkdir /homes/$user/gbs/projects/name/keyFileSh
cd /homes/$user/gbs/projects/name


Running TASSEL 5 SNP calling pipeline in series of steps. Each step uses mostly the default settings that can be changed if desired

## GBSSeqToTagDBPlugin - RUN Tags to DB
$tasselPath -Xms64G -Xmx64G -fork1 -GBSSeqToTagDBPlugin -e PstI-MspI \
    -i $seqDir \
    -db name.db \
    -k $keyFile \
    -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 250000000 \
    -endPlugin -runfork1 >> z_name_GBSSeqToTagDB.out
 
## TagExportToFastqPlugin - export Tags
$tasselPath -fork1 -TagExportToFastqPlugin \
    -db name.db \
    -o name_tagsForAlign.fq -c 1 \
    -endPlugin -runfork1 >> z_name_TagExportToFastq.out
 
## RUN BOWTIE
bowtie2 -p 15 --very-sensitive-local \
    -x $dbPath \
    -U name_tagsForAlign.fq \
    -S name.sam >> z_name_bowtie2.out
 
## SAMToGBSdbPlugin - SAM to DB
$tasselPath -Xms64G -Xmx64G -fork1 -SAMToGBSdbPlugin \
    -i name.sam \
    -db name.db \
    -aProp 0.0 -aLen 0 \
    -endPlugin -runfork1 >> z_name_SAMToGBSdb.out
 
## DiscoverySNPCallerPluginV2 - RUN DISCOVERY SNP CALLER
$tasselPath -Xms64G -Xmx64G -fork1 -DiscoverySNPCallerPluginV2 \
    -db name.db \
    -mnLCov 0.1 -mnMAF 0.01 -deleteOldData true \
    -endPlugin -runfork1 >> z_name_DiscoverySNPCaller.out
 
## SNPQualityProfilerPlugin - RUN QUALITY PROFILER
$tasselPath -Xms64G -Xmx64G -fork1 -SNPQualityProfilerPlugin \
    -db name.db \
    -statFile name_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_name_SNPQualityProfiler.out
 
## UpdateSNPPositionQualityPlugin - UPDATE DATABASE WITH QUALITY SCORE
$tasselPath -Xms64G -Xmx64G -fork1 -UpdateSNPPositionQualityPlugin \
    -db name.db \
    -qsFile name_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_name_UpdateSNPPositionQuality.out
 
## ProductionSNPCallerPluginV2 - RUN PRODUCTION PIPELINE - output .h5
$tasselPath -Xms64G -Xmx64G -fork1 -ProductionSNPCallerPluginV2 \
    -db name.db \
    -i $seqDir \
    -k $keyFile \
    -o name.h5 \
    -e PstI-MspI -kmerLength 64 \
    -endPlugin -runfork1 >> z_name_ProductionSNPCaller.out
 
## ProductionSNPCallerPluginV2 - RUN PRODUCTION PIPELINE - output .vcf
$tasselPath -Xms64G -Xmx64G -fork1 -ProductionSNPCallerPluginV2 \
    -db name.db \
    -i $seqDir \
    -k $keyFile \
    -o name.vcf \
    -e PstI-MspI -kmerLength 64 \
    -endPlugin -runfork1 >> z_name_ProductionSNPCaller.out
 
## Convert to Hapmap format
$tasselPath -Xms64G -Xmx64G -fork1 -vcf name.vcf \
	-export name -exportType Hapmap


Copy keyfile and .sh file for backup

mv /homes/$user/gbs/jobs/name.* keyFileSh/


If everything ran correctly, you should get a folder named 'hessian_F3' in your gbs/projects folder with all the output files including SNP containing .vcf and .hmp.txt that can be used for further analyses

End