Difference between revisions of "Tassel 5 GBS v2 Pipeline sample script for calling SNPs using reference genome"

From Poland Lab Wiki
Jump to: navigation, search
Line 1: Line 1:
 
==== For this script to work properly, you should have 'gbs' folder in your home directory, and 'jobs' and 'projects' folders inside 'gbs' ====
 
==== For this script to work properly, you should have 'gbs' folder in your home directory, and 'jobs' and 'projects' folders inside 'gbs' ====
<br>
 
  
Only copy the code, not the guiding text. Once you have the full script copied in a file with extension '.sh', the 'name' variable should be replaced with something more explicit, such as '''hessian_F3''' for a Hessian fly project with F3 population. To do this in a semi-automatic fashion, use 'find and replace' function in your text editor and replace 'name' variable with 'hessian_F3' or any other project name.
+
Only copy the code, not the guiding text. Once you have the full script copied in a simple text file with extension '.sh', the 'name' variable should be replaced with something more explicit, such as '''hessian_F3''' for a Hessian fly project with F3 population. To do this in a semi-automatic fashion, use 'find and replace' function in your text editor and replace 'name' variable with 'hessian_F3' or any other project name.
  
 +
<blockquote>
 +
'''''To keep everything organized, I recommend using same name for the project and the files, for example'''''
 +
 +
Project name - hessian_F3
 +
 +
Keyfile - hessian_F3.txt
 +
 +
Shell script - hessian_F3.sh
 +
</blockquote>
  
 
Script can be downloaded from [[Media:Tassel5v2_sample_script.docx|here]] too, but make sure to copy and paste in a simple text file with '.sh' extension, for example '''hessian_F3.sh'''
 
Script can be downloaded from [[Media:Tassel5v2_sample_script.docx|here]] too, but make sure to copy and paste in a simple text file with '.sh' extension, for example '''hessian_F3.sh'''
  
 +
Once the Keyfile and shell script are ready, put them in 'jobs' folder and run <code> qsub hessian_F3.sh </code>
  
''Start your script with the following piece of code. First line is called Shebang and is required for a shell script. The 'PATH' variable can be changed if the software is installed elsewhere, otherwise, leave it as it is''
 
  
 +
*''Start your script with the following piece of code. First line is called Shebang and is required for a shell script. The 'PATH' variable can be changed if the software is installed elsewhere, otherwise, leave it as it is''
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
#!/bin/bash
 
#!/bin/bash
Line 16: Line 25:
  
  
''Update the ''<code>user</code> ''variable with your K-State eID''
+
*''Update the ''<code>user</code> ''variable with your K-State eID''
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
user=eID
 
user=eID
Line 22: Line 31:
  
  
''Requesting beocat resources''
+
*''Requesting beocat resources''
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
#$ -l h_rt=60:00:00
 
#$ -l h_rt=60:00:00
Line 32: Line 41:
  
  
''Path to keyfile --> '''keyfile''' should have following 4 mandatory headers''
+
*''Path to keyfile --> '''keyfile''' should have following 4 mandatory headers''
  
'Flowcell' 'Lane' 'Barcode' 'FullSampleName'
+
'Flowcell'   'Lane'   'Barcode'   'FullSampleName'
  
 
''Place the keyfile in your 'jobs' folder. It should be renamed to the what you are replacing the name variable with, for example 'hessian_F3.txt'''
 
''Place the keyfile in your 'jobs' folder. It should be renamed to the what you are replacing the name variable with, for example 'hessian_F3.txt'''
Line 42: Line 51:
  
  
''Path to sequence directory''
+
*''Path to sequence directory''
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
seqDir=/bulk/jpoland/sequence
 
seqDir=/bulk/jpoland/sequence
Line 48: Line 57:
  
  
''Database path (reference genome)''
+
*''Database path (reference genome)''
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
dbPath=/bulk/jpoland/genome/ChineseSpring/pseudomolecules_v1.0/161010_Chinese_Spring_v1.0_pseudomolecules_parts
 
dbPath=/bulk/jpoland/genome/ChineseSpring/pseudomolecules_v1.0/161010_Chinese_Spring_v1.0_pseudomolecules_parts
Line 54: Line 63:
  
  
''TASSEL5 path''
+
*''TASSEL5 path''
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
tasselPath=/homes/nss470/softwares/tassel5/run_pipeline.pl
 
tasselPath=/homes/nss470/softwares/tassel5/run_pipeline.pl
Line 60: Line 69:
  
  
''Create required directories for output''
+
*''Create required directories for output''
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
mkdir /homes/$user/gbs/projects/name
 
mkdir /homes/$user/gbs/projects/name
Line 68: Line 77:
  
  
''Running TASSEL 5 SNP calling pipeline in series of steps. Each step uses mostly the default settings that can be changed if desired''
+
*''Running TASSEL 5 SNP calling pipeline in series of steps. Each step uses mostly the default settings that can be changed if desired''
 
<syntaxhighlight lang="java">
 
<syntaxhighlight lang="java">
 
## GBSSeqToTagDBPlugin - RUN Tags to DB
 
## GBSSeqToTagDBPlugin - RUN Tags to DB
Line 139: Line 148:
  
  
''Copy keyfile and .sh file for backup''
+
*''Copy keyfile and .sh file for backup''
 
<pre>
 
<pre>
 
mv /homes/$user/gbs/jobs/name.* keyFileSh/
 
mv /homes/$user/gbs/jobs/name.* keyFileSh/

Revision as of 06:15, 21 October 2016

For this script to work properly, you should have 'gbs' folder in your home directory, and 'jobs' and 'projects' folders inside 'gbs'

Only copy the code, not the guiding text. Once you have the full script copied in a simple text file with extension '.sh', the 'name' variable should be replaced with something more explicit, such as hessian_F3 for a Hessian fly project with F3 population. To do this in a semi-automatic fashion, use 'find and replace' function in your text editor and replace 'name' variable with 'hessian_F3' or any other project name.

To keep everything organized, I recommend using same name for the project and the files, for example

Project name - hessian_F3

Keyfile - hessian_F3.txt

Shell script - hessian_F3.sh

Script can be downloaded from here too, but make sure to copy and paste in a simple text file with '.sh' extension, for example hessian_F3.sh

Once the Keyfile and shell script are ready, put them in 'jobs' folder and run qsub hessian_F3.sh


  • Start your script with the following piece of code. First line is called Shebang and is required for a shell script. The 'PATH' variable can be changed if the software is installed elsewhere, otherwise, leave it as it is
#!/bin/bash
export PATH=$PATH:/homes/nss470/usr/bin:/homes/nss470/usr/bin/bin


  • Update the user variable with your K-State eID
user=eID


  • Requesting beocat resources
#$ -l h_rt=60:00:00
#$ -l mem=64G
#$ -M your_email -m aes
#$ -N name
#$ -cwd


  • Path to keyfile --> keyfile should have following 4 mandatory headers

'Flowcell' 'Lane' 'Barcode' 'FullSampleName'

Place the keyfile in your 'jobs' folder. It should be renamed to the what you are replacing the name variable with, for example 'hessian_F3.txt'

keyFile=/homes/$user/gbs/jobs/name.txt


  • Path to sequence directory
seqDir=/bulk/jpoland/sequence


  • Database path (reference genome)
dbPath=/bulk/jpoland/genome/ChineseSpring/pseudomolecules_v1.0/161010_Chinese_Spring_v1.0_pseudomolecules_parts


  • TASSEL5 path
tasselPath=/homes/nss470/softwares/tassel5/run_pipeline.pl


  • Create required directories for output
mkdir /homes/$user/gbs/projects/name
mkdir /homes/$user/gbs/projects/name/keyFileSh
cd /homes/$user/gbs/projects/name


  • Running TASSEL 5 SNP calling pipeline in series of steps. Each step uses mostly the default settings that can be changed if desired
## GBSSeqToTagDBPlugin - RUN Tags to DB
$tasselPath -Xms64G -Xmx64G -fork1 -GBSSeqToTagDBPlugin -e PstI-MspI \
    -i $seqDir \
    -db name.db \
    -k $keyFile \
    -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 250000000 \
    -endPlugin -runfork1 >> z_name_GBSSeqToTagDB.out
 
## TagExportToFastqPlugin - export Tags
$tasselPath -fork1 -TagExportToFastqPlugin \
    -db name.db \
    -o name_tagsForAlign.fq -c 1 \
    -endPlugin -runfork1 >> z_name_TagExportToFastq.out
 
## RUN BOWTIE
bowtie2 -p 15 --very-sensitive-local \
    -x $dbPath \
    -U name_tagsForAlign.fq \
    -S name.sam >> z_name_bowtie2.out
 
## SAMToGBSdbPlugin - SAM to DB
$tasselPath -Xms64G -Xmx64G -fork1 -SAMToGBSdbPlugin \
    -i name.sam \
    -db name.db \
    -aProp 0.0 -aLen 0 \
    -endPlugin -runfork1 >> z_name_SAMToGBSdb.out
 
## DiscoverySNPCallerPluginV2 - RUN DISCOVERY SNP CALLER
$tasselPath -Xms64G -Xmx64G -fork1 -DiscoverySNPCallerPluginV2 \
    -db name.db \
    -mnLCov 0.1 -mnMAF 0.01 -deleteOldData true \
    -endPlugin -runfork1 >> z_name_DiscoverySNPCaller.out
 
## SNPQualityProfilerPlugin - RUN QUALITY PROFILER
$tasselPath -Xms64G -Xmx64G -fork1 -SNPQualityProfilerPlugin \
    -db name.db \
    -statFile name_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_name_SNPQualityProfiler.out
 
## UpdateSNPPositionQualityPlugin - UPDATE DATABASE WITH QUALITY SCORE
$tasselPath -Xms64G -Xmx64G -fork1 -UpdateSNPPositionQualityPlugin \
    -db name.db \
    -qsFile name_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_name_UpdateSNPPositionQuality.out
 
## ProductionSNPCallerPluginV2 - RUN PRODUCTION PIPELINE - output .h5
$tasselPath -Xms64G -Xmx64G -fork1 -ProductionSNPCallerPluginV2 \
    -db name.db \
    -i $seqDir \
    -k $keyFile \
    -o name.h5 \
    -e PstI-MspI -kmerLength 64 \
    -endPlugin -runfork1 >> z_name_ProductionSNPCaller.out
 
## ProductionSNPCallerPluginV2 - RUN PRODUCTION PIPELINE - output .vcf
$tasselPath -Xms64G -Xmx64G -fork1 -ProductionSNPCallerPluginV2 \
    -db name.db \
    -i $seqDir \
    -k $keyFile \
    -o name.vcf \
    -e PstI-MspI -kmerLength 64 \
    -endPlugin -runfork1 >> z_name_ProductionSNPCaller.out
 
## Convert to Hapmap format
$tasselPath -Xms64G -Xmx64G -fork1 -vcf name.vcf \
	-export name -exportType Hapmap


  • Copy keyfile and .sh file for backup
mv /homes/$user/gbs/jobs/name.* keyFileSh/


If everything ran correctly, you should get a folder named 'hessian_F3' in your gbs/projects folder with all the output files including SNP containing .vcf and .hmp.txt that can be used for further analyses

End