Difference between revisions of "Tassel 5 GBS v2 Pipeline sample script for calling SNPs using reference genome"

From Poland Lab Wiki
Jump to: navigation, search
Line 112: Line 112:
 
If everything ran correctly, you should get a folder named '''hessian_F3''' in your <code>gbs/projects</code> folder with all the output files including SNPs containing '''.vcf''' and '''.hmp.txt''' files that can be used for further analyses
 
If everything ran correctly, you should get a folder named '''hessian_F3''' in your <code>gbs/projects</code> folder with all the output files including SNPs containing '''.vcf''' and '''.hmp.txt''' files that can be used for further analyses
  
If you have any question or it doesn't work, please contact me [mailto:nss470@ksu.edu?subject=TASSEL5%GBS%v2%pipeline%question here].
+
If you have any question or it doesn't work, please contact me [mailto:nss470@ksu.edu?Subject=TASSEL5%GBS%v2%pipeline%question here].
  
 
==== End ====
 
==== End ====

Revision as of 02:51, 26 October 2016

For this script to work properly, you should have gbs folder in your home directory, and jobs and projects folders inside gbs

Once you have the full script copied in a simple text file with extension .sh, the name variable should be replaced with something more explicit, such as your project name.

Keyfile should have these 4 mandatory headers: 'Flowcell' 'Lane' 'Barcode' 'FullSampleName'

To keep everything organized, I recommend using the same base name for the project and the files, for example

name - projectName

Keyfile name - projectName.txt

Shell script name - projectName.sh

Once the Keyfile and shell script are ready, put them in jobs folder and run qsub projectName.sh


#!/bin/bash
 
# Update the user and name variables, user is your K-State eID
user=eID
name=projectName
 
# Update beocat resources request
#$ -l h_rt=72:00:00 -l mem=64G -cwd
 
 
## NO NEED TO CHANGE ANYTHING FROM HERE ON ##
 
keyFile=/homes/$user/gbs/jobs/${name}.txt
 
seqDir=/bulk/jpoland/sequence
dbPath=/bulk/jpoland/genome/ChineseSpring/pseudomolecules_v1.0/161010_Chinese_Spring_v1.0_pseudomolecules_parts
tasselPath=/homes/nss470/softwares/tassel5/run_pipeline.pl
 
mkdir /homes/$user/gbs/projects/${name}
mkdir /homes/$user/gbs/projects/${name}/keyFileSh
cd /homes/$user/gbs/projects/${name}
 
# Path for required software
export PATH=$PATH:/homes/nss470/usr/bin:/homes/nss470/usr/bin/bin
 
# Set JAVA VM Version
eselect java-vm set user 5
 
## GBSSeqToTagDBPlugin - RUN Tags to DB
$tasselPath -Xms64G -Xmx64G -fork1 -GBSSeqToTagDBPlugin -e PstI-MspI \
    -i $seqDir \
    -db ${name}.db \
    -k $keyFile \
    -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 250000000 \
    -endPlugin -runfork1 >> z_pipeline.out
 
## TagExportToFastqPlugin - export Tags
$tasselPath -fork1 -TagExportToFastqPlugin \
    -db ${name}.db \
    -o ${name}_tagsForAlign.fq -c 10 \
    -endPlugin -runfork1 >> z_pipeline.out
 
## RUN BOWTIE
bowtie2 -p 20 --very-sensitive-local \
    -x $dbPath \
    -U ${name}_tagsForAlign.fq \
    -S ${name}.sam >> z_pipeline.out
 
## SAMToGBSdbPlugin - SAM to DB
$tasselPath -Xms64G -Xmx64G -fork1 -SAMToGBSdbPlugin \
    -i ${name}.sam \
    -db ${name}.db \
    -aProp 0.0 -aLen 0 \
    -endPlugin -runfork1 >> z_pipeline.out
 
## DiscoverySNPCallerPluginV2 - RUN DISCOVERY SNP CALLER
$tasselPath -Xms64G -Xmx64G -fork1 -DiscoverySNPCallerPluginV2 \
    -db ${name}.db \
    -mnLCov 0.1 -mnMAF 0.01 -deleteOldData true \
    -endPlugin -runfork1 >> z_pipeline.out
 
## SNPQualityProfilerPlugin - RUN QUALITY PROFILER
$tasselPath -Xms64G -Xmx64G -fork1 -SNPQualityProfilerPlugin \
    -db ${name}.db \
    -statFile ${name}_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_pipeline.out
 
## UpdateSNPPositionQualityPlugin - UPDATE DATABASE WITH QUALITY SCORE
$tasselPath -Xms64G -Xmx64G -fork1 -UpdateSNPPositionQualityPlugin \
    -db ${name}.db \
    -qsFile ${name}_SNPqual_stats.txt \
    -endPlugin -runfork1 >> z_pipeline.out
 
## ProductionSNPCallerPluginV2 - RUN PRODUCTION PIPELINE - output .h5
$tasselPath -Xms64G -Xmx64G -fork1 -ProductionSNPCallerPluginV2 \
    -db ${name}.db \
    -i $seqDir \
    -k $keyFile \
    -o ${name}.h5 \
    -e PstI-MspI -kmerLength 64 \
    -minPosQS 90 -ko false \
    -endPlugin -runfork1 z_pipeline.out
 
## Convert to Hapmap format
$tasselPath -Xms64G -Xmx64G -fork1 -h5 ${name}.h5 \
    -export ${name} -exportType Hapmap
 
mv /homes/$user/gbs/jobs/name.* keyFileSh/


If everything ran correctly, you should get a folder named hessian_F3 in your gbs/projects folder with all the output files including SNPs containing .vcf and .hmp.txt files that can be used for further analyses

If you have any question or it doesn't work, please contact me here.

End