Difference between revisions of "How to Submit GBS Data to the NCBI Sequence Read Archive (SRA)"

From Poland Lab Wiki
Jump to: navigation, search

Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/customer/www/wiki.wheatgenetics.k-state.edu/public_html/includes/diff/DairikiDiff.php on line 434

Deprecated: assert(): Calling assert() with a string argument is deprecated in /home/customer/www/wiki.wheatgenetics.k-state.edu/public_html/includes/diff/DairikiDiff.php on line 437
(3. Create Sequence Read Archive Submission)
(3. Create Sequence Read Archive Submission)
Line 47: Line 47:
 
Go to the Sequence Read Archive Submission page: https://www.ncbi.nlm.nih.gov/Traces/sra_sub/sub.cgi?login=pda and click on Create new submission.
 
Go to the Sequence Read Archive Submission page: https://www.ncbi.nlm.nih.gov/Traces/sra_sub/sub.cgi?login=pda and click on Create new submission.
  
Fill out the Submitter page if not already done.
+
<ol>
On the general info page enter the following information:
+
<li>Fill out the Submitter page if not already done.</li>
Existing BioProject: Enter the PRNJA number (e.g. PRJNA445460)  that was generated for the BioProject submission. (This will be visible when the BioProject submission has been processed)
+
<li>On the general info page enter the following information:</li>
Biosample: Click on Yes to indicate that a BioSample submission has been completed.
+
* Existing BioProject: Enter the PRNJA number (e.g. PRJNA445460)  that was generated for the BioProject submission. (This will be visible when the BioProject submission has been processed)
Release Date: Check either:
+
* Biosample: Click on Yes to indicate that a BioSample submission has been completed.
Release immediately following processing or  
+
* Release Date: Check either:
Release on specified date or upon publication, whichever is first and enter a Projected Release Date.
+
* Release immediately following processing or  
On the SRA metadata page click on the Upload a file using Excel or text format(tab-delimited) button. The form will update to allow you to download either a tab-delimited file or Excel spreadsheet template.
+
* Release on specified date or upon publication, whichever is first and enter a Projected Release Date.
Fill out the template as shown in this example from Lakin-Fuller RIL Population: https://drive.google.com/open?id=1iq_gK83Yf7FP9cBJ2VrdaKW0NDIrUZWubDmCyBv6400
+
<li>On the SRA metadata page click on the Upload a file using Excel or text format(tab-delimited) button. The form will update to allow you to download either a tab-delimited file or Excel spreadsheet template.</li>
 
+
<li>Fill out the template as shown in this example from Lakin-Fuller RIL Population: https://drive.google.com/open?id=1iq_gK83Yf7FP9cBJ2VrdaKW0NDIrUZWubDmCyBv6400</li>
Key Points to Note:
+
* Key Points to Note:
 
+
** bioproject_accession is the accession number for the BioProject that is provided when the BioProject submission processing has been completed.
bioproject_accession is the accession number for the BioProject that is provided when the BioProject submission processing has been completed.
+
** biosample_accession is the accession number for the Sample that is provided when the BioSample submission processing has been completed.  
biosample_accession is the accession number for the Sample that is provided when the BioSample submission processing has been completed.  
+
** All read files are considered to be part of one BioSample and are related by the BioSample accession number.
All read files are considered to be part of one BioSample and are related by the BioSample accession number.
+
** library_id is set to the sampleID associated with the set of files contained in each compressed tgz file i.e. for paired reads, there will be two files per library_id.
library_id is set to the sampleID associated with the set of files contained in each compressed tgz file i.e. for paired reads, there will be two files per library_id.
+
** The file names given should be the uncompressed file names contained within the associated tar archive, i.e. fastq file names.
The file names given should be the uncompressed file names contained within the associated tar archive, i.e. fastq file names.
+
<li>Create a directory that has the name of the SRA submission number shown at the top of each SRA submission page e.g. SUB3836349 was the submission number for the Lakin-Fuller RIL Population submission</li>
 
+
<li>Copy the tar archive tgz file containing the read files that need to be uploaded to the SRA into the submission folder created in the previous step. (NCBI SRA will extract the files named in the SRA metadata spreadsheet from the tar archive.)</li>
Create a directory that has the name of the SRA submission number shown at the top of each SRA submission page e.g. SUB3836349 was the submission number for the Lakin-Fuller RIL Population submission
+
<li>The best way to upload the files is to use the Aspera command line tool executed in a terminal window. (See Appendix B for installation instructions.)</li>
 
+
* To initiate the upload type a command of the following form:
Copy the tar archive tgz file containing the read files that need to be uploaded to the SRA into the submission folder created in the previous step. (NCBI SRA will extract the files named in the SRA metadata spreadsheet from the tar archive.)
+
* ~/.aspera/cli/bin/ascp -v -i ~/aspera_key_file/<your aspera key file name> -QT -l1000m -k1 -d ~/<submission_folder/ subasp@upload.ncbi.nlm.nih.gov:uploads/<your NCBI upload folder>/
 
+
* A progress indication will be displayed in the terminal window and a message will be displayed to indicate successful completion of the upload or an error message if the upload failed.
The best way to upload the files is to use the Aspera command line tool executed in a terminal window. (See Appendix B for installation instructions.)
+
* Go back to the SRA Files page and click on the Select preload folder button and select the folder name that corresponds to your submission folder name e.g. SUB2900496.
 
+
* Tick the Autofinish submission check box. The NCBI system will begin processing the files that were uploaded. This can take many hours depending on the size of the uploaded files.
To initiate the upload type a command of the following form:
+
* When processing is complete, the accession numbers for the completed submission will become available. The SRP number is the study accession number that refers to the overall submission.
 
+
</ol>
~/.aspera/cli/bin/ascp -v -i ~/aspera_key_file/<your aspera key file name> -QT -l1000m -k1 -d ~/<submission_folder/ subasp@upload.ncbi.nlm.nih.gov:uploads/<your NCBI upload folder>/
+
 
+
A progress indication will be displayed in the terminal window and a message will be displayed to indicate successful completion of the upload or an error message if the upload failed.
+
 
+
Go back to the SRA Files page and click on the Select preload folder button and select the folder name that corresponds to your submission folder name e.g. SUB2900496.
+
 
+
 
+
Tick the Autofinish submission check box. The NCBI system will begin processing the files that were uploaded. This can take many hours depending on the size of the uploaded files.
+
 
+
When processing is complete, the accession numbers for the completed submission will become available. The SRP number is the study accession number that refers to the overall submission.
+

Revision as of 15:47, 24 September 2019

There are three steps in submitting sequence data to the NCBI Sequence Read Archive (SRA):

1. Create a BioProject

Login to the NCBI Submission Portal: https://submit.ncbi.nlm.nih.gov/subs/bioproject/ and click on New Submission. Then fill out the following sections. After each section, click “Continue” to move on to the next section:

  1. Submitter
  2. Project Type
    • Project data type: Check “Phenotype or Genotype.”
    • Sample scope: “Select Multiisolate.”
  3. Target
    • Organism name: e.g. Triticum aestivum
  4. General Info
    • Release date: Choose either “Release immediately following processing” or “Release on specified date or upon publication, whichever is first” and enter a Projected release date.
    • Project title: Enter a title of your choosing.
    • Public description: Provide a paragraph of the study goals and relevance.
    • Is your project part of a larger initiative which is already registered with NCBI? Select Yes or No. If yes, enter the "Initiative description" and the "BioProject accession" of the larger initiative to which this project belongs.
  5. BioSample
    • Click on “Continue.” You will be able to create a BioSample after you finish creating this BioProject.
  6. Publications
    • Enter a PubMed ID or DOI if available.
  7. Review & Submit
    • Review all the information and click “Submit” if everything looks ok. If anything looks wrong, you can go back to any section by clicking on its respective tab at the top.

2. Create BioSample

Login to the NCBI Submission Portal: https://submit.ncbi.nlm.nih.gov/subs/biosample/and click on New Submission. Then fill out the following sections. After each section, click “Continue” to move on to the next section:

  1. Submitter
  2. General Info
    • Release date: Check either:
      • Select either "Release immediately following processing" or "Release on specified date or upon publication, whichever is first and enter a Projected Release Date."
    • Specify if you are submitting a single sample or a file containing multiple samples
      • Select "Batch/Multiple" or "Single"
  3. Sample Type
    • Select the Plant Sample check box.
  4. Attributes
    • Provide your BioSample attributes directly on the screen or by uploading a .csv file.
  5. Review & Submit
    • Check that all of the information displayed is correct and make any necessary adjustments. Then click Submit to submit the BioSample.

3. Create Sequence Read Archive Submission

Go to the Sequence Read Archive Submission page: https://www.ncbi.nlm.nih.gov/Traces/sra_sub/sub.cgi?login=pda and click on Create new submission.

  1. Fill out the Submitter page if not already done.
  2. On the general info page enter the following information:
    • Existing BioProject: Enter the PRNJA number (e.g. PRJNA445460) that was generated for the BioProject submission. (This will be visible when the BioProject submission has been processed)
    • Biosample: Click on Yes to indicate that a BioSample submission has been completed.
    • Release Date: Check either:
    • Release immediately following processing or
    • Release on specified date or upon publication, whichever is first and enter a Projected Release Date.
  3. On the SRA metadata page click on the Upload a file using Excel or text format(tab-delimited) button. The form will update to allow you to download either a tab-delimited file or Excel spreadsheet template.
  4. Fill out the template as shown in this example from Lakin-Fuller RIL Population: https://drive.google.com/open?id=1iq_gK83Yf7FP9cBJ2VrdaKW0NDIrUZWubDmCyBv6400
    • Key Points to Note:
      • bioproject_accession is the accession number for the BioProject that is provided when the BioProject submission processing has been completed.
      • biosample_accession is the accession number for the Sample that is provided when the BioSample submission processing has been completed.
      • All read files are considered to be part of one BioSample and are related by the BioSample accession number.
      • library_id is set to the sampleID associated with the set of files contained in each compressed tgz file i.e. for paired reads, there will be two files per library_id.
      • The file names given should be the uncompressed file names contained within the associated tar archive, i.e. fastq file names.
  5. Create a directory that has the name of the SRA submission number shown at the top of each SRA submission page e.g. SUB3836349 was the submission number for the Lakin-Fuller RIL Population submission
  6. Copy the tar archive tgz file containing the read files that need to be uploaded to the SRA into the submission folder created in the previous step. (NCBI SRA will extract the files named in the SRA metadata spreadsheet from the tar archive.)
  7. The best way to upload the files is to use the Aspera command line tool executed in a terminal window. (See Appendix B for installation instructions.)
    • To initiate the upload type a command of the following form:
    • ~/.aspera/cli/bin/ascp -v -i ~/aspera_key_file/<your aspera key file name> -QT -l1000m -k1 -d ~/<submission_folder/ subasp@upload.ncbi.nlm.nih.gov:uploads/<your NCBI upload folder>/
    • A progress indication will be displayed in the terminal window and a message will be displayed to indicate successful completion of the upload or an error message if the upload failed.
    • Go back to the SRA Files page and click on the Select preload folder button and select the folder name that corresponds to your submission folder name e.g. SUB2900496.
    • Tick the Autofinish submission check box. The NCBI system will begin processing the files that were uploaded. This can take many hours depending on the size of the uploaded files.
    • When processing is complete, the accession numbers for the completed submission will become available. The SRP number is the study accession number that refers to the overall submission.