How to Submit GBS Data to the NCBI Sequence Read Archive (SRA)
There are three steps in submitting sequence data to the NCBI Sequence Read Archive (SRA):
1. Create a BioProject
Login to the NCBI Submission Portal: https://submit.ncbi.nlm.nih.gov/subs/bioproject/ and click on New Submission. Then fill out the following sections. After each section, click “Continue” to move on to the next section:
- Project Type
- Project data type: Check “Phenotype or Genotype.”
- Sample scope: “Select Multiisolate.”
- Organism name: e.g. Triticum aestivum
- Release date: Choose either “Release immediately following processing” or “Release on specified date or upon publication, whichever is first” and enter a Projected release date.
- Project title: Enter a title of your choosing.
- Public description: Provide a paragraph of the study goals and relevance.
- Is your project part of a larger initiative which is already registered with NCBI? Select Yes or No. If yes, enter the "Initiative description" and the "BioProject accession" of the larger initiative to which this project belongs.
- Click on “Continue.” You will be able to create a BioSample after you finish creating this BioProject.
- Enter a PubMed ID or DOI if available.
- Review all the information and click “Submit” if everything looks ok. If anything looks wrong, you can go back to any section by clicking on its respective tab at the top.
2. Create BioSample
Login to the NCBI Submission Portal: https://submit.ncbi.nlm.nih.gov/subs/biosample/and click on New Submission. Then fill out the following sections. After each section, click “Continue” to move on to the next section:
- General Info
- Release date: Check either:
- Select either "Release immediately following processing" or "Release on specified date or upon publication, whichever is first and enter a Projected Release Date."
- Specify if you are submitting a single sample or a file containing multiple samples
- Select "Batch/Multiple" or "Single"
- Select the Plant Sample check box.
- Provide your BioSample attributes directly on the screen or by uploading a .csv file.
- Check that all of the information displayed is correct and make any necessary adjustments. Then click Submit to submit the BioSample.
3. Create Sequence Read Archive Submission
Go to the Sequence Read Archive Submission page: https://www.ncbi.nlm.nih.gov/Traces/sra_sub/sub.cgi?login=pda and click on Create new submission.
Fill out the Submitter page if not already done. On the general info page enter the following information: Existing BioProject: Enter the PRNJA number (e.g. PRJNA445460) that was generated for the BioProject submission. (This will be visible when the BioProject submission has been processed) Biosample: Click on Yes to indicate that a BioSample submission has been completed. Release Date: Check either: Release immediately following processing or Release on specified date or upon publication, whichever is first and enter a Projected Release Date. On the SRA metadata page click on the Upload a file using Excel or text format(tab-delimited) button. The form will update to allow you to download either a tab-delimited file or Excel spreadsheet template. Fill out the template as shown in this example from Lakin-Fuller RIL Population: https://drive.google.com/open?id=1iq_gK83Yf7FP9cBJ2VrdaKW0NDIrUZWubDmCyBv6400
Key Points to Note:
bioproject_accession is the accession number for the BioProject that is provided when the BioProject submission processing has been completed. biosample_accession is the accession number for the Sample that is provided when the BioSample submission processing has been completed. All read files are considered to be part of one BioSample and are related by the BioSample accession number. library_id is set to the sampleID associated with the set of files contained in each compressed tgz file i.e. for paired reads, there will be two files per library_id. The file names given should be the uncompressed file names contained within the associated tar archive, i.e. fastq file names.
Create a directory that has the name of the SRA submission number shown at the top of each SRA submission page e.g. SUB3836349 was the submission number for the Lakin-Fuller RIL Population submission
Copy the tar archive tgz file containing the read files that need to be uploaded to the SRA into the submission folder created in the previous step. (NCBI SRA will extract the files named in the SRA metadata spreadsheet from the tar archive.)
The best way to upload the files is to use the Aspera command line tool executed in a terminal window. (See Appendix B for installation instructions.)
To initiate the upload type a command of the following form:
~/.aspera/cli/bin/ascp -v -i ~/aspera_key_file/<your aspera key file name> -QT -l1000m -k1 -d ~/<submission_folder/ email@example.com:uploads/<your NCBI upload folder>/
A progress indication will be displayed in the terminal window and a message will be displayed to indicate successful completion of the upload or an error message if the upload failed.
Go back to the SRA Files page and click on the Select preload folder button and select the folder name that corresponds to your submission folder name e.g. SUB2900496.
Tick the Autofinish submission check box. The NCBI system will begin processing the files that were uploaded. This can take many hours depending on the size of the uploaded files.
When processing is complete, the accession numbers for the completed submission will become available. The SRP number is the study accession number that refers to the overall submission.