A user’s guide to IGVF GoogleSheet Submitter

Spreadsheet submission tool that uses GoogleSheet’s AppScript to allow users to submit/modify metadata on the portal without installing any software or having any scripting experience. Users will need to fill in any necessary metadata in the GoogleSheet where the different object types (for example primary_cell, tissue, human_donor, etc.) are populating separate tabs of the spreadsheet. For example, if you’re submitting human_donor, tissue, and primary_cell, three tabs will need to be prepared. With a press of a button, the GoogleSheet will launch an embedded script that interacts with the portal and submits/modifies metadata. *If you are unfamiliar with the schema, please take a look at the Schema overview documentation.

IGVF submission (GoogleSheet’s AppScript) repository

All built-in functions can be found in the toolbar under “IGVF”. Image of Toolbar

Set Up

  1. To authorize submissions, obtain access keys from your lab wrangler, if unsure you can always contact the DACC wrangling team .
  2. Enter your access keys in the google sheet by clicking: IGVF -> Authorize for IGVF -> enter your access key pairs (i.e. ‘username’/’password’). Entering your access keys will only be required once per spreadsheet. Authorize for IGVF in toolbar
  3. Connect to server (set default endpoints): IGVF -> Set endpoints. For IGVF, our sandbox site is https://api.sandbox.igvf.org/ Set defaults in toolbar
  4. Set the schema object type (profile) you want to submit. *Note: if you are submitting multiple object types, it is mandatory to set the corresponding profile type on each of the spreadsheet tabs. IGVF -> Set profile name Set profile name in toolbar
  5. To view current endpoints and profiles: Toolbar -> IGVF -> Show sheet info & header legend

Submitter Spreadsheet Design and Formatting

The first two columns (‘#response’ and ‘#response_time’) provide information regarding submission responses, they are not properties of an object.

Table 1. Common response values

#ResponseDescription
200 or 201Action was successful
404 or 422Action was unsuccessful (ERROR)
409Action was unsuccessful due to a clash (unique value conflict) between the metadata on the portal and metadata you attempted to submit. (ERROR)
401 or 403Action was unsuccessful due to a lack of permissions/authentication (ERROR)
*Note: response can also provide other values, additional information and error messages

Each column contains a header that corresponds to a property name in schema. If you hover your cursor to the property name, a tooltip will appear with information (title, description, type and dependency) on that property.

Here is an example of how the tooltip will look like for "samples" property: properties tooltip

Header property names that are underlined are searchable: Toolbar -> IGVF -> Search will show you all existing objects of that object type on the portal. *Note: A property value has to be selected for search.

Search in toolbar

Table 2. Object Properties Formatting Legend

ColorDescriptionExamples
Redsubmittable, required propertiestaxa or biosample_term
Blacksubmittable, not required but recommended if it applies to your data contextdate_obtained
Bluenon-submittable, an identifying property unique to the object generated by the serveraccession or uuid
Graynon-submittable, a property calculated by the system. *Note: Any values you list here for POST/PATCH will produce an error code.status or submitted_by
Italics + boldArray property. Must be wrapped in square brackets [][“IGVFSM0000AAAA”, “IGVFSM0001AAAB”]
Begins with #Fields specific to the Google Sheets tool for debugging, logging, or uploading. Not actual schema properties.#response, #response_time
Dropdown arrowIndicates field has enums (enums will appear in the drop down)

Table 3. Property Value Types

TypeRequirementsExamples
ArraysNeed to have a [“ ”] wrapper; Array of objects: need to be in valid JSON format for POST/PATCH to succeed[“A”, “B”, “C”]
EnumsA set of options (i.e. statuses)) will be available via the drop down of the cell (▼)[Image 6]
ObjectsNeed to be in valid JSON format for POST/PATCH. *Note: Linked objects must already exist with identifiers in the database. If you are unsure of what identifier to use, please contact the wrangling team.Object identifier:‘alias’,‘accession’,‘uuid’

*Note: Further details of property values can be found here.

Submission Actions

GET: Retrieve objects and properties from the portal, which can be used as a template for new submissions.

  1. To retrieve the available set of properties belonging to the profile (object): Toolbar -> IGVF -> Set profile name Make new template in toolbar
  2. Enter an identifier (in aliases, accession, or uuid). *Note: multiple alias identifiers can be used for retrieving object instances Identifiers in sheet
  3. To retrieve properties belonging to the profile (object type): Toolbar -> IGVF -> GET metadata from portal. Please wait for a couple of seconds for the script to run. You will be notified with a pop-up window when the action is complete. Get metadata from portal in toolbar

POST: Submit new object instances (rows on the sheet) to the portal.

  1. Retrieve the available set of properties belonging to the profile (object): Toolbar -> IGVF -> Set profile name Make new template row
  2. Fill in row values, where a row reflects a single record or instance of that object. *Note: multiple rows can be submitted at a time. Fill in values
  3. POST the object records: Toolbar -> IGVF -> POST new metadata to portal. Please wait for a couple of seconds for the script to run. You will be notified with a pop-up window when the action is complete. An assigned identifier will be generated for each new record. Post new metadata to portal

PATCH: Fix properties of existing objects on the portal.

  1. Prepare the object sheet. Identifier(s) that need patching must be specified on either the accession, aliases, or uuid columns. Enter the values of the property(s) you would like to fix in the relevant columns.
  2. Toolbar -> IGVF -> PATCH selected/all columns. Patch options in toolbar

Review submission:

Please review the results for every record using the ‘#response’ column. See Table 1 for response values.

Troubleshooting:

If submission is unsuccessful, an error message in the ‘#response’ column will be present. Identify the row(s) and cell(s) that are producing the error(s) and fix appropriately before trying to resubmit. Please do not include already successful rows when trying to resubmit.

Table 4. Common Errors

CategoryPotential solutions
Missing valuesLook at object schema descriptions for all required properties
FormattingLook for the SyntaxError: Unexpected token … is not valid JSON, again see schema descriptions for required property syntax. Example: array type property requires [“A”, “B”, “C”] formatting
Wrong valueValue is not one of the enum options (defined by the schema)

Reusing the same spreadsheet

If you plan to reuse the same spreadsheet for another submission, there a feature that allows you to skip over the rows that were successfully GET/POST/PATCH before.

  1. Manually add a column and name it "#skip".
  2. For each row in the column, set it to 1 to skip.
  3. As for any rows you don't want to skip, you can either set it to 0 or just leave the cell blank.
  4. You can GET/POST/PATCH via the toolbar.
  5. Review submissions and repeat for any additional actions.

skip rows

  • Note: first four rows were skipped, especially when #response is 200 or 201 (successful action). The last two rows were not.

External site validation:

If the #response column is not descriptive, you can opt to use external site validation: Toolbar -> IGVF -> Validate

Validate in toolbar

This will compare your sheet metadata against the described profile schema by using an external JSON schema validator. Please use this function for debugging only when the error message is unclear.

WARNING: if you are working with sensitive/restricted data, DO NOT use this feature. Instead, convert each row to JSON format and copy-paste to an internal JSON validator.

Updating spreadsheet script version

Every now and then the spreadsheet will have script updates. The version of the sheet is usually printed on the toolbar header. It is best practice to check for script updates by going to IGVF Toolbar > Check for script update. If the spreadsheet version does not match the newer version on github, there will be a pop-up window with further instructions.

script upgrade

upgrade window

Submitting Files

Submission of files have slightly different steps than other object types.

File setup

  • Same as the setup above, please make sure your googlesheets have access key pairs entered, set endpoints and profiles (reference_file, sequence_file, signal_file, alignment_file).
  • Some files (depending on their file format) need to be gzipped before being uploaded. As of July 2024, the following files need to be gzipped: .bed, .bedpe, .csv, .dat, .fasta, .fastq, .gaf, .gds, .gff, .gtf, .obo, .owl, .pairs, .sam, .tagAlign, .tar, .tsv, .txt, .vcf, .xml and .yaml. To find the latest list of files please refer to https://github.com/IGVF-DACC/igvfd/blob/dev/src/igvfd/types/file.py#L41-L81
  • Please put all your files for submission in a local machine directory where there are at least one subdirectory. For example: path1/file_A.fastq.gz. You can put multiple files in path1 folder, the sheet will be able to initialize upload for all the files in that folder.

File submissions

The submission is separated into 2 parts:

  • Part1: Post files metadata - creating file objects WITHOUT the actual files attached
  • Part2: Upload local files - Attaching the actual files to the appropriate file objects.

Both parts will be covered step by step below.

Part1: Post files metadata

  1. To retrieve the available set of properties belonging to the profile (object): Toolbar -> IGVF -> Set profile name Make new template in toolbar
  2. The first two columns (‘#response’ and ‘#response_time’) provide information regarding submission responses, they are not properties of an object.
  3. Fill in row values, where a row reflects a single record or instance of that object. *Note: multiple rows can be submitted at a time.
  4. For submission of Files, there are other columns starting with ‘#’ that will have to be added manually. Please make new columns for #upload_abspath, #upload_status, and #upload_cmd. Most of these columns will be automatically filled in except for #upload_abspath. File columns
    Column name#response#response_time#upload_abspath#upload_status#upload_cmd
    definitionstatus of file metadata submission (Part1).datetime when response is received from the portal.absolute path of the file in your local machine. Google bucket or AWS file paths are currently not accepted. Please use igvf_utils instead.status of uploading local file (Part2).an alternate way to upload the file by copy and pasting to the command line (using S3 CLI). This field will be automatically filled in after dropping your root file in the File upload sidebar.
    Automatically filled?Yes. This field will be automatically filled in after posting metadata submission. It will give you hints if the posting fails.Yes. This field will be automatically filled in after posting metadata submission.No. Please fill in and make sure it has at least 1 subdirectory files before the actual file. For example, if the file full path is /Users/Downloads/path2/path1/file_A.fastq.gz, the #upload_abspath can be path1/file_A.fastq.gz or /path2/path1/file_A.fastq.gz. Shorter paths will initialize uploads faster than longer paths.Yes. This field will be automatically filled in after dropping your root file in the File upload sidebar.Yes. This field will be automatically filled in after dropping your root file in the File upload sidebar.

5. POST the object records: After the rows have been filled in. Toolbar -> IGVF -> POST new metadata to portal. Please wait for a couple of seconds for the script to run. You will be notified with a pop-up window when the action is complete. An assigned identifier will be generated for each new record (accession). #response and #response_time will be filled in.

Post metadata menu

Troubleshooting for Part1:

  • Make sure files are not repeats of what is already on the portal. File metadata with conflicting md5sums will not be posted.
  • Required properties and other optional properties are properly filled.

Part2: Upload local file

  1. Open the File uploader: IGVF -> Upload local files (sidebar). This will open up a sidebar on the right of the screen. Menu open file uploader
  2. Drag the root file of where the local files are located in your local machine. For example, if the files full path are /path2/path1/file_A.fastq.gz and /path2/path1/file_B.fastq.gz, you can drag and drop either ../path2/ or ../path1/ file into the dotted area. Dropping the shorter ../path1/ will be faster to complete initializing upload compared to dropping file paths that are longer.
    Sidebar
  3. Click on “Initialize upload for current sheet”. Give a few seconds for the script to run and #upload_status will show “status: in progress.” once that’s done. Initialize upload button #upload_status column
  4. Review file upload under “Upload Panel” and Click “Start”. This will upload all files with the same root file that was dragged in step10. All successful uploads will have #upload_status Done. Image: Upload panel

Troubleshooting for Part2:

  • Files that are in revoked, released, replaced, deleted or archived status do not have the permissions for file uploads to be successful. Please contact your wrangler or email igvf-portal-help@lists.stanford.edu.
  • Make sure there is an identifying property (blue properties) filled in.
  • If you receive an error message that says “Expired Token” or “Failed to get upload credentials…”, please contact your wrangler or email igvf-portal-help@lists.stanford.edu.

Submitting attachments

For a profile with attachment property (e.g. document profile), you can define attachment column as a JSON string {"path":"/GOOGLE/DRIVE/PATH/TO/FILE/me.pdf"}.

It is recommended to make a local directory for document files only on your computer, and then drag and drop the folder itself to your Google Drive. Then all files in it will be transferred to Google Drive while keeping the directory structure.