The DACC has been working with IGVF members in preparation to assist with submission of your data to the IGVF Portal. If you have data ready to be submitted, please initiate contact with the DACC at igvf-portal-help@lists.stanford.edu to get your submission process started. Once notified, the wrangling team at DACC will reach out and help set up the submission process by providing each submitter with an API access key to the portal, instructions on collecting metadata and tools available to help with data submission.
Submission workshop recording (IGVF Consortium meeting 2023): https://drive.google.com/file/d/1bXG186nJbHjZwAvzLieA5FrRvv1ghi2C/view?usp=sharing
The submission workshop video covers a general guide for submitters; however, we still recommend going through the documentation for a more detailed and specific help, examples, and guidelines.
Video timestamps:
After contacting DACC wranglers, the lab and the wranglers will start discussing data modeling. This could take multiple zoom meeting as it is an ongoing process. DACC wranglers will then take the understanding of the data and will make updates to the system. The Lab's submitters will start data submissions on our test server (sandbox). Any submissions to sandbox is for practice only. However, the wranglers will still review the submissions to offer any feedback. If there are any concerns from both sides that come up at this point, the process starts over again until the data model and submissions are finalized. Once the submissions look good, the data submitter can proceed to submit on our production server. Any data submissions up to this point is not yet released to the public. It is only available to the consortium. There will be another review on production before the Lab and DACC wranglers both agree to release the data to the public.
API access key pairs are used to authenticate a user before giving access to submit data. Please provide your wrangler an email address associated with a gmail or github account. They will make sure you have the appropriate permissions to submit data for the appropriate lab.
To request the key pairs, log in on the bottom left of the side toolbar. Once successfully logged in, click "Profile".
In your "User Profile" page, click on "Create Access Key". Your Access Key ID and Access Key Secret will appear in a pop-up window. Please make a note of them as they are shared only once. Once the pop-up window is closed, you will have no way to retrieve it again. However, new key pairs can be requested if the previous pair is lost.
Providing rich, reliable metadata is essential for maintaining high standards set by the IGVF consortium and making the Portal a valuable resource for the scientific community. Our current data model includes multiple object types (for example: tissue, primary_cell, human_donor, etc. ) Each component has its own set of metadata properties specifically designed to capture the relations of components to each other. All metadata prepared will be reviewed by the wrangling team at DACC. Any data submitted to the portal becomes accessible for internal IGVF consortium members. However, it is not going to become publicly available (“released”) until the DACC has finished the review of the submitted data and received approval from the submitting lab.
The data model supports the submission of objects classified under the following general categories: samples, donors, file sets, files, ontology terms, and other.
Samples | Donors | Files/File Sets | Files | Ontology Terms | Other types: |
---|---|---|---|---|---|
in_vitro_system, primary_cell, tissue, whole_organism, technical_sample, multiplexed_sample | human_donor, rodent_donor | analysis_set, curated_set, measurement_set, construct_library, auxiliary_set, model, prediction | reference_file, sequence_file, configuration_file, signal_file, alignment_file | assay_term, phenotype_term, sample_term, platform_term | award, analysis_steps, biomarker, document, gene, image, lab, modification, page, phenotypic_feature, publication, technical_sample, software, software_version, source, treatment, users, human_genomic_variant, workflow |
*Note: The data model is being actively developed, see github schemas for further detail.
If you are submitting data resulting from a single-cell assay (such as scRNA-seq, 10X multiome, SHARE-seq, MULTI-seq, etc.), you should define (generate) a machine-readable YAML file describing your genomic library sequence and structure. The YAML file should be submitted as a configuration_file and linked from the corresponding sequencing_file(s) metadata to allow processing of your data.
For any additional help generating a YAML file, please contact Sina Booeshaghi and Lior Pachter . For any other additional question on submitting a configuration_file, please contact your wrangler.
The IGVF data model includes schemas organized by object type that list the different properties (metadata) describing the associated experimental artifact. These pages are key resources to refer to while preparing spreadsheets for submission.
There are two tools available for submitters to use:
Each object type, also known as profiles, will need its own spreadsheet as it has its own set of metadata properties. Please note that although primary_cells, tissue, etc. are categorized as biosamples, they are considered as different object types in our system. Same concept applies to human_donor and rodent_donor. For that reason, multiple sheets will have to be prepared depending on the number of object types being submitted.
Let’s go through the human_donor and tissue schema, assess which properties are needed, what type of property it is and assign an example value.
*Please remember that for any property that links to another object, an identifier of an existing object on the Portal will have to be provided for reference. If you are unsure of what identifier to use, please contact the wrangling team.
Descriptions of both required and optional properties for human_donor can be found here in JSON format. Required properties must be described to successfully submit an object record. Optional properties are recommended to provide if they are available and applicable.
{
"title": "Human Donor",
"$id": "/profiles/human_donor.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "Derived schema submitting human donors.",
"type": "object",
"required": [
"award",
"lab",
"taxa"
]
}...
Required Property | Type | Comments | Example Value |
---|---|---|---|
award | string | Link to an associated award or grant object. | /awards/HG012012 |
lab | string | Link to an associated lab. | /labs/john-doe |
taxa | string (enum) | Donor’s taxa. | Homo sapiens |
Optional Property | Type | Comments | Example Value |
---|---|---|---|
phenotypic_features | array of strings | List of links to the associated phenotypic features of the donor. | [“HP:0000726”, “MONDO:0004975”] |
ethnicities | array of strings (enums) | http://bioportal.bioontology.org/ontologies/HANCESTRO terms are used. | [“Hispanic”, “Arab”] |
*Note: Not all optional properties are listed here in the example, for more properties see schema page.
{
"title": "Tissue",
"$id": "/profiles/tissue.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "Schema for submitting a tissue sample",
"type": "object",
"required": [
"award",
"lab",
"source",
"donors",
"biosample_term"
]
}...
Required Property | Type | Description | Example Value |
---|---|---|---|
award | string | Grant associated with the submission. | /awards/HG012012 |
lab | string | Lab associated with the submission. | /labs/john-doe |
source | string | Sample provider lab or a vendor. | /sources/atcc |
donors | array of strings | Donor(s) the sample was derived from. | [“IGVFDO1645ZWSY”, “IGVFDO2416VXNA”] |
biosample_term | string | Ontology term identifying a biosample. Links to Sample Term object (unique identifier). | /sample-terms/UBERON_0000955 |
Optional Property | Type | Description | Example Value |
---|---|---|---|
pmi | integer | Post-mortem Interval, the amount of time that elapsed since the death of the donor. | 3 |
pmi_units | string | The unit in which the PMI time was reported. Enum list includes: second, minute, hour, day, week. | day |
preservation_method | string | The method by which the tissue was preserved. Enum list includes cryopreservation, flash-freezing. | flash-freezing |
date_obtained | string | Date harvested. Date should be submitted as YYYY-MM-DD. | 2022-04-02 |
aliases | award | lab | taxa |
---|---|---|---|
john-doe:donor_01 | /awards/HG012012 | /labs/john-doe | Homo sapiens |
john-doe:donor_02 | /awards/HG012012 | /labs/john-doe | Homo sapiens |
For every object that is submitted to the portal, the system automatically generates a unique identifier (uuid). For a subset of objects in addition to the uuid an accession is generated, following the format IGVF[SM|DO][0]9]{4}[A-Z]{4}, where [SM|DO] refer to the object type. The examples human_donor and tissue will have accessions automatically generated, IGVFDO[0]9]{4}[A-Z]{4} and IGVFSM[0]9]{4}[A-Z]{4}, respectively.
IMPORTANT: While accessions and unique identifiers (UUIDs) are automatically generated and can be used to find your object of interest, we highly encourage the use of aliases property, another form of a unique identifier. Aliases are not assigned by the system and provide an opportunity for submitters to assign an identifier that makes sense for internal records such as the identifier coming from the lab's LIMS system.
Aliases are to be formatted in the following way: ‘[lab name]:[chosen identifier]’ (e.g. john-doe:experiment_01).
*Note: These three types of IDs (uuid, accession, and aliases) can be used interchangeably to refer to an object in the spreadsheets used for object submission or modification.
Following successful submission, appending the object type followed by an identifier of the object such as uuid, accession, or alias to the URL of the server will allow you to view your object.
Examples:
If your objects have a metadata error(s) you need to fix, you can easily patch your object property values. The first column header in your spreadsheet should be either accession (for Google Sheets Submitter) or record_id (for igvf_utils). The property(s) to be updated should be specified in the next columns.
Example: for the tissue pmi and pmi_units properties, both records initially specified as 3 days will be changed to 5 weeks.
accession | pmi | pmi_units |
---|---|---|
john-doe:tissue_01 | 5 | week |
john-doe:tissue_02 | 5 | week |
IMPORTANT: The order of submission by object type matters! Objects can be related or linked to each other. Creation of these relationships depends on the proper order of submission. For example, a tissue object relates to a specific donor object (a unique identifier must be specified), see the example above. Therefore, the donor(s) needs to be submitted first, otherwise you will not be able to reference them upon submission, causing an error if the donor property is required.
Current order of object types for submissions link
There is a subset of objects that are not submittable for curation purposes (i.e. preventing duplicates, misuse of objects, etc.) Here is a list:
Please contain your wrangler if you would like to submit new objects within the object types listed.