IGVFFI9761FYQI

released
validated
File Set
IGVFDS8660TXPW(Homo sapiens genome)
Summary
genome reference
File Format
fasta
Content Type
genome reference
Aliases
tim-reddy:starr-seq-fasta-reference-personal-genome-P009
md5sum
2e63940785c60c702e10f0f13eb8e6d8
Content MD5sum
d0ee6242ac4efe69cedd04d74ad8d9a0
File Size
6.2 GB
Submitted File Name
/data/reddylab/Alex/IGVF/data_submissions/reference_data/AIM_1_STARR_seq/starrseq.personal_genome.fasta_reference.P009.fasta.gz
Checkfiles Version
Submitter Comment
Note on the approach to create custom personal genomes - Reddy Lab To improve the ability to map individual variants to accurately measure allele-specific regulatory activity, modified versions of the GRCh38 reference genome were used. We created a personalized reference genome for short read alignment that includes phased information for all haplotypes expected in the sequencing data. There were two main types of regions included in this method: constant regions that included no genetic variants relative to the reference, and variable regions that did include genetic variants relative to the reference. The sequences of constant regions were directly included in the personal genome with no changes. For the variable sequences, one unique version of the region was included for each possible haplotype of that region, given phasing information that was previously known about the samples being sequenced. Consecutive regions overlapped such that read alignments did not span multiple regions. If a variable region had more than 150 total variants and spanned a length double the expected fragment length, then that one region would be divided into two consecutive variable regions with an overlap allowing for read alignment in a single region.
Attribution

Files This File Derives From

1 item
Accession
File Set
File Format
Content Type
Lab
File Size
Upload Status
fasta
genome reference
External Lab, Community
873 MB
validated