IGVFFI3258IYJA

released
validated
File Set
IGVFDS8660TXPW(Homo sapiens genome)
Summary
genome reference
File Format
fasta
Content Type
genome reference
Aliases
tim-reddy:starr-seq-fasta-reference-personal-genome-P012
md5sum
e97ade6f6047ac80d0687db769d85665
Content MD5sum
5e2c1a652ddd5233c684e5dfb6726965
File Size
6.1 GB
Submitted File Name
/data/reddylab/Alex/IGVF/data_submissions/reference_data/AIM_1_STARR_seq/starrseq.personal_genome.fasta_reference.P012.fasta.gz
Checkfiles Version
Submitter Comment
Note on the approach to create custom personal genomes - Reddy Lab To improve the ability to map individual variants to accurately measure allele-specific regulatory activity, modified versions of the GRCh38 reference genome were used. We created a personalized reference genome for short read alignment that includes phased information for all haplotypes expected in the sequencing data. There were two main types of regions included in this method: constant regions that included no genetic variants relative to the reference, and variable regions that did include genetic variants relative to the reference. The sequences of constant regions were directly included in the personal genome with no changes. For the variable sequences, one unique version of the region was included for each possible haplotype of that region, given phasing information that was previously known about the samples being sequenced. Consecutive regions overlapped such that read alignments did not span multiple regions. If a variable region had more than 150 total variants and spanned a length double the expected fragment length, then that one region would be divided into two consecutive variable regions with an overlap allowing for read alignment in a single region.
Attribution

Files This File Derives From

1 item
Accession
File Set
File Format
Content Type
Lab
File Size
Upload Status
fasta
genome reference
External Lab, Community
873 MB
validated