IGVFFI3681VSQP

released
validated
File Set
IGVFDS8660TXPW(Homo sapiens genome)
Summary
genome reference
File Format
fasta
Content Type
genome reference
Aliases
tim-reddy:starr-seq-fasta-reference-personal-genome-P013
md5sum
61b6f2db238c2a8ea96ca80d6839ca3d
Content MD5sum
c2d1cbd48fb25e759eab4d9f9d1b1f82
File Size
6.1 GB
Submitted File Name
/data/reddylab/Alex/IGVF/data_submissions/reference_data/AIM_1_STARR_seq/starrseq.personal_genome.fasta_reference.P013.fasta.gz
Checkfiles Version
Submitter Comment
Note on the approach to create custom personal genomes - Reddy Lab To improve the ability to map individual variants to accurately measure allele-specific regulatory activity, modified versions of the GRCh38 reference genome were used. We created a personalized reference genome for short read alignment that includes phased information for all haplotypes expected in the sequencing data. There were two main types of regions included in this method: constant regions that included no genetic variants relative to the reference, and variable regions that did include genetic variants relative to the reference. The sequences of constant regions were directly included in the personal genome with no changes. For the variable sequences, one unique version of the region was included for each possible haplotype of that region, given phasing information that was previously known about the samples being sequenced. Consecutive regions overlapped such that read alignments did not span multiple regions. If a variable region had more than 150 total variants and spanned a length double the expected fragment length, then that one region would be divided into two consecutive variable regions with an overlap allowing for read alignment in a single region.
Attribution

Files This File Derives From

1 item
Accession
File Set
File Format
Content Type
Lab
File Size
Upload Status
fasta
genome reference
External Lab, Community
873 MB
validated