IGVFFI7070MEEI

released
validated
File Set
IGVFDS8660TXPW(Homo sapiens genome)
Summary
genome reference
File Format
fasta
Content Type
genome reference
Aliases
tim-reddy:starr-seq-fasta-reference-personal-genome-P020
md5sum
5cc25f611a2ff96484643036a222a4e9
Content MD5sum
44c064491af7dae21ca3305fa1ae98ce
File Size
6.2 GB
Submitted File Name
/data/reddylab/Alex/IGVF/data_submissions/reference_data/AIM_1_STARR_seq/starrseq.personal_genome.fasta_reference.P020.fasta.gz
Checkfiles Version
Submitter Comment
Note on the approach to create custom personal genomes - Reddy Lab To improve the ability to map individual variants to accurately measure allele-specific regulatory activity, modified versions of the GRCh38 reference genome were used. We created a personalized reference genome for short read alignment that includes phased information for all haplotypes expected in the sequencing data. There were two main types of regions included in this method: constant regions that included no genetic variants relative to the reference, and variable regions that did include genetic variants relative to the reference. The sequences of constant regions were directly included in the personal genome with no changes. For the variable sequences, one unique version of the region was included for each possible haplotype of that region, given phasing information that was previously known about the samples being sequenced. Consecutive regions overlapped such that read alignments did not span multiple regions. If a variable region had more than 150 total variants and spanned a length double the expected fragment length, then that one region would be divided into two consecutive variable regions with an overlap allowing for read alignment in a single region.
Attribution

Files This File Derives From

1 item
Accession
File Set
File Format
Content Type
Lab
File Size
Upload Status
fasta
genome reference
External Lab, Community
873 MB
validated