tim-reddy:starr-seq-fasta-reference-personal-genome-P009
/data/reddylab/Alex/IGVF/data_submissions/reference_data/AIM_1_STARR_seq/starrseq.personal_genome.fasta_reference.P009.fasta.gz
Note on the approach to create custom personal genomes - Reddy Lab
To improve the ability to map individual variants to accurately measure allele-specific regulatory activity, modified versions of the GRCh38 reference genome were used.
We created a personalized reference genome for short read alignment that includes phased information for all haplotypes expected in the sequencing data.
There were two main types of regions included in this method: constant regions that included no genetic variants relative to the reference, and variable regions that did include genetic variants relative to the reference. The sequences of constant regions were directly included in the personal genome with no changes. For the variable sequences, one unique version of the region was included for each possible haplotype of that region, given phasing information that was previously known about the samples being sequenced. Consecutive regions overlapped such that read alignments did not span multiple regions.
If a variable region had more than 150 total variants and spanned a length double the expected fragment length, then that one region would be divided into two consecutive variable regions with an overlap allowing for read alignment in a single region.