Title: | Gene Haplotype Statistics, Phenotype Association and Visualization |
---|---|
Description: | Import genome variants data and perform gene haplotype Statistics, visualization and phenotype association with 'R'. |
Authors: | Zhang Renliang [aut, cre], Jia Guanqing [aut] |
Maintainer: | Zhang Renliang <[email protected]> |
License: | GPL-3 |
Version: | 1.2.5 |
Built: | 2024-11-05 05:17:47 UTC |
Source: | https://github.com/zhangrenl/genehapr |
add annotations to INFO fields used for plotHapTable()
addINFO(hap, tag = "", values = values, replace = FALSE, sep = ";") sites(hap)
addINFO(hap, tag = "", values = values, replace = FALSE, sep = ";") sites(hap)
hap |
object of |
tag |
tag names, usually is a single word used before "=" |
values |
annotation for each site. Length of values must be equal with sites in hapResult |
replace |
whether replace origin INFOs in hapResult or not. Default as FALSE |
sep |
a character string to separate the terms. Not
|
object of hapSummary or hapResult class with added/replaced INFOs
data("geneHapR_test") # length of values must be equal with number of sites in hap result values <- paste0("newInfo",c(1:9)) hapResult <- addINFO(hapResult, tag = "new", values = values, replace = TRUE) data("geneHapR_test") # check how many sites were concluded in hapResult/hapSummary sites(hapResult)
data("geneHapR_test") # length of values must be equal with number of sites in hap result values <- paste0("newInfo",c(1:9)) hapResult <- addINFO(hapResult, tag = "new", values = values, replace = TRUE) data("geneHapR_test") # check how many sites were concluded in hapResult/hapSummary sites(hapResult)
add promoter to annotation
addPromoter(anno, PromoterLength = 1500, bedFile = NULL)
addPromoter(anno, PromoterLength = 1500, bedFile = NULL)
anno |
anotation, imported gff/bed |
PromoterLength |
the length of promoter region, default as 1500 |
bedFile |
the output bed file name |
data("geneHapR_test") bed <- addPromoter(gff)
data("geneHapR_test") bed <- addPromoter(gff)
convert hapSummary
or hapResult
class into haplotype
class (pegas)
as.haplotype(hap)
as.haplotype(hap)
hap |
object of |
haplotype class
It's not advised for hapSummary
or hapResult
with indels, due to indels will
convert to SNPs with equal length of each indel.
data("geneHapR_test") hap <- as.haplotype(hapResult) hapSummary <- hap_summary(hapResult) hap <- as.haplotype(hapSummary)
data("geneHapR_test") hap <- as.haplotype(hapResult) hapSummary <- hap_summary(hapResult) hap <- as.haplotype(hapSummary)
gff
contains a example of gff file used for test of
visualization mutations on gene model.pheno
contains a simulated test pheno data used for test of comparison
between different haps
vcf
, a vcfR object provide a data set for test of seq2hap()
.
vcf
contains indels, snps, biallelic sites and multiallelic sites.
AccINFO
a data.frame provide addtional information of accessions,
including accession type, source and location.
show variants on gene model using hapSummary and gene annotations
displayVarOnGeneModel( hapSummary, gff, Chr, startPOS, endPOS, type = "pin", cex = 0.7, CDS_h = 0.05, fiveUTR_h = 0.02, threeUTR_h = 0.01, geneElement = geneElement, hap )
displayVarOnGeneModel( hapSummary, gff, Chr, startPOS, endPOS, type = "pin", cex = 0.7, CDS_h = 0.05, fiveUTR_h = 0.02, threeUTR_h = 0.01, geneElement = geneElement, hap )
hapSummary , hap
|
haplotype result |
gff |
gff |
Chr |
the chromosome name. If missing, the first element in the hapSummary will be used |
startPOS |
If missing, will use the min position in hapSummary |
endPOS |
If missing, will use the max position in hapSummary |
type |
character. Could be "circle", "pie", "pin", "pie.stack" or "flag" |
cex |
a numeric control the size of circle |
CDS_h , fiveUTR_h , threeUTR_h
|
The height of CDS 5'UTR and 3'UTR in gene model |
geneElement |
ploted elements, eg.: c("CDS","five_prime_UTR") |
No return value
data("geneHapR_test") hapSummary <- hap_summary(hapResult) displayVarOnGeneModel(hapSummary, gff, startPOS = 4100, endPOS = 8210, cex = 0.75)
data("geneHapR_test") hapSummary <- hap_summary(hapResult) displayVarOnGeneModel(hapSummary, gff, startPOS = 4100, endPOS = 8210, cex = 0.75)
filter hapResult or hapSummary by remove positions or accessions or haplotypes
filter_hap(hap, rm.mode = c("position", "accession", "haplotype", "freq"), position.rm = position.rm, accession.rm = accession.rm, haplotype.rm = haplotype.rm, freq.min = 5)
filter_hap(hap, rm.mode = c("position", "accession", "haplotype", "freq"), position.rm = position.rm, accession.rm = accession.rm, haplotype.rm = haplotype.rm, freq.min = 5)
hap |
object of hapSummary or hapResult class |
rm.mode |
filter mode, one of "position", "accession", "haplotype" |
position.rm |
numeric vector contains positions need to be removed |
accession.rm |
character vector contains accessions need to be removed, only hapResult can be filtered by accessions |
haplotype.rm |
character vector contains haplotypes need to be removed |
freq.min |
numeric, hapltypes with accessions number less than freq.min will be removed |
hapSummary or hapResult depend input
data("geneHapR_test") hap <- filter_hap(hapResult, rm.mode = c("position", "accession", "haplotype", "freq"), position.rm = c(4879, 4950), accession.rm = c("C1", "C9"), haplotype.rm = c("H009", "H008"), freq.min = 5)
data("geneHapR_test") hap <- filter_hap(hapResult, rm.mode = c("position", "accession", "haplotype", "freq"), position.rm = c(4879, 4950), accession.rm = c("C1", "C9"), haplotype.rm = c("H009", "H008"), freq.min = 5)
filter variants in hapmap format
filter_hmp( x, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, gff = gff, type = type, cusTyp = cusTyp, geneID = geneID )
filter_hmp( x, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, gff = gff, type = type, cusTyp = cusTyp, geneID = geneID )
x |
genotype dataset in hapmap format, object of data.frame class |
mode |
filter mode, one of "POS", "type", "both" |
Chr |
chromosome name, needed if mode set to "POS" or "both" |
start |
start position, needed if mode set to "POS" or "both" |
end |
end position, needed if mode set to "POS" or "both" |
gff |
object of GRanges class, genome annotations imported by
|
type |
filter type, needed if mode set to "type" or "both",
one of "CDS", "exon", "gene", "genome", "custom",
if |
cusTyp |
character vector, custom filter type,
needed if |
geneID |
gene ID |
# create a dataset of genotype in hapmap format hmp <- hap2hmp(hapResult); # example hmp <- filter_hmp(hmp, mode = "POS", Chr = "scaffold_1", start = 4100, end = 5000)
# create a dataset of genotype in hapmap format hmp <- hap2hmp(hapResult); # example hmp <- filter_hmp(hmp, mode = "POS", Chr = "scaffold_1", start = 4100, end = 5000)
used for filtration of p.link
filter_plink.pedmap(x, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, gff = gff, type = type, cusTyp = cusTyp, geneID = geneID)
filter_plink.pedmap(x, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, gff = gff, type = type, cusTyp = cusTyp, geneID = geneID)
x |
a list stored the p.link information |
mode |
filtration mode, one of c("POS", "type", "both") |
Chr |
the chromosome name, need if mode set as POS or both |
start , end
|
numeric, the range of filtration, and the start should smaller than end |
gff |
the imported gff object |
type |
should be in |
cusTyp |
if |
geneID |
geneID |
list, similar with x
, but filtered
pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") p.link <- import_plink.pedmap(pedfile = pedfile, mapfile = mapfile, sep_map = "\t", sep_ped = "\t") p.link <- filter_plink.pedmap(p.link, mode = "POS", Chr = "chr08", start = 25948004, end = 25949944) hapResult <- plink.pedmap2hap(p.link, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") p.link <- import_plink.pedmap(pedfile = pedfile, mapfile = mapfile, sep_map = "\t", sep_ped = "\t") p.link <- filter_plink.pedmap(p.link, mode = "POS", Chr = "chr08", start = 25948004, end = 25949944) hapResult <- plink.pedmap2hap(p.link, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
filter variants stored in table
filter_table( x, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, gff = gff, type = type, cusTyp = cusTyp, geneID = geneID )
filter_table( x, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, gff = gff, type = type, cusTyp = cusTyp, geneID = geneID )
x |
genotype dataset in hapmap format, object of data.frame class |
mode |
filter mode, one of "POS", "type", "both" |
Chr |
chromosome name, needed if mode set to "POS" or "both" |
start |
start position, needed if mode set to "POS" or "both" |
end |
end position, needed if mode set to "POS" or "both" |
gff |
object of GRanges class, genome annotations imported by
|
type |
filter type, needed if mode set to "type" or "both",
one of "CDS", "exon", "gene", "genome", "custom",
if |
cusTyp |
character vector, custom filter type,
needed if |
geneID |
gene ID |
# example data("geneHapR_test") table <- filter_table(gt.geno, mode = "POS", Chr = "scaffold_1", start = 4100, end = 5000)
# example data("geneHapR_test") table <- filter_table(gt.geno, mode = "POS", Chr = "scaffold_1", start = 4100, end = 5000)
filter VCF by GFF annotation or by position or both
filter_vcf(vcf, gff = gff, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, type = c("CDS", "exon", "gene", "genome", "custom"), cusTyp = c("CDS", "five_prime_UTR", "three_prime_UTR"), geneID = geneID)
filter_vcf(vcf, gff = gff, mode = c("POS", "type", "both"), Chr = Chr, start = start, end = end, type = c("CDS", "exon", "gene", "genome", "custom"), cusTyp = c("CDS", "five_prime_UTR", "three_prime_UTR"), geneID = geneID)
vcf |
object of vcfR class, VCF file imported by |
gff |
object of GRanges class, genome annotations imported by
|
mode |
filter mode, one of "POS", "type", "both" |
Chr |
chromosome name, needed if mode set to "POS" or "both" |
start |
start position, needed if mode set to "POS" or "both" |
end |
end position, needed if mode set to "POS" or "both" |
type |
filter type, needed if mode set to "type" or "both",
one of "CDS", "exon", "gene", "genome", "custom",
if |
cusTyp |
character vector, custom filter type,
needed if |
geneID |
gene ID |
vcfR
# filtet vcf data("geneHapR_test") vcf_f1 <- filter_vcf(vcf, mode = "POS", Chr = "scaffold_1", start = 4300, end = 5890) vcf_f2 <- filter_vcf(vcf, mode = "type", gff = gff, type = "CDS") vcf_f3 <- filter_vcf(vcf, mode = "both", Chr = "scaffold_1", start = 4300, end = 5890, gff = gff, type = "CDS")
# filtet vcf data("geneHapR_test") vcf_f1 <- filter_vcf(vcf, mode = "POS", Chr = "scaffold_1", start = 4300, end = 5890) vcf_f2 <- filter_vcf(vcf, mode = "type", gff = gff, type = "CDS") vcf_f3 <- filter_vcf(vcf, mode = "both", Chr = "scaffold_1", start = 4300, end = 5890, gff = gff, type = "CDS")
Filter/extract one or multiple gene(s)/range(s) from a large
p.link
file.
filterLargeP.link( root, rootOut = rootOut, Chr = Chr, POS = NULL, start = start, end = end, override = TRUE, sep = "\t" )
filterLargeP.link( root, rootOut = rootOut, Chr = Chr, POS = NULL, start = start, end = end, override = TRUE, sep = "\t" )
root |
The file name without suffix. This function only support p.link file format stored in "map" and "ped" format, the file names after removed suffix should be same with each other. |
rootOut |
Path(s) of output |
Chr |
a single CHROM name or CHROM names vector. |
POS , start , end
|
provide the chromosome name should be extract from orignal p.link dataset.
|
override |
whether override existed file or not, default as |
sep |
a character indicate the separation of map and ped file, default is |
This package import P.link files. However, import a large P.link file is time and
memory consuming. It's suggested that extract variants in target
range with filterLargeP.link()
before identification of haplotype.
When filter/extract multi genes/ranges, the parameter of Chr
and POS
must have equal length. Results will save to a single file if the user
provide a single file path or save to multiple P.link file(s) when a equal length
vector consist with file paths is provided.
No return value
# The filteration of P.link of regular size should be done with `filter_plink.pedmap()`. # however, here, we use a mini vcf instead just for example and test pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) file.copy(pedfile, "test.ped") file.copy(mapfile, "test.map") # extract a single gene/range from large vcf filterLargeP.link(root = "test", rootOut = "filtered_test", Chr = "scaffold_1", POS = c(4300,5000), override = TRUE) setwd(oldDir) # delete temp_dir unlink(temp_dir, recursive = TRUE)
# The filteration of P.link of regular size should be done with `filter_plink.pedmap()`. # however, here, we use a mini vcf instead just for example and test pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) file.copy(pedfile, "test.ped") file.copy(mapfile, "test.map") # extract a single gene/range from large vcf filterLargeP.link(root = "test", rootOut = "filtered_test", Chr = "scaffold_1", POS = c(4300,5000), override = TRUE) setwd(oldDir) # delete temp_dir unlink(temp_dir, recursive = TRUE)
Filter/extract one or multiple gene(s)/range(s) from a large
*.vcf/*.vcf.gz
file.
filterLargeVCF(VCFin = VCFin, VCFout = VCFout, Chr = Chr, POS = NULL, start = start, end = end, override = TRUE)
filterLargeVCF(VCFin = VCFin, VCFout = VCFout, Chr = Chr, POS = NULL, start = start, end = end, override = TRUE)
VCFin |
Path of input |
VCFout |
Path(s) of output |
Chr |
a single CHROM name or CHROM names vector. |
POS , start , end
|
provide the range should be extract from orignal vcf.
|
override |
whether override existed file or not, default as |
This package import VCF files with 'vcfR' which is more efficient to
import/manipulate VCF files in 'R'. However, import a large VCF file is time and
memory consuming. It's suggested that filter/extract variants in target
range with filterLargeVCF()
.
When filter/extract multi genes/ranges, the parameter of Chr
and POS
must have equal length. Results will save to a single file if the user
provide a single file path or save to multiple VCF file(s) when a equal length
vector consist with file paths is provided.
However, if you have hundreds gene/ranges need to extract from very large VCF file(s), it's prefer to process with other linux tools in a script on server, such as: 'vcftools' and 'bcftools'.
No return value
# The filteration of small vcf should be done with `filter_vcf()`. # however, here, we use a mini vcf instead just for example and test. vcfPath <- system.file("extdata", "var.vcf.gz", package = "geneHapR") oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) # extract a single gene/range from large vcf filterLargeVCF(VCFin = vcfPath, VCFout = "filtered.vcf.gz", Chr = "scaffold_1", POS = c(4300,5000), override = TRUE) # extract multi genes/ranges from large vcf filterLargeVCF(VCFin = vcfPath, VCFout = c("filtered1.vcf.gz", "filtered2.vcf.gz", "filtered3.vcf.gz"), Chr = rep("scaffold_1", 3), POS = list(c(4300, 5000), c(5000, 6000), c(5000, 7000)), override = TRUE) setwd(oldDir)
# The filteration of small vcf should be done with `filter_vcf()`. # however, here, we use a mini vcf instead just for example and test. vcfPath <- system.file("extdata", "var.vcf.gz", package = "geneHapR") oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) # extract a single gene/range from large vcf filterLargeVCF(VCFin = vcfPath, VCFout = "filtered.vcf.gz", Chr = "scaffold_1", POS = c(4300,5000), override = TRUE) # extract multi genes/ranges from large vcf filterLargeVCF(VCFin = vcfPath, VCFout = c("filtered1.vcf.gz", "filtered2.vcf.gz", "filtered3.vcf.gz"), Chr = rep("scaffold_1", 3), POS = list(c(4300, 5000), c(5000, 6000), c(5000, 7000)), override = TRUE) setwd(oldDir)
Get Gene Position
getGenePOS(gff= gff, geneID = geneID, type = type, gffTermContaingeneID = "Parent")
getGenePOS(gff= gff, geneID = geneID, type = type, gffTermContaingeneID = "Parent")
gff |
imported gff |
geneID |
target geneID |
type |
vector consist with one or more types in gff |
gffTermContaingeneID |
which term contains the geneID in your gff, defalt is Parent |
named vectors contains start, end and strand
data("geneHapR_test") genePOS <- getGenePOS(gff = gff, geneID = "test1G0387", type = "CDS", gffTermContaingeneID = "Parent")
data("geneHapR_test") genePOS <- getGenePOS(gff = gff, geneID = "test1G0387", type = "CDS", gffTermContaingeneID = "Parent")
Get Gene Ranges
getGeneRanges(gff= gff, geneID = geneID, type = type, gffTermContaingeneID = "Parent")
getGeneRanges(gff= gff, geneID = geneID, type = type, gffTermContaingeneID = "Parent")
gff |
imported gff |
geneID |
target geneID |
type |
vector consist with one or more types in gff |
gffTermContaingeneID |
which term contains the geneID in your gff, defalt is Parent |
GRanges
data("geneHapR_test") geneRanges <- getGeneRanges(gff = gff, geneID = "test1G0387", type = "CDS", gffTermContaingeneID = "Parent")
data("geneHapR_test") geneRanges <- getGeneRanges(gff = gff, geneID = "test1G0387", type = "CDS", gffTermContaingeneID = "Parent")
A function used for summarize hapResult to visualization and calculation.
hap_summary(hap, hapPrefix = "H", file = file)
hap_summary(hap, hapPrefix = "H", file = file)
hap |
object of hapResult class, generated by |
hapPrefix |
prefix of hap names, default as "H" |
file |
file path where to save the hap summary result. If missing, nothing will be saved to disk. |
It is suggested to use the result of vcf2hap()
or seqs2hap()
as input directly.
However the user can import previously hap result from local file
with import_hap()
hapSummary, first four rows are fixed to meta information: CHR, POS, INFO, ALLELE Hap names were placed in first column, Accessions and freqs were placed at the last two columns.
If the user have changed the default hapPrefix
in vcf2hap()
or seqs2hap()
,
then the parameter hapPrefix
is needed.
Furthermore, a multi-letter prefix of hap names is possible.
data("geneHapR_test") hapSummary <- hap_summary(hapResult, hapPrefix = "H")
data("geneHapR_test") hapSummary <- hap_summary(hapResult, hapPrefix = "H")
Convert hapResult object to hapmap (hmp) format, for interact with other packages
hap2hmp(hap) hmp2hap(hmp, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE, ...)
hap2hmp(hap) hmp2hap(hmp, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE, ...)
hap |
object of "hapResult" class |
hmp |
object of "data.frame" class in hapmap format |
hapPrefix |
prefix of haplotype names |
hetero_remove |
whether remove accessions contains hyb-sites, Character not A T C G |
na_drop |
whether drop accessions contains missing data ("N", "NA", ".") |
... |
Arguments passed on to
|
a data.frame in hapmap format.
data("geneHapR_test") hmp <- hap2hmp(hapResult) hap <- hmp2hap(hmp)
data("geneHapR_test") hmp <- hap2hmp(hapResult) hap <- hmp2hap(hmp)
show distribution of intereted haplotypes on maps
hapDistribution( hap, AccINFO, LON.col, LAT.col, hapNames, database = "world", regions = ".", hap.color = hap.color, zColours = zColours, legend = TRUE, symbolSize = 1, symbol.lim = c(1, 10), ratio = 1, cex.legend = 0.8, lwd.pie = 1, borderCol.pie = NA, lty.pie = 1, showlabel = TRUE, label.col = "black", label.cex = 0.8, label.font = 1, label.adj = c(0.5, 0.5), map.fill.color = 1, ... )
hapDistribution( hap, AccINFO, LON.col, LAT.col, hapNames, database = "world", regions = ".", hap.color = hap.color, zColours = zColours, legend = TRUE, symbolSize = 1, symbol.lim = c(1, 10), ratio = 1, cex.legend = 0.8, lwd.pie = 1, borderCol.pie = NA, lty.pie = 1, showlabel = TRUE, label.col = "black", label.cex = 0.8, label.font = 1, label.adj = c(0.5, 0.5), map.fill.color = 1, ... )
hap |
an object of hapResult class |
AccINFO |
a data.frame contains accession information |
LON.col , LAT.col
|
column names of
longitude( |
hapNames |
haplotype names used for display |
database |
character string naming a geographical database, a list of
|
regions |
character vector that names the polygons to draw.
Each database is composed of a collection of polygons, and each polygon has
a unique name.
When a region is composed of more than one polygon, the individual polygons
have the name of the region, followed by a colon and a qualifier,
as in |
hap.color , zColours
|
colors to apply to the pie section for each attribute column, "zColours" will be detached in future. |
legend |
a keyword specified the position of legend, one of "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center"; or a numeric vector of length two contains x,y coordinate of the legend |
symbolSize |
a numeric specified the symbol size. It will be detached in future. Please use "symbol.lim" instead. |
symbol.lim |
a numeric vector give the maximum and minimum size of each symbol |
ratio |
the ratio of Y to N in the output map, set to 1 as default |
cex.legend |
character expansion factor for legend relative to current |
lwd.pie |
line width of the pies |
borderCol.pie |
The color of pie's border, default is NA, which means no border will be plotted |
lty.pie |
the line type of pie border |
showlabel |
a bool vector indicates whether show the labels which represens number of individuals. Default as TRUE. |
label.col |
color of the labels, default as "black" |
label.cex |
a number indicates the text size in label, default as 0.8 |
label.font |
Font of label, 1 for normal, 2 for bold, 3 for italica, 4 for bold-italica |
label.adj |
the position of label, default as c(0.5, 0.5) |
map.fill.color |
vector of colors. If fill is FALSE, the first color is used for plotting all lines, and any other colors are ignored. Otherwise, the colors are matched one-one with the polygons that get selected by the region argument (and are reused cyclically, if necessary). If fill = TRUE, the default boundary line colour is given by par("fg"). To change this, you can use the border argument (see '...'). A color of NA causes the corresponding region to be deleted from the list of polygons to be drawn. Polygon colors are assigned after polygons are deleted due to values of the xlim and ylim arguments |
... |
Extra arguments passed to |
No return value
data("geneHapR_test") data(china) hapDistribution(hapResult, AccINFO = AccINFO, LON.col = "longitude", LAT.col = "latitude", hapNames = c("H001", "H002", "H003"))
data("geneHapR_test") data(china) hapDistribution(hapResult, AccINFO = AccINFO, LON.col = "longitude", LAT.col = "latitude", hapNames = c("H001", "H002", "H003"))
hapVsPheno
hapVsPheno( hap, pheno, phenoName, hapPrefix = "H", title = "", comparisons = comparisons, method = "t.test", method.args = list(), symnum.args = list(), mergeFigs = FALSE, angle = angle, hjust = hjust, vjust = vjust, minAcc = minAcc, freq.min = freq.min, outlier.rm = TRUE, ... )
hapVsPheno( hap, pheno, phenoName, hapPrefix = "H", title = "", comparisons = comparisons, method = "t.test", method.args = list(), symnum.args = list(), mergeFigs = FALSE, angle = angle, hjust = hjust, vjust = vjust, minAcc = minAcc, freq.min = freq.min, outlier.rm = TRUE, ... )
hap |
object of hapResult class, generate with |
pheno |
object of data.frame class, imported by |
phenoName |
pheno name for plot, should be one column name of pheno |
hapPrefix |
prefix of hapotypes, default as "H" |
title |
a charater which will used for figure title |
comparisons |
a list contains comparison pairs
eg. |
method |
a character string indicating which method to be used for comparing means. |
method.args |
a list of additional arguments used for the test method.
For example one might use |
symnum.args |
a list of arguments to pass to the function
In other words, we use the following convention for symbols indicating statistical significance:
|
mergeFigs |
bool type, indicate whether merge the heat map and box
plot or not. Default as |
angle |
the angle of x labels |
hjust , vjust
|
hjust and vjust of x labels |
minAcc , freq.min
|
If observations number of a Hap less than this number will not be compared with others or be ploted. Should not less than 3 due to the t-test will meaninglessly. Default as 5 |
outlier.rm |
whether remove ouliers, default as TRUE |
... |
Arguments passed on to
|
list. A list contains a character vector with Haps were applied
student test, a mattrix contains p-value of each compare of Haps and a
ggplot2 object named as figs if mergeFigs set as TRUE
, or two ggplot2
objects names as fig_pvalue and fig_Violin
data("geneHapR_test") # plot the figs directly hapVsPheno(hap = hapResult, pheno = pheno, phenoName = "GrainWeight.2021", minAcc = 3) # do not merge the files results <- hapVsPheno(hap = hapResult, pheno = pheno, phenoName = "GrainWeight.2021", minAcc = 3, mergeFigs = FALSE) plot(results$fig_pvalue) plot(results$fig_Violin)
data("geneHapR_test") # plot the figs directly hapVsPheno(hap = hapResult, pheno = pheno, phenoName = "GrainWeight.2021", minAcc = 3) # do not merge the files results <- hapVsPheno(hap = hapResult, pheno = pheno, phenoName = "GrainWeight.2021", minAcc = 3, mergeFigs = FALSE) plot(results$fig_pvalue) plot(results$fig_Violin)
Comparie phenotype site by site.
hapVsPhenoPerSite( hap, pheno, phenoName, sitePOS, fileName, fileType = NULL, freq.min = 5, ... )
hapVsPhenoPerSite( hap, pheno, phenoName, sitePOS, fileName, fileType = NULL, freq.min = 5, ... )
hap |
an R object of hapresult class |
pheno , phenoName
|
pheno, a data.frame contains the phenotypes; Only one phenotype name is required. |
sitePOS |
the coordinate of site |
fileName , fileType
|
file name and file type will be needed for saving result, file type could be one of "png, tiff, jpg" |
freq.min |
miner allies frequency less than freq.min will not be skipped |
... |
addtional params will be passed to plot saving function like |
data("geneHapR_test") hapVsPhenoPerSite(hapResult, pheno, sitePOS = "4300")
data("geneHapR_test") hapVsPhenoPerSite(hapResult, pheno, sitePOS = "4300")
hapVsPhenos
hapVsPhenos( hap, pheno, outPutSingleFile = TRUE, hapPrefix = "H", title = "Seita.0G000000", width = 12, height = 8, res = 300, compression = "lzw", filename.prefix = filename.prefix, filename.surfix = "pdf", filename.sep = "_", outlier.rm = TRUE, mergeFigs = TRUE, ... )
hapVsPhenos( hap, pheno, outPutSingleFile = TRUE, hapPrefix = "H", title = "Seita.0G000000", width = 12, height = 8, res = 300, compression = "lzw", filename.prefix = filename.prefix, filename.surfix = "pdf", filename.sep = "_", outlier.rm = TRUE, mergeFigs = TRUE, ... )
hap |
object of hapResult class, generate with |
pheno |
object of data.frame class, imported by |
outPutSingleFile |
|
hapPrefix |
prefix of hapotypes, default as "H" |
title |
a charater which will used for figure title |
width |
manual option for determining the output file width in inches. (default: 12) |
height |
manual option for determining the output file height in inches. (default: 8) |
res |
The nominal resolution in ppi which will be recorded in the bitmap file, if a positive integer. Also used for units other than the default, and to convert points to pixels |
compression |
the type of compression to be used. |
filename.prefix , filename.surfix , filename.sep
|
if multi files generated, file names will be formed by
prefix |
outlier.rm |
whether remove ouliers, default as TRUE |
mergeFigs |
bool type, indicate whether merge the heat map and box
plot or not. Default as |
... |
Arguments passed on to
|
No return value
data("geneHapR_test") oriDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) # analysis all pheno in the data.frame of pheno hapVsPhenos(hapResult, pheno, outPutSingleFile = TRUE, hapPrefix = "H", title = "Seita.0G000000", filename.prefix = "test", width = 12, height = 8, res = 300) setwd(oriDir)
data("geneHapR_test") oriDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) # analysis all pheno in the data.frame of pheno hapVsPhenos(hapResult, pheno, outPutSingleFile = TRUE, hapPrefix = "H", title = "Seita.0G000000", filename.prefix = "test", width = 12, height = 8, res = 300) setwd(oriDir)
import accession information including phenotype data, accession group, location from a tab delimed table file
import_AccINFO(file, comment.char = "#", check.names = FALSE, row.names = 1, ...)
import_AccINFO(file, comment.char = "#", check.names = FALSE, row.names = 1, ...)
file |
file path, this file should be a tab delimed table |
comment.char |
character: a character vector of length one
containing a single character or an empty string. Use |
check.names |
logical. If |
row.names |
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names. If there is a header and the first row contains one fewer field than
the number of columns, the first column in the input is used for the
row names. Otherwise if Using |
... |
Further arguments to be passed to |
First column should be Accessions; phenos/accession information should begin from second column, phenoName/group/locations should located at the first row, If a dot '.' is located in pheno name, then the part before the dot will be set as y axis name and the latter will be set as foot when plot figures.
data.frame, Accession names were set as rownames and columns were named by pheno/info names
oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) data("geneHapR_test") write.table(pheno, file = "test.pheno.txt", sep = "\t") pheno <- import_AccINFO("test.pheno.txt") pheno setwd(oldDir)
oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) data("geneHapR_test") write.table(pheno, file = "test.pheno.txt", sep = "\t") pheno <- import_AccINFO("test.pheno.txt") pheno setwd(oldDir)
import bed files contains annotations into R as GRanges object
import_bed(con, quite = FALSE)
import_bed(con, quite = FALSE)
con |
A path, URL, connection or |
quite |
whether show message |
If there is no genome annotation file in GFF format for your interest species, a BED file is convenient to custom a simple annotation file for a gene. Here we suggest two type of BED format: BED6 and BED4.
As the definition of UCSC. The BED6 contains 6 columns, which are 1) chrom, 2) chromStart, 3) chromEnd, 4) name, 5) score and 6) strand. The BED4 format contains the first 4 column of BED6 format.
However, in gene haplotype statistics, we only care about the type of each site. Thus we use the fourth column to definition the transcripts name and "CDS" or "URTs", separated by a space, eg.:
Chr8 678 890 HD1.1 CDS . -
Chr8 891 989 HD1.1 five_prime_UTR . -
Chr8 668 759 HD1.2 CDS . -
Chr8 908 989 HD1.2 CDS . -
This example indicate a small gene named as HD1 have two transcripts, named as HD1.1 and HD1.2, separately. HD1 has a CDS and a UTR region; while HD1.2 has two CDS region.
GRange object
bed.Path <- system.file("extdata", "annotation.bed6", package = "geneHapR") bed <- import_bed(bed.Path) bed
bed.Path <- system.file("extdata", "annotation.bed6", package = "geneHapR") bed <- import_bed(bed.Path) bed
import genome annotations in GFF/GFF3 format
import_gff(gffFile, format = "gff")
import_gff(gffFile, format = "gff")
gffFile |
the gff file path |
format |
should be one of "gff", "gff1", "gff2", "gff3", "gvf", or "gtf". Default as gff |
GRange object
gff.Path <- system.file("extdata", "annotation.gff", package = "geneHapR") gff <- import_gff(gff.Path, format = "gff") gff
gff.Path <- system.file("extdata", "annotation.gff", package = "geneHapR") gff <- import_gff(gff.Path, format = "gff") gff
This function could be used for import hap result or hap summary result. The type of returned object is decided by input file, see details.
import_hap(file, type = "auto", ...)
import_hap(file, type = "auto", ...)
file |
hapSummary or hapResult file path. |
type |
the content type of imported file, should be one of c("hapResult", "hapSummary") |
... |
extras will pass to |
The hap result and hap summary result have common features. The common features of these two types are: First four rows contains extra information: CHR, POS, INFO and ALLELE Hap names were in the first column. The differences are: Hap summary result have a freq column while hap result not. Rows represent haplotypes in hap summary result, while rows represent accessions in hap result. In addtion, the accessions of each haplotype in hap summary result were separated by ";".
hapSummary or hapResult
oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) data("geneHapR_test") write.hap(hapResult, file = "test.pheno.txt", sep = "\t") hap <- import_hap("test.pheno.txt") hap setwd(oldDir)
oldDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) data("geneHapR_test") write.hap(hapResult, file = "test.pheno.txt", sep = "\t") hap <- import_hap("test.pheno.txt") hap setwd(oldDir)
import sequences algned results
import_MultipleAlignment(filepath, format = "fasta", type = "DNA")
import_MultipleAlignment(filepath, format = "fasta", type = "DNA")
filepath |
A character vector (of arbitrary length when reading, of length 1
when writing) containing the paths to the files to read or write.
Note that special values like |
format |
Either |
type |
one of "DNA" and "Protein" |
object of DNAMultipleAlignment
aliSeqPath <- system.file("extdata", "seqs.fa", package = "geneHapR") geneSeqs <- import_MultipleAlignment(filepath = aliSeqPath, format = "fasta", type = "DNA") geneSeqs <- import_MultipleAlignment(filepath = aliSeqPath, format = "fasta", type = "Protein")
aliSeqPath <- system.file("extdata", "seqs.fa", package = "geneHapR") geneSeqs <- import_MultipleAlignment(filepath = aliSeqPath, format = "fasta", type = "DNA") geneSeqs <- import_MultipleAlignment(filepath = aliSeqPath, format = "fasta", type = "Protein")
used for import regular p.link file stored in map and ped format
import_plink.pedmap(root = root, sep_ped = "\t", sep_map = "\t", pedfile = pedfile, mapfile = mapfile)
import_plink.pedmap(root = root, sep_ped = "\t", sep_map = "\t", pedfile = pedfile, mapfile = mapfile)
root |
The file name without suffix. This function only support p.link file format stored in "map" and "ped" format, the file names after removed suffix should be same with each other. |
sep_ped |
a character indicate the separation of ped file |
sep_map |
a character indicate the separation of map file |
pedfile , mapfile
|
if |
list, contains map information stored in data.frame and ped information stored in data.frame
pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") p.link <- import_plink.pedmap(pedfile = pedfile, mapfile = mapfile, sep_map = "\t", sep_ped = "\t") p.link <- filter_plink.pedmap(p.link, mode = "POS", Chr = "chr08", start = 25948004, end = 25949944) hapResult <- plink.pedmap2hap(p.link, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") p.link <- import_plink.pedmap(pedfile = pedfile, mapfile = mapfile, sep_map = "\t", sep_ped = "\t") p.link <- filter_plink.pedmap(p.link, mode = "POS", Chr = "chr08", start = 25948004, end = 25949944) hapResult <- plink.pedmap2hap(p.link, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
import DNA sequences in FASTA format
import_seqs(filepath, format = "fasta")
import_seqs(filepath, format = "fasta")
filepath |
A character vector containing the path to the DNA sequences file. Reading files in gzip format (which usually have the '.gz' extension) is supported. Note that only DNA supported here. |
format |
Either "fasta" (the default) or "fastq" |
object of DNAStringSet class
seqPath <- system.file("extdata", "seqs.fa", package = "geneHapR") geneSeqs <- import_seqs(filepath = seqPath, format = "fasta")
seqPath <- system.file("extdata", "seqs.fa", package = "geneHapR") geneSeqs <- import_seqs(filepath = seqPath, format = "fasta")
import *.vcf structured text format,
as well as the compressed *.vcf.gz
format.
import_vcf(file = file, ...) import_vcf(file = file, ...)
import_vcf(file = file, ...) import_vcf(file = file, ...)
file |
file path of VCF file |
... |
pass to |
vcfR object
Zhangrenl
vcfPath <- system.file("extdata", "var.vcf.gz", package = "geneHapR") vcf <- import_vcf(file = vcfPath) vcf
vcfPath <- system.file("extdata", "var.vcf.gz", package = "geneHapR") vcf <- import_vcf(file = vcfPath) vcf
LDheatmap()
is used to produce a graphical display, as a heat map,
of pairwise linkage disequilibrium (LD) measurements for SNPs.
The heat map is a false color image in the upper-left diagonal of a square plot.
Optionally, a line parallel to the diagonal of the image indicating
the physical or genetic map positions of the SNPs may be added, along
with text reporting the total length of the genomic region considered.
plot_LDheatmap( hap, gff, Chr, start, end, geneID = NULL, distances = "physical", LDmeasure = "r", title = "Pairwise LD", add.map = TRUE, map.height = 1, colorLegend = TRUE, geneMapLocation = 0.15, geneMapLabelX = NULL, geneMapLabelY = NULL, SNP.name = TRUE, color = NULL, color_gmodel = "grey", color_snp = "grey", color_snpname = "grey40", cex_snpname = 0.8, snpmarks_height = NULL, newpage = TRUE, name = "ldheatmap", vp.name = NULL, pop = FALSE, text = FALSE )
plot_LDheatmap( hap, gff, Chr, start, end, geneID = NULL, distances = "physical", LDmeasure = "r", title = "Pairwise LD", add.map = TRUE, map.height = 1, colorLegend = TRUE, geneMapLocation = 0.15, geneMapLabelX = NULL, geneMapLabelY = NULL, SNP.name = TRUE, color = NULL, color_gmodel = "grey", color_snp = "grey", color_snpname = "grey40", cex_snpname = 0.8, snpmarks_height = NULL, newpage = TRUE, name = "ldheatmap", vp.name = NULL, pop = FALSE, text = FALSE )
hap |
R object of hapSummary or hapResult class. |
gff |
annotations |
Chr , start , end , geneID
|
chromosome, start and end position, gene ID for extract annotation in target range. |
distances |
A character string to specify whether the provided map locations
are in physical or genetic distances.
If |
LDmeasure |
A character string specifying the measure of LD
|
title |
A character string for the main title of the plot. Default is “Pairwise LD”. |
add.map |
If |
map.height |
the height of gene map, default is 0.02 |
colorLegend |
If |
geneMapLocation |
A numeric value specifying the position of the line
parallel to the diagonal of the matrix; the larger the value, the
farther it lies from the matrix diagonal. Ignored when |
geneMapLabelX |
A numeric value specifying the x-coordinate
of the text indicating the total length of the genomic region
being considered. Ignored when |
geneMapLabelY |
A numeric value specifying the y-coordinate
of the text indicating the total length of the genomic region
being considered. Ignored when |
SNP.name |
a logical vector indicated wherther display SNP names using positions. |
color |
A range of colors to be used for drawing the heat map. Default
is |
color_gmodel , color_snp , color_snpname
|
the color of gene model and snp and snp names respectively, default as grey80. |
cex_snpname |
the size of snp names/labels |
snpmarks_height |
the height of snp marks, if set as NULL, nothing will display on gene model where the heat map is going to be drawn. |
newpage |
If |
name |
A character string specifying the name of the LDheatmap
graphical object ( |
vp.name |
A character string specifying the name of the viewport |
pop |
If |
text |
If |
The input object gdat
can be a data frame of genotype
objects
(a data structure from the genetics package), a SnpMatrix
object
(a data structure from the snpStats package), or
any square matrix with values between 0 and 1 inclusive.
LD computation is much faster for SnpMatrix
objects than for
genotype
objects.
In the case of a matrix of LD values between 0 and 1,
the values above the diagonal will be plotted.
In the display of LD, SNPs appear in the order supplied by the
user as the horizontal and vertical coordinates are increased
and one moves along the off-diagonal line, from the bottom-left
to the top-right corner. To achieve this, the conventions of
the image()
function have been adopted, in which horizontal
coordinates correspond to the rows of the matrix and vertical coordinates
correspond to columns, and vertical coordinates are indexed in increasing
order from bottom to top.
See the package vignette LDheatmap
for more examples and details
of the implementation. Examples of adding “tracks” of genomic
annotation above a flipped heatmap are in the package vignette
addTracks
.
An object of class "LDheatmap"
which contains the following components:
LDmatrix |
The matrix of pairwise LD measurements plotted in the heat map. |
LDheatmapGrob |
A grid graphical object (grob) representing the produced heat map. |
heatmapVP |
The viewport in which the heat map is drawn. See viewport. |
genetic.distances |
The vector of the supplied physical or genetic map locations, or the vector of equispaced marker distances when no distance vector is supplied. |
distances |
A character string specifying whether the provided map distances are physical or genetic. |
color |
The range of colors used for drawing the heat map. |
The grob
LDheatmapGrob
has three grob
s as its children (components).
They are listed below along with their own children and respectively represent
the color image with main title, genetic map and color key of the heat map:
"heatMap"
- "heatmap"
, "title"
;
"geneMap"
- "diagonal"
, "segments"
,
"title"
, "symbols"
, "SNPnames"
; and
"Key"
- "colorKey"
, "title"
, "labels"
,
"ticks"
, "box"
.
The produced heat map can be modified in two ways.
First, it is possible to edit interactively the grob components of the heat map,
by using the function grid.edit
;
the function will not work if there is no
open graphical device showing the heat map.
Alternatively, the user can use the function
editGrob
and work with
the grob LDheatmapGrob
returned by LDheatmap
.
See Examples for usage.
LDheatmap()
uses Grid
, which
does not respond to par()
settings.
Hence modifying par()
settings of mfrow
and mfcol
will not work with LDheatmap()
. The Examples section shows how to
display multiple heat maps on one plot without the use
of par()
.
Shin J-H, Blay S, McNeney B and Graham J (2006). LDheatmap: An R Function for Graphical Display of Pairwise Linkage Disequilibria Between Single Nucleotide Polymorphisms. Journal of Statistical Software, 16 Code Snippet 3
# Pass LDheatmap a SnpMatrix object data(geneHapR_test) plot_LDheatmap(hap = hapResult, gff = gff, Chr = hapResult[1,2], start = 4000, end = 8200)
# Pass LDheatmap a SnpMatrix object data(geneHapR_test) plot_LDheatmap(hap = hapResult, gff = gff, Chr = hapResult[1,2], start = 4000, end = 8200)
computes a haplotype network with haplotype summary result
get_hapNet(hapSummary, AccINFO = AccINFO, groupName = groupName, na.label = "Unknown") getHapGroup( hapSummary, AccINFO = AccINFO, groupName = groupName, na.label = na.label )
get_hapNet(hapSummary, AccINFO = AccINFO, groupName = groupName, na.label = "Unknown") getHapGroup( hapSummary, AccINFO = AccINFO, groupName = groupName, na.label = na.label )
hapSummary |
object of |
AccINFO |
data.frame, specified groups of each accession.
Used for pie plot. If missing, pie will not draw in plotHapNet.
Or you can supplied a hap_group mattrix with |
groupName |
the group name used for pie plot,
should be in |
na.label |
the label of |
hapNet class
Mark P.J. van der Loo (2014) doi:10.32614/RJ-2014-011;
E. Paradis (2010) doi:10.1093/bioinformatics/btp696
plotHapNet()
and
hap_summary()
.
data("geneHapR_test") hapSummary <- hap_summary(hapResult) # calculate haploNet hapNet <- get_hapNet(hapSummary, AccINFO = AccINFO, # accession types groupName = colnames(AccINFO)[2]) # plot haploNet plot(hapNet) # plot haploNet plotHapNet(hapNet, size = "freq", # circle size scale = "log10", # scale circle with 'log10(size + 1)' cex = 1, # size of hap symbol col.link = 2, # link colors link.width = 2, # link widths show.mutation = 2, # mutation types one of c(0,1,2,3) legend = FALSE) # legend position
data("geneHapR_test") hapSummary <- hap_summary(hapResult) # calculate haploNet hapNet <- get_hapNet(hapSummary, AccINFO = AccINFO, # accession types groupName = colnames(AccINFO)[2]) # plot haploNet plot(hapNet) # plot haploNet plotHapNet(hapNet, size = "freq", # circle size scale = "log10", # scale circle with 'log10(size + 1)' cex = 1, # size of hap symbol col.link = 2, # link colors link.width = 2, # link widths show.mutation = 2, # mutation types one of c(0,1,2,3) legend = FALSE) # legend position
convert p.link format data into hapResult
plink.pedmap2hap( p.link, hapPrefix = "H", pad = 3, hetero_remove = TRUE, na_drop = TRUE )
plink.pedmap2hap( p.link, hapPrefix = "H", pad = 3, hetero_remove = TRUE, na_drop = TRUE )
p.link |
list contains p.link information |
hapPrefix |
prefix of haplotype names |
pad |
The number length in haplotype names should be extend to. |
hetero_remove |
whether remove accessions contains hyb-sites |
na_drop |
whether drop accessions contains missing data ("N", NA) |
object of hapSummary class
pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") p.link <- import_plink.pedmap(pedfile = pedfile, mapfile = mapfile, sep_map = "\t", sep_ped = "\t") p.link <- filter_plink.pedmap(p.link, mode = "POS", Chr = "chr08", start = 25948004, end = 25949944) hapResult <- plink.pedmap2hap(p.link, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
pedfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.ped", package = "geneHapR") mapfile <- system.file("extdata", "snp3kvars-CHR8-25947258-25951166-plink.map", package = "geneHapR") p.link <- import_plink.pedmap(pedfile = pedfile, mapfile = mapfile, sep_map = "\t", sep_ped = "\t") p.link <- filter_plink.pedmap(p.link, mode = "POS", Chr = "chr08", start = 25948004, end = 25949944) hapResult <- plink.pedmap2hap(p.link, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
plotEFF
plotEFF( siteEFF, gff = gff, Chr = Chr, start = start, end = end, showType = c("five_prime_UTR", "CDS", "three_prime_UTR"), CDS.height = CDS.height, cex = 0.1, col = c("red", "yellow"), pch = 20, main = main, legend.cex = 0.8, gene.legend = TRUE, markMutants = TRUE, mutants.col = 1, mutants.type = 1, y = c("pvalue", "effect"), ylab = ylab, legendtitle = legendtitle, par.restore = TRUE )
plotEFF( siteEFF, gff = gff, Chr = Chr, start = start, end = end, showType = c("five_prime_UTR", "CDS", "three_prime_UTR"), CDS.height = CDS.height, cex = 0.1, col = c("red", "yellow"), pch = 20, main = main, legend.cex = 0.8, gene.legend = TRUE, markMutants = TRUE, mutants.col = 1, mutants.type = 1, y = c("pvalue", "effect"), ylab = ylab, legendtitle = legendtitle, par.restore = TRUE )
siteEFF |
matrix, column name are pheno names and row name are site position |
gff |
gff annotation |
Chr |
the chromosome name |
start |
start position |
end |
end position |
showType |
character vector, eg.: "CDS", "five_prime_UTR", "three_prime_UTR" |
CDS.height |
numeric indicate the height of CDS in gene model,
range: |
cex |
a numeric control the size of point |
col |
vector specified the color bar |
pch |
vector controls points type, see
|
main |
main title |
legend.cex |
a numeric control the legend size |
gene.legend |
whether add legend for gene model |
markMutants |
whether mark mutants on gene model, default as |
mutants.col |
color of lines which mark mutants |
mutants.type |
a vector of line types |
y , ylab , legendtitle
|
y: indicate either pvalue or effect should be used as y axix, ylab,legendtitle:,character, if missing, the value will be decide by y. |
par.restore |
default as |
No return value, called for side effects
data("geneHapR_test") # calculate site functional effect # siteEFF <- siteEFF(hapResult, pheno, names(pheno)) # plotEFF(siteEFF, gff = gff, Chr = "scaffold_1")
data("geneHapR_test") # calculate site functional effect # siteEFF <- siteEFF(hapResult, pheno, names(pheno)) # plotEFF(siteEFF, gff = gff, Chr = "scaffold_1")
plotHapNet
plotHapNet( hapNet, size = "freq", scale = 1, cex = 0.8, cex.legend = 0.6, col.link = 1, link.width = 1, show.mutation = 2, backGround = backGround, hapGroup = hapGroup, legend = FALSE, show_size_legend = TRUE, show_color_legend = TRUE, pie.lim = c(0.5, 2), main = main, labels = TRUE, legend_version = 0, labels.cex = 0.8, labels.col = "blue", labels.adj = NULL, labels.font = 2, ... )
plotHapNet( hapNet, size = "freq", scale = 1, cex = 0.8, cex.legend = 0.6, col.link = 1, link.width = 1, show.mutation = 2, backGround = backGround, hapGroup = hapGroup, legend = FALSE, show_size_legend = TRUE, show_color_legend = TRUE, pie.lim = c(0.5, 2), main = main, labels = TRUE, legend_version = 0, labels.cex = 0.8, labels.col = "blue", labels.adj = NULL, labels.font = 2, ... )
hapNet |
an object of class "haploNet" |
size |
a numeric vector giving the diameter of the circles representing the haplotypes: this is in the same unit than the links and eventually recycled. |
scale |
a numeric indicate the ratio of the scale of the links
representing the number of steps on the scale of the circles
representing the haplotypes or a character one of |
cex |
character expansion factor relative to current par("cex") |
cex.legend |
same as |
col.link |
a character vector specifying the colours of the links; eventually recycled. |
link.width |
a numeric vector giving the width of the links; eventually recycled. |
show.mutation |
an integer value: if 0, nothing is drawn on the links; if 1, the mutations are shown with small segments on the links; if 2, they are shown with small dots; if 3, the number of mutations are printed on the links. |
backGround |
a color vector with length equal to number of Accession types |
hapGroup |
a matrix used to draw pie charts for each haplotype; its number of rows must be equal to the number of haplotypes |
legend |
a logical specifying whether to draw the legend,
or a vector of length two giving the coordinates where to draw the legend;
|
show_size_legend , show_color_legend
|
wether show size or color legend |
pie.lim |
A numeric vector define the maximum and minmum pie size, which will be avoid the pie to samll or too large |
main |
The main title (on top) using font, size (character
expansion) and color |
labels |
a logical specifying whether to identify the haplotypes with their labels (default as TRUE) |
legend_version |
the size legened style, default as 0 |
labels.cex |
the size of labels |
labels.col |
the labels color |
labels.adj |
a named list contains two length vectors defining the adjustment of labels. The names should be exactly matched with the haplotype names. default as NULL. |
labels.font |
the font of labels, default as 2 |
... |
other parameters will pass to |
Additional parameters control the network features: labels.cex = 1, labels.font = 2, link.color = "black", link.type = 1, link.type.alt = 2, link.width = 1, link.width.alt = 1, altlinks = TRUE, threshold = c(1,2), haplotype.inner.color = "white", haplotype.outer.color = "black", mutations.cex = 1, mutations.font = 1, mutations.frame.background = "#0000FF4D", mutations.frame.border = "black", mutations.text.color = 1, mutations.arrow.color = "black", mutations.arrow.type = "triangle", mutations.sequence.color = "#BFBFBF4D", mutations.sequence.end = "round", mutations.sequence.length = 0.3, mutations.sequence.width = 5, pie.inner.segments.color = "black", pie.colors.function = rainbow, scale.ratio = 1, show.mutation = 2
The alter links could be eliminated by set the 'threshold' to 0 or set 'altlinks' as FALSE.
No return value
hap_summary()
and
get_hapNet()
.
data("geneHapR_test") hapSummary <- hap_summary(hapResult) # calculate haploNet hapNet <- get_hapNet(hapSummary, AccINFO = AccINFO, # accession types groupName = colnames(AccINFO)[2]) # plot haploNet plot(hapNet) # plot haploNet plotHapNet(hapNet, size = "freq", # circle size scale = "log10", # scale circle with 'log10(size + 1)' cex = 1, # size of hap symbol col.link = 2, # link colors link.width = 2, # link widths show.mutation = 2, # mutation types one of c(0,1,2,3) legend = FALSE) # legend position
data("geneHapR_test") hapSummary <- hap_summary(hapResult) # calculate haploNet hapNet <- get_hapNet(hapSummary, AccINFO = AccINFO, # accession types groupName = colnames(AccINFO)[2]) # plot haploNet plot(hapNet) # plot haploNet plotHapNet(hapNet, size = "freq", # circle size scale = "log10", # scale circle with 'log10(size + 1)' cex = 1, # size of hap symbol col.link = 2, # link colors link.width = 2, # link widths show.mutation = 2, # mutation types one of c(0,1,2,3) legend = FALSE) # legend position
display hap result as a table-like figure
plotHapTable(hapSummary, hapPrefix = "H", title = "", geneName = geneName, INFO_tag = NULL, tag_split = tag_split, tag_field = tag_field, tag_name = tag_name, displayIndelSize = 0, angle = c(0,45,90), replaceMultiAllele = TRUE, ALLELE.color = "grey90")
plotHapTable(hapSummary, hapPrefix = "H", title = "", geneName = geneName, INFO_tag = NULL, tag_split = tag_split, tag_field = tag_field, tag_name = tag_name, displayIndelSize = 0, angle = c(0,45,90), replaceMultiAllele = TRUE, ALLELE.color = "grey90")
hapSummary |
object of hapSummary class |
hapPrefix |
prefix of haplotype names. Default as "H" |
title |
the main title of the final figure |
geneName |
character, will be used for filter INFO filed of ANN |
INFO_tag |
The annotations in the INFO field are represented as tag-value pairs, where the tag and value are separated by an equal sign, ie "=", and pairs are separated by colons, ie ";". For more information please see details. |
tag_split |
usually, the value of tag-value contains one information.
However, if a tag contains more than one fields, eg "ANN", then |
tag_field |
integer, if a tag-value contains more than one fields,
user need to specified which field should be display. If |
tag_name |
tag name is displayed in Hap figure. If |
displayIndelSize |
display indels with max size of |
angle |
the angle of coordinates, should be one of 0, 45 and 90 |
replaceMultiAllele |
whether to replace MultiAllele with "T*",
default as |
ALLELE.color |
the color of ALLELE row, default as "grey90" |
In VCF files, the INFO field are represented as tag-value pairs, where the tag and value are separated by an equal sign, ie "=", and pairs are separated by colons, ie ";".
If hapSummarys were generated from sequences, INFO row is null.
If hapSummarys were generated from VCF, INFO was take from the INFO column
in the source VCF file.
Some tag-values may contains more than one value separated by
"|", eg.: "ANN" or "snpEFF" added by 'snpeff' or other software. For those
fields we need specified value of tag_field = "ANN"
and tag_split = "[\|]"
,
it's suggest specified the value of tag_name
for display in figure.
'snpeff', a toolbox for genetic variant annotation and functional effect prediction, will add annotations to INFO filed in VCF file under a tag named as "ANN". The annotations contains several fields separated by "|". eg.:
Allele
Annotation
Annotation_Impact
Gene_Name
Gene_ID
Feature_Type
Feature_ID
Transcript_BioType
Rank
HGVS.c
HGVS.p
cDNA.pos/cDNA.length ... ...
However, the INFO in hapResults may missing annotations that we need.
In this case, we can custom INFOs in hapSummarys with addINFO()
.
Once the needed annotations were included in hap results, we can display
them with plotHapTable()
by specify the value of INFO_tag
.
ggplot2 object
data("geneHapR_test") plotHapTable(hapResult)
data("geneHapR_test") plotHapTable(hapResult)
plot the hapResult in table like style using grid system. This function is under development and may not stable. Some parameters may deleted or renamed in future.
plotHapTable2( hapSummary, show_indel_size = 1, replaceMultiAllele = TRUE, angle = 0, show_INFO = FALSE, INFO_split = c(";", ",", "\\|"), INFO_tag = "ANN", geneID = NA, tag_field = -1, title = "", gff = NULL, show_chr_name = TRUE, Chr = NULL, start = NULL, end = NULL, model_rect_col = "black", model_rect_fill = "grey80", model_line_col = "black", model_anno_txt = NULL, model_anno_col = "black", model_anno_cex = 1, table_txt_col = "black", table_txt_cex = 1, model_anno_pos = c(-1, -1), model_anno_adj = c(0, 1), gene_model_height = 0.2, space_height = 0.1, table_height = NULL, CDS_height = 0.3, link_line_type = 3, headrows = 1, equal_col_width = FALSE, head_anno = 1, col_annots = 0, row_labels = 1, row_annots = 1, table_line_col = "white", annot_for_each_transcrips = TRUE, labels_fill = "white", annot_fill = "grey90", head_fill = NULL, cell_fill = NULL, style = gpar(fontfamily = "sans", fontface = 1, cex = 0.7), footbar = "", ... )
plotHapTable2( hapSummary, show_indel_size = 1, replaceMultiAllele = TRUE, angle = 0, show_INFO = FALSE, INFO_split = c(";", ",", "\\|"), INFO_tag = "ANN", geneID = NA, tag_field = -1, title = "", gff = NULL, show_chr_name = TRUE, Chr = NULL, start = NULL, end = NULL, model_rect_col = "black", model_rect_fill = "grey80", model_line_col = "black", model_anno_txt = NULL, model_anno_col = "black", model_anno_cex = 1, table_txt_col = "black", table_txt_cex = 1, model_anno_pos = c(-1, -1), model_anno_adj = c(0, 1), gene_model_height = 0.2, space_height = 0.1, table_height = NULL, CDS_height = 0.3, link_line_type = 3, headrows = 1, equal_col_width = FALSE, head_anno = 1, col_annots = 0, row_labels = 1, row_annots = 1, table_line_col = "white", annot_for_each_transcrips = TRUE, labels_fill = "white", annot_fill = "grey90", head_fill = NULL, cell_fill = NULL, style = gpar(fontfamily = "sans", fontface = 1, cex = 0.7), footbar = "", ... )
hapSummary |
the hapSummary or hapResult object |
show_indel_size |
the Indel length longer will be replaced by "i1,i2,i3,..." |
replaceMultiAllele |
replace multi-allele title by 'T1,T2,...' or not |
angle |
the angle of number positions |
show_INFO |
show annotation field or not, default as |
INFO_split , INFO_tag , geneID , tag_field
|
used to set annotation in haplotype table. And the geneID was used to fileter annotation in INFO field. |
title |
title of plot |
gff |
gff or bed annotation |
show_chr_name |
show chromosome name at left-top cell or not |
Chr , start , end
|
which range should be plotted in gene model |
model_rect_col , model_rect_fill , model_line_col
|
a string specified the color for line/rectangle in gene model |
table_txt_col , table_txt_cex
|
controls the color and size of texts in genotype table |
model_anno_pos , model_anno_adj , model_anno_cex , model_anno_col , model_anno_txt
|
the position (x,y), just (hjust, vjust), color, size and content of annotation text in gene model |
gene_model_height , table_height , space_height
|
the plotting range height of gene model, table and spacer |
CDS_height |
a numeric vector specified the height of CDS, and the height of utr is half of that, only useful when gff is provided, |
link_line_type |
the type of link lines for mutations in gene model and genotype table |
equal_col_width |
a bool or numeric vector specified whether column with should be equal |
col_annots , head_anno , headrows , row_annots , row_labels
|
the column or row number of annotation or labels or heads |
table_line_col |
the line color in genotype table |
annot_for_each_transcrips |
mark the strand and trancripts name for each gene modle |
annot_fill , head_fill , labels_fill
|
the fill color of annotation, head and label row or columns |
cell_fill |
a color vector or function or named vector specified cell fill color |
style |
see help(gpar) |
footbar |
the foot notes |
... |
param not used |
# data(geneHapR_test) plotHapTable2(hapResult) plotHapTable2(hapResult, gff = gff)
# data(geneHapR_test) plotHapTable2(hapResult) plotHapTable2(hapResult, gff = gff)
generate hapResults from aligned and trimed sequences
seqs2hap( seqs, Ref = names(seqs)[1], hetero_remove = TRUE, na_drop = TRUE, maxGapsPerSeq = 0.25, hapPrefix = "H", pad = 3, chrName = "Chr0", ... ) trimSeqs(seqs, minFlankFraction = 0.1)
seqs2hap( seqs, Ref = names(seqs)[1], hetero_remove = TRUE, na_drop = TRUE, maxGapsPerSeq = 0.25, hapPrefix = "H", pad = 3, chrName = "Chr0", ... ) trimSeqs(seqs, minFlankFraction = 0.1)
seqs |
object of DNAStringSet or DNAMultipleAlignment class |
Ref |
the name of reference sequences. Default as the name of the first sequence |
hetero_remove |
whether remove accessions contains hybrid site or not.
Default as |
na_drop |
whether drop sequeces contain "N"
Default as |
maxGapsPerSeq |
value in |
hapPrefix |
prefix of hap names. Default as "H" |
pad |
The number length in haplotype names should be extend to. |
chrName |
the Name should be used for haplotype |
... |
Parameters not used. |
minFlankFraction |
A value in |
object of hapResult class
data("geneHapR_test") seqs seqs <- trimSeqs(seqs, minFlankFraction = 0.1) seqs hapResult <- seqs2hap(seqs, Ref = names(seqs)[1], hetero_remove = TRUE, na_drop = TRUE, maxGapsPerSeq = 0.25, hapPrefix = "H")
data("geneHapR_test") seqs seqs <- trimSeqs(seqs, minFlankFraction = 0.1) seqs hapResult <- seqs2hap(seqs, Ref = names(seqs)[1], hetero_remove = TRUE, na_drop = TRUE, maxGapsPerSeq = 0.25, hapPrefix = "H")
Set position of ATG as zero in hap result and gff annotation. The upstream was negative while the gene range and downstream was positive.
gffSetATGas0(gff = gff, hap = hap, geneID = geneID, Chr = Chr, POS = POS) hapSetATGas0(gff = gff, hap = hap, geneID = geneID, Chr = Chr, POS = POS)
gffSetATGas0(gff = gff, hap = hap, geneID = geneID, Chr = Chr, POS = POS) hapSetATGas0(gff = gff, hap = hap, geneID = geneID, Chr = Chr, POS = POS)
gff |
gene annotations |
hap |
object of hapResult or hapSummary class |
geneID |
geneID |
Chr |
Chromsome name |
POS |
vector consist with start and end position |
Filter hap result and gff annotation according to provided information. And then set position of ATG as zero in hap result and gff annotation. The upstream was negative while the gene range and downstream was positive.
Notice: the position of "ATG" after modified was 0, 1 and 2 separately. The site in hap result exceed the selected range will be dropped.
gffSetATGas0
: filtered gff with position of ATG was as zero
hapSetATGas0
: hap results with position of ATG was set as zero
# load example dataset data("geneHapR_test") # set position of ATG as zero in gff newgff <- gffSetATGas0(gff = gff, hap = hapResult, geneID = "test1G0387", Chr = "scaffold_1", POS = c(4300, 7910)) data("geneHapR_test") # set position of ATG as zero in hap results newhapResult <- hapSetATGas0(gff = gff, hap = hapResult, geneID = "test1G0387", Chr = "scaffold_1", POS = c(4300, 7910))
# load example dataset data("geneHapR_test") # set position of ATG as zero in gff newgff <- gffSetATGas0(gff = gff, hap = hapResult, geneID = "test1G0387", Chr = "scaffold_1", POS = c(4300, 7910)) data("geneHapR_test") # set position of ATG as zero in hap results newhapResult <- hapSetATGas0(gff = gff, hap = hapResult, geneID = "test1G0387", Chr = "scaffold_1", POS = c(4300, 7910))
Calculation of Sites Effective
siteEFF(hap, pheno, phenoNames, quality = FALSE, method = "auto", p.adj = "none")
siteEFF(hap, pheno, phenoNames, quality = FALSE, method = "auto", p.adj = "none")
hap |
object of "hapResult" class |
pheno |
phenotype data, with column names as pheno name and row name as accessions. |
phenoNames |
pheno names used for analysis, if missing,
will use all pheno names in |
quality |
bool type, indicate whther the type of phenos are quality or
quantitative. Length of |
method |
character or character vector with length equal with
|
p.adj |
character, indicate correction method. Could be "BH", "BY", "none" |
The site EFF was determinate by the phenotype difference between each site geno-type.
The p was calculated with statistical analysis method as designated by the
parameter method
. If method
set as "auto", then
chi.test will be
selected for quantity phenotype, eg.: color;
for quantity phynotype, eg.: height, with at least 30 observations per
geno-type and fit Gaussian distribution t.test will be performed, otherwise
wilcox.test will be performed.
a list containing two matrix names as "p" and "EFF", with column name are pheno names and row name are site position. The matrix names as "p" contains all p-value. The matrix named as "EFF" contains scaled difference between each geno-types per site.
data("geneHapR_test") # calculate site functional effect # siteEFF <- siteEFF(hapResult, pheno, names(pheno)) # plotEFF(siteEFF, gff = gff, Chr = "scaffold_1")
data("geneHapR_test") # calculate site functional effect # siteEFF <- siteEFF(hapResult, pheno, names(pheno)) # plotEFF(siteEFF, gff = gff, Chr = "scaffold_1")
Used for all allele effect compare once
compareAllSites( hap, pheno, phenoName = names(pheno)[1], hetero_remove = TRUE, title = "", file = file )
compareAllSites( hap, pheno, phenoName = names(pheno)[1], hetero_remove = TRUE, title = "", file = file )
hap |
object of hapResult class |
pheno |
a data.frame contains phenotypes |
phenoName |
the name of used phenotype |
hetero_remove |
removing the heter-sites or not, default as TRUE |
title |
the title of the figure |
file |
if provieds a file path the comparing results will saved to file. |
convert variants stored in table format into hapResult
table2hap(x, hapPrefix = "H", pad = 3, hetero_remove = TRUE, na_drop = TRUE)
table2hap(x, hapPrefix = "H", pad = 3, hetero_remove = TRUE, na_drop = TRUE)
x |
a data.frame contains variants information. The first file column are fix as Chrome name, position, reference nuclieotide, alter nuclieotide and INFO. Accession genotype should be in followed columns. "-" will be treated as Indel. "." and "N" will be treated as missing data. Heterozygotes should be "A/T", "AAA/A" |
hapPrefix |
prefix of haplotype names |
pad |
The number length in haplotype names should be extend to. |
hetero_remove |
whether remove accessions contains hyb-sites, Character not A T C G |
na_drop |
whether drop accessions contains missing data ("N", "NA", ".") |
object of hapSummary class
data("geneHapR_test") hapResult <- table2hap(gt.geno, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
data("geneHapR_test") hapResult <- table2hap(gt.geno, hapPrefix = "H", hetero_remove = TRUE, na_drop = TRUE)
Generate hapResult from vcfR object
A simple filter by position was provided in this function,
however it's prefer to filter VCF (vcfR object) through
filter_vcf()
.
vcf2hap( vcf, hapPrefix = "H", filter_Chr = FALSE, filter_POS = FALSE, pad = 3, hetero_remove = TRUE, na_drop = TRUE, ... )
vcf2hap( vcf, hapPrefix = "H", filter_Chr = FALSE, filter_POS = FALSE, pad = 3, hetero_remove = TRUE, na_drop = TRUE, ... )
vcf |
vcfR object imported by |
hapPrefix |
prefix of hap names, default as "H" |
filter_Chr |
not used |
filter_POS |
not used |
pad |
The number length in haplotype names should be extend to. |
hetero_remove |
whether remove accessions contains hybrid site or not.
Default as |
na_drop |
whether remove accessions contains unknown allele site or not
Default as |
... |
Parameters not used |
object of hapResult class
Zhangrenl
extract genotype from vcf:
vcfR::extract_gt_tidy()
,
import vcf files:
import_vcf()
(preferred) and
vcfR::read.vcfR()
,
filter vcf according position and annotations:
filter_vcf()
data("geneHapR_test") hapResult <- vcf2hap(vcf)
data("geneHapR_test") hapResult <- vcf2hap(vcf)
This function will write hap result into a txt file.
write.hap(x, file = file, sep = "\t", pheno = pheno, phenoName = phenoName)
write.hap(x, file = file, sep = "\t", pheno = pheno, phenoName = phenoName)
x |
objec of hapResult or hapSummary class |
file |
file path, where to save the hap result/summary |
sep |
the field separator string. Values within each row of x are separated by this string.
Default as " |
pheno , phenoName
|
the phenotype data.frame, only used for export hapResult object. |
The hap result and hap summary result have common features. The common features of these two types are: First four rows contains extra information: CHR, POS, INFO and ALLELE Hap names were in the first column. The differences are: Hap summary result have a freq column while hap result not. Rows represent haplotypes in hap summary result, while rows represent accessions in hap result. In addtion, the accessions of each haplotype in hap summary result were separated by ";".
No return value
oriDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) data("geneHapR_test") write.hap(hapResult, file = "hapResult.txt") setwd(oriDir)
oriDir <- getwd() temp_dir <- tempdir() if(! dir.exists(temp_dir)) dir.create(temp_dir) setwd(temp_dir) data("geneHapR_test") write.hap(hapResult, file = "hapResult.txt") setwd(oriDir)