SpectralInference
Documentation for SpectralInference.
SpectralInference.UPGMA_tree
SpectralInference.adjustedrandindex
SpectralInference.as_polytomy
SpectralInference.as_polytomy!
SpectralInference.clusters_per_cutlevel
SpectralInference.collectiveLCA
SpectralInference.cuttree
SpectralInference.distancematrix_spaceneeded
SpectralInference.distancetrace_spaceneeded
SpectralInference.empiricalMI
SpectralInference.explainedvariance
SpectralInference.fscore_precision_recall
SpectralInference.getintervals
SpectralInference.getintervalsIQR
SpectralInference.getleafids
SpectralInference.getleafnames
SpectralInference.ij2k
SpectralInference.k2ij
SpectralInference.ladderize!
SpectralInference.mapinternalnodes
SpectralInference.maplocalnodes
SpectralInference.mapnodes
SpectralInference.network_distance
SpectralInference.network_distances
SpectralInference.newickstring
SpectralInference.pairedMI_across_treedepth
SpectralInference.pairwise
SpectralInference.patristic_distance
SpectralInference.patristic_distances
SpectralInference.projectinLSV
SpectralInference.projectinRSV
SpectralInference.projectout
SpectralInference.readphylip
SpectralInference.scaledcumsum
SpectralInference.spectral_lineage_encoding
SpectralInference.spectralcorrelations
SpectralInference.spectraldistances
SpectralInference.spectraldistances_trace
SpectralInference.squareform
SpectralInference.vmeasure_homogeneity_completeness
SpectralInference.UPGMA_tree
— MethodUPGMA_tree(Dij::AbstractMatrix{<:Number})
shorthand for Clustering.hclust(Dij, linkage=:average, branchorder=:optimal)
SpectralInference.adjustedrandindex
— Methodadjustedrandindex(a::AbstractVector{<:Number}, b::AbstractVector{<:Number}; nbins=50)
Args:
- a, vector of numbers
- b, vector of numbers
- nbins, for continuous approximates discrete, for discrete choose nbins>maxnumberof_classes
SpectralInference.as_polytomy
— Functionas_polytomy(f::Function, tree)
Makes a copy of the tree, then removes nodes. function f
should return true
if the node is to be removed. Children of node are attached to the parent of the removed node
SpectralInference.as_polytomy!
— Functionas_polytomy!(f::Function, tree)
in-place removal of nodes. function f
should return true
if the node is to be removed. Children of node are attached to the parent of the removed node
SpectralInference.clusters_per_cutlevel
— Functionclusters_per_cutlevel(distfun::Function, tree::Node, ncuts::Number)
Returns:
- clusts: vector of cluster-memberships. each value indicates the cluster membership of the leaf at that cut. leaves are ordered in prewalk order within each membership vector
- treedepths: distance from root for each of the
ncuts
SpectralInference.collectiveLCA
— FunctioncollectiveLCA(treenodes::AbstractVector)
returns last common ancestor of the vector of treenodes
SpectralInference.cuttree
— Functioncuttree(tree, θ)
return vector of nodes whose distance from the root are < θ and whose children's distance to the root are > θ
SpectralInference.distancematrix_spaceneeded
— Methoddistancematrix_spaceneeded(n, p; bits=64) = Base.format_bytes(binomial(n,2) * p * bits)
how much memory is needed to store distance matrix Args:
- n: number of samples
SpectralInference.distancetrace_spaceneeded
— Methoddistancetrace_spaceneeded(n, p; bits=64) = Base.format_bytes(binomial(n,2) * p * bits)
how much memory is needed to store spectral residual trace
Args:
- n: number of samples
- p: number of partitions/components
SpectralInference.empiricalMI
— MethodempiricalMI(a::AbstractVector{<:Float}, b::AbstractVector{<:Float}[; nbins=50, edges_a=nothing, edges_b=nothing, normalize=false])
empiricalMI(ab::AbstractVector{F}, mask::AbstractVector{<:Bool}[; nbins=100, edges=nothing, base=ℯ, normalize=false])
computes empirical MI from identity of $H(a) + H(b) - H(a,b)$. where $H := -sum(p(x)*log(p(x))) + log(Δ)$ the $+ log(Δ)$ corresponds to the log binwidth and unbiases the entropy estimate from binwidth choice. estimates are roughly stable from $32$ ($32^2 ≈ 1000$ total bins) to size of sample. going from a small undersestimate to a small overestimate across that range. We recommend choosing the sqrt(mean(1000, samplesize))
for nbins
argument, or taking a few estimates across that range and averaging.
Args:
- a, vecter of length N
- b, AbstractVector of length N
- nbins, number of bins per side, use 1000 < nbins^2 < length(a) for best results
- edges_a, defaults to
nothing
. If provided is used as the breaks defining bins fora
,nbins
will be ignored - edges_b, defaults to
nothing
. If provided is used as the breaks defining bins forb
,nbins
will be ignored - base, base unit of MI (defaults to nats with base=ℯ)
- normalize, bool, whether to normalize with mi / mean(ha, hb)
Returns:
- MI
SpectralInference.explainedvariance
— Methodexplainedvariance(s::AbstractVector{<:Number})
SpectralInference.fscore_precision_recall
— Functionfscore_precision_recall(reftree, predictedtree)
fscore, precision, and recall of branches between the two trees
SpectralInference.getintervals
— Methodgetintervals(S::AbstractVector{<:Number}; alpha=1.0, q=0.5)
finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than 1.0 * Q2(differences)
i.e. finds breaks in the spectrum that explain smaller scales of variance
Args:
- S: singular values of a SVD factorization
- alpha: scalar multiple of
q
- q: which quantile of log differences to use; by default Q2
Returns:
- AbstractVector{UnitRange} indices into S corresponding to the spectral partitions
SpectralInference.getintervalsIQR
— Methodgetintervals_IQR(S::AbstractVector{<:Number}; alpha=1.5, ql=.25, qh=.75)
finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than 1.5 * IQR(differences)
i.e. finds breaks in the spectrum that explain smaller scales of variance
Args:
- S: singular values of a SVD factorization
- alpha: scalar multiple of
q
- q: which quantile of log differences to use; by default Q3
Returns:
- AbstractVector{UnitRange} indices into S corresponding to the spectral partitions
SpectralInference.getleafids
— Functionreturns ids of all leaves desended from a node in prewalk order
SpectralInference.getleafnames
— Functionreturns names of all leaves desended from a node in prewalk order
SpectralInference.ij2k
— Methodij2k(i,j,n)
with pair (i,j)
give index k
to the pairs produced by combinations(vec), where vec is length n
SpectralInference.k2ij
— Methodk2ij(k, n)
which pair (i,j)
produces the k
th element of combinations(vec), where vec is length n
SpectralInference.ladderize!
— Functionladderize!(tree, rev=false)
sorts children of each node by number of leaves descending from the child in ascending order.
SpectralInference.mapinternalnodes
— Functionmapinternalnodes(f::Function, tree)
maps across all nodes that have children in prewalk order and applies function f(node)
SpectralInference.maplocalnodes
— Functionmaplocalnodes(f::Function, tree)
maps across all nodes that have a leaf as a child in prewalk order and applies function f(node)
SpectralInference.mapnodes
— Functionmapnodes(f::Function, tree)
maps across all nodes that have children in prewalk order and applies function f(node)
SpectralInference.network_distance
— Functionnetwork_distance(leaf_i, leaf_j)
returns network distance between two leaves
SpectralInference.network_distances
— Functionnetwork_distances(tree::Node)
returns network distances between all pairs of leaves (leaves are in same order as getleafnames
)
SpectralInference.newickstring
— Methodnewickstring(hc::Hclust[, tiplabels::AbstractVector[<:String]])
convert Hclust to newick tree string
Args:
- hc:
Hclust
object from Clustering package - tiplabels:
AbstractVector{<:String}
names in same order as distance matrix
Returns:
- newick tree formated string
SpectralInference.pairedMI_across_treedepth
— FunctionpairedMI_across_treedepth(metacolumns, metacolumns_ids, tree)
pairedMI_across_treedepth(metacolumns, metacolumns_ids, compare::Function=(==), tree::Node; ncuts=100, bootstrap=false, mask=nothing)
iterates over each metacolumn and calculates MI between the paired elements of the metacolumn and the paired elements of tree clusters.
Args:
- metacolumns: column iterator, can be fed to
map(metacolumns)
- metacolumn_ids: ids for each element in the metacolumn. should match the leafnames of the tree, but not necessarily the order.
- tree: NewickTree tree
- compare: function used to calculate similarity of two elements in metacolumn. Should be written for each element as it will be broadcast across all pairs.
Returns:
- (; MI, treedepths)
- MI: Vector{Vector{Float64}} MI for each metacolumn and each tree depth
- treedepths Vector{<:Number} tree depth (away from root) for each cut of the tree
SpectralInference.pairwise
— Methodpairwise(func::Function, m::AbstractMatrix)
returns the lower columnwise offdiagonal of result[k] = func(i, j)
where k is the kth pair and i and j are the ith and kth columns of m calculated from enumerate(((i, j) for j in axes(m, 2) for i in (j+1):lastindex(m, 2)))
SpectralInference.patristic_distance
— Functionpatristic_distance(leaf_i, leaf_j)
returns patristic distance between two leaves
SpectralInference.patristic_distances
— Functionpatristic_distances(tree::Node)
returns patristic distances between all pairs of leaves (leaves are in same order as getleafnames
)
SpectralInference.projectinLSV
— MethodprojectinLSV(data::AbstractArray{T}, usv::SVD{T}, [window])
returns estimated left singular vectors (aka: LSV or Û) for new data based on already calculated SVD factorization
SpectralInference.projectinRSV
— MethodprojectinRSV(data::AbstractArray, usv::SVD, [window])
returns estimated transposed right singular vectors (RSV or V̂ᵗ) for new data based on already calculated SVD factorization
SpectralInference.projectout
— Methodprojectout(usv::SVD, [window])
recreates original matrix i.e. calculates $UΣV'$ or if window is included creates a spectrally filtered version of the original matrix off of the provided components in window
.
i.e., usv.U[:, window] * diagm(usv.S[window]) * usv.Vt[window, :]
SpectralInference.readphylip
— Methodreadphylip(fn::String)
Read phylip alignment file, return dataframe of IDs and Sequences
SpectralInference.scaledcumsum
— Methodscaledcumsum(c; dims=1)
cumsum divided by maximum cumulative value
SpectralInference.spectral_lineage_encoding
— Functionspectral_lineage_encoding(tree::Node, orderedleafnames=getleafnames(tree); filterfun=x->true)
returns vector of named tuples with the id nodeid
of the node and sle
a vector of booleans ordered by orderedleafnames
where true indicates the leaf descends from the node and false indicates that it does not.
SpectralInference.spectralcorrelations
— Methodcalc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, window)
calc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, vals::AbstractVector{<:Number}, window)
Calculates pairwise spectral (spearman) correlations for a set of observations.
Args:
- vecs: set of left or right singular vectors with observations/features on rows and spectral components on columns
- vals: vector of singular values
- window: set of indices of
vecs
columns to compute correlations across
Returns:
- correlation matrix where each pixel is the correlation between a pair of observations
SpectralInference.spectraldistances
— Methodspectraldistances(A::AbstractMatrix; [onrows=true, alpha=1.0, q=0.5])
spectraldistances(usv::SVD; [onrows=true, alpha=1.0, q=0.5])
spectraldistances(vecs::AbstactMatrix, vals::AbstractMatrix, intervals::AbstractVector{<:UnitRange})
computes the cumulative spectral residual distance for spectral phylogenetic inference
(∑_{p ∈ P} ||UₚΣₚ||₂)²
where $P$ are the spectral partitions found with getintervals
.
Args:
- A, usv: AbstractMatrix or SVD factorization (AbstractMatrix is passed to
svd()
before calculation) - onrows: if true will compute spectral distances on the left singular vectors (U matrix), if false will calculate on the right singular vectors or (V matrix)
- alpha, q: are passed to
getintervals()
see its documentation
Returns:
- distance matrix
SpectralInference.spectraldistances_trace
— Methodspectraldistances_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)
spectraldistances_trace(vecs, vals, groups)
calculates spectral residual within each partition of spectrum and each pair of taxa
returns matrix where rows are spectral partitions and columns are taxa:taxa pairs ordered as the upper triangle in rowwise order, or lower triangle in colwise order.
Args:
- method:
spectraldistances_trace(vecs, vals, groups)
- vecs: either usv.U or usv.V matrix
- vals: usv.S singular values vector
- groups: usually calculated with
getintervals(usv.S; alpha=alpha, q=q)
- method:
spectraldistances_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)
- usv: SVD object
- onrows: true/false switch to calculate spectral distance on rows (U matrix) or columns (V matrix).
- groups: if nothing groups are calculated with
getintervals(usv.S; alpha=alpha, q=q)
, otherwise they assume a vector of index ranges[1:1, 2:3, ...]
to groupusv.S
with. - alpha: passed to
getintervals
- q: passed to
getintervals
SpectralInference.squareform
— Functionsquareform(d::AbstractVector, fillvalue=zero(eltype(d)))
squareform(d::AbstractVector, fillvalue=zero(eltype(d)))
If d
is a vector, squareform
checks if it of n
choose 2 length for integer n
, then fills the values of a symetric square matrix with the values of d
.
If d
is a matrix, squareform
checks if it is square then fills the values of vector with the lower offdiagonal of matrix d
in column order form.
fillvalue
is the initial value of the produced vector or matrix. Only really apparant in a produced matrix where it will be the values on the diagonal.
SpectralInference.vmeasure_homogeneity_completeness
— Methodvmeasure_homogeneity_completeness(labels_true, labels_pred; β=1.)
calculates and returns v-measure, homogeneity, completeness; similar to f-score, precision, and recall respectively
Args:
- β, weighting term for v-measure, if β is greater than 1 completeness
is weighted more strongly in the calculation, if β is less than 1, homogeneity is weighted more strongly
Citation:
- A. Rosenberg, J. Hirschberg, in Proceedings of the 2007 Joint Conference
on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (Association for Computational Linguistics, Prague, Czech Republic, 2007; https://aclanthology.org/D07-1043), pp. 410–420.