SpectralInference

Documentation for SpectralInference.

SpectralInference.adjustedrandindexMethod
adjustedrandindex(a::AbstractVector{<:Number}, b::AbstractVector{<:Number}; nbins=50)

Args:

  • a, vector of numbers
  • b, vector of numbers
  • nbins, for continuous approximates discrete, for discrete choose nbins>maxnumberof_classes
source
SpectralInference.as_polytomyFunction
as_polytomy(f::Function, tree)

Makes a copy of the tree, then removes nodes. function f should return true if the node is to be removed. Children of node are attached to the parent of the removed node

source
SpectralInference.as_polytomy!Function
as_polytomy!(f::Function, tree)

in-place removal of nodes. function f should return true if the node is to be removed. Children of node are attached to the parent of the removed node

source
SpectralInference.clusters_per_cutlevelFunction
clusters_per_cutlevel(distfun::Function, tree::Node, ncuts::Number)

Returns:

  • clusts: vector of cluster-memberships. each value indicates the cluster membership of the leaf at that cut. leaves are ordered in prewalk order within each membership vector
  • treedepths: distance from root for each of the ncuts
source
SpectralInference.cuttreeFunction
cuttree(tree, θ)

return vector of nodes whose distance from the root are < θ and whose children's distance to the root are > θ

source
SpectralInference.distancetrace_spaceneededMethod
distancetrace_spaceneeded(n, p; bits=64) = Base.format_bytes(binomial(n,2) * p * bits)

how much memory is needed to store spectral residual trace

Args:

  • n: number of samples
  • p: number of partitions/components
source
SpectralInference.empiricalMIMethod
empiricalMI(a::AbstractVector{<:Float}, b::AbstractVector{<:Float}[; nbins=50, edges_a=nothing, edges_b=nothing, normalize=false])
empiricalMI(ab::AbstractVector{F}, mask::AbstractVector{<:Bool}[; nbins=100, edges=nothing, base=ℯ, normalize=false])

computes empirical MI from identity of $H(a) + H(b) - H(a,b)$. where $H := -sum(p(x)*log(p(x))) + log(Δ)$ the $+ log(Δ)$ corresponds to the log binwidth and unbiases the entropy estimate from binwidth choice. estimates are roughly stable from $32$ ($32^2 ≈ 1000$ total bins) to size of sample. going from a small undersestimate to a small overestimate across that range. We recommend choosing the sqrt(mean(1000, samplesize)) for nbins argument, or taking a few estimates across that range and averaging.

Args:

  • a, vecter of length N
  • b, AbstractVector of length N
  • nbins, number of bins per side, use 1000 < nbins^2 < length(a) for best results
  • edges_a, defaults to nothing. If provided is used as the breaks defining bins for a, nbins will be ignored
  • edges_b, defaults to nothing. If provided is used as the breaks defining bins for b, nbins will be ignored
  • base, base unit of MI (defaults to nats with base=ℯ)
  • normalize, bool, whether to normalize with mi / mean(ha, hb)

Returns:

  • MI
source
SpectralInference.getintervalsMethod
getintervals(S::AbstractVector{<:Number}; alpha=1.0, q=0.5)

finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than 1.0 * Q2(differences)

i.e. finds breaks in the spectrum that explain smaller scales of variance

Args:

  • S: singular values of a SVD factorization
  • alpha: scalar multiple of q
  • q: which quantile of log differences to use; by default Q2

Returns:

  • AbstractVector{UnitRange} indices into S corresponding to the spectral partitions
source
SpectralInference.getintervalsIQRMethod
getintervals_IQR(S::AbstractVector{<:Number}; alpha=1.5, ql=.25, qh=.75)

finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than 1.5 * IQR(differences)

i.e. finds breaks in the spectrum that explain smaller scales of variance

Args:

  • S: singular values of a SVD factorization
  • alpha: scalar multiple of q
  • q: which quantile of log differences to use; by default Q3

Returns:

  • AbstractVector{UnitRange} indices into S corresponding to the spectral partitions
source
SpectralInference.ij2kMethod
ij2k(i,j,n)

with pair (i,j) give index k to the pairs produced by combinations(vec), where vec is length n

source
SpectralInference.newickstringMethod
newickstring(hc::Hclust[, tiplabels::AbstractVector[<:String]])

convert Hclust to newick tree string

Args:

  • hc: Hclust object from Clustering package
  • tiplabels: AbstractVector{<:String} names in same order as distance matrix

Returns:

source
SpectralInference.pairedMI_across_treedepthFunction
pairedMI_across_treedepth(metacolumns, metacolumns_ids, tree)
pairedMI_across_treedepth(metacolumns, metacolumns_ids, compare::Function=(==), tree::Node; ncuts=100, bootstrap=false, mask=nothing)

iterates over each metacolumn and calculates MI between the paired elements of the metacolumn and the paired elements of tree clusters.

Args:

  • metacolumns: column iterator, can be fed to map(metacolumns)
  • metacolumn_ids: ids for each element in the metacolumn. should match the leafnames of the tree, but not necessarily the order.
  • tree: NewickTree tree
  • compare: function used to calculate similarity of two elements in metacolumn. Should be written for each element as it will be broadcast across all pairs.

Returns:

  • (; MI, treedepths)
  • MI: Vector{Vector{Float64}} MI for each metacolumn and each tree depth
  • treedepths Vector{<:Number} tree depth (away from root) for each cut of the tree
source
SpectralInference.pairwiseMethod
pairwise(func::Function, m::AbstractMatrix)

returns the lower columnwise offdiagonal of result[k] = func(i, j) where k is the kth pair and i and j are the ith and kth columns of m calculated from enumerate(((i, j) for j in axes(m, 2) for i in (j+1):lastindex(m, 2)))

source
SpectralInference.projectinLSVMethod
projectinLSV(data::AbstractArray{T}, usv::SVD{T}, [window])

returns estimated left singular vectors (aka: LSV or Û) for new data based on already calculated SVD factorization

source
SpectralInference.projectinRSVMethod
projectinRSV(data::AbstractArray, usv::SVD, [window])

returns estimated transposed right singular vectors (RSV or V̂ᵗ) for new data based on already calculated SVD factorization

source
SpectralInference.projectoutMethod
projectout(usv::SVD, [window])

recreates original matrix i.e. calculates $UΣV'$ or if window is included creates a spectrally filtered version of the original matrix off of the provided components in window.

i.e., usv.U[:, window] * diagm(usv.S[window]) * usv.Vt[window, :]

source
SpectralInference.spectral_lineage_encodingFunction
spectral_lineage_encoding(tree::Node, orderedleafnames=getleafnames(tree); filterfun=x->true)

returns vector of named tuples with the id nodeid of the node and sle a vector of booleans ordered by orderedleafnames where true indicates the leaf descends from the node and false indicates that it does not.

source
SpectralInference.spectralcorrelationsMethod
calc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, window)
calc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, vals::AbstractVector{<:Number}, window)

Calculates pairwise spectral (spearman) correlations for a set of observations.

Args:

  • vecs: set of left or right singular vectors with observations/features on rows and spectral components on columns
  • vals: vector of singular values
  • window: set of indices of vecs columns to compute correlations across

Returns:

  • correlation matrix where each pixel is the correlation between a pair of observations
source
SpectralInference.spectraldistancesMethod
spectraldistances(A::AbstractMatrix; [onrows=true, alpha=1.0, q=0.5])
spectraldistances(usv::SVD; [onrows=true, alpha=1.0, q=0.5])
spectraldistances(vecs::AbstactMatrix, vals::AbstractMatrix, intervals::AbstractVector{<:UnitRange})

computes the cumulative spectral residual distance for spectral phylogenetic inference

(∑_{p ∈ P} ||UₚΣₚ||₂)²

where $P$ are the spectral partitions found with getintervals.

Args:

  • A, usv: AbstractMatrix or SVD factorization (AbstractMatrix is passed to svd() before calculation)
  • onrows: if true will compute spectral distances on the left singular vectors (U matrix), if false will calculate on the right singular vectors or (V matrix)
  • alpha, q: are passed to getintervals() see its documentation

Returns:

  • distance matrix
source
SpectralInference.spectraldistances_traceMethod
spectraldistances_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)
spectraldistances_trace(vecs, vals, groups)

calculates spectral residual within each partition of spectrum and each pair of taxa

returns matrix where rows are spectral partitions and columns are taxa:taxa pairs ordered as the upper triangle in rowwise order, or lower triangle in colwise order.

Args:

  • method: spectraldistances_trace(vecs, vals, groups)
    • vecs: either usv.U or usv.V matrix
    • vals: usv.S singular values vector
    • groups: usually calculated with getintervals(usv.S; alpha=alpha, q=q)
  • method: spectraldistances_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)
    • usv: SVD object
    • onrows: true/false switch to calculate spectral distance on rows (U matrix) or columns (V matrix).
    • groups: if nothing groups are calculated with getintervals(usv.S; alpha=alpha, q=q), otherwise they assume a vector of index ranges [1:1, 2:3, ...] to group usv.S with.
    • alpha: passed to getintervals
    • q: passed to getintervals
source
SpectralInference.squareformFunction
squareform(d::AbstractVector, fillvalue=zero(eltype(d)))
squareform(d::AbstractVector, fillvalue=zero(eltype(d)))

If d is a vector, squareform checks if it of n choose 2 length for integer n, then fills the values of a symetric square matrix with the values of d.

If d is a matrix, squareform checks if it is square then fills the values of vector with the lower offdiagonal of matrix d in column order form.

fillvalue is the initial value of the produced vector or matrix. Only really apparant in a produced matrix where it will be the values on the diagonal.

source
SpectralInference.vmeasure_homogeneity_completenessMethod
vmeasure_homogeneity_completeness(labels_true, labels_pred; β=1.)

calculates and returns v-measure, homogeneity, completeness; similar to f-score, precision, and recall respectively

Args:

  • β, weighting term for v-measure, if β is greater than 1 completeness

is weighted more strongly in the calculation, if β is less than 1, homogeneity is weighted more strongly

Citation:

  • A. Rosenberg, J. Hirschberg, in Proceedings of the 2007 Joint Conference

on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (Association for Computational Linguistics, Prague, Czech Republic, 2007; https://aclanthology.org/D07-1043), pp. 410–420.

source