SPI
Documentation for SPI.
SPI.UPGMA_treeSPI.adjustedrandindexSPI.calc_spcorr_mtxSPI.calc_spi_mtxSPI.calc_spi_traceSPI.calc_spi_treeSPI.empiricalMISPI.explainedvarianceSPI.getintervalsSPI.getintervalsIQRSPI.ij2kSPI.k2ijSPI.minspaceneededSPI.nwstrSPI.pairwiseSPI.projectinLSVSPI.projectinRSVSPI.projectoutSPI.readphylipSPI.scaledcumsumSPI.spimtx_spaceneededSPI.spiresultSPI.squareformSPI.vmeasure_homogeneity_completeness
SPI.UPGMA_tree — MethodUPGMA_tree(Dij::AbstractMatrix{<:Number})shorthand for Clustering.hclust(Dij, linkage=:average, branchorder=:optimal)
SPI.adjustedrandindex — Methodadjustedrandindex(a::AbstractVector{<:Number}, b::AbstractVector{<:Number}; nbins=50)Args:
- a, vector of numbers
- b, vector of numbers
- nbins, for continuous approximates discrete, for discrete choose nbins>maxnumberof_classes
SPI.calc_spcorr_mtx — Methodcalc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, window)
calc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, vals::AbstractVector{<:Number}, window)Calculates pairwise spectral (pearson) correlations for a set of observations.
Args:
vecs, set of left singular vectors or principal components with observations on rows and components on columnsvals, vector of singular valueswindow, set of indices ofvecscolumns to compute correlations across
Returns:
- correlation matrix where each pixel is the correlation between a pair of observations
SPI.calc_spi_mtx — Methodcalc_spi_mtx(A::AbstractMatrix; [Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])
calc_spi_mtx(usv::SVD; [Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])
calc_spi_mtx(usv::SVD[, SPI.LSVs(); Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])
calc_spi_mtx(usv::SVD[, SPI.RSVs(); Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])computes the cumulative spectral residual distance for spectral phylogenetic inference
(∑_{p ∈ P} ||UₚΣₚ||₂)²
where $P$ are the spectral partitions found with getintervals.
Args:
- A,usv = AbstractMatrix or SVD factorization (AbstractMatrix is just passed to
svd()before calculation) - SPI.Left() computes SPI matrix for LSVs; SPI.Right() computes SPI matrix for RSVs
- alpha, q are passed to
getintervals()see its documentation
Returns:
- distance matrix
SPI.calc_spi_trace — Methodcalc_spi_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)
calc_spi_trace(vecs, vals, groups)calculates spectral residual within each partition of spectrum and each pair of taxa
returns matrix where rows are spectral partitions and columns are taxa:taxa pairs ordered as the upper triangle in rowwise order, or lower triangle in colwise order.
Args:
- method:
calc_spi_trace(vecs, vals, groups)- vecs: either usv.U or usv.V matrix
- vals: usv.S singular values vector
- groups: usually calculated with
getintervals(usv.S; alpha=alpha, q=q)
- method:
calc_spi_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)- usv: SVD object
- onrows: switch to calculate SPI on rows (U matrix) or columns (V matrix).
- groups: if nothing groups are calculated with
getintervals(usv.S; alpha=alpha, q=q), otherwise they assume a vector of index ranges[1:1, 2:3, ...]to groupusv.Swith. - alpha: passed to
getintervals - q: passed to
getintervals
SPI.calc_spi_tree — Methodcalc_spi_tree(A[, ids; labelinternalnodes=true])helper function that immediately returns the newick tree string inferred by SPI
SPI.empiricalMI — MethodempiricalMI(a::AbstractVector{<:Float}, b::AbstractVector{<:Float}[, nbins=50, normalize=false])computes empirical MI from identity of $H(a) + H(b) - H(a,b)$. where $H := -sum(p(x)*log(p(x))) + log(Δ)$ the $+ log(Δ)$ corresponds to the log binwidth and unbiases the entropy estimate from binwidth choice. estimates are roughly stable from $32$ ($32^2 ≈ 1000$ total bins) to size of sample. going from a small undersestimate to a small overestimate across that range. We recommend choosing the sqrt(mean(1000, samplesize)) for nbins argument, or taking a few estimates across that range and averaging.
Args:
- a, vecter of length N
- b, AbstractVector of length N
- nbins, number of bins per side, use 1000 < nbins^2 < length(a) for best results
- base, base unit of MI (defaults to nats with base=ℯ)
- normalize, bool, whether to normalize with mi / mean(ha, hb)
Returns:
- MI
SPI.explainedvariance — Methodexplainedvariance(s::AbstractVector{<:Number})SPI.getintervals — Methodgetintervals(S::AbstractVector{<:Number}; alpha=1.0, q=0.5)finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than 1.0 * Q2(differences)
i.e. finds breaks in the spectrum that explain smaller scales of variance
Args:
- S = singular values of a SVD factorization
- alpha = scalar multiple of
q - q = which quantile of log differences to use; by default Q2
Returns:
- AbstractVector{UnitRange} indices into S corresponding to the spectral partitions
SPI.getintervalsIQR — MethodgetintervalsIQR(S::AbstractVector{<:Number}; alpha=1.5, ql=.25, qh=.75)finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than ALPHA * Q3(differences)
i.e. finds breaks in the spectrum that explain smaller scales of variance
Args:
- S = singular values of a SVD factorization
- alpha = scalar multiple of
q - q = which quantile of log differences to use; by default Q3
Returns:
- AbstractVector{UnitRange} indices into S corresponding to the spectral partitions
SPI.ij2k — Methodij2k(i,j,n)with pair (i,j) give index k to the pairs produced by combinations(vec), where vec is length n
SPI.k2ij — Methodk2ij(k, n)which pair (i,j) produces the kth element of combinations(vec), where vec is length n
SPI.minspaceneeded — Methodminspaceneeded(n, p; bits=64) = Base.format_bytes(binomial(n,2) * p * bits)how much memory is needed to store spectral residual trace
SPI.nwstr — Methodnwstr(hc::Hclust[, tiplabels::AbstractVector[<:String]])convert Hclust to newick tree string Args:
- hc,
Hclustobject from Clustering package - tiplabels,
AbstractVector{<:String}names in same order as distance matrix
SPI.pairwise — Methodpairwise(func::Function, m::M) where M<:AbstractMatrixreturns upper offdiagonals of res[k] = func(i, j) where (k, (i,j)) are calculated from enumerate(((i, j) for j in axes(m, 2) for i in (j+1):lastindex(m, 2)))
SPI.projectinLSV — MethodprojectinLSV(data::AbstractArray{T}, usv::SVD{T}, [window])returns estimated left singular vectors (aka: LSV or Û) for new data based on already calculated SVD factorization
SPI.projectinRSV — MethodprojectinRSV(data::AbstractArray{T}, usv::SVD{T}, [window])returns estimated transposed right singular vectors (RSV or V̂ᵗ) for new data based on already calculated SVD factorization
SPI.projectout — Methodprojectout(usv::SVD, [window])recreates original matrix i.e. calculates $UΣV'$ or if window is included creates a spectrally filtered version of the original matrix off of the provided components in window.
SPI.readphylip — Methodreadphylip(fn::String)Read phylip alignment file, return dataframe of IDs and Sequences
SPI.scaledcumsum — Methodscaledcumsum(c; dims=1)cumsum divided by maximum cumulative value
SPI.spimtx_spaceneeded — Methodspimtx_spaceneeded(n, p; bits=64) = Base.format_bytes(binomial(n,2) * p * bits)how much memory is needed to store SPI distance matrix
SPI.spiresult — Methodspiresult(A::AbstractMatrix{<:Number})convenience function for optaining SVD, SPImtx, and SPItree
SPI.squareform — Functionsquareform(d::AbstractVector, fillvalue=zero(eltype(d)))
squareform(d::AbstractVector, fillvalue=zero(eltype(d)))If d is a vector, squareform checks if it of n choose 2 length for integer n, then fills the values of a symetric square matrix with the values of d.
If d is a matrix, squareform checks if it is square then fills the values of vector with the lower offdiagonal of matrix d in column order form.
fillvalue is the initial value of the produced vector or matrix. Only really apparant in a produced matrix where it will be the values on the diagonal.
SPI.vmeasure_homogeneity_completeness — Methodvmeasure_homogeneity_completeness(labels_true, labels_pred; β=1.)calculates and returns v-measure, homogeneity, completeness; similar to f-score, precision, and recall respectively
Args:
- β, weighting term for v-measure, if β is greater than 1 completeness
is weighted more strongly in the calculation, if β is less than 1, homogeneity is weighted more strongly
Citation:
- A. Rosenberg, J. Hirschberg, in Proceedings of the 2007 Joint Conference
on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (Association for Computational Linguistics, Prague, Czech Republic, 2007; https://aclanthology.org/D07-1043), pp. 410–420.