SPI

Documentation for SPI.

SPI.UPGMA_tree
SPI.adjustedrandindex
SPI.calc_spcorr_mtx
SPI.calc_spi_mtx
SPI.calc_spi_trace
SPI.calc_spi_tree
SPI.empiricalMI
SPI.explainedvariance
SPI.getintervals
SPI.getintervalsIQR
SPI.ij2k
SPI.k2ij
SPI.minspaceneeded
SPI.nwstr
SPI.pairwise
SPI.projectinLSV
SPI.projectinRSV
SPI.projectout
SPI.readphylip
SPI.scaledcumsum
SPI.spimtx_spaceneeded
SPI.spiresult
SPI.squareform
SPI.vmeasure_homogeneity_completeness

SPI.UPGMA_tree — Method

UPGMA_tree(Dij::AbstractMatrix{<:Number})

shorthand for Clustering.hclust(Dij, linkage=:average, branchorder=:optimal)

source

SPI.adjustedrandindex — Method

adjustedrandindex(a::AbstractVector{<:Number}, b::AbstractVector{<:Number}; nbins=50)

Args:

a, vector of numbers
b, vector of numbers
nbins, for continuous approximates discrete, for discrete choose nbins>maxnumberof_classes

source

SPI.calc_spcorr_mtx — Method

calc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, window)
calc_spcorr_mtx(vecs::AbstractMatrix{<:Number}, vals::AbstractVector{<:Number}, window)

Calculates pairwise spectral (pearson) correlations for a set of observations.

Args:

vecs, set of left singular vectors or principal components with observations on rows and components on columns
vals, vector of singular values
window, set of indices of vecs columns to compute correlations across

Returns:

correlation matrix where each pixel is the correlation between a pair of observations

source

SPI.calc_spi_mtx — Method

calc_spi_mtx(A::AbstractMatrix; [Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])
calc_spi_mtx(usv::SVD; [Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])
calc_spi_mtx(usv::SVD[, SPI.LSVs(); Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])
calc_spi_mtx(usv::SVD[, SPI.RSVs(); Nsmps=size(A,1), Nfeats=size(A,2), alpha=1.0, q=0.5])

computes the cumulative spectral residual distance for spectral phylogenetic inference

(∑_{p ∈ P} ||UₚΣₚ||₂)²

where $P$ are the spectral partitions found with getintervals.

Args:

A,usv = AbstractMatrix or SVD factorization (AbstractMatrix is just passed to svd() before calculation)
SPI.Left() computes SPI matrix for LSVs; SPI.Right() computes SPI matrix for RSVs
alpha, q are passed to getintervals() see its documentation

Returns:

distance matrix

source

SPI.calc_spi_trace — Method

calc_spi_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)
calc_spi_trace(vecs, vals, groups)

calculates spectral residual within each partition of spectrum and each pair of taxa

returns matrix where rows are spectral partitions and columns are taxa:taxa pairs ordered as the upper triangle in rowwise order, or lower triangle in colwise order.

Args:

method: calc_spi_trace(vecs, vals, groups)
- vecs: either usv.U or usv.V matrix
- vals: usv.S singular values vector
- groups: usually calculated with getintervals(usv.S; alpha=alpha, q=q)
method: calc_spi_trace(usv::SVD; onrows=true, groups=nothing, alpha=1.0, q=0.5)
- usv: SVD object
- onrows: switch to calculate SPI on rows (U matrix) or columns (V matrix).
- groups: if nothing groups are calculated with getintervals(usv.S; alpha=alpha, q=q), otherwise they assume a vector of index ranges [1:1, 2:3, ...] to group usv.S with.
- alpha: passed to getintervals
- q: passed to getintervals

source

SPI.calc_spi_tree — Method

calc_spi_tree(A[, ids; labelinternalnodes=true])

helper function that immediately returns the newick tree string inferred by SPI

source

SPI.empiricalMI — Method

empiricalMI(a::AbstractVector{<:Float}, b::AbstractVector{<:Float}[, nbins=50, normalize=false])

computes empirical MI from identity of $H(a) + H(b) - H(a,b)$. where $H := -sum(p(x)*log(p(x))) + log(Δ)$ the $+ log(Δ)$ corresponds to the log binwidth and unbiases the entropy estimate from binwidth choice. estimates are roughly stable from $32$ ($32^2 ≈ 1000$ total bins) to size of sample. going from a small undersestimate to a small overestimate across that range. We recommend choosing the sqrt(mean(1000, samplesize)) for nbins argument, or taking a few estimates across that range and averaging.

Args:

a, vecter of length N
b, AbstractVector of length N
nbins, number of bins per side, use 1000 < nbins^2 < length(a) for best results
base, base unit of MI (defaults to nats with base=ℯ)
normalize, bool, whether to normalize with mi / mean(ha, hb)

Returns:

source

SPI.explainedvariance — Method

explainedvariance(s::AbstractVector{<:Number})

source

SPI.getintervals — Method

getintervals(S::AbstractVector{<:Number}; alpha=1.0, q=0.5)

finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than 1.0 * Q2(differences)

i.e. finds breaks in the spectrum that explain smaller scales of variance

Args:

S = singular values of a SVD factorization
alpha = scalar multiple of q
q = which quantile of log differences to use; by default Q2

Returns:

AbstractVector{UnitRange} indices into S corresponding to the spectral partitions

source

SPI.getintervalsIQR — Method

getintervalsIQR(S::AbstractVector{<:Number}; alpha=1.5, ql=.25, qh=.75)

finds spectral partitions. Computes log difference between each subsequent singular value and by default selects the differences that are larger than ALPHA * Q3(differences)

i.e. finds breaks in the spectrum that explain smaller scales of variance

Args:

S = singular values of a SVD factorization
alpha = scalar multiple of q
q = which quantile of log differences to use; by default Q3

Returns:

AbstractVector{UnitRange} indices into S corresponding to the spectral partitions

source

SPI.ij2k — Method

ij2k(i,j,n)

with pair (i,j) give index k to the pairs produced by combinations(vec), where vec is length n

source

SPI.k2ij — Method

k2ij(k, n)

which pair (i,j) produces the kth element of combinations(vec), where vec is length n

source

SPI.minspaceneeded — Method

minspaceneeded(n, p; bits=64) = Base.format_bytes(binomial(n,2) * p * bits)

how much memory is needed to store spectral residual trace

source

SPI.nwstr — Method

nwstr(hc::Hclust[, tiplabels::AbstractVector[<:String]])

convert Hclust to newick tree string Args:

hc, Hclust object from Clustering package
tiplabels, AbstractVector{<:String} names in same order as distance matrix

source

SPI.pairwise — Method

pairwise(func::Function, m::M) where M<:AbstractMatrix

returns upper offdiagonals of res[k] = func(i, j) where (k, (i,j)) are calculated from enumerate(((i, j) for j in axes(m, 2) for i in (j+1):lastindex(m, 2)))

source

SPI.projectinLSV — Method

projectinLSV(data::AbstractArray{T}, usv::SVD{T}, [window])

returns estimated left singular vectors (aka: LSV or Û) for new data based on already calculated SVD factorization

source

SPI.projectinRSV — Method

projectinRSV(data::AbstractArray{T}, usv::SVD{T}, [window])

returns estimated transposed right singular vectors (RSV or V̂ᵗ) for new data based on already calculated SVD factorization

source

SPI.projectout — Method

projectout(usv::SVD, [window])

recreates original matrix i.e. calculates $UΣV'$ or if window is included creates a spectrally filtered version of the original matrix off of the provided components in window.

source

SPI.readphylip — Method

readphylip(fn::String)

Read phylip alignment file, return dataframe of IDs and Sequences

source

SPI.scaledcumsum — Method

scaledcumsum(c; dims=1)

cumsum divided by maximum cumulative value

source

SPI.spimtx_spaceneeded — Method

spimtx_spaceneeded(n, p; bits=64) = Base.format_bytes(binomial(n,2) * p * bits)

how much memory is needed to store SPI distance matrix

source

SPI.spiresult — Method

spiresult(A::AbstractMatrix{<:Number})

convenience function for optaining SVD, SPImtx, and SPItree

source

SPI.squareform — Function

squareform(d::AbstractVector, fillvalue=zero(eltype(d)))
squareform(d::AbstractVector, fillvalue=zero(eltype(d)))

If d is a vector, squareform checks if it of n choose 2 length for integer n, then fills the values of a symetric square matrix with the values of d.

If d is a matrix, squareform checks if it is square then fills the values of vector with the lower offdiagonal of matrix d in column order form.

fillvalue is the initial value of the produced vector or matrix. Only really apparant in a produced matrix where it will be the values on the diagonal.

source

SPI.vmeasure_homogeneity_completeness — Method

vmeasure_homogeneity_completeness(labels_true, labels_pred; β=1.)

calculates and returns v-measure, homogeneity, completeness; similar to f-score, precision, and recall respectively

Args:

β, weighting term for v-measure, if β is greater than 1 completeness

is weighted more strongly in the calculation, if β is less than 1, homogeneity is weighted more strongly

Citation:

A. Rosenberg, J. Hirschberg, in Proceedings of the 2007 Joint Conference

on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (Association for Computational Linguistics, Prague, Czech Republic, 2007; https://aclanthology.org/D07-1043), pp. 410–420.

source