Title: | Binarization of One-Dimensional Data |
---|---|
Description: | Provides methods for the binarization of one-dimensional data and some visualization functions. |
Authors: | Stefan Mundus, Christoph Müssel, Florian Schmid, Ludwig Lausser, Tamara J. Blätte, Martin Hopfensitz, Hans A. Kestler |
Maintainer: | Hans Kestler <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.3.1 |
Built: | 2024-11-01 11:16:17 UTC |
Source: | https://github.com/cran/Binarize |
A specialized class storing the results of a call to binarize.BASC
.
Objects of this class shouldn't be created directly. They are created implicitly by a call to binarize.BASC
.
p.value
:The p-value of the statistical test for reliability of the binarization.
intermediateSteps
:A matrix specifying the optimal step functions from which the binarization was calculated. The number of rows corresponds to the number of step functions, and the number of columns is determined by the length of the input vector minus 2 (that is, the length of the step function corresponding to the input vector). From the first to the last row, the number of steps increases. The non-zero entries of the matrix represent the locations of the steps. Step functions with fewer steps than the input step function have entries set to zero.
intermediateHeights
:A matrix giving the jump heights of the steps supplied in intermediateSteps
.
intermediateStrongestSteps
:A vector with one entry for each step function (row) in intermediateSteps
. The entries specify the location of the strongest step for each of the functions.
originalMeasurements
:A numeric vector storing the input measurements.
binarizedMeasurements
:An integer vector of binarized values (0 or 1) corresponding to the original measurements.
threshold
:The threshold that separates 0 and 1.
method
:A string describing the binarization method that yielded the result.
Class "BinarizationResult"
, directly.
signature(x = "BASCResult")
: Plot the intermediate optimal step functions used to determine the threshold.
signature(x = "BASCResult")
: Print a summary of the binarization.
signature(object = "BASCResult")
: ...
binarize.BASC
,
BinarizationResult
An artificial data set consisting of ten artificial feature vectors that are used to illustrate the binarization methods in the package vignette. Each row of the matrix binarizationExample
corresponds to one feature vector, of which 10 measurements are drawn from a normal distribution N(0,1). The remaining 10 measurements are drawn from a normal distribution N(m,1), with m=10:1
decreasing from the first to the last row.
data(binarizationExample)
data(binarizationExample)
The data is a matrix with 20 columns and 10 rows.
This is the base class for objects that store the results of a binarization algorithm. It defines the slots and methods that the results of all algorithms share.
Objects of this class shouldn't be created directly. They are created implicitly by a call to one of the binarizeation algorithms.
originalMeasurements
:A numeric vector storing the input measurements.
binarizedMeasurements
:An integer vector of binarized values (0 or 1) corresponding to the original measurements.
threshold
:The threshold that separates 0 and 1.
method
:A string describing the binarization method that yielded the result.
p.value
:The p-value obtained by a test for validity of the binarization (e.g. BASC bootstrap test, Hartigan's dip test for k-means binarization, scan statistic p-value for best window. If no test was performed, this is NA
.
signature(x = "BinarizationResult")
: Plot the binarization and the threshold.
signature(x = "BinarizationResult")
: Print a summary of the binarization.
signature(object = "BinarizationResult")
: ...
binarize.BASC
,
binarize.kMeans
,
BASCResult
,
Binarizes real-valued data using the multiscale BASC methods.
binarize.BASC(vect, method = c("A","B"), tau = 0.01, numberOfSamples = 999, sigma = seq(0.1, 20, by=.1), na.rm=FALSE)
binarize.BASC(vect, method = c("A","B"), tau = 0.01, numberOfSamples = 999, sigma = seq(0.1, 20, by=.1), na.rm=FALSE)
method |
Chooses the BASC method to use (see details), i.e. either "A" or "B". |
vect |
A real-valued vector of data to binarize. |
tau |
This parameter adjusts the sensitivity and the specificity of the statistical testing procedure that rates the quality of the binarization. Defaults to 0.01. |
numberOfSamples |
The number of samples for the bootstrap test. Defaults to 999. |
sigma |
If |
na.rm |
If set to |
The two BASC methods can be subdivided into three steps:
An initial step function is obtained by rearranging the original time series measurements in increasing order. Then, step functions with fewer discontinuities are calculated. BASC A calculates these step functions in such a way that each minimizes the Euclidean distance to the initial step function. BASC B obtains step functions from smoothened versions of the input function in a scale-space manner.
A strong discontinuity is a high jump size (derivative) in combination with a low approximation error.
Based on these estimates, data values can be excluded from further analyses.
Returns an object of class BASCResult
.
M. Hopfensitz, C. Müssel, C. Wawra, M. Maucher, M. Kuehl, H. Neumann, and H. A. Kestler. Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(2):487-498, 2012.).
BinarizationResult
,
BASCResult
par(mfrow=c(2,1)) result <- binarize.BASC(iris[,"Petal.Length"], method="A", tau=0.15) print(result) plot(result) result <- binarize.BASC(iris[,"Petal.Length"], method="B", tau=0.15) print(result) plot(result)
par(mfrow=c(2,1)) result <- binarize.BASC(iris[,"Petal.Length"], method="A", tau=0.15) print(result) plot(result) result <- binarize.BASC(iris[,"Petal.Length"], method="B", tau=0.15) print(result) plot(result)
Binarizes a vector of real-valued data using the k-means clustering algorithm. The data is first split into 2 clusters.The values belonging to the cluster with the smaller centroid are set to 0, and the values belonging to the greater centroid are set to 1.
binarize.kMeans(vect, nstart=1, iter.max=10, dip.test=TRUE, na.rm=FALSE)
binarize.kMeans(vect, nstart=1, iter.max=10, dip.test=TRUE, na.rm=FALSE)
vect |
A real-valued vector to be binarized (at least 3 measurements). |
nstart |
The number of restarts for k-means. See |
iter.max |
The maximum number of iterations for k-means. See |
dip.test |
If set to |
na.rm |
If set to |
Returns an object of class BinarizationResult
.
result <- binarize.kMeans(iris[,"Petal.Length"]) print(result) plot(result, twoDimensional=TRUE)
result <- binarize.kMeans(iris[,"Petal.Length"]) print(result) plot(result, twoDimensional=TRUE)
Binarizes a matrix of measurements all at once, and returns the binarized vectors as well as the binarization thresholds and the p-values.
binarizeMatrix(mat, method = c("BASCA", "BASCB", "kMeans"), adjustment = "none", ...)
binarizeMatrix(mat, method = c("BASCA", "BASCB", "kMeans"), adjustment = "none", ...)
mat |
A n x m matrix comprising m raw measurements of n features. |
method |
The binarization algorithm to be used. |
adjustment |
Specifies an optional adjustment for multiple testing that is applied to the p-values (see |
... |
Further parameters that are passed to the respective binarization methods ( |
A n x (m+2) matrix of binarized measurements. Here, the first m columns correspond to the binarized measurements. The m+1-st column comprises the binarization thresholds for the features, and the m+2-nd column contains the p-values.
binarize.BASC
, binarize.kMeans
, p.adjust
bin <- binarizeMatrix(t(iris[,1:4])) print(bin)
bin <- binarizeMatrix(t(iris[,1:4])) print(bin)
Visualizes a binarization as a ray or a two-dimensional plot.
## S4 method for signature 'BinarizationResult,ANY' plot(x, twoDimensional=FALSE, showLegend=TRUE, showThreshold=TRUE, ...) ## S4 method for signature 'numeric,BinarizationResult' plot(x, y, showLegend=TRUE, showThreshold=TRUE, ...)
## S4 method for signature 'BinarizationResult,ANY' plot(x, twoDimensional=FALSE, showLegend=TRUE, showThreshold=TRUE, ...) ## S4 method for signature 'numeric,BinarizationResult' plot(x, y, showLegend=TRUE, showThreshold=TRUE, ...)
x |
If |
y |
If |
twoDimensional |
Specifies whether the binarization is depicted as a ray or as a two-dimensional curve (see details). |
showLegend |
If set to |
showThreshold |
If set to |
... |
Further graphical parameters to be passed to |
The function comprises two different plots: If twoDimensional = TRUE
, the positions in the input vector are aligned with the x axis, and the y axis corresponds to the values. The binarization threshold is shown as a horizontal line, and the binarization is indicated by two different symbols.
If twoDimensional = FALSE
, the binarized values are aligned with a one-dimensional ray, and the separating threshold is depicted as a vertical line.
# plot a binarization in one and two dimensions res <- binarize.BASC(iris[,"Petal.Length"], method="A") plot(res) plot(res, twoDimensional = TRUE) plot(res, twoDimensional = TRUE, pch = c("x", "+"), col = c("red", "black", "royalblue"), lty = 4, lwd = 2)
# plot a binarization in one and two dimensions res <- binarize.BASC(iris[,"Petal.Length"], method="A") plot(res) plot(res, twoDimensional = TRUE) plot(res, twoDimensional = TRUE, pch = c("x", "+"), col = c("red", "black", "royalblue"), lty = 4, lwd = 2)
Visualizes a trinarization as a ray or a two-dimensional plot.
## S4 method for signature 'TrinarizationResult,ANY' plot(x, twoDimensional=FALSE, showLegend=TRUE, showThreshold=TRUE, ...) ## S4 method for signature 'numeric,TrinarizationResult' plot(x, y, showLegend=TRUE, showThreshold=TRUE, ...)
## S4 method for signature 'TrinarizationResult,ANY' plot(x, twoDimensional=FALSE, showLegend=TRUE, showThreshold=TRUE, ...) ## S4 method for signature 'numeric,TrinarizationResult' plot(x, y, showLegend=TRUE, showThreshold=TRUE, ...)
x |
If |
y |
If |
twoDimensional |
Specifies whether the trinarization is depicted as a ray or as a two-dimensional curve (see details). |
showLegend |
If set to |
showThreshold |
If set to |
... |
Further graphical parameters to be passed to |
The function comprises two different plots: If twoDimensional = TRUE
, the positions in the input vector are aligned with the x axis, and the y axis corresponds to the values. The trinarization thresholds are shown as a horizontal lines, and the trinarization is indicated by three different symbols.
If twoDimensional = FALSE
, the trinarized values are aligned with a one-dimensional ray, and the separating thresholds are depicted as a vertical lines.
# plot a binarization in one and two dimensions res <- TASC(iris[,"Petal.Length"]) plot(res) plot(res, twoDimensional = TRUE) plot(res, twoDimensional = TRUE, pch = c("x", "+"), col = c("red", "black", "royalblue", "green"), lty = 4, lwd = 2)
# plot a binarization in one and two dimensions res <- TASC(iris[,"Petal.Length"]) plot(res) plot(res, twoDimensional = TRUE) plot(res, twoDimensional = TRUE, pch = c("x", "+"), col = c("red", "black", "royalblue", "green"), lty = 4, lwd = 2)
A specialized visualization that plots all the optimal step functions computed by the BASC algorithms or TASC.
plotStepFunctions(x, showLegend=TRUE, connected=FALSE, withOriginal=TRUE, ...)
plotStepFunctions(x, showLegend=TRUE, connected=FALSE, withOriginal=TRUE, ...)
x |
A binarization (or trinarisation) result object of class |
showLegend |
If |
connected |
If |
withOriginal |
If set to |
... |
Additional graphical parameters to be passed to |
BASCResult
,
binarize.BASC
,
TASCResult
,
TASC
result <- binarize.BASC(iris[,"Petal.Width"], method="B") plotStepFunctions(result) result <- TASC(iris[,"Petal.Width"]) plotStepFunctions(result)
result <- binarize.BASC(iris[,"Petal.Width"], method="B") plotStepFunctions(result) result <- TASC(iris[,"Petal.Width"]) plotStepFunctions(result)
Trinarizes real-valued data using the multiscale TASC method.
TASC(vect, method = c("A","B"), tau = 0.01, numberOfSamples = 999, sigma = seq(0.1, 20, by=.1), na.rm=FALSE, error = c("mean", "min"))
TASC(vect, method = c("A","B"), tau = 0.01, numberOfSamples = 999, sigma = seq(0.1, 20, by=.1), na.rm=FALSE, error = c("mean", "min"))
method |
Chooses the TASC method to use (see details), i.e. either "A" or "B". |
vect |
A real-valued vector of data to trinarize. |
tau |
This parameter adjusts the sensitivity and the specificity of the statistical testing procedure that rates the quality of the trinarization. Defaults to 0.01. |
numberOfSamples |
The number of samples for the bootstrap test. Defaults to 999. |
sigma |
If |
na.rm |
If set to |
error |
Determines which error should be used for the data points between two thresholds, the "mean" error (default) to the thresholds or the "min" error. |
The two TASC methods can be subdivided into three steps:
An initial step function is obtained by rearranging the original time series measurements in increasing order. Then, step functions with fewer discontinuities are calculated. TASC A calculates these step functions in such a way that each minimizes the Euclidean distance to the initial step function. TASC B obtains step functions from smoothened versions of the input function in a scale-space manner.
A strong discontinuity is a high jump size (derivative) in combination with a low approximation error. For TASC a pair of strongest discontinuities is determined.
Based on these estimates, data values can be excluded from further analyses.
Returns an object of class TASCResult
.
TrinarizationResult
,
TASCResult
par(mfrow=c(2,1)) result <- TASC(iris[,"Petal.Width"], method="A", tau=0.15) print(result) plot(result) result <- TASC(iris[,"Petal.Width"], method="B", tau=0.15) print(result) plot(result)
par(mfrow=c(2,1)) result <- TASC(iris[,"Petal.Width"], method="A", tau=0.15) print(result) plot(result) result <- TASC(iris[,"Petal.Width"], method="B", tau=0.15) print(result) plot(result)
A specialized class storing the results of a call to TASC
.
Objects of this class shouldn't be created directly. They are created implicitly by a call to TASC
.
p.value
:The p-value of the statistical test for reliability of the trinarization.
intermediateSteps
:A matrix specifying the optimal step functions from which the trinarization was calculated. The number of rows corresponds to the number of step functions, and the number of columns is determined by the length of the input vector minus 2 (that is, the length of the step function corresponding to the input vector). From the first to the last row, the number of steps increases. The non-zero entries of the matrix represent the locations of the steps. Step functions with fewer steps than the input step function have entries set to zero.
intermediateHeights1
:A matrix giving the jump heights of the steps supplied in intermediateSteps
for the first threshold.
intermediateHeights2
:A matrix giving the jump heights of the steps supplied in intermediateSteps
for the second threshold.
intermediateStrongestSteps
:A matrix with one row for each step function (row) in intermediateSteps
. The entries specify the location of the two strongest steps for each of the functions.
originalMeasurements
:A numeric vector storing the input measurements.
trinarizedMeasurements
:An integer vector of trinarized values (0, 1 or 2) corresponding to the original measurements.
threshold1
:The threshold that separates 0 from 1.
threshold2
:The threshold that separates 1 from 2.
method
:A string describing the trinarization method that yielded the result.
Class "TrinarizationResult"
, directly.
signature(x = "TASCResult")
: Plot the intermediate optimal step functions used to determine the thresholds.
signature(x = "TASCResult")
: Print a summary of the trinarization.
signature(object = "TASCResult")
: ...
An artificial data set consisting of 100 artificial feature vectors that are used to illustrate the trinarization methods in the package vignette. Each row of the matrix trinarizationExample
corresponds to one feature vector, of which 5 measurements are drawn from a normal distribution N(0,1). The remaining 10 measurements are drawn from two normal distributions N(m,1), with m=10:1
and m=seq(20,2,by=-2)
(5 meansurements per distribution).
data(trinarizationExample)
data(trinarizationExample)
The data is a matrix with 15 columns and 100 rows.
This is the base class for objects that store the results of a trinarization algorithm. It defines the slots and methods that the results of all algorithms share.
Objects of this class shouldn't be created directly. They are created implicitly by a call to one of the trinarizeation algorithms.
originalMeasurements
:A numeric vector storing the input measurements.
trinarizedMeasurements
:An integer vector of trinarized values (0 or 1 or 2) corresponding to the original measurements.
threshold1
:The threshold that separates 0 and 1.
threshold2
:The threshold that separates 1 and 2.
method
:A string describing the trinarization method that yielded the result.
p.value
:The p-value obtained by a test for validity of the trinarization (e.g. TASC bootstrap test). If no test was performed, this is NA
.
signature(x = "TrinarizationResult")
: Plot the trinarization and the thresholds.
signature(x = "TrinarizationResult")
: Print a summary of the trinarization.
signature(object = "TrinarizationResult")
: ...
Trinarizes a matrix of measurements all at once, and returns the trinarized vectors as well as the trinarization thresholds and the p-values.
trinarizeMatrix(mat, method = c("TASCA", "TASCB"), adjustment = "none", ...)
trinarizeMatrix(mat, method = c("TASCA", "TASCB"), adjustment = "none", ...)
mat |
A n x m matrix comprising m raw measurements of n features. |
method |
The trinarization algorithm to be used. |
adjustment |
Specifies an optional adjustment for multiple testing that is applied to the p-values (see |
... |
Further parameters that are passed to the respective trinarization methods ( |
A n x (m+3) matrix of trinarized measurements. Here, the first m columns correspond to the trinarized measurements. The m+1-st and the m+2-st column comprises the trinarization thresholds for the features, and the m+3-nd column contains the p-values.
tri <- trinarizeMatrix(t(iris[,1:4])) print(tri)
tri <- trinarizeMatrix(t(iris[,1:4])) print(tri)