Package 'Binarize' reference manual

Title:	Binarization of One-Dimensional Data
Description:	Provides methods for the binarization of one-dimensional data and some visualization functions.
Authors:	Stefan Mundus, Christoph Müssel, Florian Schmid, Ludwig Lausser, Tamara J. Blätte, Martin Hopfensitz, Hans A. Kestler
Maintainer:	Hans Kestler <[email protected]>
License:	Artistic-2.0
Version:	1.3.1
Built:	2025-03-31 05:10:30 UTC
Source:	https://github.com/cran/Binarize

Class "BASCResult"

Description

A specialized class storing the results of a call to binarize.BASC.

Objects of this class

Objects of this class shouldn't be created directly. They are created implicitly by a call to binarize.BASC.

Slots

p.value:: The p-value of the statistical test for reliability of the binarization.
intermediateSteps:: A matrix specifying the optimal step functions from which the binarization was calculated. The number of rows corresponds to the number of step functions, and the number of columns is determined by the length of the input vector minus 2 (that is, the length of the step function corresponding to the input vector). From the first to the last row, the number of steps increases. The non-zero entries of the matrix represent the locations of the steps. Step functions with fewer steps than the input step function have entries set to zero.
intermediateHeights:: A matrix giving the jump heights of the steps supplied in intermediateSteps.
intermediateStrongestSteps:: A vector with one entry for each step function (row) in intermediateSteps. The entries specify the location of the strongest step for each of the functions.
originalMeasurements:: A numeric vector storing the input measurements.
binarizedMeasurements:: An integer vector of binarized values (0 or 1) corresponding to the original measurements.
threshold:: The threshold that separates 0 and 1.
method:: A string describing the binarization method that yielded the result.

Extends

Class "BinarizationResult", directly.

Methods

plotStepFunctions: signature(x = "BASCResult"): Plot the intermediate optimal step functions used to determine the threshold.
print: signature(x = "BASCResult"): Print a summary of the binarization.
show: signature(object = "BASCResult"): ...

An artificial data set consisting of ten artificial feature vectors.

Description

An artificial data set consisting of ten artificial feature vectors that are used to illustrate the binarization methods in the package vignette. Each row of the matrix binarizationExample corresponds to one feature vector, of which 10 measurements are drawn from a normal distribution N(0,1). The remaining 10 measurements are drawn from a normal distribution N(m,1), with m=10:1 decreasing from the first to the last row.

Usage

data(binarizationExample)data(binarizationExample)

Format

The data is a matrix with 20 columns and 10 rows.

Class "BinarizationResult"

Description

This is the base class for objects that store the results of a binarization algorithm. It defines the slots and methods that the results of all algorithms share.

Objects of this class

Objects of this class shouldn't be created directly. They are created implicitly by a call to one of the binarizeation algorithms.

Slots

originalMeasurements:: A numeric vector storing the input measurements.
binarizedMeasurements:: An integer vector of binarized values (0 or 1) corresponding to the original measurements.
threshold:: The threshold that separates 0 and 1.
method:: A string describing the binarization method that yielded the result.
p.value:: The p-value obtained by a test for validity of the binarization (e.g. BASC bootstrap test, Hartigan's dip test for k-means binarization, scan statistic p-value for best window. If no test was performed, this is NA.

Methods

plot: signature(x = "BinarizationResult"): Plot the binarization and the threshold.
print: signature(x = "BinarizationResult"): Print a summary of the binarization.
show: signature(object = "BinarizationResult"): ...

Binarization Across Multiple Scales

Description

Binarizes real-valued data using the multiscale BASC methods.

Usage

binarize.BASC(vect, 
              method = c("A","B"), 
              tau = 0.01, 
              numberOfSamples = 999, 
              sigma = seq(0.1, 20, by=.1),
              na.rm=FALSE)
binarize.BASC(vect, 
              method = c("A","B"), 
              tau = 0.01, 
              numberOfSamples = 999, 
              sigma = seq(0.1, 20, by=.1),
              na.rm=FALSE)

Arguments

`method`	Chooses the BASC method to use (see details), i.e. either "A" or "B".
`vect`	A real-valued vector of data to binarize.
`tau`	This parameter adjusts the sensitivity and the specificity of the statistical testing procedure that rates the quality of the binarization. Defaults to 0.01.
`numberOfSamples`	The number of samples for the bootstrap test. Defaults to 999.
`sigma`	If `method="B"`, this specifies a vector of different sigma values for the convolutions with the Bessel function. Ignored for `method="A"`.
`na.rm`	If set to `TRUE`, `NA` values are removed from the input. Otherwise, binarization will fail in the presence of `NA` values.

Details

The two BASC methods can be subdivided into three steps:

Compute a series of step functions:: An initial step function is obtained by rearranging the original time series measurements in increasing order. Then, step functions with fewer discontinuities are calculated. BASC A calculates these step functions in such a way that each minimizes the Euclidean distance to the initial step function. BASC B obtains step functions from smoothened versions of the input function in a scale-space manner.
Find strongest discontinuity in each step function:: A strong discontinuity is a high jump size (derivative) in combination with a low approximation error.
Estimate location and variation of the strongest discontinuities:: Based on these estimates, data values can be excluded from further analyses.

Value

Returns an object of class BASCResult.

References

M. Hopfensitz, C. Müssel, C. Wawra, M. Maucher, M. Kuehl, H. Neumann, and H. A. Kestler. Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(2):487-498, 2012.).

Examples

par(mfrow=c(2,1))
result <- binarize.BASC(iris[,"Petal.Length"], method="A", tau=0.15)
print(result)
plot(result)

result <- binarize.BASC(iris[,"Petal.Length"], method="B", tau=0.15)
print(result)
plot(result)
par(mfrow=c(2,1))
result <- binarize.BASC(iris[,"Petal.Length"], method="A", tau=0.15)
print(result)
plot(result)

result <- binarize.BASC(iris[,"Petal.Length"], method="B", tau=0.15)
print(result)
plot(result)

k-means Binarization

Description

Binarizes a vector of real-valued data using the k-means clustering algorithm. The data is first split into 2 clusters.The values belonging to the cluster with the smaller centroid are set to 0, and the values belonging to the greater centroid are set to 1.

Usage

binarize.kMeans(vect, 
                nstart=1, 
                iter.max=10,
                dip.test=TRUE,
                na.rm=FALSE)
binarize.kMeans(vect, 
                nstart=1, 
                iter.max=10,
                dip.test=TRUE,
                na.rm=FALSE)

Arguments

`vect`	A real-valued vector to be binarized (at least 3 measurements).
`nstart`	The number of restarts for k-means. See `kmeans` for details.
`iter.max`	The maximum number of iterations for k-means. See `kmeans` for details.
`dip.test`	If set to `TRUE`, Hartigan's dip test for unimodality is performed on `vect`, and its p-value is returned in the `pvalue` slot of the result. An insignificant test indicates that the data may not be binarizeable.
`na.rm`	If set to `TRUE`, `NA` values are removed from the input. Otherwise, binarization will fail in the presence of `NA` values.

Value

Returns an object of class BinarizationResult.

Examples

result <- binarize.kMeans(iris[,"Petal.Length"])

print(result)
plot(result, twoDimensional=TRUE)
result <- binarize.kMeans(iris[,"Petal.Length"])

print(result)
plot(result, twoDimensional=TRUE)

Utility function to binarize a matrix of measurements

Description

Binarizes a matrix of measurements all at once, and returns the binarized vectors as well as the binarization thresholds and the p-values.

Usage

binarizeMatrix(mat, 
               method = c("BASCA", "BASCB", "kMeans"), 
               adjustment = "none", 
               ...)
binarizeMatrix(mat, 
               method = c("BASCA", "BASCB", "kMeans"), 
               adjustment = "none", 
               ...)

Arguments

`mat`	A n x m matrix comprising m raw measurements of n features.
`method`	The binarization algorithm to be used. `method="BASCA"` calls `binarize.BASC` with `method="A"`. `method="BASCB"` calls `binarize.BASC` with `method="B"`. `method="kMeans"` calls `binarize.kMeans`.
`adjustment`	Specifies an optional adjustment for multiple testing that is applied to the p-values (see `p.adjust` for possible values). By default, no adjustment is applied.
`...`	Further parameters that are passed to the respective binarization methods (`binarize.BASC` or `method="kMeans"`).

Value

A n x (m+2) matrix of binarized measurements. Here, the first m columns correspond to the binarized measurements. The m+1-st column comprises the binarization thresholds for the features, and the m+2-nd column contains the p-values.

Examples

bin <- binarizeMatrix(t(iris[,1:4]))
print(bin)
bin <- binarizeMatrix(t(iris[,1:4]))
print(bin)

Visualization of binarization results.

Description

Visualizes a binarization as a ray or a two-dimensional plot.

Usage

## S4 method for signature 'BinarizationResult,ANY'
plot(x,
     twoDimensional=FALSE, 
     showLegend=TRUE, 
     showThreshold=TRUE, 
     ...)
## S4 method for signature 'numeric,BinarizationResult'
plot(x, 
                                            y, 
                                            showLegend=TRUE, 
                                            showThreshold=TRUE, 
                                            ...)     
## S4 method for signature 'BinarizationResult,ANY'
plot(x,
     twoDimensional=FALSE, 
     showLegend=TRUE, 
     showThreshold=TRUE, 
     ...)
## S4 method for signature 'numeric,BinarizationResult'
plot(x, 
                                            y, 
                                            showLegend=TRUE, 
                                            showThreshold=TRUE, 
                                            ...)

Arguments

`x`	If `y` is supplied, this is a vector of x coordinates for the binarization values in`y`, which are plotted on the y axis. If `y` is not supplied, this is object of class `BinarizationResult` containing the binarized values to visualize.
`y`	If `x` is a vector of x coordinates, this is object of class `BinarizationResult` containing the binarized values to visualize.
`twoDimensional`	Specifies whether the binarization is depicted as a ray or as a two-dimensional curve (see details).
`showLegend`	If set to `true`, a legend is included in the plot.
`showThreshold`	If set to `true`, the binarization threshold is depicted as a horizontal or vertical line (depending on `twoDimensional`).
`...`	Further graphical parameters to be passed to `plot`. The parameters `col` and `pch` can be supplied in different ways: If supplied as vectors of size 2, the first value corresponds to a 0 in the binarization, and the second value corresponds to a 1 in the binarization. `col` can also have length 3, in which case the third entry is the color of the threshold line. If `col` or `pch` have the size of the input vector, the corresponding colors and symbols are assigned to the data points.

Details

The function comprises two different plots: If twoDimensional = TRUE, the positions in the input vector are aligned with the x axis, and the y axis corresponds to the values. The binarization threshold is shown as a horizontal line, and the binarization is indicated by two different symbols.

If twoDimensional = FALSE, the binarized values are aligned with a one-dimensional ray, and the separating threshold is depicted as a vertical line.

Examples

# plot a binarization in one and two dimensions
res <- binarize.BASC(iris[,"Petal.Length"], method="A")
plot(res)
plot(res, twoDimensional = TRUE)
plot(res, twoDimensional = TRUE, 
     pch = c("x", "+"), 
     col = c("red", "black", "royalblue"), 
     lty = 4, lwd = 2)
# plot a binarization in one and two dimensions
res <- binarize.BASC(iris[,"Petal.Length"], method="A")
plot(res)
plot(res, twoDimensional = TRUE)
plot(res, twoDimensional = TRUE, 
     pch = c("x", "+"), 
     col = c("red", "black", "royalblue"), 
     lty = 4, lwd = 2)

Visualization of trinarization results.

Description

Visualizes a trinarization as a ray or a two-dimensional plot.

Usage

## S4 method for signature 'TrinarizationResult,ANY'
plot(x,
	twoDimensional=FALSE,
	showLegend=TRUE,
	showThreshold=TRUE,
	...)
## S4 method for signature 'numeric,TrinarizationResult'
plot(x,
	y,
	showLegend=TRUE,
	showThreshold=TRUE,
	...)
## S4 method for signature 'TrinarizationResult,ANY'
plot(x,
	twoDimensional=FALSE,
	showLegend=TRUE,
	showThreshold=TRUE,
	...)
## S4 method for signature 'numeric,TrinarizationResult'
plot(x,
	y,
	showLegend=TRUE,
	showThreshold=TRUE,
	...)

Arguments

`x`	If `y` is supplied, this is a vector of x coordinates for the trinarization values in`y`, which are plotted on the y axis. If `y` is not supplied, this is object of class `TrinarizationResult` containing the trinarized values to visualize.
`y`	If `x` is a vector of x coordinates, this is object of class `TrinarizationResult` containing the trinarized values to visualize.
`twoDimensional`	Specifies whether the trinarization is depicted as a ray or as a two-dimensional curve (see details).
`showLegend`	If set to `true`, a legend is included in the plot.
`showThreshold`	If set to `true`, the trinarization thresholds are depicted as a horizontal or vertical lines (depending on `twoDimensional`).
`...`	Further graphical parameters to be passed to `plot`. The parameters `col` and `pch` can be supplied in different ways: If supplied as vectors of size 3, the first value corresponds to a 0 in the trinarization, the second value corresponds to a 1, and the third value corresponds to a 2. `col` can also have length 4, in which case the fourth entry is the color of the threshold line. If `col` or `pch` have the size of the input vector, the corresponding colors and symbols are assigned to the data points.

Details

The function comprises two different plots: If twoDimensional = TRUE, the positions in the input vector are aligned with the x axis, and the y axis corresponds to the values. The trinarization thresholds are shown as a horizontal lines, and the trinarization is indicated by three different symbols.

If twoDimensional = FALSE, the trinarized values are aligned with a one-dimensional ray, and the separating thresholds are depicted as a vertical lines.

Examples

# plot a binarization in one and two dimensions
res <- TASC(iris[,"Petal.Length"])
plot(res)
plot(res, twoDimensional = TRUE)
plot(res, twoDimensional = TRUE, 
	 pch = c("x", "+"), 
	 col = c("red", "black", "royalblue", "green"), 
	 lty = 4, lwd = 2)
# plot a binarization in one and two dimensions
res <- TASC(iris[,"Petal.Length"])
plot(res)
plot(res, twoDimensional = TRUE)
plot(res, twoDimensional = TRUE, 
	 pch = c("x", "+"), 
	 col = c("red", "black", "royalblue", "green"), 
	 lty = 4, lwd = 2)

Plot all step functions for BASC or TASC

Description

A specialized visualization that plots all the optimal step functions computed by the BASC algorithms or TASC.

Usage

plotStepFunctions(x, 
                  showLegend=TRUE, 
                  connected=FALSE, 
                  withOriginal=TRUE, 
                  ...)
plotStepFunctions(x, 
                  showLegend=TRUE, 
                  connected=FALSE, 
                  withOriginal=TRUE, 
                  ...)

Arguments

`x`	A binarization (or trinarisation) result object of class `BASCResult` (`TASCResult`).
`showLegend`	If `TRUE`, a legend is included in the plot.
`connected`	If `TRUE`, the single steps of the step functions are connected by lines.
`withOriginal`	If set to `TRUE`, the original step function (i.e. the sorted input vector) is included in the plot.
`...`	Additional graphical parameters to be passed to `plot`.

Examples

result <- binarize.BASC(iris[,"Petal.Width"], 
                        method="B")
plotStepFunctions(result)

result <- TASC(iris[,"Petal.Width"])
plotStepFunctions(result)
result <- binarize.BASC(iris[,"Petal.Width"], 
                        method="B")
plotStepFunctions(result)

result <- TASC(iris[,"Petal.Width"])
plotStepFunctions(result)

Trinarization Across Multiple Scales

Description

Trinarizes real-valued data using the multiscale TASC method.

Usage

TASC(vect, 
	method = c("A","B"), 
	tau = 0.01, 
	numberOfSamples = 999, 
	sigma = seq(0.1, 20, by=.1),
	na.rm=FALSE,
	error = c("mean", "min"))
TASC(vect, 
	method = c("A","B"), 
	tau = 0.01, 
	numberOfSamples = 999, 
	sigma = seq(0.1, 20, by=.1),
	na.rm=FALSE,
	error = c("mean", "min"))

Arguments

`method`	Chooses the TASC method to use (see details), i.e. either "A" or "B".
`vect`	A real-valued vector of data to trinarize.
`tau`	This parameter adjusts the sensitivity and the specificity of the statistical testing procedure that rates the quality of the trinarization. Defaults to 0.01.
`numberOfSamples`	The number of samples for the bootstrap test. Defaults to 999.
`sigma`	If `method="B"`, this specifies a vector of different sigma values for the convolutions with the Bessel function. Ignored for `method="A"`.
`na.rm`	If set to `TRUE`, `NA` values are removed from the input. Otherwise, trinarization will fail in the presence of `NA` values.
`error`	Determines which error should be used for the data points between two thresholds, the "mean" error (default) to the thresholds or the "min" error.

Details

The two TASC methods can be subdivided into three steps:

Compute a series of step functions:: An initial step function is obtained by rearranging the original time series measurements in increasing order. Then, step functions with fewer discontinuities are calculated. TASC A calculates these step functions in such a way that each minimizes the Euclidean distance to the initial step function. TASC B obtains step functions from smoothened versions of the input function in a scale-space manner.
Find strongest discontinuities in each step function:: A strong discontinuity is a high jump size (derivative) in combination with a low approximation error. For TASC a pair of strongest discontinuities is determined.
Estimate location and variation of the strongest discontinuities:: Based on these estimates, data values can be excluded from further analyses.

Value

Returns an object of class TASCResult.

Examples

par(mfrow=c(2,1))
result <- TASC(iris[,"Petal.Width"], method="A", tau=0.15)
print(result)
plot(result)

result <- TASC(iris[,"Petal.Width"], method="B", tau=0.15)
print(result)
plot(result)
par(mfrow=c(2,1))
result <- TASC(iris[,"Petal.Width"], method="A", tau=0.15)
print(result)
plot(result)

result <- TASC(iris[,"Petal.Width"], method="B", tau=0.15)
print(result)
plot(result)

Class "TASCResult"

Description

A specialized class storing the results of a call to TASC.

Objects of this class

Objects of this class shouldn't be created directly. They are created implicitly by a call to TASC.

Slots

p.value:: The p-value of the statistical test for reliability of the trinarization.
intermediateSteps:: A matrix specifying the optimal step functions from which the trinarization was calculated. The number of rows corresponds to the number of step functions, and the number of columns is determined by the length of the input vector minus 2 (that is, the length of the step function corresponding to the input vector). From the first to the last row, the number of steps increases. The non-zero entries of the matrix represent the locations of the steps. Step functions with fewer steps than the input step function have entries set to zero.
intermediateHeights1:: A matrix giving the jump heights of the steps supplied in intermediateSteps for the first threshold.
intermediateHeights2:: A matrix giving the jump heights of the steps supplied in intermediateSteps for the second threshold.
intermediateStrongestSteps:: A matrix with one row for each step function (row) in intermediateSteps. The entries specify the location of the two strongest steps for each of the functions.
originalMeasurements:: A numeric vector storing the input measurements.
trinarizedMeasurements:: An integer vector of trinarized values (0, 1 or 2) corresponding to the original measurements.
threshold1:: The threshold that separates 0 from 1.
threshold2:: The threshold that separates 1 from 2.
method:: A string describing the trinarization method that yielded the result.

Extends

Class "TrinarizationResult", directly.

Methods

plotStepFunctions: signature(x = "TASCResult"): Plot the intermediate optimal step functions used to determine the thresholds.
print: signature(x = "TASCResult"): Print a summary of the trinarization.
show: signature(object = "TASCResult"): ...

An artificial data set consisting of ten artificial feature vectors.

Description

An artificial data set consisting of 100 artificial feature vectors that are used to illustrate the trinarization methods in the package vignette. Each row of the matrix trinarizationExample corresponds to one feature vector, of which 5 measurements are drawn from a normal distribution N(0,1). The remaining 10 measurements are drawn from two normal distributions N(m,1), with m=10:1 and m=seq(20,2,by=-2) (5 meansurements per distribution).

Usage

data(trinarizationExample)data(trinarizationExample)

Format

The data is a matrix with 15 columns and 100 rows.

Class "TrinarizationResult"

Description

This is the base class for objects that store the results of a trinarization algorithm. It defines the slots and methods that the results of all algorithms share.

Objects of this class

Objects of this class shouldn't be created directly. They are created implicitly by a call to one of the trinarizeation algorithms.

Slots

originalMeasurements:: A numeric vector storing the input measurements.
trinarizedMeasurements:: An integer vector of trinarized values (0 or 1 or 2) corresponding to the original measurements.
threshold1:: The threshold that separates 0 and 1.
threshold2:: The threshold that separates 1 and 2.
method:: A string describing the trinarization method that yielded the result.
p.value:: The p-value obtained by a test for validity of the trinarization (e.g. TASC bootstrap test). If no test was performed, this is NA.

Methods

plot: signature(x = "TrinarizationResult"): Plot the trinarization and the thresholds.
print: signature(x = "TrinarizationResult"): Print a summary of the trinarization.
show: signature(object = "TrinarizationResult"): ...

Utility function to trinarize a matrix of measurements

Description

Trinarizes a matrix of measurements all at once, and returns the trinarized vectors as well as the trinarization thresholds and the p-values.

Usage

trinarizeMatrix(mat, 
               method = c("TASCA", "TASCB"), 
               adjustment = "none", 
               ...)
trinarizeMatrix(mat, 
               method = c("TASCA", "TASCB"), 
               adjustment = "none", 
               ...)

Arguments

`mat`	A n x m matrix comprising m raw measurements of n features.
`method`	The trinarization algorithm to be used. `method="TASCA"` calls `TASC` with `method="A"`. `method="TASCB"` calls `TASC` with `method="B"`.
`adjustment`	Specifies an optional adjustment for multiple testing that is applied to the p-values (see `p.adjust` for possible values). By default, no adjustment is applied.
`...`	Further parameters that are passed to the respective trinarization methods (`TASC`).

Value

A n x (m+3) matrix of trinarized measurements. Here, the first m columns correspond to the trinarized measurements. The m+1-st and the m+2-st column comprises the trinarization thresholds for the features, and the m+3-nd column contains the p-values.

Examples

tri <- trinarizeMatrix(t(iris[,1:4]))
print(tri)
tri <- trinarizeMatrix(t(iris[,1:4]))
print(tri)

Package 'Binarize'

Help Index

Class "BASCResult"

Description

Objects of this class

Slots

Extends

Methods

See Also

An artificial data set consisting of ten artificial feature vectors.

Description

Usage

Format

Class "BinarizationResult"

Description

Objects of this class

Slots

Methods

See Also

Binarization Across Multiple Scales

Description

Usage

Arguments

Details

Value

References

See Also

Examples

k-means Binarization

Description

Usage

Arguments

Value

See Also

Examples

Utility function to binarize a matrix of measurements

Description

Usage

Arguments

Value

See Also

Examples

Visualization of binarization results.

Description

Usage

Arguments

Details

See Also

Examples

Visualization of trinarization results.

Description

Usage

Arguments

Details

See Also

Examples

Plot all step functions for BASC or TASC

Description

Usage

Arguments

See Also

Examples

Trinarization Across Multiple Scales

Description

Usage

Arguments

Details

Value

See Also

Examples

Class "TASCResult"

Description

Objects of this class

Slots

Extends

Methods

See Also

An artificial data set consisting of ten artificial feature vectors.

Description

Usage