--- title: "Using sparta" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{using_sparta} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` A probability mass function can be represented by a multi-dimensional array. However, for high-dimensional distributions where each variable may have a large state space, lack of computer memory can become a problem. For example, an $80$-dimensional random vector in which each variable has $10$ levels will lead to a state space with $10^{80}$ cells. Such a distribution can not be stored in a computer; in fact, $10^{80}$ is one of the estimates of the number of atoms in the universe. However, if the array consists of only a few non-zero values, we need only store these values along with information about their location. That is, a sparse representation of a table. Sparta was created for efficient multiplication and marginalization of sparse tables. # How to use sparta ```{r setup} library(sparta) ``` Consider two arrays `f` and `g`: ```{r} dn <- function(x) setNames(lapply(x, paste0, 1:2), toupper(x)) d <- c(2, 2, 2) f <- array(c(5, 4, 0, 7, 0, 9, 0, 0), d, dn(c("x", "y", "z"))) g <- array(c(7, 6, 0, 6, 0, 0, 9, 0), d, dn(c("y", "z", "w"))) ``` with flat layouts ```{r} ftable(f, row.vars = "X") ftable(g, row.vars = "W") ``` We can convert these to their equivalent **sparta** versions as ```{r} sf <- as_sparta(f); sg <- as_sparta(g) ``` Printing the object by the default printing method yields ```{r} print.default(sf) ``` The columns are the cells in the sparse matrix and the `vals` attribute are the corresponding values which can be extracted with the `vals` function. Furthermore, the domain resides in the `dim_names` attribute, which can also be extracted using the `dim_names` function. From the output, we see that (`x2`, `y2`, `z1`) has a value of $2$. Using the **sparta** print method prettifies things: ```{r} print(sf) ``` where row $i$ corresponds to column $i$ in the sparse matrix. The product of `sf` and `sg` ```{r} mfg <- mult(sf, sg); mfg ``` Converting `sf` into a conditional probability table (CPT) with conditioning variable `Z`: ```{r} sf_cpt <- as_cpt(sf, y = "Z"); sf_cpt ``` Slicing `sf` on `X1 = x1` and dropping the `X` dimension ```{r} slice(sf, s = c(X = "x1"), drop = TRUE) ``` reduces `sf` to a single non-zero element, whereas the equivalent dense case would result in a `(Y,Z)` table with one non-zero element and three zero-elements. Marginalizing (or summing) out `Y` in `sg` yields ```{r} marg(sg, y = c("Y")) ``` Finally, we mention that a sparse table can be created using the constructor `sparta_struct`, which can be necessary to use if the corresponding dense table is too large to have in memory. # Functionalities in sparta | Function name | Description | |:-------------------------|:-------------------------------------------------------------------| | `as_` | Convert \code{array}-like object to a `sparta` | | `as_` | Convert `sparta` object to an `array/data.frame/CPT` | | `sparta_struct` | Constructor for `sparta` objects | | `mult, div, marg, slice` | Multiply/divide/marginalize/slice | | `normalize` | Normalize (the values of the result sum to one) | | `get_val` | Extract the value for a specific named cell | | `get_cell_name` | Extract the named cell | | `get_values` | Extract the values | | `dim_names` | Extract the domain | | `names` | Extract the variable names | | `max/min` | The maximum/minimum value | | `which__cell` | The column index referring to the max/min value | | `which__idx` | The configuration corresponding to the max/min value | | `sum` | Sum the values | | `equiv` | Test if two tables are identical up to permutations of the columns | | | |