---
title: "Using sparta"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{using_sparta}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
A probability mass function can be represented by a multi-dimensional array. However, for
high-dimensional distributions where each variable may have a large
state space, lack of computer memory can become a problem. For
example, an $80$-dimensional random vector in which each variable
has $10$ levels will lead to a state space with $10^{80}$ cells. Such
a distribution can not be stored in a computer; in fact, $10^{80}$ is
one of the estimates of the number of atoms in the universe. However,
if the array consists of only a few non-zero values, we need only store
these values along with information about their location. That is, a sparse representation of a table. Sparta was created for efficient multiplication and marginalization of sparse tables.
# How to use sparta
```{r setup}
library(sparta)
```
Consider two arrays `f` and `g`:
```{r}
dn <- function(x) setNames(lapply(x, paste0, 1:2), toupper(x))
d <- c(2, 2, 2)
f <- array(c(5, 4, 0, 7, 0, 9, 0, 0), d, dn(c("x", "y", "z")))
g <- array(c(7, 6, 0, 6, 0, 0, 9, 0), d, dn(c("y", "z", "w")))
```
with flat layouts
```{r}
ftable(f, row.vars = "X")
ftable(g, row.vars = "W")
```
We can convert these to their equivalent **sparta** versions as
```{r}
sf <- as_sparta(f); sg <- as_sparta(g)
```
Printing the object by the default printing method yields
```{r}
print.default(sf)
```
The columns are the cells in the sparse matrix and the `vals` attribute are the corresponding values which can be extracted with the `vals` function. Furthermore, the domain resides in the `dim_names` attribute, which can also be extracted using the `dim_names` function. From the output, we see that (`x2`, `y2`, `z1`) has a value of $2$. Using the **sparta** print method prettifies things:
```{r}
print(sf)
```
where row $i$ corresponds to column $i$ in the sparse matrix. The product of `sf` and `sg`
```{r}
mfg <- mult(sf, sg); mfg
```
Converting `sf` into a conditional probability table (CPT) with conditioning variable `Z`:
```{r}
sf_cpt <- as_cpt(sf, y = "Z"); sf_cpt
```
Slicing `sf` on `X1 = x1` and dropping the `X` dimension
```{r}
slice(sf, s = c(X = "x1"), drop = TRUE)
```
reduces `sf` to a single non-zero element, whereas the equivalent dense case would result in a `(Y,Z)` table with one non-zero element and three zero-elements.
Marginalizing (or summing) out `Y` in `sg` yields
```{r}
marg(sg, y = c("Y"))
```
Finally, we mention that a sparse table can be created using the constructor `sparta_struct`, which can be necessary to use if the corresponding dense table is too large to have in memory.
# Functionalities in sparta
| Function name | Description |
|:-------------------------|:-------------------------------------------------------------------|
| `as_` | Convert \code{array}-like object to a `sparta` |
| `as_` | Convert `sparta` object to an `array/data.frame/CPT` |
| `sparta_struct` | Constructor for `sparta` objects |
| `mult, div, marg, slice` | Multiply/divide/marginalize/slice |
| `normalize` | Normalize (the values of the result sum to one) |
| `get_val` | Extract the value for a specific named cell |
| `get_cell_name` | Extract the named cell |
| `get_values` | Extract the values |
| `dim_names` | Extract the domain |
| `names` | Extract the variable names |
| `max/min` | The maximum/minimum value |
| `which__cell` | The column index referring to the max/min value |
| `which__idx` | The configuration corresponding to the max/min value |
| `sum` | Sum the values |
| `equiv` | Test if two tables are identical up to permutations of the columns |
| | |