Title: | Efficient Stepwise Selection in Decomposable Models |
---|---|
Description: | An implementation of the ESS algorithm following Amol Deshpande, Minos Garofalakis, Michael I Jordan (2013) <arXiv:1301.2267>. The ESS algorithm is used for model selection in decomposable graphical models. |
Authors: | Mads Lindskou [aut, cre] |
Maintainer: | Mads Lindskou <[email protected]> |
License: | GPL-3 |
Version: | 1.1.2 |
Built: | 2024-11-07 03:41:26 UTC |
Source: | https://github.com/mlindsk/ess |
The class of graphical models is a family of probability distributions for which conditional dependencies can be read off from a graph. If the graph is decomposable, the maximum likelihood estimates of the parameters in the model can be shown to be on exact form. This is what enables ESS to be fast and efficient for model selection in decomposable graphical models.
Maintainer: Mads Lindskou [email protected]
Useful links:
Extracts the adjacency list of a gengraph
adj_lst(x) ## S3 method for class 'gengraph' adj_lst(x)
adj_lst(x) ## S3 method for class 'gengraph' adj_lst(x)
x |
|
An adjacency list
Extracts the adjacency matrix of a gengraph
object
adj_mat(x) ## S3 method for class 'gengraph' adj_mat(x)
adj_mat(x) ## S3 method for class 'gengraph' adj_mat(x)
x |
|
An adjacency matrix
Converts an adjacency matrix to an adjacency list
as_adj_lst(A)
as_adj_lst(A)
A |
Adjacency matrix |
Converts an adjacency list to an adjacency matrix
as_adj_mat(adj)
as_adj_mat(adj)
adj |
Adjacency list |
An adjacency matrix
adj <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) as_adj_mat(adj)
adj <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) as_adj_mat(adj)
Convert a gengraph
object to an igraph
object
as_igraph(x) ## S3 method for class 'gengraph' as_igraph(x)
as_igraph(x) ## S3 method for class 'gengraph' as_igraph(x)
x |
|
An igraph
object
Finds the components of a graph
components(adj)
components(adj)
adj |
Adjacency list or |
A list where the elements are the components of the graph
This data set contains 358 observations (we have removed 8 with missing values). It contains 12 clinical attributes and 21 histopathological attributes. The age attribute has been discretized. The class variable "ES" has six levels; each describing a skin disease.
derma
derma
An object of class tbl_df
(inherits from tbl
, data.frame
) with 358 rows and 35 columns.
Finds the elements in the component of root
dfs(adj, root)
dfs(adj, root)
adj |
A named adjacency list of a decomposable grah |
root |
The node from which the component should be found |
All nodes connected to root
x <- list(a = c("b", "d"), b = c("a", "d"), c = c("b", "a"), d = c("e", "f"), e = c("d", "f"), f = c("d", "e")) dfs(x, "a")
x <- list(a = c("b", "d"), b = c("a", "d"), c = c("b", "a"), d = c("e", "f"), e = c("d", "f"), f = c("d", "e")) dfs(x, "a")
Simulate observations from a decomposable graphical model
dgm_sim_from_graph(g, lvls, nsim = 1000, cell_rate = 0.5)
dgm_sim_from_graph(g, lvls, nsim = 1000, cell_rate = 0.5)
g |
An adjacency list |
lvls |
Named list with levels of the discrete variables |
nsim |
Number of simulations |
cell_rate |
Control discrete cell probabilities |
This function returns a matrix of dimension where each row correspond
to a simulated observation from a DGM represented by g
.
g = list( A = c("B", "X", "Y"), B = c("A", "Y"), X = c("A", "Y"), Y = c("A", "X", "B") ) lvls <- list( A = c("0", "1"), B = c("0", "1"), X = c("a", "b", "c"), Y = c("0", "1", "2") ) dgm_sim_from_graph(g, lvls, nsim = 10) #'
g = list( A = c("B", "X", "Y"), B = c("A", "Y"), X = c("A", "Y"), Y = c("A", "X", "B") ) lvls <- list( A = c("0", "1"), B = c("0", "1"), X = c("a", "b", "c"), Y = c("0", "1", "2") ) dgm_sim_from_graph(g, lvls, nsim = 10) #'
Calculates the joint entropy over discrete variables in df
entropy(df, thres = 5, npc = new.env())
entropy(df, thres = 5, npc = new.env())
df |
data.frame |
thres |
A threshold mechanism for choosing between two different ways of calculating the entropy. Can Speed up the procedure with the "correct" value. |
npc |
An environment. If supplied, the number of positive cells
in the underlying pmf will be stored in the environment with the name
|
A number representing the entropy of the variables in df
.
entropy(derma[1:100, 1:3])
entropy(derma[1:100, 1:3])
Structure learning in decomposable graphical models on several components
fit_components( df, comp, type = "fwd", q = 0.5, trace = FALSE, thres = 5, wrap = TRUE )
fit_components( df, comp, type = "fwd", q = 0.5, trace = FALSE, thres = 5, wrap = TRUE )
df |
Character data.frame |
comp |
A list with character vectors. Each element in the list is a component in the graph (using expert knowledge) |
type |
Character ("fwd", "bwd", "tree" or "tfwd") |
q |
Penalty term in the stopping criterion
where |
trace |
Logical indicating whether or not to trace the procedure |
thres |
A threshold mechanism for choosing between two different ways of calculating the entropy. |
wrap |
logical specifying if the result of a run with type = "tree" should be converted to a "fwd" object |
An adjacency list object
fit_graph
, adj_lst.gengraph
,
adj_mat.gengraph
, walk.fwd
,
walk.bwd
, gengraph
A generic method for structure learning in decomposable graphical models
fit_graph( df, type = "fwd", q = 0.5, trace = FALSE, sparse_qic = FALSE, thres = 5, wrap = TRUE )
fit_graph( df, type = "fwd", q = 0.5, trace = FALSE, sparse_qic = FALSE, thres = 5, wrap = TRUE )
df |
Character data.frame |
type |
Character ("fwd", "bwd", "tree" or "tfwd") |
q |
Penalty term in the stopping criterion
where |
trace |
Logical indicating whether or not to trace the procedure |
sparse_qic |
Logical. If |
thres |
A threshold mechanism for choosing between two different ways of calculating the entropy. |
wrap |
logical specifying if the result of a run with type = "tree" should be converted to a "fwd" object |
The types are
"fwd": forward selection
"bwd": backward selection
"tree": Chow-Liu tree (first order interactions only)
"tfwd": A combination of "tree" and "fwd". This can speed up runtime considerably in high dimensions.
Using adj_lst
on an object returned by fit_graph
gives the
adjacency list corresponding to the graph. Similarly one can use adj_mat
to obtain an adjacency matrix. Applying the rip
function on an
adjacency list returns the cliques and separators of the graph.
A gengraph
object representing a decomposable graph.
https://arxiv.org/abs/1301.2267, doi:10.1109/ictai.2004.100
adj_lst
, adj_mat
,
as_igraph
, gengraph
g <- fit_graph(derma) print(g) plot(g) # Adjacency matrix and adjacency list adjm <- adj_mat(g) adjl <- adj_lst(g)
g <- fit_graph(derma) print(g) plot(g) # Adjacency matrix and adjacency list adjm <- adj_mat(g) adjl <- adj_lst(g)
A generic structure for decomposable graphical models
gengraph(df, type = "fwd", q = 0.5, sparse_qic = TRUE)
gengraph(df, type = "fwd", q = 0.5, sparse_qic = TRUE)
df |
Character data.frame |
type |
Character ("fwd", "bwd", "tree" or "tfwd") |
q |
Penalty term in the stopping criterion
where |
sparse_qic |
Logical. If |
A gengraph
object with child class type
used for model selection.
adj_lst.gengraph
, adj_mat.gengraph
, fit_graph
, walk.fwd
, walk.bwd
gengraph(derma, type = "fwd") gengraph(derma, type = "bwd")
gengraph(derma, type = "fwd") gengraph(derma, type = "bwd")
This function returns TRUE
if the graph is decomposable and FALSE
otherwise
is_decomposable(adj)
is_decomposable(adj)
adj |
Adjacency list of an undirected graph |
Logial describing whether or not adj
is decomposable
# 4-cycle: adj <- list(a = c("b", "d"), b = c("a", "c"), c = c("b", "d"), d = c("a", "c")) is_decomposable(adj) # FALSE # Two triangles: adj2 <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) is_decomposable(adj2) # TRUE
# 4-cycle: adj <- list(a = c("b", "d"), b = c("a", "c"), c = c("b", "d"), d = c("a", "c")) is_decomposable(adj) # FALSE # Two triangles: adj2 <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) is_decomposable(adj2) # TRUE
A helper function to make an adjacency list corresponding to a complete graph
make_complete_graph(nodes)
make_complete_graph(nodes)
nodes |
A character vector containing the nodes to be used in the graph |
An adjacency list of a complete graph
d <- derma[, 5:8] cg <- make_complete_graph(colnames(d))
d <- derma[, 5:8] cg <- make_complete_graph(colnames(d))
A helper function to make an adjacency list corresponding to a null graph (no edges)
make_null_graph(nodes)
make_null_graph(nodes)
nodes |
A character vector containing the nodes to be used in the graph |
An adjacency list the null graph with no edges
d <- derma[, 5:8] ng <- make_null_graph(colnames(d))
d <- derma[, 5:8] ng <- make_null_graph(colnames(d))
Maximum Cardinality Search
mcs(adj, check = TRUE)
mcs(adj, check = TRUE)
adj |
A named adjacency list of a decomposable graph |
check |
Boolean: check if adj is decomposable |
If adj is not the adjacency list of a decomposable graph an error is raised
A list with a perfect numbering of the nodes and a perfect sequence of sets
x <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) mcs(x)
x <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) mcs(x)
A wrapper around igraphs plot method for gengraph
objects
## S3 method for class 'gengraph' plot(x, vc = NULL, ...)
## S3 method for class 'gengraph' plot(x, vc = NULL, ...)
x |
A |
vc |
Named character vector; the names are the vertices and the elements are the colors of the nodes |
... |
Extra arguments. See the igraph package |
No return value, called for side effects
d <- derma[, 10:25] g <- fit_graph(d) vs <- colnames(d) vcol <- structure(vector("character", length(vs)), names = vs) vcol[1:4] <- "lightsteelblue2" vcol[5:7] <- "orange" vcol[8:16] <- "pink" plot(g, vcol)
d <- derma[, 10:25] g <- fit_graph(d) vs <- colnames(d) vcol <- structure(vector("character", length(vs)), names = vs) vcol[1:4] <- "lightsteelblue2" vcol[5:7] <- "orange" vcol[8:16] <- "pink" plot(g, vcol)
A print method for gengraph
objects
## S3 method for class 'gengraph' print(x, ...)
## S3 method for class 'gengraph' print(x, ...)
x |
A |
... |
Not used (for S3 compatability) |
A print method for tree
objects
## S3 method for class 'tree' print(x, ...)
## S3 method for class 'tree' print(x, ...)
x |
A |
... |
Not used (for S3 compatability) |
Given a decomposable graph, this functions finds a perfect numbering on the vertices using maximum cardinality search, and hereafter returns a list with two elements: "C" - A RIP-ordering of the cliques and "S" - A RIP ordering of the separators.
rip(adj, check = TRUE)
rip(adj, check = TRUE)
adj |
A named adjacency list of a decomposable graph |
check |
Boolean: check if adj is decomposable |
A list with cliques and separators of adj
x <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) y <- rip(x) # Cliques: y$C # Separators: y$S
x <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) y <- rip(x) # Cliques: y$C # Separators: y$S
Construct a subgraph with a given set of nodes removed
subgraph(x, g)
subgraph(x, g)
x |
Character vector of nodes |
g |
Adjacency list (named) or a adjacency matrix with dimnames given as the nodes |
An adjacency list or adjacency matrix.
adj <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) d <- data.frame(a = "", b = "", c ="", d = "") # Toy data so we can plot the graph subgraph(c("c", "b"), adj) subgraph(c("b", "d"), as_adj_mat(adj))
adj <- list(a = c("b", "d"), b = c("a", "c", "d"), c = c("b", "d"), d = c("a", "c", "b")) d <- data.frame(a = "", b = "", c ="", d = "") # Toy data so we can plot the graph subgraph(c("c", "b"), adj) subgraph(c("b", "d"), as_adj_mat(adj))
Stepwise model selection in decomposable graphical models
walk(x, df, q, thres)
walk(x, df, q, thres)
x |
|
df |
data.frame |
q |
Penalty term in the stopping criterion ( |
thres |
A threshold mechanism for choosing between two different ways of calculating the entropy. Can Speed up the procedure with the "correct" value. |
A fwd
(or bwd
) object can be created using the gengraph
constructor with type = "fwd"
.
A fwd
or bwd
object with one additional edge than the input object.
d <- derma[, 10:25] g <- gengraph(d, type = "fwd") s <- walk(g, d) print(s) plot(s) adj_lst(s) adj_mat(s)
d <- derma[, 10:25] g <- gengraph(d, type = "fwd") s <- walk(g, d) print(s) plot(s) adj_lst(s) adj_mat(s)
Stepwise backward selection in decomposable graphical models
## S3 method for class 'bwd' walk(x, df, q = 0.5, thres = 5)
## S3 method for class 'bwd' walk(x, df, q = 0.5, thres = 5)
x |
|
df |
data.frame |
q |
Penalty term in the stopping criterion ( |
thres |
A threshold mechanism for choosing between two different ways of calculating the entropy. Can Speed up the procedure with the "correct" value. |
A bwd
object can be created using the gengraph
constructor with type = "bwd"
A bwd
object; a subclass of gengraph
) used for backward selection.
d <- derma[, 10:25] g <- gengraph(d, type = "bwd") s <- walk(g, d) print(s) plot(s) adj_lst(s) adj_mat(s)
d <- derma[, 10:25] g <- gengraph(d, type = "bwd") s <- walk(g, d) print(s) plot(s) adj_lst(s) adj_mat(s)
Stepwise efficient forward selection in decomposable graphical models
## S3 method for class 'fwd' walk(x, df, q = 0.5, thres = 5)
## S3 method for class 'fwd' walk(x, df, q = 0.5, thres = 5)
x |
A |
df |
data.frame |
q |
Penalty term in the stopping criterion ( |
thres |
A threshold mechanism for choosing between two different ways of calculating the entropy. Can Speed up the procedure with the "correct" value. |
A fwd
object can be created using the gengraph
constructor with type = "fwd"
A fwd
object; a subclass of gengraph
) used for forward selection.
https://arxiv.org/abs/1301.2267, doi:10.1109/ictai.2004.100
d <- derma[, 10:25] g <- gengraph(d, type = "fwd") s <- walk(g, d) print(s) plot(s) adj_lst(s) adj_mat(s)
d <- derma[, 10:25] g <- gengraph(d, type = "fwd") s <- walk(g, d) print(s) plot(s) adj_lst(s) adj_mat(s)