Title: | Bayesian Analysis Framework for MVN (Mixture) Distribution |
---|---|
Description: | Tools of Bayesian analysis framework using the method suggested by Berger (1985) <doi:10.1007/978-1-4757-4286-2> for multivariate normal (MVN) distribution and multivariate normal mixture (MixMVN) distribution: a) calculating Bayesian posteriori of (Mix)MVN distribution; b) generating random vectors of (Mix)MVN distribution; c) Markov chain Monte Carlo (MCMC) for (Mix)MVN distribution. |
Authors: | ZHANG Chen |
Maintainer: | ZHANG Chen <[email protected]> |
License: | GPL-2 |
Version: | 0.0.8-11 |
Built: | 2024-11-06 03:11:03 UTC |
Source: | https://github.com/cubiczebra/mvnbayesian |
Tools of Bayesian analysis framework using the method suggested by Berger (1985) <doi:10.1007/978-1-4757-4286-2> for multivariate normal (MVN) distribution and multivariate normal mixture (MixMVN) distribution: a) calculating Bayesian posteriori of (Mix)MVN distribution; b) generating random vectors of (Mix)MVN distribution; c) Markov chain Monte Carlo (MCMC) for (Mix)MVN distribution.
This package is aimed to build a easy approach for MVN (mixture) distribution in Bayesian analysis framework. Bayesian posteriori MVN (mixture) distribution can be calculated in conditions of given priori MVN (mixture) informations. The conjugated property of MVN distribution makes it effective in parameter estimation using Bayesian iterator. Joint and marginal probability densities of a certain MVN (mixture) can be achieved through random vector generator, using Gibbs sampling. Conditional probability densities from a certain MVN (mixture) can be simulated using MCMC method.
ZHANG Chen
Maintainer: ZHANG Chen <[email protected]>
"Statistical Inference" by George Casella. Roger L. Berger;
"Statistical Decision Theory and Bayesian Analysis" by James O. Berger;
"Matrix Computation" by Gee H. Golub. Charles F. Van Loan;
"Bayesian Statistics" by WEI Laisheng;
"Machine Learning" by NAKAGAWA Hiroshi.
library(Rfast) library(mvtnorm) library(plyr) head(dataset1) BP <- MVN_BayesianPosteriori(dataset1) BP BP_Gibbs <- MVN_GibbsSampler(5000, BP) colMeans(BP_Gibbs) colrange(BP_Gibbs) result <- MVN_MCMC(BP, 5000, c(1), c(77.03)) result$Accept
library(Rfast) library(mvtnorm) library(plyr) head(dataset1) BP <- MVN_BayesianPosteriori(dataset1) BP BP_Gibbs <- MVN_GibbsSampler(5000, BP) colMeans(BP_Gibbs) colrange(BP_Gibbs) result <- MVN_MCMC(BP, 5000, c(1), c(77.03)) result$Accept
Renumbering vector by elemental frequency in ascending order.
# Tidy vector by elemental frequency: Ascending_Num(data)
# Tidy vector by elemental frequency: Ascending_Num(data)
data |
An 1d-vector. |
return a renumbered vector by elemental frequency. Factors will be positive integers arrayed in ascending order.
library(plyr) x <- c(1,2,2,2,2,2,2,2,3,3,3,1,3,3,3) x Ascending_Num(x)
library(plyr) x <- c(1,2,2,2,2,2,2,2,3,3,3,1,3,3,3) x Ascending_Num(x)
Dataset built for MVN mixture test, which contains 3 variables and 25 observations.
data("dataset1")
data("dataset1")
A data frame with 25 observations on 3 independent variables, named as fac1
, fac2
and fac3
.
fac1
The 1st factor.
fac2
The 2nd factor.
fac3
The 3rd factor.
dataset1
dataset1
Dataset built for MVN mixture test, which contains 4 variables (the first 4 columns), clustering (the last column) and 96 observations.
data("dataset2")
data("dataset2")
A data frame with 96 pseudo-observations generated by random number generator. All observations come from 3 different centers which have been marked in the last column "species". More specifically, data of species=1 comes from the center (1,1,1,1); data of species=2 comes from the center (2,2,2,0); data of species=3 comes from the center (1,0,2,2).
dimen1
the 1st variable
dimen2
the 2nd variable
dimen3
the 3rd variable
dimen4
the 4th variable
species
clustering label
dataset2
dataset2
Interchange all elements between two specified rows and columns in a matrix.
# A matrix-like data MatrixAlternative(data, sub, rep)
# A matrix-like data MatrixAlternative(data, sub, rep)
data |
A matrix to be processed. |
sub |
A positive integer. The first selected dimension. |
rep |
A positive integer. The second selected dimension. Default value is 1. |
return a matrix with interchanged rows and columns in two specified dimensions.
library(plyr) M <- matrix(1:9,3,3,1) M MatrixAlternative(M, 2)
library(plyr) M <- matrix(1:9,3,3,1) M MatrixAlternative(M, 2)
Given a design matrix (data) including sufficient samplings from different multivariate normal distribution, export the parameters of Bayesian posterior multivariate normal mixture distribution. Parameters contains mixture probability, mean vector and covariance matrix, for each cluster.
# paramtric columns-only as input data: # data <- dataset2[,1:4] # Specify species to get parameters of mixture MVN: MixMVN_BayesianPosterior(data, species, idx)
# paramtric columns-only as input data: # data <- dataset2[,1:4] # Specify species to get parameters of mixture MVN: MixMVN_BayesianPosterior(data, species, idx)
data |
Design matrix: data.frame or matrix-like data sampling from different multivariate normal distribution, |
species |
Number of clusters: number of clusters for import data. It will be only called once by the next argument |
idx |
port for clustering result: an vector which should have the identical dimension as the number of samplings of data. Default value is generated by kmeans algrithm. |
return a matrix-like result with contains all parameters of Bayesian posterior multivariate normal mixture distribution: All clusters are arrayed in rows and mixture probability, posterior mean and posterior covariance for each cluster are arrayed in columns.
ZHANG Chen
kmeans, MVN_BayesianPosterior
library(plyr) # Design matrix should only contain columns of variables # Export will be a matrix-like data # Using kmeans (default) clustering algrithm data_dim <- dataset2[,1:4] result <- MixMVN_BayesianPosterior(data=data_dim, species=3) result # class(result) # Get the parameters of the 1st cluster: result[1,] # class(result[1,]) # Get the mixture probability of cluster2: result[2,1][[1]] # class(result[2,1][[1]]) # class(result[2,]$probability) # class(result[2,1]) # Get the mean vector of cluster1: result[1,2][[1]] # class(result[1,2][[1]]) # class(result[1,]$mean) # class(result[1,2]) # Get the covariance matrix of cluster3: result[3,3][[1]] # class(result[3,3][[1]]) # class(result[3,]$var) # class(result[3,3])
library(plyr) # Design matrix should only contain columns of variables # Export will be a matrix-like data # Using kmeans (default) clustering algrithm data_dim <- dataset2[,1:4] result <- MixMVN_BayesianPosterior(data=data_dim, species=3) result # class(result) # Get the parameters of the 1st cluster: result[1,] # class(result[1,]) # Get the mixture probability of cluster2: result[2,1][[1]] # class(result[2,1][[1]]) # class(result[2,]$probability) # class(result[2,1]) # Get the mean vector of cluster1: result[1,2][[1]] # class(result[1,2][[1]]) # class(result[1,]$mean) # class(result[1,2]) # Get the covariance matrix of cluster3: result[3,3][[1]] # class(result[3,3][[1]]) # class(result[3,]$var) # class(result[3,3])
The function to export the mixture probabilities, the mean vectors and covariance matrices of Bayesian posteriori MVN mixture distribution in the basis of given priori information (priori MVN mixture) and observation data (a design matrix containing all variables).
# paramtric columns-only as input data: # data <- dataset2[,1:4] # Specify species to get parameters of MVN mixture model: MixMVN_BayesianPosteriori(data, species, idx)
# paramtric columns-only as input data: # data <- dataset2[,1:4] # Specify species to get parameters of MVN mixture model: MixMVN_BayesianPosteriori(data, species, idx)
data |
A data.frame or matrix-like data: obervations should be arrayed in rows while variables should be arrayed in columns. |
species |
A positive integer. The number of clusters for import data. It will be only called once by the next argument |
idx |
A vector-like data to import for accepting clustering result. Default value is generated by |
return a matrix-like result containing all parameters of Bayesian posteriori MVN mixture distribution: Clusters are arrayed in rows, while the mixture probabilities, posteriori mean vectors and posteriori covariance matrices are arrayed in columns.
kmeans
, MVN_BayesianPosteriori
library(plyr) # Design matrix should only contain columns of variables # Export will be a matrix-like data # Using kmeans (default) clustering algrithm data_dim <- dataset2[,1:4] result <- MixMVN_BayesianPosteriori(data=data_dim, species=3) result # Get the parameters of the cluster1: result[1,] # Get the mixture probability of cluster2: # (Attention to the difference between # result[2,1][[1]] and result[2,1]) result[2,1][[1]] # Get the mean vector of cluster1: result[1,2][[1]] # Get the covariance matrix of cluster3: result[3,3][[1]]
library(plyr) # Design matrix should only contain columns of variables # Export will be a matrix-like data # Using kmeans (default) clustering algrithm data_dim <- dataset2[,1:4] result <- MixMVN_BayesianPosteriori(data=data_dim, species=3) result # Get the parameters of the cluster1: result[1,] # Get the mixture probability of cluster2: # (Attention to the difference between # result[2,1][[1]] and result[2,1]) result[2,1][[1]] # Get the mean vector of cluster1: result[1,2][[1]] # Get the covariance matrix of cluster3: result[3,3][[1]]
Generating random vectors on the basis of a given MVN mixture distribution, through Gibbs sampling algorithm or matrix factorization.
# Bayesian posteriori MVN mixture model as input data: # data <- MixMVN_BayesianPosteriori(dataset2[,1:4], species=3) # Generate random vectors based on Bayesian posteriori MVN mixture: MixMVN_GibbsSampler(n, data, random_method = c("Gibbs", "Fast"), reject_rate=0, ...)
# Bayesian posteriori MVN mixture model as input data: # data <- MixMVN_BayesianPosteriori(dataset2[,1:4], species=3) # Generate random vectors based on Bayesian posteriori MVN mixture: MixMVN_GibbsSampler(n, data, random_method = c("Gibbs", "Fast"), reject_rate=0, ...)
n |
A positive integer. The numbers of random vectors to be generated. |
data |
A matrix-like data which contains the mixture probability, mean vector and covariance matrix for each cluster in each row. |
random_method |
The method to generate random vectors. Options are |
reject_rate |
A numeric value which will be efficient if the |
... |
Other arguments to control the process in Gibbs sampling if the |
It is recommanded using the random method of "Fast" due to the high efficiency. The time complexity of "Gibbs" method is O(k*n) where the k means dimensionality of MVN mixture model and n means generated numbers of random vectors; while that of the "Fast" method is only O(n), without considering the effect of burn-in period. this discrepancy will be even further significant when we use MCMC methods to do some further analysis in which random vectors will be generated every time when we set conditions.
return a series random vectors in the basis of given MVN mixture distribution.
Ascending_Num
, MixMVN_BayesianPosteriori
, MVN_BayesianPosteriori
library(plyr) library(mvtnorm) library(stats) # Use dataset2 for demonstration. Get parameters of Bayesian # posteriori multivariate normal mixture distribution head(dataset2) dataset2_par <- dataset2[,1:4] # only parameter columns are premitted MixBPos <- MixMVN_BayesianPosteriori(dataset2_par, species=3) MixBPos # Generate random vectors using Gibbs sampling: MixBPos_Gibbs <- MixMVN_GibbsSampler(5000, MixBPos, random_method = "Gibbs") head(MixBPos_Gibbs) # Compared generation speed of "Gibbs" to that of "Fast" MixBPos_Fast <- MixMVN_GibbsSampler(5000, MixBPos, random_method = "Fast") head(MixBPos_Fast) # Visulization by clusters: library(rgl) dimen1 <- MixBPos_Gibbs[,1] dimen2 <- MixBPos_Gibbs[,2] dimen3 <- MixBPos_Gibbs[,3] dimen4 <- MixBPos_Gibbs[,4] plot3d(x=dimen1, y=dimen2, z=dimen3, col=MixBPos_Gibbs[,5], size=2)
library(plyr) library(mvtnorm) library(stats) # Use dataset2 for demonstration. Get parameters of Bayesian # posteriori multivariate normal mixture distribution head(dataset2) dataset2_par <- dataset2[,1:4] # only parameter columns are premitted MixBPos <- MixMVN_BayesianPosteriori(dataset2_par, species=3) MixBPos # Generate random vectors using Gibbs sampling: MixBPos_Gibbs <- MixMVN_GibbsSampler(5000, MixBPos, random_method = "Gibbs") head(MixBPos_Gibbs) # Compared generation speed of "Gibbs" to that of "Fast" MixBPos_Fast <- MixMVN_GibbsSampler(5000, MixBPos, random_method = "Fast") head(MixBPos_Fast) # Visulization by clusters: library(rgl) dimen1 <- MixBPos_Gibbs[,1] dimen2 <- MixBPos_Gibbs[,2] dimen3 <- MixBPos_Gibbs[,3] dimen4 <- MixBPos_Gibbs[,4] plot3d(x=dimen1, y=dimen2, z=dimen3, col=MixBPos_Gibbs[,5], size=2)
Function to get a MCMC simulation results based on the imported MVN mixture distribution. It is commonly used for inquiring the specified conditional probability of MVN mixture distribuiton calculated through Bayesian posteriori.
# Bayesian posteriori mix MVN as input data: # data <- MixMVN_BayesianPosteriori(dataset2[,1:4], 3) # run MCMC simulation based on Bayesian posteriori mix MVN: MixMVN_MCMC(data, steps, pars, values, tol, random_method, ...)
# Bayesian posteriori mix MVN as input data: # data <- MixMVN_BayesianPosteriori(dataset2[,1:4], 3) # run MCMC simulation based on Bayesian posteriori mix MVN: MixMVN_MCMC(data, steps, pars, values, tol, random_method, ...)
data |
A matrix-like data containing the mixture probability, mean vector and covariance matrix for each cluster in each row. |
steps |
A positive integer. The numbers of random vectors to be generated for MCMC step. |
pars |
A integer vector to declare fixed dimension(s). For example if the desired dimensions are 1st=7 and 3rd=10, set this argument as c(1,3). |
values |
A numeric vector to assign value(s) to declared dimension(s). For example if the desired dimensions are 1st=7 and 3rd=10, set this argument as c(7,10). |
tol |
Tolerance. A numeric value to control the generated vectors to be accepted or rejected. Criterion uses Euclidean distance in declared dimension(s). Default value is 0.3. |
random_method |
The method to generate random vectors. Options are |
... |
Other arguments to control the process in Gibbs sampling if the |
return a list which contains:
AcceptRate |
Acceptance of declared conditions of MCMC |
MCMCdata |
All generated random vectors in MCMC step based on MVN mixture distribution |
Accept |
Subset of accepted sampling in MCMCdata |
Reject |
Subset of rejected sampling in MCMCdata |
MixMVN_BayesianPosteriori
, MixMVN_GibbsSampler
, MVN_GibbsSampler
, MVN_FConditional
library(plyr) library(mvtnorm) library(stats) # dataset2 has 4 parameters: dimen1, dimen2, dimen3 and dimen4: head(dataset2) dataset2_dim <- dataset2[,1:4] # extract parametric columns # Get posteriori parameters of dataset2 using kmeans 3 clustering: MixBPos <- MixMVN_BayesianPosteriori(dataset2_dim, 3) # If we want to know when dimen1=1, which clusters are accepted, run: MixBPos_MCMC <- MixMVN_MCMC(MixBPos, steps=5000, pars=c(1), values=c(1), tol=0.3) MixBPos_MCMC$AcceptRate result <- MixBPos_MCMC$MCMCdata head(result) # count accepted samples by clustering: count(result[which(result[,7]==1),5]) library(rgl) # Visualization using plot3d() if necessary: # Clustering result in the rest 3 dimensions: plot3d(result[,2], result[,3], z=result[,4], col=result[,5], size=2) # Acceptance rejection visualization: plot3d(result[,2], result[,3], z=result[,4], col=result[,7]+1, size=2)
library(plyr) library(mvtnorm) library(stats) # dataset2 has 4 parameters: dimen1, dimen2, dimen3 and dimen4: head(dataset2) dataset2_dim <- dataset2[,1:4] # extract parametric columns # Get posteriori parameters of dataset2 using kmeans 3 clustering: MixBPos <- MixMVN_BayesianPosteriori(dataset2_dim, 3) # If we want to know when dimen1=1, which clusters are accepted, run: MixBPos_MCMC <- MixMVN_MCMC(MixBPos, steps=5000, pars=c(1), values=c(1), tol=0.3) MixBPos_MCMC$AcceptRate result <- MixBPos_MCMC$MCMCdata head(result) # count accepted samples by clustering: count(result[which(result[,7]==1),5]) library(rgl) # Visualization using plot3d() if necessary: # Clustering result in the rest 3 dimensions: plot3d(result[,2], result[,3], z=result[,4], col=result[,5], size=2) # Acceptance rejection visualization: plot3d(result[,2], result[,3], z=result[,4], col=result[,7]+1, size=2)
Function to execute parameter estimation for MVN distribution, under Bayesian analysis framework.
# Get parameters of Bayesian posteriori MVN: MVN_BayesianIterator(data, pri_mean=colMeans(data), Gibbs_nums=5000, pseudo_nums=dim(data)[1], threshold=1e-04, iteration=100, ...)
# Get parameters of Bayesian posteriori MVN: MVN_BayesianIterator(data, pri_mean=colMeans(data), Gibbs_nums=5000, pseudo_nums=dim(data)[1], threshold=1e-04, iteration=100, ...)
data |
A data.frame or matrix-like data: obervations should be arrayed in rows while variables should be arrayed in columns. |
pri_mean |
A numeric vector to assign priori mean for MVN. Default value applies |
Gibbs_nums |
A positive integer. The numbers of random vectors to be generated for each iteration step. Defult value is 5000. |
pseudo_nums |
A positive integer. The argument to determine numbers of generated vectors used for each iteration step. Default value keeps the same scale as input data. Notice that a too small value can result in singular matrix. |
threshold |
A numeric value to control stoping the iteration loop. Default value used 0.0001. While the Euclidean distance of mean vectors between pseudo-data (the last |
iteration |
A positive integer. Argument to assign the maximum steps for iteration. Default value is 100 after which the iteration loop will compulsively exit. |
... |
Other arguments to control the process in Gibbs sampling. |
Because that MVN distribution possess conjugated property in Bayesian analysis framework, the convergence of Bayesian iterator for MVN distribution can be ensured, accoumpanied with the shrink of 2nd-norm of Bayesian posteriori covariance matrix. But pay attention to the fact that pseudo-data leads to the randomness, the argument pseudo_nums
should be set carefully.
return a double level list containing Bayesian posteriori after iteration process:
mean |
Bayesian posteriori mean vector |
var |
Bayesian posteriori covariance matrix |
If the parameter values are the only interested thing we concerned, this iterator makes sense. Since it can significantly help us decrease the scale of covariance matrix, to obtain a more reliable estimation for the parameters. However, in more cases, some correlationships of a certain group of pamameters are more valuable, which are usually clued by the covariance matrix.
MVN_BayesianPosteriori
, MVN_GibbsSampler
, MVN_FConditional
, MatrixAlternative
library(mvtnorm) # Bayesian posteriori before iteration using dataset1 as example, # c(80, 16, 3) as priori mean: # View 2-norm of covariance matrix of Bayesian posteriori: BPos_init <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) BPos_init norm(as.matrix(BPos_init$var), type = "2") # Bayesian posteriori after iteration using c(80,16,3) as priori # Using 30 last samples generated by GibbsSampler for each step: BPos_fina1 <- MVN_BayesianIterator(dataset1, c(80,16,3), 5000, 30) BPos_fina1 norm(as.matrix(BPos_fina1$var), type = "2") # Too small pseudo_nums setting can results in singular system, try: MVN_BayesianIterator(dataset1, pseudo_nums=3)
library(mvtnorm) # Bayesian posteriori before iteration using dataset1 as example, # c(80, 16, 3) as priori mean: # View 2-norm of covariance matrix of Bayesian posteriori: BPos_init <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) BPos_init norm(as.matrix(BPos_init$var), type = "2") # Bayesian posteriori after iteration using c(80,16,3) as priori # Using 30 last samples generated by GibbsSampler for each step: BPos_fina1 <- MVN_BayesianIterator(dataset1, c(80,16,3), 5000, 30) BPos_fina1 norm(as.matrix(BPos_fina1$var), type = "2") # Too small pseudo_nums setting can results in singular system, try: MVN_BayesianIterator(dataset1, pseudo_nums=3)
Given a design matrix (data) and priori information, export the mean vector and covariance matrix of Bayesian posterior multivariate normal distribution.
# Given the data as design matrix, priori mean vector and priori covariance # matrix, this function will export a list which contains mean ($mean) and # covariance ($var) of Bayesian posterior multivariate normal distribution. MVN_BayesianPosterior(data, pri_mean, pri_var) # defualt pri_mean uses colMenas() # defualt pri_var uses unit matrix
# Given the data as design matrix, priori mean vector and priori covariance # matrix, this function will export a list which contains mean ($mean) and # covariance ($var) of Bayesian posterior multivariate normal distribution. MVN_BayesianPosterior(data, pri_mean, pri_var) # defualt pri_mean uses colMenas() # defualt pri_var uses unit matrix
data |
Design matrix: data.frame or matrix-like data, |
pri_mean |
priori mean: necessary vector which should be of the identical dimensions of data ( |
pri_var |
prior covariance matrix: a real symmetric matrix by definition; the default value is an unit matrix with the same dimension of priori mean vector. |
Although this function is very simple, the observation data should be diagnosed firstly. it is strongly recommanded that researchers and developers should have some prior knowledge of ill-conditioned system before using this function. Simply, ill-conditioned system, or singular matrix, is caused by a) insufficient data or b) almostly linear dependency of two certain parameters, which two can result in a too small eigenvalue then cause a ill-conditioned (singular) system. Therefore users should make sure the data contains enough observations and the degree of freedom is strictly equal to the number of parameters.
return a list of:
mean |
mean vector of Bayesian posterior |
var |
covariance of Bayesian posterior |
ZHANG Chen
# Demo using dataset1: head(dataset1) BPos <- MVN_BayesianPosterior(dataset1, c(80,16,3)) BPos$mean BPos$var # Singular system caused by insufficient data eigen(var(dataset1[1:3,]))$values rcond(var(dataset1[1:3,])) eigen(var(dataset1[1:6,]))$values rcond(var(dataset1[1:6,])) # Singular system caused by improper degree of freedom K <- cbind(dataset1, dataset1[,3]*(-2)+3) eigen(var(K[,2:4]))$values rcond(var(K[,2:4]))
# Demo using dataset1: head(dataset1) BPos <- MVN_BayesianPosterior(dataset1, c(80,16,3)) BPos$mean BPos$var # Singular system caused by insufficient data eigen(var(dataset1[1:3,]))$values rcond(var(dataset1[1:3,])) eigen(var(dataset1[1:6,]))$values rcond(var(dataset1[1:6,])) # Singular system caused by improper degree of freedom K <- cbind(dataset1, dataset1[,3]*(-2)+3) eigen(var(K[,2:4]))$values rcond(var(K[,2:4]))
The function to export the mean vector and covariance matrix of Bayesian posteriori MVN distribution in the basis of given priori information (priori MVN) and observation data (a design matrix containing all variables).
# Given the data as design matrix, priori mean vector and priori covariance # matrix, this function will export a list which contains mean ($mean) and # covariance ($var) of Bayesian posteriori multivariate normal distribution. MVN_BayesianPosteriori(data, pri_mean, pri_var)
# Given the data as design matrix, priori mean vector and priori covariance # matrix, this function will export a list which contains mean ($mean) and # covariance ($var) of Bayesian posteriori multivariate normal distribution. MVN_BayesianPosteriori(data, pri_mean, pri_var)
data |
A data.frame or matrix-like data: obervations should be arrayed in rows while variables should be arrayed in columns. |
pri_mean |
A numeric vector to assign priori mean for MVN. Default value applies |
pri_var |
A matrix-like parameter to assign priori covariance matrix. Default value uses unit matrix. |
return a double level list containing:
mean |
mean vector of Bayesian posteriori MVN distribution |
var |
covariance of Bayesian posteriori MVN distribution |
It is strongly recommanded that users should have some prior knowledge of ill-conditioned system before using this function. Simply, ill-conditioned system, or singular matrix, is caused by a) insufficient data or b) almostly linear dependency of two certain parameters, which two can result in a excessively small eigenvalue then cause a ill-conditioned (singular) system. Therefore users must diagnose their data firstly to confirm the fact that the it contains enough observations, and the degree of freedom is strictly equal to the number of parameters as well. Additionally, for the argument pri_var
, a real symmetric matrix is desired by definition.
# Demo using dataset1: head(dataset1) BPos <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) BPos$mean BPos$var # Singular system caused by insufficient data eigen(var(dataset1[1:3,]))$values rcond(var(dataset1[1:3,])) eigen(var(dataset1[1:6,]))$values rcond(var(dataset1[1:6,])) # Singular system caused by improper degree of freedom K <- cbind(dataset1, dataset1[,3]*(-2)+3) eigen(var(K[,2:4]))$values rcond(var(K[,2:4]))
# Demo using dataset1: head(dataset1) BPos <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) BPos$mean BPos$var # Singular system caused by insufficient data eigen(var(dataset1[1:3,]))$values rcond(var(dataset1[1:3,])) eigen(var(dataset1[1:6,]))$values rcond(var(dataset1[1:6,])) # Singular system caused by improper degree of freedom K <- cbind(dataset1, dataset1[,3]*(-2)+3) eigen(var(K[,2:4]))$values rcond(var(K[,2:4]))
Function to export parameters of full conditional normal distribution in basis of given MVN distribution, the undecided dimension, as well as all values in the rest dimensions.
# Bayesian posteriori as input data: # data <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) # inquire parameters of full-conditional distribution based on Bayesian posteriori: MVN_FConditional(data, variable, z)
# Bayesian posteriori as input data: # data <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) # inquire parameters of full-conditional distribution based on Bayesian posteriori: MVN_FConditional(data, variable, z)
data |
A double level list containing all parameters of MVN distribution: mean vector ( |
variable |
A integer to specify the undecided dimension. |
z |
A nd-vector to assign conditions (n = dimensions of given MVN distribution). It should be noted that the value in dimension specified by |
It can be proved that any full conditional distribution from a given MVN will degenerate to an 1d-normal distribution.
return a double level list containing the following parameters of full conditional normal distributions of given MVN in specified dimension:
mean |
a numberic mean of a normal distribution |
var |
a numberic variance of a normal distribution |
MVN_BayesianPosteriori
, MatrixAlternative
head(dataset1) BPos <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) BPos # Bayesian Posteriori result <- MVN_FConditional(BPos, variable = 1, z=c(75, 13, 4)) result$mean class(result$mean) result$var class(result$var) # compare the following results: MVN_FConditional(BPos, variable = 2, z=c(75, 13, 4)) MVN_FConditional(BPos, variable = 2, z=c(75, 88, 4)) MVN_FConditional(BPos, variable = 1, z=c(75, 88, 4))
head(dataset1) BPos <- MVN_BayesianPosteriori(dataset1, c(80,16,3)) BPos # Bayesian Posteriori result <- MVN_FConditional(BPos, variable = 1, z=c(75, 13, 4)) result$mean class(result$mean) result$var class(result$var) # compare the following results: MVN_FConditional(BPos, variable = 2, z=c(75, 13, 4)) MVN_FConditional(BPos, variable = 2, z=c(75, 88, 4)) MVN_FConditional(BPos, variable = 1, z=c(75, 88, 4))
Generating random vectors on the basis of a given MVN distribution, through Gibbs sampling algorithm.
# Bayesian posteriori as data # data <- MVN_BayesianPosteriori(dataset1) # Using Gibbs sampler to generate random vectors based on Bayesian posteriori: MVN_GibbsSampler(n, data, initial, reject_rate, burn)
# Bayesian posteriori as data # data <- MVN_BayesianPosteriori(dataset1) # Using Gibbs sampler to generate random vectors based on Bayesian posteriori: MVN_GibbsSampler(n, data, initial, reject_rate, burn)
n |
A positive integer. The numbers of random vectors to be generated. |
data |
A double level list which contains the mean vector ( |
initial |
Initial vector where Markov chain starts. Default value use a random vector generated by |
reject_rate |
A numeric to control burn-in period by ratio. Default value is 0.2, namely the first 20% generated vectors will be rejected. If this arg was customized, the next arg |
burn |
A numeric to control burn-in period by numbers. If this arg was customized, final result will be generated by this manner in which it will drop the first n numbers (n= |
There're also some literatures suggest using the mean or mode of priori as initial vector. Users can customize this setting according to their own needs.
return a series random vectors in the basis of given MVN distribution.
MVN_FConditional
, MatrixAlternative
library(mvtnorm) # Get parameters of Bayesian posteriori multivariate normal distribution BPos <- MVN_BayesianPosteriori(dataset1) BPos # Using previous result (BPos) to generate random vectors through Gibbs # sampling: 7000 observations, start from c(1,1,2), use 0.3 burning rate BPos_Gibbs <- MVN_GibbsSampler(7000, BPos, initial=c(1,1,2), 0.3) tail(BPos_Gibbs) # Check for convergence of Markov chain BPos$mean colMeans(BPos_Gibbs) BPos$var var(BPos_Gibbs) # 3d Visulization: library(rgl) fac1 <- BPos_Gibbs[,1] fac2 <- BPos_Gibbs[,2] fac3 <- BPos_Gibbs[,3] plot3d(x=fac1, y=fac2, z=fac3, col="red", size=2)
library(mvtnorm) # Get parameters of Bayesian posteriori multivariate normal distribution BPos <- MVN_BayesianPosteriori(dataset1) BPos # Using previous result (BPos) to generate random vectors through Gibbs # sampling: 7000 observations, start from c(1,1,2), use 0.3 burning rate BPos_Gibbs <- MVN_GibbsSampler(7000, BPos, initial=c(1,1,2), 0.3) tail(BPos_Gibbs) # Check for convergence of Markov chain BPos$mean colMeans(BPos_Gibbs) BPos$var var(BPos_Gibbs) # 3d Visulization: library(rgl) fac1 <- BPos_Gibbs[,1] fac2 <- BPos_Gibbs[,2] fac3 <- BPos_Gibbs[,3] plot3d(x=fac1, y=fac2, z=fac3, col="red", size=2)
Function to get a MCMC simulation results based on the imported MVN distribution. It is commonly used for inquiring the specified conditional probability of MVN distribuiton calculated through Bayesian posteriori.
# Bayesian posteriori as input data # data <- MVN_BayesianPosteriori(dataset1, pri_mean=c(80,16,3)) # run MCMC simulation using Bayesian posteriori: MVN_MCMC(data, steps, pars, values, tol, ...)
# Bayesian posteriori as input data # data <- MVN_BayesianPosteriori(dataset1, pri_mean=c(80,16,3)) # run MCMC simulation using Bayesian posteriori: MVN_MCMC(data, steps, pars, values, tol, ...)
data |
A double level list which contains the mean vector ( |
steps |
A positive integer. The numbers of random vectors to be generated for MCMC step. |
pars |
A integer vector to declare fixed dimension(s). For example if the desired dimensions are 1st=7 and 3rd=10, set this argument as c(1,3). |
values |
A numeric vector to assign value(s) to declared dimension(s). For example if the desired dimensions are 1st=7 and 3rd=10, set this argument as c(7,10). |
tol |
Tolerance. A numeric value to control the generated vectors to be accepted or rejected. Criterion uses Euclidean distance in declared dimension(s). Default value is 0.3. |
... |
Other arguments to control the process in Gibbs sampling. |
return a list which contains:
AcceptRate |
Acceptance of declared conditions of MCMC |
MCMCdata |
All generated random vectors in MCMC step based on MVN distribution |
Accept |
Subset of accepted sampling in MCMCdata |
Reject |
Subset of rejected sampling in MCMCdata |
MVN_GibbsSampler
, MVN_FConditional
library(mvtnorm) library(plyr) # dataset1 has three parameters: fac1, fac2 and fac3: head(dataset1) # Get posteriori parameters of dataset1 using prior of c(80,16,3): BPos <- MVN_BayesianPosteriori(dataset1, pri_mean=c(80,16,3)) # If we want to know when fac1=78, how fac2 responses to fac3, run: BPos_MCMC <- MVN_MCMC(BPos, steps=8000, pars=c(1), values=c(78), tol=0.3) MCMC <- BPos_MCMC$MCMCdata head(MCMC) # Visualization using plot3d() if necessary: library(rgl) plot3d(MCMC[,1], MCMC[,2], z=MCMC[,3], col=MCMC[,5]+1, size=2) # Visualization: 2d scatter plot MCMC_2d <- BPos_MCMC$Accept head(MCMC_2d) plot(MCMC_2d[,3], MCMC_2d[,2], pch=20, col="red", xlab = "fac3", ylab = "fac2") # Compared to the following scatter plot when fac1 is not fixed: plot(BPos_MCMC$MCMCdata[,3], BPos_MCMC$MCMCdata[,2], pch=20, col="red", xlab = "fac3", ylab = "fac2")
library(mvtnorm) library(plyr) # dataset1 has three parameters: fac1, fac2 and fac3: head(dataset1) # Get posteriori parameters of dataset1 using prior of c(80,16,3): BPos <- MVN_BayesianPosteriori(dataset1, pri_mean=c(80,16,3)) # If we want to know when fac1=78, how fac2 responses to fac3, run: BPos_MCMC <- MVN_MCMC(BPos, steps=8000, pars=c(1), values=c(78), tol=0.3) MCMC <- BPos_MCMC$MCMCdata head(MCMC) # Visualization using plot3d() if necessary: library(rgl) plot3d(MCMC[,1], MCMC[,2], z=MCMC[,3], col=MCMC[,5]+1, size=2) # Visualization: 2d scatter plot MCMC_2d <- BPos_MCMC$Accept head(MCMC_2d) plot(MCMC_2d[,3], MCMC_2d[,2], pch=20, col="red", xlab = "fac3", ylab = "fac2") # Compared to the following scatter plot when fac1 is not fixed: plot(BPos_MCMC$MCMCdata[,3], BPos_MCMC$MCMCdata[,2], pch=20, col="red", xlab = "fac3", ylab = "fac2")