Title: | Goodness-of-Fit for Zero-Inflated Univariate Hidden Markov Models |
---|---|
Description: | Inference, goodness-of-fit tests, and predictions for continuous and discrete univariate Hidden Markov Models (HMM), including zero-inflated distributions. The goodness-of-fit test is based on a Cramer-von Mises statistic and uses parametric bootstrap to estimate the p-value. The description of the methodology is taken from Nasri et al (2020) <doi:10.1029/2019WR025122>. |
Authors: | Bouchra R. Nasri [aut, cre, cph], Mamadou Yamar Thioub [aut, cph], Bruno N. Remillard [aut, cph] |
Maintainer: | Bouchra R. Nasri <[email protected]> |
License: | GPL-3 |
Version: | 0.2.1 |
Built: | 2025-03-14 05:31:55 UTC |
Source: | https://github.com/cran/GenHMM1d |
This function computes the cumulative distribution function (cdf) of a univariate distribution
CDF(family, y, param, size = 0)
CDF(family, y, param, size = 0)
family |
distribution name; run the function distributions() for help |
y |
values at which the cdf is evaluated |
param |
parameters of the distribution; (1 x p) |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
f |
cdf |
This function allows the users to find the details on the available distributions.
distributions()
distributions()
No returned value, allows the users to know the different distributions and parameters
This function computes the expected shortfall of an univariate distribution, excluding zero-inflated.
ES(p, param, family, size = 0, Nsim = 25000)
ES(p, param, family, size = 0, Nsim = 25000)
p |
value (1 x 1) at which the expected shortfall needs to be computed; between 0 and 1; (e.g 0.01, 0.05) |
param |
parameters of the distribution; (1 x p) |
family |
distribution name; run the function distributions() for help |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
Nsim |
number of simulations |
es |
expected shortfall |
family = "gaussian" theta = c(-1.5, 1.7) ; es = ES( 0.01, theta, family) print('Expected shortfall : ') print(es$es)
family = "gaussian" theta = c(-1.5, 1.7) ; es = ES( 0.01, theta, family) print('Expected shortfall : ') print(es$es)
This function estimates the parameters from a univariate hidden Markov model
EstHMMGen( y, ZI = 0, reg, family, start = 0, max_iter = 10000, eps = 1e-04, size = 0, theta0 = NULL, graph = FALSE )
EstHMMGen( y, ZI = 0, reg, family, start = 0, max_iter = 10000, eps = 1e-04, size = 0, theta0 = NULL, graph = FALSE )
y |
observations; (n x 1) |
ZI |
1 if zero-inflated, 0 otherwise (default) |
reg |
number of regimes (including zero-inflated; must be > ZI) |
family |
distribution name; run the function distributions() for help |
start |
starting parameters for the estimation; (1 x p) |
max_iter |
maximum number of iterations of the EM algorithm; suggestion 10000 |
eps |
precision (stopping criteria); suggestion 0.001. |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
theta0 |
initial parameters for each regimes; (r x p), default is NULL |
graph |
TRUE a graph, FALSE otherwise (default); only for continuous distributions |
#############################################################################
theta |
estimated parameters; (r x p) |
Q |
estimated transition matrix for the regimes; (r x r) |
eta |
conditional probabilities of being in regime k at time t given observations up to time t; (n x r) |
lambda |
conditional probabilities of being in regime k at time t given all observations; (n x r) |
U |
matrix of Rosenblatt transforms; (n x r) |
cvm |
cramer-von-Mises statistic for goodness-of-fit |
W |
pseudo-observations that should be uniformly distributed under the null hypothesis |
LL |
log-likelihood |
nu |
stationary distribution |
AIC |
Akaike information criterion |
BIC |
Bayesian information criterion |
CAIC |
consistent Akaike information criterion |
AICcorrected |
Akaike information criterion corrected |
HQC |
Hannan-Quinn information criterion |
stats |
empirical means and standard deviation of each regimes using lambda |
pred_l |
estimated regime using lambda |
pred_e |
estimated regime using eta |
runs_l |
estimated number of runs using lambda |
runs_e |
estimated number of runs using eta |
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(-1.5, 1.7, 1, 1),2,2) ; y = SimHMMGen(theta, Q=Q, family=family, n=100)$SimData est = EstHMMGen(y, reg=2, family=family)
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(-1.5, 1.7, 1, 1),2,2) ; y = SimHMMGen(theta, Q=Q, family=family, n=100)$SimData est = EstHMMGen(y, reg=2, family=family)
This function computes the forecasted cumulative distribution function of a univariate HMM for multiple horizons, given observations up to time n
ForecastHMMCdf( x, ZI = 0, family, theta, Q, eta, size = 0, k = 1, graph = FALSE )
ForecastHMMCdf( x, ZI = 0, family, theta, Q, eta, size = 0, k = 1, graph = FALSE )
x |
points at which the cdf function is computed |
ZI |
1 if zero-inflated, 0 otherwise (default) |
family |
distribution name; run the function distributions() for help |
theta |
parameters; (r x p) |
Q |
probability transition matrix for the regimes; (r x r) |
eta |
vector of the estimated probability of each regime at time n; (1 x r) |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
k |
prediction times |
graph |
TRUE to produce plots (FALSE by default). |
cdf |
values of the cdf function |
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.96, 0.04) x=seq(from=-6, to=6, by=0.1) k=c(1,5,10,20) cdf = ForecastHMMCdf(x, 0, family, theta, Q, eta, size=0, k, graph=TRUE)
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.96, 0.04) x=seq(from=-6, to=6, by=0.1) k=c(1,5,10,20) cdf = ForecastHMMCdf(x, 0, family, theta, Q, eta, size=0, k, graph=TRUE)
This function computes the predicted probabilities of the regimes for a new observation of a univariate HMM, given observations up to time n
ForecastHMMeta(ynew, ZI = 0, family, theta, Q, eta)
ForecastHMMeta(ynew, ZI = 0, family, theta, Q, eta)
ynew |
new observations |
ZI |
1 if zero-inflated, 0 otherwise (default) |
family |
distribution name; run the function distributions() for help |
theta |
parameters; (r x p) |
Q |
probability transition matrix for the regimes; (r x r) |
eta |
vector of the estimated probability of each regime at time n; (1 x r) |
etanew |
predicted probabilities of the regimes |
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.96, 0.04) ForecastHMMeta(1.5, 0, family, theta, Q, eta)
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.96, 0.04) ForecastHMMeta(1.5, 0, family, theta, Q, eta)
This function computes the probability forecasted density function (with respect to Dirac(0)+Lesbesgue) of a univariate HMM for multiple horizons, given observations up to time n
ForecastHMMPdf( y, ZI = 0, family, theta, Q, eta, size = 0, k = 1, graph = FALSE )
ForecastHMMPdf( y, ZI = 0, family, theta, Q, eta, size = 0, k = 1, graph = FALSE )
y |
points at which the pdf function is computed |
ZI |
1 if zero-inflated, 0 otherwise (default) |
family |
distribution name; run the function distributions() for help |
theta |
parameters; (r x p) |
Q |
probability transition matrix for the regimes; (r x r) |
eta |
vector of the estimated probability of each regime at time n; (1 x r) |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
k |
prediction times (may be a vector of integers) |
graph |
TRUE to produce plots (FALSE is default) |
pdf |
values of the pdf function |
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.06, 0.94) x=seq(from=-6, to=6, by=0.1) k=c(1,5,10,20) pdf = ForecastHMMPdf(x, 1, family, theta, Q, eta, k=k, graph=TRUE)
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.06, 0.94) x=seq(from=-6, to=6, by=0.1) k=c(1,5,10,20) pdf = ForecastHMMPdf(x, 1, family, theta, Q, eta, k=k, graph=TRUE)
This function computes the VAR of a univariate HMM for multiple horizons, given observations up to time n
ForecastHMMVAR(U, ZI = 0, family, theta, Q, eta, k = 1)
ForecastHMMVAR(U, ZI = 0, family, theta, Q, eta, k = 1)
U |
values (n x 1) between 0 and 1 |
ZI |
1 if zero-inflated, 0 otherwise (default) |
family |
distribution name; run the function distributions() for help |
theta |
parameters; (r x p) |
Q |
probability transition matrix for the regimes; (r x r) |
eta |
vector of the estimated probability of each regime at time n; (1 x r) |
k |
prediction times (may be a vector of integers). |
var |
values at risk (1 x horizon) |
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.96, 0.04) U=c(0.01,0.05) k=c(1,2,3,4,5) ForecastHMMVAR(U, 0, family, theta, Q, eta=eta,k)
family = "gaussian" theta = matrix(c(-1.5, 1.7, 1, 1),2,2) Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) eta = c(0.96, 0.04) U=c(0.01,0.05) k=c(1,2,3,4,5) ForecastHMMVAR(U, 0, family, theta, Q, eta=eta,k)
This function performs a goodness-of-fit test for a univariate hidden Markov model
GofHMMGen( y, ZI = 0, reg, family, start = 0, max_iter = 10000, eps = 1e-04, size = 0, n_samples = 1000, n_cores = 1, useFest = TRUE )
GofHMMGen( y, ZI = 0, reg, family, start = 0, max_iter = 10000, eps = 1e-04, size = 0, n_samples = 1000, n_cores = 1, useFest = TRUE )
y |
observations |
ZI |
1 if zero-inflated, 0 otherwise (default) |
reg |
number of regimes |
family |
distribution name; run the function distributions() for help |
start |
starting parameter for the estimation |
max_iter |
maximum number of iterations of the EM algorithm; suggestion 10000 |
eps |
precision (stopping criteria); suggestion 0.0001. |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
n_samples |
number of bootstrap samples; suggestion 1000 |
n_cores |
number of cores to use in the parallel computing |
useFest |
TRUE (default) to use the first estimated parameters as starting value for the bootstrap, FALSE otherwise |
pvalue |
pvalue of the Cramer-von Mises statistic in percent |
theta |
Estimated parameters; (r x p) |
Q |
estimated transition matrix; ; (r x r) |
eta |
(conditional probabilities of being in regime k at time t given observations up to time t; (n x r) |
lambda |
conditional probabilities of being in regime k at time t given all observations; (n x r) |
U |
matrix of Rosenblatt transforms; (n x r) |
cvm |
Cramer-von-Mises statistic for goodness-of-fit |
W |
pseudo-observations that should be uniformly distributed under the null hypothesis |
LL |
log-likelihood |
nu |
stationary distribution |
AIC |
Akaike information criterion |
BIC |
bayesian information criterion |
CAIC |
consistent Akaike information criterion |
AICcorrected |
Akaike information criterion corrected |
HQC |
Hannan-Quinn information criterion |
stats |
Empirical means and standard deviation of each regimes using lambda |
pred_l |
Estimated regime using lambda |
pred_e |
Estimated regime using eta |
runs_l |
Estimated number of runs using lambda |
runs_e |
Estimated number of runs using eta |
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(0, 1.7, 0, 1),2,2) ; y = SimHMMGen(theta, size=0, Q, ZI=1, family, 100)$SimData out=GofHMMGen(y,1,2,family,n_samples=10)
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(0, 1.7, 0, 1),2,2) ; y = SimHMMGen(theta, size=0, Q, ZI=1, family, 100)$SimData out=GofHMMGen(y,1,2,family,n_samples=10)
This function shows the graphs resulting from the estimation of a HMM model
graphEstim(y, ZI = 0, reg, theta, family, pred_l, pred_e)
graphEstim(y, ZI = 0, reg, theta, family, pred_l, pred_e)
y |
observations |
ZI |
1 if zero-inflated, 0 otherwise (default) |
reg |
number of regimes |
theta |
estimated parameters; (r x p) |
family |
distribution name; run the function distributions() for help |
pred_l |
estimated regime using lambda |
pred_e |
estimated regime using eta |
No returned value; produces figures of interest for the HMM model
This function performs a gridsearch to find a good starting value for the EM algorithm. A good starting value for the EM algorithm is one for which all observations have strictly positive density (the higher the better)
GridSearchS0(family, y, params, size = 0, lbpdf = 0)
GridSearchS0(family, y, params, size = 0, lbpdf = 0)
family |
distribution name; run the function distributions() for help |
y |
observations |
params |
list of six vectors named (p1, p2, p3, p4, p5, p6). Each corresponding to a parameter of the distribution (additionnal parameters will be ignored). For example : params = list(p1=c(0.5, 5, 0.5), p2=c(1, 5, 1), p3=c(0.1, 0.9, 0.1), p4=c(1,1,1), p5=c(1,1,1), p6=c(1,1,1)) where p1 is the grid of value for the first parameter. |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
lbpdf |
minimal acceptable value of the density; (should be >= 0) |
goodStart |
accepted parameter set |
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(-1.5, 1.7, 1, 1),2,2) ; sim = SimHMMGen(theta, size=0, Q, ZI=0,"gaussian", 50)$SimData ; params = list(p1=c(-2, 2, 0.5), p2=c(1, 5, 1), p3=c(1, 1, 1), p4=c(1,1,1), p5=c(1,1,1), p6=c(1,1,1)) accepted_params = GridSearchS0(family, sim, params)
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(-1.5, 1.7, 1, 1),2,2) ; sim = SimHMMGen(theta, size=0, Q, ZI=0,"gaussian", 50)$SimData ; params = list(p1=c(-2, 2, 0.5), p2=c(1, 5, 1), p3=c(1, 1, 1), p4=c(1,1,1), p5=c(1,1,1), p6=c(1,1,1)) accepted_params = GridSearchS0(family, sim, params)
This function computes the probability density function (pdf) of a univariate distribution
PDF(family, y, param, size = 0)
PDF(family, y, param, size = 0)
family |
distribution name; run the function distributions() for help |
y |
observations |
param |
parameters of the distribution; (1 x p) |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
f |
|
This function computes the quantile function of a univariate distribution, excluding zero-inflated.
QUANTILE(p, param, family, size = 0)
QUANTILE(p, param, family, size = 0)
p |
values at which the quantile needs to be computed; between 0 and 1; (e.g 0.01, 0.05) |
param |
parameters of the distribution; (1 x p) |
family |
distribution name; run the function distributions() for help |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
q |
quantile/VAR |
family = "gaussian" theta = matrix(c(-1.5, 1.7),1,2) ; quantile = QUANTILE(0.01, theta, family) print('Quantile : ') print(quantile)
family = "gaussian" theta = matrix(c(-1.5, 1.7),1,2) ; quantile = QUANTILE(0.01, theta, family) print('Quantile : ') print(quantile)
This function simulates observation from a univariate hidden Markov model
SimHMMGen(theta, size = 0, Q, ZI = 0, family, n)
SimHMMGen(theta, size = 0, Q, ZI = 0, family, n)
theta |
parameters; (r x p) |
size |
additional parameter for some discrete distributions; run the command distributions() for help |
Q |
transition probability matrix for regimes; (r x r) |
ZI |
1 if zero-inflated, 0 otherwise (default) |
family |
distribution name; run the function distributions() for help |
n |
number of simulated observations |
SimData |
Simulated data |
MC |
Simulated Markov chain |
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(0, 1.7, 0, 10),2,2) ; y = SimHMMGen(theta, Q=Q, ZI=1,family=family, n=50)$SimData
family = "gaussian" Q = matrix(c(0.8, 0.3, 0.2, 0.7), 2, 2) ; theta = matrix(c(0, 1.7, 0, 10),2,2) ; y = SimHMMGen(theta, Q=Q, ZI=1,family=family, n=50)$SimData
This function generates a Markov chain X(1), ..., X(n) with transition matrix Q, starting from a state eta0 or the uniform distribution on 1,..., r.
SimMarkovChain(Q, n, eta0)
SimMarkovChain(Q, n, eta0)
Q |
transition probability matrix |
n |
number of simulated vectors |
eta0 |
initial value in 1,...,r. |
x |
Generated Markov chain |
This function computes the Cramer-von Mises statistic Sn for goodness-of-fit of the null hypothesis of a univariate uniform distrubtion over [0,1]
Snd1(U)
Snd1(U)
U |
vector of pseudos-observations (approximating uniform) |
sta |
Cramer-von Mises statistic |