Introduction

In many disciplines, such as biomedicine, meteorology, and economics, zero-inflated non-negative continuous data with censored values are frequently encountered1. This refers to data sets containing a substantial number of zeros interspersed with positive continuous values, with censored data indicating unobserved data points due to specific reasons during observations or experiments. It is common to encounter values below a certain detection limit (DL) that remain unobserved or recorded, this type of data is referred to as Type I left-censored data. In this study, we focus on left-censored data. When data is observed using different instruments, samples may have various DLs. For example, Krishnamoothy and Wang2 estimated confidence and prediction limits for gamma distributions with censored data in their study of groundwater or air pollution data to identify potential sources of contamination.

Precipitation data is a common example that can be modeled using a zero-inflated gamma distribution with censored data. In this type of data, zero values represent days without rainfall, while the right-skewed positive values represent the amount of precipitation on rainy days. Because precipitation measurement devices typically have DLs, there are days where the precipitation amount is between zero and the DL, but the exact value is unknown, making this type of data as censored data. The gamma distribution is a common right-skewed model frequently used in the previous studies of precipitation data3,4. However, it can only handle the positive part of the data, often resulting in unsatisfactory outcomes5 for zero-inflated data. Considering the zero values in precipitation data leads to the zero-inflated gamma distribution, which has become a commonly used model for analyzing precipitation data. For example, V\(\ddot{a}\)nnman6conducted a study on the distribution of data means under the zero-inflated gamma distribution in large samples, obtaining CIs for the mean of the zero-inflated gamma distribution. Mualidharan and Kale7performed maximum likelihood estimation of three unknown parameters based on the zero-inflated gamma distribution, obtaining the asymptotic distribution of the parameters and subsequently calculating the 95% CI for the unknown parameters. Wang et al5. utilized the Fiducial, parameter bootstrap, and the Method of Variance Estimates Recovery (MOVER) based on the zero-inflated gamma distribution to estimate CIs for the mean. Kaewprasert, Niwitpong S-A, and Niwitpong S8 proposed a joint CI for the mean using methods such as Fiducial based on the zero-inflated gamma distribution.

When conducting comparative studies on precipitation data from different time periods or regions, it is often necessary to compare the mean differences or mean ratios between two or more zero-inflated gamma distributions. For instance, Satter and Zhao9proposed the adjusted Jackknife empirical likelihood methods to construct nonparametric CIs for the mean difference of two independent zero-inflated skewed populations. When analyzing daily precipitation data from different time periods or regions, Ren, Liu, and Pu10proposed a precise Fiducial method to construct CIs for the mean differences of multiple zero-inflated gamma distributions. Kaewprasert, Niwitpong S-A, and Niwitpong S11introduced eight Bayesian methods to construct CIs for the mean ratios of zero-inflated gamma distributions with application to the precipitation data from northern Thailand. Kaewprasert, Niwitpong S-A, and Niwitpong S12 constructed CIs and Bayesian credible intervals for the mean and mean differences of two independent zero-inflated delta gamma distributions.

However, when analyzing the differences or variations in zero-inflated semi-continuous data from different regions or time periods, the mean differences or mean ratios used in the above-mentioned studies may not effectively reflect the differences between two groups of data, as the influence of variance on the data has not been considered. Therefore, many researchers consider using the coefficient of variation (CV), a dimensionless indicator, to characterize the differences between data sets. CV, defined as the ratio of the standard deviation to the mean, is a relative measure of variability that quantifies the magnitude of the standard deviation relative to the corresponding mean. Albatineh, Kibria, and Zogheib13demonstrated that even when the measurement units of data at different time points are different, CV can be used to compare changes in population data. Ananthakrishnan and Soman14explored the relationship between the CV of precipitation series and the statistical distribution of daily precipitation. Starting from the interval estimation of CV, Puggard, Niwitpong S-A, and Niwitpong S15employed methods such as generalized CIs, MOVER, and Bayesian credible intervals to study differences between data sets. Buntao and Niwitpong S-A16 also compared two delta log-normal distributions based on CV difference.

Although there are many studies on studying the CV in various situations, there is currently few work on constructing CIs for the CV difference of zero-inflated gamma distributions with censored data. In this study, we mainly focus on studying this problem. We consider a total of eight CI construction methods, including two basic methods of Fiducial (denoted as F) and Box-Cox transformation (denoted as BC), and six MOVER methods by combining two basic methods for estimating gamma parameters and three methods for estimating the number of zero values. The Fiducial method proposed by Fisher17is widely used to construct CIs, it only requires sample information. The BC method proposed by Gao and Tian18is to construct the CI of the difference and ratio of the means of two gamma distributions. In addition, we introduce three different methods of estimating the parameters of the binomial distribution. These three methods are proposed by Li, Zou and Tian19(denoted as Li), Ren, Liu and Pu10(denoted as Ren) and Wilson20(denoted as Wilson), respectively. The MOVER method is usually used to construct CIs of parameter functions, originally proposed by Zou and Donner21. Tang22applied the MOVER method to construct CIs for effect parameter differences or ratios under stratified sampling. Li, Tang and Wong23studied the performance of the MOVER method in constructing CIs, and used different MOVER methods to construct CIs for parameter ratios in the Poisson distribution. Donner and Zou24 proved that the CI constructed using the MOVER method is quite close to the exact CI, and used it to construct the CI of the mean and standard deviation function of the normal distribution.

The rest of this paper is organized as follows. Section 2 presents the definitions of censored zero-inflated gamma distribution and the CV difference. Section 3 investigates the construction methods of CIs. Section 4 describes the simulation experiments to evaluate the performances of the proposed methods. Section 5 performs real data analysis of the monthly precipitation data from Zhengzhou and Lhasa using the proposed methods. Section 6 gives some conclusions of this study.

The coefficients of variation (CV) difference of censored zero-inflated gamma distributions

The censored zero-inflated gamma distribution

Suppose that \(X_i=(X_{i1},X_{i2},\dots ,X_{in_i}), i=1,2\), is the ith group of samples from censored zero-inflated gamma distribution as follows

$$\begin{aligned} X_i\sim g_i(x;\delta _i,a_i,b_i)=\delta _iI(x=0)+(1-\delta _i)I(x>0)f(x;DL_k,a_i,b_i), \end{aligned}$$

where \(\delta _i\) is the proportion of zero values, \(a_i\) and \(b_i\) are the shape parameter and scale parameter, respectively, \(a_i>0\) and \(b_i>0\). \(I(\cdot )\) is the indicator function, and \(f(x;DL_k,a_i,b_i)\) is the probability density function of censored gamma distribution given by

$$\begin{aligned} f(x;DL_k,a_i,b_i)=\frac{\frac{1}{b_i^{a_i}}x^{a_i-1}e^{-\frac{1}{b_i}x}}{\Gamma (a_i)}, \end{aligned}$$

where the k detection limits satisfy \(DL_1<DL_2<\dots <DL_k\).

The CV difference

Let \(N_{i(0)}=\sum _{j=1}^{n_i}I(X_{ij}=0)\), it can be easily seen that \(N_{i(0)}\sim bin(n_i,\delta _i)\), \(\delta _i\) is the binomial parameter and \(N_{i(1)}=n_i-N_{i(0)}\), \(n_{i(0)}\) and \(n_{i(1)}\) represent the numbers of zero and nonzero observed values. The mean and variance of \(X_i\) from censored zero-inflated gamma distribution are \(\textrm{E}(X_i)=(1-\delta _i)a_ib_i\) and \(\textrm{Var}(X_i)=(1-\delta _i)a_ib_i^2\), respectively. Therefore, the CV of \(X_i\) is denoted as

$$\begin{aligned} \eta _i=\textrm{CV}(X_i)=\frac{\sqrt{\textrm{Var}(X_i)}}{\textrm{E}(X_i)}=\sqrt{\frac{1}{(1-\delta _i)a_i}}. \end{aligned}$$
(1)

From (1), we can see that the CV of a censored zero-inflated gamma distributed random variable depends on the binomial parameter \(\delta _i\) and the shape parameter \(a_i\). In this paper, we aim to construct CIs for the CV difference of two independent censored zero-inflated gamma distributions as follows

$$\begin{aligned} \gamma =\eta _1-\eta _2=\sqrt{\frac{1}{(1-\delta _1)a_1}}-\sqrt{\frac{1}{(1-\delta _2)a_2}}. \end{aligned}$$

The construction methods of confidence intervals (CIs)

Due to the fact that the binomial distribution parameter \(\delta _i\) is one of the important determinants for the CV difference of censored zero-inflated gamma distributions, so we first introduce three previous estimates of \(\delta _i\) before constructing CIs.

The first one is proposed by Li, Zou and Tian19, which introduces the pivot quantity of \(\delta _i\) that can be expressed as

$$\begin{aligned} G_{\delta _i}=\frac{n_{i(0)}+\frac{Z_{W_i}^2}{2}}{n_i+Z_{W_i}^2}-\frac{Z_{W_i}}{n_i+Z_{W_i}^2}{[n_{i(0)}(1-\frac{n_{i(0)}}{n_i})+\frac{Z_{W_i}^2}{4}]}^{\frac{1}{2}}. \end{aligned}$$
(2)

where \(Z_{W_i}=\frac{N_{i(0)}-n_i\delta _i}{\sqrt{n_i\delta _i(1-\delta _i)}}\sim N(0,1)\). The distribution of \(G_{\delta _i}\) does not rely on any unknown parameters. When \(N_{i(0)}=n_{i(0)}\), the observed value of \(G_{\delta _i}\) is equal to \(\delta _i\).

The second one is proposed by Ren, Liu and Pu10, which transforms \(\delta _i\) to another form \(\tilde{\delta _i}\) and develops a new pivot quantity of \(\delta _i\). The following relationship holds,

$$\begin{aligned} 2\sqrt{n_i+2+\frac{1}{n_i}}(arcsin\sqrt{\tilde{\delta }_i}-arcsin\sqrt{\delta _i})\xrightarrow {d}N(0,1), \end{aligned}$$

where \(\tilde{\delta _i}=\frac{n_{i(0)}+0.5}{n_i+1},i=1,2\), \(\xrightarrow {d}\) stands for convergence distribution. The pivot quantity for \(\delta _i\) is

$$\begin{aligned} G_{\delta _i}=sin^2(arcsin\sqrt{{\tilde{\delta }_i}^{obs}}-\frac{Z_i}{2\sqrt{n_i+2+\frac{1}{n_i}}}), \end{aligned}$$

where \(Z_i\sim N(0,1)\) and \({\tilde{\delta }_i}^{obs}\) is the observed value of \({\tilde{\delta }_i}\). Different from the traditional approximate inference, this is an accurate parameter inference method.

The third one is proposed by Wilson20, it gives an accurate estimate for the CI of \(\delta _i\) directly, which can be expressed as

$$\begin{aligned} [\delta _{il},\delta _{iu}]=\left[\left({\hat{\delta }_i+\frac{Z_{\frac{\alpha }{2}}^2}{2n_i}-Z_{\frac{\alpha }{2}}\sqrt{\frac{\hat{\delta }_i(1-\hat{\delta }_i)}{n_i}+\frac{Z_{\frac{\alpha }{2}}^2}{4n_i^2}}}\right) /\left({1+\frac{Z_{\frac{\alpha }{2}}^2}{n_i}}\right),\left({\hat{\delta }_i+\frac{Z_{\frac{\alpha }{2}}^2}{2n_i}+Z_{\frac{\alpha }{2}}\sqrt{\frac{\hat{\delta }_i(1-\hat{\delta }_i)}{n_i}+\frac{Z_{\frac{\alpha }{2}}^2}{4n_i^2}}}\right)/\left({1+\frac{Z_{\frac{\alpha }{2}}^2}{n_i}}\right)\right], \end{aligned}$$

where \(\delta _{il}\) and \(\delta _{iu}\) are the lower and upper confidence limits for \(\delta _i\), respectively. \(\hat{\delta }_i=\frac{n_{i(0)}}{n_i}\), and \(Z_{\frac{\alpha }{2}}\) represents the upper \(\frac{\alpha }{2}\) quartiles of the standard normal distribution.

In the following parts, we study several CI construction methods for CV difference of two censored zero-inflated gamma distributions.

Fiducial method

Fiducial inference is commonly used in the construction of CIs, because it only require the sample information. Due to the fact that the cubic root of samples from gamma distribution asymptotically follow a normal distribution, Krishnamoorthy and Wang2 proposed approximate Fiducial quantities for the gamma distribution based on the normal distribution. Suppose \(X_{ij}\sim gamma(a_i,b_i),~X_{ij}>0,~j=1,2,\dots ,n_{i(1)};~X_{ij}=0,~j=n_{i(1)}+1,n_{i(1)}+2,\dots ,n_{i}\), \(i=1,2\). Let \(Y_{ij}=X_{ij}^{\frac{1}{3}},~X_{ij}>0\), and \(DL_{yl}=DL_l^{\frac{1}{3}}\), then \(Y_{ij}\) approximately follows the normal distribution \(N(\mu _i,\sigma _i^2)\), where the relationship between \(\mu _i,~\sigma _i^2\) and \(a_i,~b_i\) is

$$\begin{aligned} \mu _i={(b_ia_i)}^{\frac{1}{3}}(1-\frac{1}{9a_i}), \end{aligned}$$
(3)

and

$$\begin{aligned} \sigma _i^2=\frac{b_i^{\frac{1}{3}}}{9a_i^{\frac{1}{3}}}. \end{aligned}$$
(4)

Suppose that \(x_i=(x_{i1},x_{i2},\dots ,x_{in_{i(1)}})\) is the ith nonzero observations, the transformed data is \(y_i=(y_{i1},y_{i2},\dots ,y_{in_{i(1)}})\), and the k detection limits satisfy \(DL_{y1}<DL_{y2}<\dots <DL_{yk}\). Let \(m_{il}\) be the number of undetected samples between \(DL_{y(l-1)}\) and \(DL_{yl}\), then the total number of undetected samples is \(m_i=\sum _{l=1}^{k}m_{il}\). Let \(y_{i1},y_{i2},\dots ,y_{i(n_{i(1)}-m_i)}\) be the detected ones in observations. The sample mean and variance can be defined as

$$\begin{aligned} \overline{y}_i=\frac{1}{{n_{i(1)}-m_i}}\sum _{j=1}^{{n_{i(1)}-m_i}}y_{ij}, \ s_i^2=\frac{1}{{n_{i(1)}-m_i}-1}\sum _{j=1}^{{n_{i(1)}-m_i}}{(y_{ij}-\overline{y}_i)}^2. \end{aligned}$$

Krishnamoorthy et al25,26. proposed the log-likelihood function of normal distribution \(N(\mu _i,{\sigma _i}^2)\) with censored data after omitting constants, that is,

$$\begin{aligned} l(\mu _i,\sigma _i)\propto \sum _{l=1}^{k} m_{il}ln\Phi (z_l^*)-[n_{i(1)}-m_i]ln\sigma _i-\frac{[n_{i(1)}-m_i](s_i^2+{\overline{y}_i-\mu _i}^2)}{2\sigma _i^2}, \end{aligned}$$
(5)

where \(\Phi (\cdot )\) is the distribution function of N(0, 1), and \(z_l^*=\frac{DL_{yl}-\mu _i}{\sigma _i},l=1,2,\dots ,k\). By maximizing the above log-likelihood function, the maximum likelihood estimate (MLE) \((\hat{\mu }_i,\hat{\sigma }_i)\) of \((\mu _i,\sigma _i)\) can be obtained. In addition, suppose that \((\hat{\mu }_i^*,\hat{\sigma }_i^*)\) is the MLE of N(0, 1) with censored data, and its detection limit is \(DL_l^*=\frac{DL_{yl}-\mu _i}{\sigma _i},l=1,2,\dots ,k\). Replace \(DL_l^*\) by \(\hat{DL}_l^*=\frac{DL_{yl}-\hat{\mu }_i}{\hat{\sigma }_i}\), an approximate distribution of N(0, 1) with censored data can be obtained, where \(\frac{\hat{\mu }_i-\mu _i}{\sigma _i}\sim \hat{\mu }_i^*\) and \(\frac{\hat{\sigma }_i}{\sigma _i}\sim \hat{\sigma }_i^*\). The relationship of \({\mu }_i\), \(\hat{\mu }_i\) and \(\hat{\mu }_i^*\), as well as \({\sigma }_i\), \(\hat{\sigma }_i\) and \(\hat{\sigma }_i^*\), can be expressed as

$$\begin{aligned} \hat{\mu }_i \overset{\text {d}}{=}\mu _i+\hat{\mu }_i^* \sigma _i,~~ \hat{\sigma }_i\overset{\text {d}}{=}\hat{\sigma }_i^* \sigma _i, ~~\text {approximately}. \end{aligned}$$
(6)

Let \((\hat{\mu }_{i0},\hat{\sigma }_{i0})\) be a group of observations of \((\hat{\mu }_{i},\hat{\sigma }_{i})\), replace \((\hat{\mu }_{i},\hat{\sigma }_{i})\) with \((\hat{\mu }_{i0},\hat{\sigma }_{i0})\), and take it into (6) to solve \((\mu _i,\sigma _i)\), which can be expressed as

$$\begin{aligned} G_{\mu _i}=\hat{\mu }_{i0}-\frac{\hat{\mu }_i^*}{\hat{\sigma }_i^*}\hat{\sigma }_i,~~ G_{\sigma _i^2}=\frac{\hat{\sigma }_i^2}{{(\hat{\sigma }_i^*)}^2}. \end{aligned}$$
(7)

Take (7) into (3) and (4), we can get the Fiducial quantities of \(a_i\) and \(b_i\) in censored gamma distribution as follows

$$\begin{aligned} G_{a_i}=\frac{1}{9}\{{(1+\frac{G_{\mu _i}^2}{2G_{\sigma _i^2}})+[{(1+\frac{G_{\mu _i}^2}{2G_{\sigma _i^2}})}^2-1]^\frac{1}{2}\}}, \end{aligned}$$
(8)

and

$$\begin{aligned} G_{b_i}=27G_{a_i}^{\frac{1}{2}}{(G_{\sigma _i}^2)}^{\frac{2}{3}}. \end{aligned}$$

Once the Fiducial quantity of \(a_i\) has been obtained, we herein choose the first pivot quantity of \(\delta _i\) to give the Fiducial quantity of the CV difference of two independent censored zero-inflated gamma distributions, which can be expressed as

$$\begin{aligned} G_{\gamma }=G_{\eta _1}-G_{\eta _2}=\sqrt{\frac{1}{(1-G_{\delta _1})G_{a_1}}}-\sqrt{\frac{1}{(1-G_{\delta _2})G_{a_2}}}. \end{aligned}$$
(9)

The CI of \(100(1-\alpha )\%\) for \(\gamma\) is denoted as \([G_{\gamma ;(\frac{\alpha }{2})},G_{\gamma ;(1-\frac{\alpha }{2})}]\), where \(G_{\gamma ;(\frac{\alpha }{2})}\) and \(G_{\gamma ;(1-\frac{\alpha }{2})}\) are the \(100(\frac{\alpha }{2})\), \(100(1-\frac{\alpha }{2})\) quartiles for \(G_\gamma\)’s distribution, respectively.

The algorithm of Fiducial method to estimate the CI of the CV difference of two independent censored zero-inflated gamma distributions is as follows

Algorithm 1:

Step 1: Determine \(n_{i(0)}\) and \(n_{i(1)}\), where \(n_{i(0)}\) is a realization of \(N_{i(0)}\sim bin(n_i,\delta _i),~i=1,2,\) calculate \(G_{\delta _i}\) using (2);

Step 2: For a given sample of size \(n_{i(1)}\) with detection limits \(DL_1<DL_2<\dots <DL_k\), calculate the MLEs \(\hat{\mu }_{i0}\) and \(\hat{\sigma }_{i0}\) based on the cube root transformed sample and cube root transformed detection limits;

Step 3: Let \(\hat{DL}_l^*=\frac{DL_L^{\frac{1}{3}}-\hat{\mu }_i}{\hat{\sigma }_i}, ~l=1,2,\dots ,k\), be the detection limits to generate a sample of size \(n_{i(1)}\) which follows censored N(0, 1);

Step 4: Calculate the MLEs \(\hat{\mu }_i^*\) and \(\hat{\sigma }_i^*\) of censored N(0, 1), and calculate \(G_{(\mu _i)}\) and \(G_{(\sigma _i)}\) using (7);

Step 5: Calculate \(G_{a_i}\) using (8);

Step 6: Calculate \(G_\gamma\) using (9);

Step 7: Repeat Steps 2-6 for 1000 times;

Step 8: Calculate the \(100(1-\alpha )\%\) CI for \(\gamma\), and denote it as \([l_{f},u_{f}]\).

Box-Cox (BC) transformation method

The Box-Cox (BC) transformation is a classic statistical methodology to stabilize the variance and/or normalize the distribution of variables. For samples distributed from the gamma distribution, Gao and Tian18 proposed to convert the data into a normal distribution by Box-Cox transformation as follows

$$\begin{aligned} Y_{ij}={X_{ij}^{(\lambda _i)}}= {\left\{ \begin{array}{ll} \frac{{X_{ij}}^{\lambda _i}-1}{\lambda _i},~~ \lambda _i>0,\\ {\log (X_{ij})},~~ \lambda _i=0. \end{array}\right. } \end{aligned}$$

where \(X_{ij}\sim gamma(a_i,b_i)\) and \(Y_{ij}\sim N(\mu _i,\sigma _i^2)\), for \(i=1,2\) and \(j=1,\dots ,n_{i(1)}\). The estimation of the Box-Cox transformation parameter \(\lambda _i\) can be obtained by the MLE method, denoted as \(\hat{\lambda _i}\). Under two different values of \(\hat{\lambda _i}\), the parameters in gamma distribution exhibit different relationships with the corresponding normal distribution.

If \(\hat{\lambda }_i\ne 0\), the relationships can be expressed as

$$\begin{aligned} \frac{\Gamma (G_{a_i}+\hat{\lambda }_i){G_{b_i}}^{\hat{\lambda }_i}}{\Gamma (G_{a_i})}=\hat{\lambda }_iG_{\mu _i}+1, \end{aligned}$$
(10)

and

$$\begin{aligned} \frac{\Gamma (G_{a_i}+2\hat{\lambda }_i){G_{b_i}}^{2\hat{\lambda }_i}}{\Gamma (G_{a_i})}=(G_{\mu _i}^2+G_{\sigma _i^2})\hat{\lambda }_i^2+2\hat{\lambda }_iG_{\mu _i}+1. \end{aligned}$$
(11)

If \(\hat{\lambda }_i=0\), the relationships can be expressed as

$$\begin{aligned} G_{\mu _i}=\psi (G_{a_i})+\log (G_{b_i}), \end{aligned}$$
(12)

and

$$\begin{aligned} G_{\sigma _i^2}=\psi ^{(1)}(G_{a_i}). \end{aligned}$$
(13)

where \(\Gamma (\cdot )\) is the gamma function, \(\psi (\cdot )\) is the digamma function, and \(\psi ^{(1)}(\cdot )\) is the trigamma function.

Similarly, by maximizing (5), we can obtain the MLEs \(\hat{\mu }_i\) and \(\hat{\sigma }_i\) of the transformed parameters \(\mu _i\) and \(\sigma _i\). Then calculate the MLEs \(\hat{\mu }_i^*\) and \(\hat{\sigma }_i^*\) from censored N(0, 1), the estimator of detection limit is \(\hat{DL}_l^*=\frac{DL_{yl}-\hat{\mu }_i}{\hat{\sigma }_i},~l=1,2,\dots ,k\). Let \((\hat{\mu }_{i0},\hat{\sigma }_{i0})\) be a group of observed values of \((\hat{\mu }_i, \hat{\sigma }_i)\), take \((\hat{\mu }_{i0},\hat{\sigma }_{i0})\) and \((\hat{\mu }_i^*,\hat{\sigma }_i^*)\) into (7) to solve the Fiducial quantity \((G_{\mu _i},G_{\sigma _i^2})\). Based on the value of \(\hat{\lambda }_i\), the Fiducial quantity \(G_{(a_i)}\) of censored gamma distribution can be obtained.

For the binomial distribution part, we choose the quantity \(G_{(\delta _i)}\) in (2). Similarly, use (9) to obtain the quantity \(G_\gamma\) of CV difference and its \(100(1-\alpha )\%\) CI of two independent censored zero-inflated gamma distributions.

The algorithm of Box-Cox transformation method to estimate the CI of the CV difference of two independent censored zero-inflated gamma distributions is as follows

Algorithm 2:

Step 1: Determine \(n_{i(0)}\) and \(n_{i(1)}\), where \(n_{i(0)}\) is a realization of \(N_{i(0)}\sim bin(n_i,\delta _i),~i=1,2,\) calculate \(G_{\delta _i}\) using (2);

Step 2: For a given sample of size \(n_{i(1)}\) with detection limits \(DL_1<DL_2<\dots <DL_k\), calculate the MLEs \(\hat{\mu }_{i0}\) and \(\hat{\sigma }_{i0}\) based on the Box-Cox transformed sample and Box-Cox transformed detection limits;

Step 3: Let \(\hat{DL}_l^*=\frac{DL_L^{\frac{1}{3}}-\hat{\mu }_i}{\hat{\sigma }_i}, ~l=1,2,\dots ,k\), be the detection limits to generate a sample of size \(n_{i(1)}\) which follows censored N(0, 1);

Step 4: Calculate the MLEs \(\hat{\mu }_i^*\) and \(\hat{\sigma }_i^*\) of censored N(0, 1), and calculate \(G_{(\mu _i)}\) and \(G_{(\sigma _i)}\) using (7);

Step 5: Calculate \(G_{a_i}\) using (10) and (11) when \(\hat{\lambda }_i\ne 0\), or using (12) and (13) when \(\hat{\lambda }_i=0\);

Step 6: Calculate \(G_\gamma\) using (9);

Step 7: Repeat Steps 2-6 for 1000 times;

Step 8: Calculate the \(100(1-\alpha )\%\) CI for \(\gamma\), and denote it as \([l_{BC},u_{BC}]\).

MOVER method

The method of MOVER has been proposed as a technique for constructing CIs for the functions of parameters, such as the difference or ratio between two parameters. In MOVER, the lower and upper confidence limits for \(\theta _1+\theta _2\) are given as follows

$$\begin{aligned} L=\hat{\theta }_1+\hat{\theta }_2-\sqrt{{(\hat{\theta }_1-l_1)}^2+{(\hat{\theta }_2-l_2)}^2}, \end{aligned}$$
(14)

and

$$\begin{aligned} U=\hat{\theta }_1+\hat{\theta }_2-\sqrt{{(u_1-\hat{\theta }_1)}^2+{(u_2-\hat{\theta }_2)}^2}. \end{aligned}$$
(15)

Similarly, the lower and upper confidence limits for \(\theta _1-\theta _2\) are given as follows

$$\begin{aligned} L^{'}=\hat{\theta }_1-\hat{\theta }_2-\sqrt{{(\hat{\theta }_1-l_1)}^2+{(u_2-\hat{\theta }_2)}^2}, \end{aligned}$$
(16)

and

$$\begin{aligned} U^{'}=\hat{\theta }_1-\hat{\theta }_2+\sqrt{{(u_1-\hat{\theta }_1)}^2+{(\hat{\theta }_2-l_2)}^2} \end{aligned}$$
(17)

\(\hat{\theta }_1\) and \(\hat{\theta }_2\) are independent point estimators, \((l_1, u_1)\) and \((l_2, u_2)\) represent the \(100(1-\alpha )\%\) CIs for \(\theta _1\) and \(\theta _2\), respectively.

For the CV of censored zero-inflated gamma distribution, we can perform a log-transformation, \(\log (\eta _i)=\log (\sqrt{\frac{1}{(1-\delta _i)a_i}})=-\frac{1}{2}[\log (1-\delta _i)+\log (a_i)]\). Suppose \(\theta _{i1}\triangleq \log (1-\delta _i)\) and \(\theta _{i2}\triangleq \log (a_i)\), we can obtain the CI for \(\log (\eta _i)\) according to (14) and (15), and \(\eta _i=\exp [-\frac{1}{2}(\theta _{i1}+\theta _{i2})]\). For the CV difference of two independent censored zero-inflated gamma distribution, we can obtain the MLE of two CVs , \(\hat{\eta }_i=\sqrt{\frac{1}{(1-\hat{\delta }_i)\hat{a}_i}}\), \(i=1,2\), then the CI for CV difference can be calculated using (16) and (17).

Consequently, there are two methods (F and BC) to estimate the shape parameter \(a_i\), and three methods (Li, Ren and Wilson) to estimate the binomial distribution parameter \(\delta _i\). Therefore, we can obtain six MOVER combination methods for constructing CIs of the CV difference.

Simulation experiments

In order to evaluate the performances of the proposed methods, we conduct Monte Carlo simulations to compare the CI construction methods according to the following measurements.

Coverage probability (CP): the ratio of true parameters falling within the CIs.

Average length (AL): the average length of all estimated CIs.

Left error rate (\(ER_L\)): the percentage of a true parameter of interest falls to the left of lower confidence limits (\(L_{cl}\)).

Right error rate (\(ER_R\)): the percentage of a true parameter of interest falls to the right of upper confidence limits (\(U_{cl}\)).

The corresponding formulas are listed as follows

$$\begin{aligned} CP= & \frac{C(L_{cl}\le \gamma \le U_{cl})}{t},\\ AL= & \frac{\sum _{i=1}^{t}(U_{{cl}_i}-L_{{cl}_i})}{t},\\ ER_L= & \frac{C(\gamma \le L_{cl})}{t},\\ ER_R= & \frac{C(U_{cl} \le \gamma )}{t}, \end{aligned}$$

where \(C(L_{cl}\le \gamma \le U_{cl})\) denotes the number of experiments where the CV difference \(\gamma\) falls within the CIs, and t represents the total number of experiments, which is set to be 1,000. In the evaluation of CI performance, the preferred CIs are those that exhibit coverage probabilities equal to or exceeding the nominal level, while also maintaining relatively short interval lengths. Additionally, the left and right tail error rates of the CIs are considered, with a general preference for CIs that exhibit balanced tail error rates.

In the simulation study, we set the sample size pair \((n_1,n_2)\) to be (10, 10), (10, 50), (50, 50), (50, 132), and (132, 132). For the zero-inflated components, we assume that the proportions of zero values in both groups are roughly equal. We consider two scenarios for the proportion of zero values: a smaller proportion \((\delta _1, \delta _2)\) of (0.1, 0.1) and a larger proportion \((\delta _1, \delta _2)\) of (0.3, 0.3). Similarly, we vary the shape parameters \((a_1,a_2)\), considering both the same and different values, including combinations such as (2, 2), (2, 5), and (5, 5). The confidence level \(\alpha\) is set to 0.05, and detection limits are set at 0.1 and 0.2. The number of simulation trials is set to 1000 (\(t = 1000\)). As in Wang et al4., the scale parameters are set to 1 for all scenarios.

This paper introduces two fundamental construction methods, Fiducial (F) and Box-Cox transformation (BC), as well as six methods derived from combining MOVER. Tables 1 and 2 display the coverage probabilities, average lengths, left error rate and right error rate of these eight methods under different parameter settings when the detection limit \(DL=0.1\). Tables 3 and 4 present the results when the detection limit \(DL=0.2\). The values in bold in the tables correspond to the method with the coverage probability closest to the confidence level of 0.95 for each parameter set.

The simulation results are all implemented in R, and it can be concluded from the results:

  1. 1)

    Whether the proportion of zero values is small or large, the CIs based on Fiducial, Box-Cox, and MOVER closely approach the \(100(1-\alpha )\%\) confidence level in terms of coverage probability, indicating the suitability of these methods for zero-inflated models.

  2. 2)

    Under both scenarios of detection limit \(DL=0.1\) and \(DL=0.2\), the aforementioned methods have yielded satisfactory coverage probabilities for CIs, indicating their capability to effectively handle censored data.

  3. 3)

    When the detection limit is set to 0.1, indicating a lower degree of information loss, the coverage probabilities obtained from the Box-Cox transformation and its associated MOVER method approximate the \(100(1-\alpha )\%\) confidence level more closely. Conversely, when the detection limit is 0.2, signifying a higher degree of information loss, the coverage probabilities obtained from the Fiducial method and its associated MOVER approach exhibit superior performance.

  4. 4)

    In cases of smaller sample sizes, the Fiducial method and its associated MOVER approach achieve relatively shorter average lengths. As the sample size increases, the Box-Cox method and its associated MOVER approach yield relatively shorter average lengths.

  5. 5)

    All the aforementioned methods are capable of produce CIs that maintain a balance between left error rate and right error rate.

  6. 6)

    In terms of computational efficiency, the MOVER method has relatively higher computational costs, while the Fiducial method exhibits the highest computational efficiency.

To compare the computational time of different methods, we select the sample size \(n_1:n_2=50:132\). The comparison is conducted under two scenarios: one with a lower proportion of zero values (\(\delta _1=\delta _2=0.1\)) and the other with a higher proportion of zero values (\(\delta _1=\delta _2=0.3\)). The average time consumption (in seconds) of different methods is compared for different shape parameters, as shown in Figure 1. The Fiducial method and its associated MOVER method consume less average time than Box-Cox and its associated MOVER method. Taking \(\delta _1=\delta _2=0.1, a_1=2,a_2=5\) as an example, the average time wasted by F, F-Li, F-Ren, and F-Wilson are 0.2501, 0.2724, 0.2537, and 0.2532, respectively. The average time wasted by BC, BC-Li, BC-Ren, and BC-Wilson are 0.7328, 0.7310, 0.7307, and 0.7301, respectively.

Fig. 1
figure 1

Time comparison of all the methods. a: under a lower proportion of zero values (\(\delta _1=\delta _2=0.1\)); b: under a higher proportion of zero values (\(\delta _1=\delta _2=0.3\)).

Table 1 CP, AL, and tail error rates of all methods (\(\delta _1=\delta _2=0.1\), \(DL=0.1\)).
Table 2 CP, AL, and tail error rates of all methods (\(\delta _1=\delta _2=0.3\), \(DL=0.1\)).
Table 3 CP, AL, and tail error rates of all methods (\(\delta _1=\delta _2=0.1\), \(DL=0.2\)).
Table 4 CP, AL, and tail error rates of all methods (\(\delta _1=\delta _2=0.3\), \(DL=0.2\)).

Empirical analysis

In order to assess the practical applicability of the proposed methods for constructing CIs, an investigation is conducted using the monthly rainfall data obtained from the National Meteorological Science Data Centers in Zhengzhou and Lhasa, including the period spanning 2012 to 2022. The density curves of the monthly rainfall data for these two regions are illustrated in Figure 2(a). The datasets for Zhengzhou and Lhasa exhibit a considerable number of zero values, while the nonzero part displays values below the detection limit of 1. To test whether the precipitation data follow independent zero-inflated gamma distribution approximately, we examine the independence and distribution characteristics of the two datasets. For the independence of the two groups of precipitation data, we calculate both the joint empirical distribution function and the individual empirical distribution functions, then randomly generate 100 data points within the actual range of precipitation and compute the absolute difference between the joint empirical distribution function and the product of the respective empirical distribution functions. The result shows that the average absolute difference is only 0.0186, so we can conclude that the two real datasets are approximately independent. For the specific distribution of the two datasets, we present histograms of the non-zero parts of each dataset and overlay the density functions of the potential gamma distribution, which are shown in Figure 2(b-c). It is intuitive that the data exhibit characteristics of gamma distribution. Furthermore, we include Q-Q plots for both of the non-zero parts of the precipitation data, which are shown in Figure 2(d-e). The points in the Q-Q plots are closely aligned along the two sides of the reference line, suggesting that they can be considered to follow a gamma distribution. Additionally, the censored data conforms well to a gamma distribution. Thus, the two datasets can be regarded as two independent zero-inflated gamma distributions with censored data. On the other hand, to assess the goodness of fit for various common distributions to the nonzero and detectable data, we calculate the Akaike Information Criterion (AIC) values, the results are presented in Table 5. Upon comparison, it is observed that the gamma distribution yielded the lowest AIC value among all the distributions considered. This finding suggests that the gamma distribution provides a favorable fit to the nonzero and detectable data.

Fig. 2
figure 2

Description of precipitaion data in two regions. a: Density curves for precipitaion data in Zhengzhou and Lhasa; b: Histogram with gamma density of precipitation in Zhengzhou; c: Histogram with gamma density of precipitation in Lhasa; d: Q-Q plot of precipitation in Zhengzhou; e: Q-Q plot of precipitation in Lhasa.

In addition, we utilize the “goft” package in the R programming language for data fitting and parameter estimation. When testing whether the data follows a gamma distribution using the “goft” package, the gamma test function, with the null hypothesis assuming that the data follows a gamma distribution, is commonly employed. If the resulting p-value is greater than the pre-specified significance level, it indicates the inability to reject the null hypothesis, suggesting that the data follows a gamma distribution. Conversely, if the resulting p-value is less than or equal to the significance level, it implies rejecting the null hypothesis, indicating that the data does not follow a gamma distribution. When utilizing the “goft” package to estimate the specific parameter values of the gamma distribution, the gamma fitting function, is commonly employed.

The results show that the p-values associated with the nonzero detectable rainfall datasets in both Zhengzhou and Lhasa are 0.4429 and 0.1704, respectively, which imply that the data of two cities follow gamma distributions. Subsequent estimation utilizing the gamma fitting function yields parameter estimations of (0.62, 90.50) for Zhengzhou and (0.51, 102.3717) for Lhasa. When juxtaposed with the AIC values provided in Table 5, these outcomes affirm that the zero-inflated gamma distribution with censored data, stands as the most suitable framework for the data analysis.

Table 5 AICs of different distributions to fit precipitation data in Zhengzhou and Lhasa.

For the data from these two cities, the maximum likelihood estimates for each parameter are \(\delta _1=0.023\), \(a_1\)=0.62, \(\delta _2=0.242\), \(a_2\)=0.51. The CV difference between the two data sets is 0.3340365. Empirical research is conducted using two basic methods, Fiducial and Box-Cox transformation methods, as well as various combinations of the MOVER method. The CIs and average lengths obtained are presented in Table 6.

Table 6 The confidence interval and interval length obtained by all methods.

We can observe that all the CIs indicate that the precipitation in Zhengzhou is greater than that in Lhasa, and they all encompass the true CV difference between the monthly rainfall data of Zhengzhou and Lhasa. It is noticeable that the intervals obtained by BC and its MOVER methods are shorter than those obtained by F and its MOVER methods. From the simulation experiments, we can attribute this to the larger sample size, making the intervals obtained by BC and its MOVER methods more suitable in this scenario.

Conclusion

In this study, two basic methods including cube root transformation and Box-Cox transformation are considered, along with six methods constructed by combining MOVER. In previous research scenarios, these methods are commonly applied to gamma distribution or zero-inflated gamma distribution. This paper extends the method of constructing CIs using cube root transformation to study the censored data. Furthermore, we propose interval estimation of CV difference based on Box-Cox transformation for censored zero-inflated gamma distribution. Through Monte Carlo simulation experiments, we conclude that the performances of all these CIs are compared, resulting in satisfactory outcomes.

The simulation study suggests that different scenarios call for different approaches, and there is no one-size-fits-all method suitable for varying sample sizes, proportions of zero values, and different detection limits. However, the results of constructing CIs using the BC method in this paper are more satisfactory compared to other CIs on the whole. Specifically, BC CIs demonstrate coverage probabilities very close to the confidence level, particularly in scenarios with moderate to large sample sizes. From the perspective of interval length, the basic methods BC and F CIs are more satisfactory than the MOVER CIs. Additionally, basic methods are more time-efficient compared to the MOVER methods.

In practical applications, if the sample size is small, there is a high proportion of zero values, and there are time constraints, the F method may be preferred. However, if the sample size is moderate to large and there is a significant proportion of zero values, then the BC method proposed in this paper is the best choice.