Introduction

Breastfeeding is feeding a mother’s breast milk to her infant directly from the breast or by expressing (or pumping) the milk and bottle-feeding it to the infant. This method provides the ideal nutrition for infants and significantly contributes to their health, growth, and development. Additionally, breastfeeding benefits the mother’s health and fosters a strong emotional bond between mother and baby1. The World Health Organization recommends exclusive breastfeeding for the first six months of life, followed by continued breastfeeding and appropriate complementary foods for up to two years or beyond. Breastfeeding data is essential for several reasons:

  1. 1.

    Infant health: breastfeeding data helps understand infants’ health benefits. Breastfed babies have lower risks of various conditions, including asthma, obesity, type 1 diabetes, severe lower respiratory diseases, acute otitis media (ear infections), sudden infant death syndrome (SIDS), gastrointestinal infections (such as diarrhea and vomiting), necrotizing enterocolitis (NEC) in preterm infants, malnutrition, and nephrotic syndrome1,2,3,4,5.

  2. 2.

    Maternal health: breastfeeding can also provide health benefits for mothers, including a reduced risk of hypertension, type 2 diabetes, ovarian cancer, and breast cancer2.

  3. 3.

    Health care costs: Breastfeeding data can help understand and reduce health care costs. For example, if 90% of families breastfed exclusively for six months, the United States would save $2.2 billion per year on medical costs6,7.

  4. 4.

    Public health policy: breastfeeding data shapes public health policies. It helps develop interventions and programs that promote breastfeeding, especially in communities with low breastfeeding rates8.

  5. 5.

    Research: breastfeeding data is also essential for research purposes. It can be used to study the effects of breastfeeding on various health outcomes and to identify factors that influence breastfeeding initiation and duration9.

  6. 6.

    Education and support: breastfeeding data can help to identify areas where more education and support are needed to encourage breastfeeding10.

Several factors influence the cessation of exclusive breastfeeding, including difficulties with breastfeeding, concerns about infant weight gain, illnesses affecting mothers or infants, the need for medication, misperceptions about milk supply, younger maternal age, lower levels of maternal education, unplanned pregnancies, employment outside the home, and a lack of emotional support, among others1,5,11,12,13,14,15.

Exclusive breastfeeding rates widely vary beyond six months. For instance, they are only 2% in Bulgaria, 3.7% in Poland, and 1% in Greece, Finland, and the United Kingdom. In contrast, some countries meet international recommendations, including Kyrgyzstan at 56.1%, Georgia at 54.8%, Croatia at 52.4%, Slovakia at 49%, and Hungary at 44%16.

However, the literature has documented that there are critical periods for breastfeeding. Changes in breastfeeding behavior among infants at 0, 1, 3.5, and 5 months. These periods must be incorporated into the analysis of the data in some way. Many babies stopped breastfeeding for various reasons. This aligns with findings by Gianni et al.11, which indicated that 63% of Spanish mothers surveyed reported experiencing difficulties within the first month postpartum12. Similarly, a study on Taiwanese women showed that only 29.3% maintained exclusive breastfeeding before the third month13. Furthermore, only 38.4% of Saudi Arabian women breastfed exclusively for six months or more, with many ceasing prior to that mark14. Furthermore, because lactation studies often cannot be considered over such a long period, it is common for this type of study to end before all mothers in the study have weaned, making some of the data subject to right-censoring. In summary, breastfeeding data is crucial for monitoring the health of infants and mothers, informing public health policy, guiding research, and supporting breastfeeding education and promotion efforts; in this article, we will focus on point 5.

Thus, the survival analysis is an ad hoc tool to analyze data related to the time of the occurrence until an event of interest, typically censored. In this line, many distributions have been used in the literature. A key distribution used in this context is the Weibull (W) model, both because of the simplicity of its survival and risk functions and because its risk function can be monotonic (increasing, decreasing, or constant) depending only on the shape parameter. The corresponding cumulative distribution (cdf) and probability density function (pdf) for \(T\sim \text{ W }(\gamma ,\alpha )\) are defined by

$$\begin{aligned} g(t;\alpha ,\gamma ) = \frac{\alpha t^{\alpha -1}}{\gamma ^{\alpha }}\exp {\left\{ -\left( \frac{t}{\gamma }\right) ^\alpha \right\} }\quad \text{ and } \quad G(t) = 1-\exp {\left\{ -\left( \frac{t}{\gamma }\right) ^\alpha \right\} }, \end{aligned}$$

respectively and \(t>0\), \(\alpha ,\gamma >0\). Other models that appear recurrently in the literature (and especially in classical books, for example,17) associated with survival analysis are the log-normal and log-logistic models. However, the log-normal model has a non-monotone hazard function, while the logistic model can only assume decreasing monotone hazard functions. The problem is that all of these models do not allow for breaking points to be incorporated into the analysis, as discussed above for our weaning problem. An interesting alternative to these models, and considering this issue proposed more than 50 years ago, is the piecewise exponential (PE) model proposed by Feigl and Zelen18. Similar to the exponential model, this distribution has the characteristic of having a constant hazard function, not constant in \((0, \infty )\), but rather constant in each of L defined intervals. Specifically, its hazard function is given by

$$\begin{aligned} h(t; {\varvec{\lambda }}\mid {\varvec{a}})=\lambda _l, \quad \text {for} \ t \in [a_{l-1},a_l), \quad l=1,\ldots ,L, \end{aligned}$$
(1)

where \(0=a_0<a_1<\cdots<a_{L-1} < a_L=\infty\). The corresponding cdf and pdf are given by

$$\begin{aligned} R(t; {\varvec{\lambda }} \mid {\varvec{a}})&=1-\exp \left( -\sum _{l=1}^L \lambda _l \nabla _l(t)\right) , \quad \text{ and } \end{aligned}$$
(2)
$$\begin{aligned} r(t; {\varvec{\lambda }} \mid {\varvec{a}})&=\kappa _l \lambda _l \exp (-\lambda _l (t-a_{l-1})), \quad \text {for} \ t \in [a_{l-1},a_l), \quad l=1,\ldots ,L, \end{aligned}$$
(3)

respectively, with

$$\begin{aligned} \kappa _l= \left\{ \begin{array}{ll} 1 & \text{, } \text{ if } l=1, \\ \exp \left( -\displaystyle \sum _{i=1}^{l-1} \lambda _i(a_i-a_{i-1})\right) & \text{, } \text{ if } l=2,\ldots ,L, \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} \nabla _l(t)= \left\{ \begin{array}{lll} 0 & \text {,~if } t<a_{l-1}, \\ t-a_{l-1} & \text {,~if } a_{l-1}\le t<a_l, ~~~~l=1, \ldots , L,\\ a_l-a_{l-1} & \text {,~if } t \ge a_l. \end{array} \right. \end{aligned}$$

We use the notation \(T\sim PE({\varvec{\lambda }},{\varvec{a}})\) to denote the PE distribution with vector of parameters \({\varvec{\lambda }}=(\lambda _1,\ldots ,\lambda _L)^\top\) (\(\top\) indicates the transpose) and time partition \({\varvec{a}}= (a_0,a_1,\ldots ,a_L)^\top\). The case with covariates was introduced in Friedman19. One of the limitations of the PE model is that its hazard function is constant in each interval. An extension of the PE model solving this problem, using the power model20, was developed by Gómez et al.21 and extended subsequently for the interval-censored case in dos Santos and Schneider22. Along these lines, we seek to propose new models that also allow these breakpoints to be incorporated into the analysis and that are more flexible regarding the hazard function within each interval. This can be developed by creating a new family of distribution following the idea of Alzaatreh et. al.23. Different models can be found in the literature using this family distribution24,25. For our purpose, the Weibull distribution is going to be used, and as a baseline, the PE distribution, then a new distribution is going to be developed in the next sections.

The method proposed by Alzaatreh et al.23, adapted to our context of positive random variables, can be summarized as following. Let T be a random variable with pdf and cdf r(t) and R(t), respectively, and X be a continuous random variable with pdf g(x) defined on \((0,\infty )\). Also, let W(R(t)) be a function of the cdf R(t) of any random variable X so that W(R(t)) satisfies the following conditions:

  1. a)

    \(W(R(t)) \in (0,\infty )\), \(\forall t>0\);

  2. b)

    W(R(t)) is differentiable and monotonically non-decreasing, \(\forall t>0\);

  3. c)

    \(W(R(t)) \rightarrow 0\) as \(t \rightarrow 0\) and \(W(R(t)) \rightarrow \infty\) as \(t \rightarrow \infty\).

The cdf of a new family of distribution is defined as

$$\begin{aligned} F(t) = \int _{0}^{W(R(t))} g(x) dx = G(W(R(t))). \end{aligned}$$
(4)

The pdf related to Eq. (4) is

$$\begin{aligned} f(t) = \left[ \frac{d}{dt} W(R(t))\right] r(t) g[W(R(t))]. \end{aligned}$$

Motivated by the breastfeeding problem, we propose to consider the PE distribution for r(t) and R(t), henceforth denotes by \(r(t;{\varvec{\lambda }}\mid {\varvec{a}})\) and \(R(t;{\varvec{\lambda }}\mid {\varvec{a}})\), respectively, providing new classes of piecewise models.

The remainder of this paper is organized as follows. In “The new models”, the new model is discussed, and the properties of the new distribution are presented. Estimation using the maximum likelihood (ML) method, the inference used for this model is presented in “Inference”. A simulation study is presented in “Simulation study”, and real-data applications are reported in “Real data applications”. Finally, concluding remarks are given in “Conclusions”.

The new models

For the first time in the literature, we propose to investigate piecewise extensions for the T-X family of distributions considering \(T \sim \text{ PE }({\varvec{\lambda }},{\varvec{a}})\) and \(X \sim \text{ W }(1, \alpha )\) (Bourguignon et al26), i.e., the Weibull distribution with scale and shape parameters 1 and \(\alpha\), respectively. We will focus on this combination of distributions because they provide a piecewise version of the Weibull model, which is a model that has been subject of many studies (see, e.g., Alzaatreh and Ghosh,24; Thair et al.25). We will name this models as Weibull piecewise exponential (WPE) distributions. However, depending as the choice of \(W(\cdot )\), we propose two cases which are detailed below.

Case 1: \(W(R(t)) = \dfrac{R(t)}{1-R(t)}\)

In this case, the cdf of the new model is

$$\begin{aligned} F(t; \varvec{\lambda }, \alpha \, |\, \varvec{a}) = 1- \exp \left\{ -\left( \frac{R(t; \varvec{\lambda }\, |\, \varvec{a})}{1- R(t; \varvec{\lambda }\, |\, \varvec{a})} \right) ^{\alpha } \right\} , \quad t>0, \end{aligned}$$
(5)

where \(R(\cdot ; \varvec{\lambda }\, |\, \varvec{a})\) is the cdf for the PE model described in Eq. (2). We will say \(T\sim \text{ WPE1 }(\varvec{\lambda }, \alpha )\) to refer to a random variable with cdf as in (5). The corresponding pdf is given by

$$\begin{aligned} f(t; \varvec{\lambda }, \alpha \, |\, \varvec{a}) = \frac{\alpha r(t; \varvec{\lambda }\, |\, \varvec{a}) \bigl [R(t; \varvec{\lambda }\, |\, \varvec{a}) \bigr ]^{\alpha -1}}{\bigl [1 -R(t; \varvec{\lambda }\, |\, \varvec{a}) \bigr ]^{\alpha +1 }} \, \exp \left\{ - \left( \frac{R(t; \varvec{\lambda }\, |\, \varvec{a})}{1 - R(t; \varvec{\lambda }\, |\, \varvec{a})} \right) ^{\alpha } \right\} , \quad t>0. \end{aligned}$$

The quantile function, Q(p), \(0< p < 1\), is given by

$$\begin{aligned} Q(p) = R^{-1} \biggl (\frac{[-\log (1 - p)]^{{1}/{\alpha }}}{1+[-\log (1 - p)]^{{1}/{\alpha }}};\,\ \varvec{\lambda }\,|\,\varvec{a} \biggr ). \end{aligned}$$

The hazard function can be expressed as,

$$\begin{aligned} h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a}) = \frac{\alpha \, r(t; \varvec{\lambda }\, |\, \varvec{a}) \bigl [R(t; \varvec{\lambda }\, |\, \varvec{a})\bigr ]^{\alpha -1}}{ \bigl [1 -R(t; \varvec{\lambda }\, |\, \varvec{a})\bigr ]^{\alpha +1 }}, \quad t>0. \end{aligned}$$
(6)

The behavior of the hazard function for the WPE1 model is described in the next proposition.

Proposition 1

The hazard function for the WPE1 model is increasing, constant, or decreasing within each interval defined by the partition time for \(\alpha >1\), \(\alpha =1\), and \(\alpha <1\), respectively, except at most in an interval, where it will be non-monotonic.

Proof

The derivative of the logarithm in Eq. (6) is given by

$$\begin{aligned} \frac{\partial \log h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})}{\partial t}=\frac{r'(t; \varvec{\lambda }\, |\, \varvec{a})}{r(t; \varvec{\lambda }\, |\, \varvec{a})}+r(t; \varvec{\lambda }\, |\, \varvec{a})\left[ \frac{\alpha -1+2R(t; \varvec{\lambda }\, |\, \varvec{a})}{R(t; \varvec{\lambda }\, |\, \varvec{a})[1-R(t; \varvec{\lambda }\, |\, \varvec{a})]}\right] , \end{aligned}$$

where \(r'(t; \varvec{\lambda }\, |\, \varvec{a})=\partial r(t; \varvec{\lambda }\, |\, \varvec{a})/\partial t=-\kappa _l \lambda _l^2 \exp (-\lambda _l(t-a_{l-1}))I^{(t)}_{[a_{l-1},a_l)}\) and \(I_{A}^{(t)}\) denotes the indicator function. Therefore, \(r'(t; \varvec{\lambda }\, |\, \varvec{a})/r(t; \varvec{\lambda }\, |\, \varvec{a})=-\lambda _l I^{(t)}_{[a_{l-1},a_l)}\), for \(l=1,\dots ,L\). In addition, if \(t\in [a_{l-1},a_l)\) we also can write \(r(t; \varvec{\lambda }\, |\, \varvec{a})=\lambda _l C_{l}\exp (-\lambda _l t)\) and \(R(t; \varvec{\lambda }\, |\, \varvec{a})=1-C_{l}\exp (-\lambda _l t)\), with \(C_{l}=\kappa _l \exp (\lambda _l a_{l-1})\). With those notations and considering \(t \in [a_{l-1},a_l)\), we obtain

$$\begin{aligned} \frac{\partial \log h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})}{\partial t}=-\lambda _l+\frac{\lambda _l [\alpha -1+2(1-C_{l}e^{-\lambda _l t})]}{(1-C_{l}e^{-\lambda _l t})}. \end{aligned}$$

Therefore, the solution for \(\partial \log h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})/\partial t=0\) is given by \(t_l=-\frac{1}{\lambda _l}\log \left( \frac{\alpha }{C_l}\right)\). Note that \(t_l \in [a_{l-1},a_l)\), if \(\kappa _l\exp (-\lambda _l(a_l-a_{l-1})) < \alpha \le \kappa _l\). As \(\kappa _1<\kappa _2<\ldots <\kappa _L\), then: i) for a single \(l=1,\ldots ,L\), it can be fulfilled that if \(\kappa _l\exp (-\lambda _l(a_l-a_{l-1})) < \alpha \le \kappa _l\) or; ii) for no l is it true that if \(\kappa _l\exp (-\lambda _l(a_l-a_{l-1})) < \alpha \le \kappa _l\). The first case implies that hazard function is monotonic within all intervals, except in one, whereas the second case implies that the hazard function is monotonic within all the intervals. \(\square\)

Figure 1 shows the hazard function for different time partitions. Note that when \(\alpha <1\), the function is decreasing in both partitions, and if \(\alpha > 1\), the function has different behavior because in the first time partition, the function is increasing, and in the second partition, it is a parabola. It is possible to note that in the first time partition, the function is decreasing and in the second time partition, the function is increasing. \({\varvec{\lambda }}\) value plays the role of scale parameter. The figures were made with the statistical software R27.

Figure 1
figure 1

Hazard function for WPE1 model and different time partitions. Left panel: \(L=2\) and \(a_1=2\) with different values of \(\lambda _1, \lambda _2\) and \(\alpha\). Right panel: \(L=3, a_1=1\) and \(a_2=3\) with various values of \(\lambda _1, \lambda _2, \lambda _3\) and \(\alpha\).

Case 2: \(W(R(t))=-\log \left( 1-R(t)\right)\)

In this case, the cdf of the new model is

$$\begin{aligned} F(t; \varvec{\lambda }, \alpha \, |\, \varvec{a}) = 1 - \exp \biggl \{-\Bigl (- \log \bigl [1 - R(t; \varvec{\lambda }\, |\, \varvec{a}) \bigr ] \Bigr )^\alpha \biggr \}, \quad t > 0, \end{aligned}$$
(7)

and the pdf is

$$\begin{aligned} f(t; \varvec{\lambda }, \alpha \, |\, \varvec{a}) = \frac{\alpha r(t; \varvec{\lambda }\, |\, \varvec{a}) }{ 1- R(t; \varvec{\lambda }\, |\, \varvec{a})} \exp \biggl \{-\Bigl (- \log \bigl [1 - R(t; \varvec{ \lambda }\, |\, \varvec{a}) \bigr ] \Bigr )^\alpha \biggr \} \Bigl ( -\log \bigl [1 - R(t; \varvec{\lambda }\, |\, \varvec{a}) \bigr ] \Bigr )^{\alpha -1}. \end{aligned}$$

Henceforth, we will refer to this model as WPE2\((\varvec{\lambda , a}, \alpha )\). Then, the survival and hazard function are given by,

$$\begin{aligned} S(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})= \exp \biggl \{-\Bigl (- \log \bigl [1 - R(t; \varvec{\lambda }\, |\, \varvec{a}) \bigr ] \Bigr )^\alpha \biggr \}, \end{aligned}$$

and

$$\begin{aligned} h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a}) = \frac{\alpha r(t; \varvec{\lambda }\, |\, \varvec{a})}{1 - R(t; \varvec{\lambda }\, |\, \varvec{a})} \biggl ( -\log \bigl [1 - R(t; \varvec{\lambda }\, |\, \varvec{a}) \bigr ] \biggr )^{\alpha -1}, \end{aligned}$$

where \(R(\cdot )\) and \(r(\cdot )\) are the pdf and cdf of the PE model defined in Eqs. (2) and (3), respectively. Figure 2 illustrates the flexibility of the WPE2 model. The parameter \(\alpha\) has a similar form in \(L=2\) and \(L=3\). In the case of \(\alpha < 1\), the hazard function is decreasing, and for \(\alpha > 1\) the function is increasing. When \(\alpha\) is too small, the hazard function decreases faster. The different pattern for the hazard function of the WPE2 model is presented in the following proposition.

Proposition 2

The hazard function for the WPE2 model is increasing, constant, or decreasing within each interval defined by the partition time for \(\alpha >1\), \(\alpha =1\), and \(\alpha <1\), respectively.

Proof

As \(r(t; \varvec{\lambda }\, |\, \varvec{a})/[1-R(t; \varvec{\lambda }\, |\, \varvec{a})]\) is constant between each partition time, it follows that

$$\begin{aligned} \frac{\partial \log h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})}{\partial t}=-\frac{(\alpha -1)r(t; \varvec{\lambda }\, |\, \varvec{a})}{(1-R(t; \varvec{\lambda }\, |\, \varvec{a}))\log (1-R(t; \varvec{\lambda }\, |\, \varvec{a}))}. \end{aligned}$$

Note that \(\partial \log h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})/\partial t=0\) has no solution for \(t>0\). It follows directly the result. \(\square\)

Some particular cases of the WPE2 model are:

  • WPE2 reduce to PE model when \(\alpha =1\)

  • WPE2 reduce to Exponential model when \(\alpha =1\) and \(L=1\)

Figure 2
figure 2

Hazard function for WPE2 model and different time partition. Left panel: \(L=2\) and \(a_1=2\) with different values of \(\lambda _1, \lambda _2\) and \(\alpha\). Right panel: \(L=3, a_1=1\) and \(a_2=3\) with various values of \(\lambda _1, \lambda _2, \lambda _3\) and \(\alpha\).

The quantile function is

$$\begin{aligned} Q(p) = R^{-1} \Bigl ( 1 - \exp \Bigl \{ - \left[ -\log (1 - p)\right] ^{{1}/{\alpha }}\Bigr \} ;\,\varvec{\lambda }\,|\,\varvec{a}\Bigr ), \qquad 0<p<1 \end{aligned}$$

Proposition 3

If \(T \sim \text {WPE2}(\varvec{\lambda , a}, \alpha )\), then the n-th moments is,

$$\begin{aligned} \begin{aligned} E[T^n]&= \sum _{\ell =1}^L \lambda _{\ell }^{-1} \sum _{k=0}^\infty \left( {\begin{array}{c}n\\ k\end{array}}\right) A^{n-k} \biggl \{ \Gamma \biggl [ \frac{k}{\alpha } + 1, \biggl (\sum _{d=1}^L \lambda _d (a_d - a_{d-1})+\lambda _{\ell }(a_\ell - a_{\ell - 1})\biggr )^{\alpha } \biggr ] \\&- \Gamma \biggl [ \frac{k}{\alpha } + 1, \biggl (\sum _{d=1}^L \lambda _d (a_d - a_{d-1}))\biggr )^{\alpha } \biggr \}. \end{aligned} \end{aligned}$$

Proof

By definition and using the alternative expression for the pdf we obtain

$$\begin{aligned} E[T^n]&= \int _{0}^{\infty } t^n r(t; \varvec{\lambda , a}, \alpha ) dt \\&= \alpha \sum _{\ell =1}^L \lambda _\ell \int _{a_{l-1}}^{a_l} t^n \biggl ( \sum ^{L}_{\ell =1} \lambda _\ell \nabla _{l}(t) \biggr )^{\alpha -1} \exp \left\{ -\biggl ( \sum ^{L}_{d=1} \lambda _d \nabla _{d}(t) \biggr )^{\alpha } \right\} dt. \end{aligned}$$

Using the definition of \(\nabla _{\ell }\) in Eq. (1), putting \(u = w^\alpha\), with \(w = I_{\ell >1} \sum _{d=1}^L \lambda _d (a_d - a_{d-1}) + \lambda _{\ell } (t - a_{\ell - 1})\) and using the identity \((1+b)^c=\sum _{k=0}^{\infty }\left( {\begin{array}{c}c\\ k\end{array}}\right) b^k\), the result is obtained. \(\square\)

The non-homogeneous case

The Cox proportional hazards (PH) model approach28 was considered to incorporate predictor variables. Cox assumed that the hazard ratio between any two observations is constant over time, according to the relationship

$$\begin{aligned} h(t\,|\,{\varvec{x}}_i) = h_0(t)\exp {({\varvec{x}}_i^{\top }{\varvec{\beta }})}, \quad t>0, \end{aligned}$$

where \(h_0(\cdot )\) is called the baseline or reference risk, \({\varvec{x}}_i= (x_{i1}, \ldots , x_{ip})\) is a vector of covariates for the i-th individual and \({\varvec{\beta }}=(\beta _1,\ldots ,\beta _p)\) is the vector of coefficients related to the p observed covariates (both without intercept term). In this context, \(\exp (\beta _j)\) represents the increase or decrease in the hazard function if all covariates were fixed and the j-th covariate was increased in one unit. Based on this scheme, the survival function becomes

$$\begin{aligned} S(t\,|\,{\varvec{x}}_i) = S_0(t)^{\exp {({\varvec{x}}_i^{\top }{\varvec{\beta }})}}, \quad t>0, \end{aligned}$$

with \(S_0(\cdot )\) being the baseline survival function corresponding to \(h_0(\cdot )\). Taking into account our initial problem of considering the critical months for modeling the breastfeeding problem, we propose to explore the use of the WPE1 and WPE2 distributions in this context.

Inference

In this section, we study the ML estimation for the parameters of the WPE1 and WPE2 for the right censored problem.

ML estimation

For this problem and considering that the data are subject to right censoring, the observed data for the i-th individual can be represented by \(t_i=\min (t_i^*,c_i)\), \(\delta _i=I(t_i^*\le c_i)\) and the vector of covariates \(\textbf{x}_{i}\), where \(t_i^*\) and \(c_i\) denotes the failure and censoring times, respectively, and \(I(\cdot )\) denotes the indicator function. Under a right censoring scheme, the corresponding log-likelihood function for \({\varvec{\theta }}=({\varvec{\beta }}^\top ,{\varvec{\lambda }}^\top , \alpha )\) (the vector of parameters) under the assumption of non-informative censoring is expressed as

$$\begin{aligned} \ell ({\varvec{\theta }})= \sum _{i=1}^n \left[ \delta _i \left\{ \log h_{0}(t_i\mid {\varvec{\lambda }},\alpha )+{\varvec{x}}_i^{\top }{\varvec{\beta }}\right\} +\exp {({\varvec{x}}_i^{\top }{\varvec{\beta }})}\log S_0(t_i\mid {\varvec{\lambda }},\alpha )\right] . \end{aligned}$$
(8)

The ML estimator of \({\varvec{\theta }}\) is obtained maximizing \(\ell ({\varvec{\theta }})\) in relation to \({\varvec{\theta }}\). This is equivalent to solve the equations

$$\begin{aligned} \frac{\partial \ell ({\varvec{\theta }})}{\partial {\varvec{\beta }}}=0, \quad \frac{\partial \ell ({\varvec{\theta }})}{\partial {\varvec{\lambda }}}=0 \quad \text{ and } \quad \frac{\partial \ell ({\varvec{\theta }})}{\partial \alpha }=0. \end{aligned}$$
(9)

Note that

$$\begin{aligned} \frac{\partial \ell ({\varvec{\theta }})}{\partial {\varvec{\beta }}}=\sum _{i=1}^n {\varvec{x}}_i^{\top }\left[ \delta _i+\exp ({\varvec{x}}_i^{\top } {\varvec{\beta }})\log S_0(t_i\mid {\varvec{\lambda }}, \alpha )\right] . \end{aligned}$$

On the other hand, for the WPE1 model

$$\begin{aligned} \frac{\partial \ell ({\varvec{\theta }})}{\partial \lambda _\ell }&= \sum _{i=1}^n \bigg \{\delta _i \left[ \frac{1}{\lambda _\ell }- (t_i-a_{\ell -1})I(a_{\ell -1}\le t_i < a_\ell )+(\alpha -1)\frac{[1-R(t_i\mid {\varvec{\lambda }})]}{R(t_i\mid {\varvec{\lambda }})}\nabla _\ell (t_i)-(\alpha +1)\nabla _\ell (t_i)\right] \\&~~~~~~~~~~~~~-\alpha \exp {({\varvec{x}}_i^{\top }{\varvec{\beta }})}\left( \frac{R(t_i\mid {\varvec{\lambda }})}{1-R(t_i\mid {\varvec{\lambda }})}\right) ^{\alpha -1}\frac{\nabla _\ell (t_i)}{[1-R(t_i\mid {\varvec{\lambda }})]}\bigg \}, \quad \ell =1,\ldots ,L,\\ \frac{\partial \ell ({\varvec{\theta }})}{\partial \alpha }&=\sum _{i=1}^n \left\{ \delta _i\left[ \frac{1}{\alpha }+ \log \left( \frac{R(t_i\mid {\varvec{\lambda }})}{1- R(t_i\mid {\varvec{\lambda }})}\right) \right] -\exp {({\varvec{x}}_i^{\top }{\varvec{\beta }})}\left( \frac{R(t_i\mid {\varvec{\lambda }})}{1-R(t_i\mid {\varvec{\lambda }})}\right) ^{\alpha }\log \left( \frac{R(t_i\mid {\varvec{\lambda }})}{1-R(t_i\mid {\varvec{\lambda }})}\right) \right\} , \end{aligned}$$

whereas for the WPE2 model

$$\begin{aligned} \frac{\partial \ell ({\varvec{\theta }})}{\partial \lambda _\ell }&= \sum _{i=1}^n \bigg \{\delta _i \left[ \frac{1}{\lambda _\ell }- (t_i-a_{\ell -1})I(a_{\ell -1}\le t_i < a_\ell )+\nabla _\ell (t_i)+(\alpha -1)\frac{\nabla _\ell (t_i)}{\sum _{l=1}^L \nabla _l(t_i)}\right] \\&~~~~~~~~~~~~~-\alpha \exp {({\varvec{x}}_i^{\top }{\varvec{\beta }})}\left( \sum _{l=1}^L \lambda _l \nabla _l(t_i)\right) ^{\alpha -1}\nabla _\ell (t_i)\bigg \}, \quad \ell =1,\ldots ,L,\\ \frac{\partial \ell ({\varvec{\theta }})}{\partial \alpha }&=\sum _{i=1}^n \left\{ \delta _i\left[ \frac{1}{\alpha }+\log \sum _{l=1}^L \lambda _l \nabla _l(t_i)\right] -\exp {({\varvec{x}}_i^{\top }{\varvec{\beta }})}\left( \sum _{l=1}^L \lambda _l \nabla _l(t_i)\right) ^\alpha \log \left( \sum _{l=1}^L \lambda _l \nabla _l(t_i)\right) \right\} . \end{aligned}$$

Equation (9) should be solved numerically using some optimization algorithm, such as Newton-Raphson and quasi-Newton29. The R software27 was used for this purpose. Specifically, the optim function was used for solve this optimization problem. Under mild regularity conditions (Cox and Hinley30) the asymptotic distribution of the ML estimator \(\widehat{\varvec{\theta }}\) is

$$\begin{aligned} \sqrt{n}\left[ \mathscr {I}({\varvec{\theta }})\right] ^{-1}\left( \widehat{\varvec{\theta }}-{\varvec{\theta }}\right) {\mathop {\longrightarrow }\limits ^{\mathscr {D}}} N_{p+L+1}({\varvec{0}}_{p+L+1}, {\varvec{I}}_{p+L+1}), \quad \text{ as } n\rightarrow \infty , \end{aligned}$$
(10)

where \(N_{r}({\varvec{\mu }},{\varvec{\Sigma }})\) denotes the r-variate normal distribution with mean \({\varvec{\mu }}\) and variance \({\varvec{\Sigma }}\), \({\varvec{0}}_r\) denotes a vector of zeros with dimension r, \({\varvec{I}}_r\) denotes the identity matrix with dimension r and \(\mathscr {I}({\varvec{\theta }})\) denotes the Fisher information matrix, which can be consistently estimated by \(\widehat{\mathscr {I}}(\widehat{\varvec{\theta }})\), the hessian matrix related to the log-likelihood function in Eq. (8) evaluated at \(\widehat{\varvec{\theta }}\). In addition, approximated confidence intervals for the components of \({\varvec{\theta }}\) can be considered as

$$\begin{aligned} \left[ \widehat{\mathscr {I}}(\widehat{\varvec{\theta }})\right] ^{-1}=\left( \begin{array}{ccc} \widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\beta }}} & \widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\lambda }}} & \widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\alpha }} \\ \cdot & \widehat{\mathscr {I}}^{\widehat{\varvec{\lambda }}\widehat{\varvec{\lambda }}} & \widehat{\mathscr {I}}^{\widehat{\varvec{\lambda }}\widehat{ \alpha }} \\ \cdot & \cdot & \widehat{\mathscr {I}}^{\widehat{\alpha }\widehat{\alpha }} \\ \end{array} \right) . \end{aligned}$$

Then, the approximated \(100(1-\nu )\%\) confidence intervals for \(\beta _r\), with \(\nu \in (0,1)\) and \(r=1,\ldots ,p\) is given by

$$\begin{aligned} \left( \widehat{\beta }_r - z_{1-\nu /2}\sqrt{\widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\beta }}}_{rr}},\widehat{\beta }_r + z_{1-\nu /2}\sqrt{\widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\beta }}}_{rr}}\right) , \end{aligned}$$

where \(z_u\) denotes the u-th quantile of the standard normal distribution, \(\widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\beta }}}_{rr}\) denotes the r-th element of the diagonal of \(\widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\beta }}}\). In a similar way, the approximated \(100(1-\nu )\%\) confidence intervals for \(\lambda _\ell\), \(\ell =1,\ldots ,L\) and \(\alpha\) are

$$\begin{aligned} \left( \widehat{\lambda }_\ell - z_{1-\nu /2}\sqrt{\widehat{\mathscr {I}}^{\widehat{\varvec{\lambda }}\widehat{\varvec{\lambda }}}_{\ell \ell }},\widehat{\lambda }_\ell + z_{1-\nu /2}\sqrt{\widehat{\mathscr {I}}^{\widehat{\varvec{\lambda }}\widehat{\varvec{\lambda }}}_{\ell \ell }}\right) \quad \text{ and } \quad \left( \widehat{\alpha } - z_{1-\nu /2}\sqrt{\widehat{\mathscr {I}}^{\widehat{\alpha }\widehat{ \alpha }}},\widehat{\alpha } + z_{1-\nu /2}\sqrt{\widehat{\mathscr {I}}^{\widehat{\alpha }\widehat{ \alpha }}}\right) , \end{aligned}$$

respectively.

Simulation study

In this section, we present a simulation study in order to assess the performance of the ML estimators for the WPE2 model under a scenario similar to application 1, which was carried out with the statistical software R27. The algorithm used to generate censored samples for the \(\text {WPE2}\) regression model (with \(100\%C\) censoring of observations) is the following:

  • For \(i = 1,\ldots , n\), draw the covariates vector \(\textbf{x}_i\), where \(\textbf{x}_i=(x_{1i},x_{2i},x_{3i},x_{4i})\) is a random sample from \(X_1\sim Bern(63/150)\), \(X_2\sim Bern(70/150)\), \(X_3\sim Bern(66/150)\) and \(X_4\sim Bern(56/150)\).

  • Draw \(\displaystyle {U_i \sim U(0,1) }\).

  • Compute \(W_i^{(T)} = \displaystyle {1 - \exp \left\{ - \left( \frac{- \log (U_i)}{\textbf{x}_i^{\top } {\varvec{\beta }}} \right) ^{1/\alpha } \right\} }\) and \(W_i^{(C)} = \displaystyle {1 - \exp \left\{ - \left( \frac{- \log (C)}{ \textbf{x}_i^{\top } {\varvec{\beta }}} \right) ^{1/\alpha } \right\} }\).

  • Compute \(Y_i = R^{-1}\left( W_i^{(T)}; {\varvec{\lambda }} \mid \varvec{a} \right) .\) (the failure times)

  • Compute \(C_i=R^{-1}\left( W_i^{(C)}; {\varvec{\lambda }}\mid \varvec{a} \right) .\) (the censoring times)

  • Compute \(T_i=\min (Y_i,C_i)\) and \(\delta _i=I(Y_i \le C_i)\), the observed times and failure indicators.

Similar to the scheme of the first application, we assumed \(L=4\) partitions and the partition time as \({\varvec{a}}= (0,0.08,0.29,0.41)\). The parameters used were \(\varvec{\beta } = (\beta _1, \beta _2,\beta _3,\beta _4) =(0.58,0.76,0.68,0.75)\), \(\varvec{\lambda } =(\lambda _1, \lambda _2,\lambda _3,\lambda _4) =(0.011,0.056,0.189,0.188)\) and \(\alpha =0.58\). Different sample size were considered: 100, 150, 200 and 500, and three censoring cases: \(10\%\), \(25\%\) and \(50\%\). For each combination of \(\varvec{\beta }\), \(\varvec{\lambda }\), \(\alpha\), censoring and sample size, we simulated 1000 replicates of the WPE2 regression model, and the parameter estimation was performed based on the ML method. In each case, we compute the bias, the root of the mean square error (RMSE), the mean of the estimated standard errors (SE), and the 95% coverage probability (CP). The results are summarized in Table 1. It is possible to observe a reduction in the bias for all the parameters and under any censoring scenario when the sample size is increasing. In each case, we compute the mean and the standard error. In addition, the RMSE and SE terms are closer, specially when the sample size is increased, suggesting that the variance of the ML estimators is reasonably well estimated. Finally, the CP terms are closer to the nominal value used to their construction, suggesting that the asymptotic distribution for the ML estimators in Eq. (10) is appropriated even in finite samples.

Table 1 Empirical bias, SE, RMSE and CP for ML estimators in the WPE2 regression model for different scenarios.

Real data applications

Several international health organizations established that breast milk is one of the most important ways to ensure the survival and health of children. It is recommended, for infants, exclusive breastfeeding for up to 6 months, starting at the first hour of life, which is recommended by OMS. In this Section, we analyzed Breastfeeding data from Brazil31 and the United States17. The codes are available in a public GitHub repository (https://github.com/Ymgomez89/Proportional-Hazard-Model).

Breastfeeding from Brazil

Colosimo and Giolo31 provide a data set related to 150 mothers, including the time from birth to cessation of breastfeeding (in months), as well as four covariates associated with each. Censored observations occur when the breastfeeding record is cut off for different reasons. The 4 reported covariates are related to previous breastfeeding experience (\(x_1=1\) for yes (\(42\%\)) and \(x_1=0\) for no (\(58\%\))), ideal breastfeeding time (\(x_2=1\) for greater than 6 months (\(47\%\)) and \(x_2=0\) for otherwise (\(53\%\))), breastfeeding problems (\(x_3=1\) for yes (\(44\%\)) and \(x_3=0\) for no (\(56\%\))) and exclusive breastfeeding with breast milk (\(x_4=1\) for yes (\(37\%\)) and \(x_4=0\) for no (\(63\%\))). Figure 3 shows the Kaplan–Meier (K–M) estimator32 by the four covariates.

Figure 3
figure 3

Estimated K–M survival curve for covariates of brazilian breastfeeding data set.

Based on the discussion provided in the introduction Section, we fix the partition time in 0, 1, 3, and 5 months (i.e., \(L=4\)). Table 2 shows the log-likelihood, Akaike information criterion (AIC; Akaike33), and the Bayesian information criterion (BIC; Schwarz34) values obtained for the fitted piecewise-type models (PE, PPE, WPE1, WPE2) with \(L=4\). Based on those criteria, the WPE2 model is preferred.

Table 2 Log-likelihood, AIC and BIC for the PE, PPE, WPE1 and WPE2 models in brazilian breastfeeding data.

The estimates for the WPE2 model are presented in Table 3. Based on the results, we can obtain the following conclusions (keeping in mind that the other covariates are kept fixed).

  • The risk of early weaning for mothers without previous experience in breastfeeding is \(\exp (\widehat{\beta _1})\approx 1.6\) times higher in relation to who did have experience (95% confidence interval 1–2.8).

  • The risk of early weaning for mothers who think that the ideal breastfeeding time is less than or equal to 6 months is approximately \(\exp (\widehat{\beta _2})\approx 1.7\) times the risk of mothers who believe the opposite. (95% confidence interval: 1 - 2.8).

  • The risk of early weaning in mothers who had breastfeeding problems is \(\exp (\widehat{\beta _3})\approx 2\) times higher compared to those who did not experience problems. (95% confidence interval 1.2–3.3).

  • The risk of early weaning in infants who were not exclusively breastfed is \(\exp (\widehat{\beta _4})\approx 1.9\) times greater than the risk of premature weaning in infants who are exclusively breastfed. (95% confidence interval 1.1–3.1).

Table 3 ML estimates and standard errors (in parenthesis) for the WPE2 model in Brazilian breastfeeding data.

Breastfeeding from United States

We consider a second data set related to breastfeeding obtained from Klein17. These data were taken from a National Survey of Youth Labor (NLSY) in Ohio, which began in 1979 with youths aged 14–21, interviewed annually until 1988. Beginning in 1983, women were asked about any pregnancy that had occurred since the last interview (pregnancies prior to 1983 were also documented), recording breastfeeding information. The data set includes information on 927 firstborns whose mothers chose to breastfeed. The sample is restricted to children who were born after 1978 and had a gestation of 20–45 weeks in order to eliminate recall problems.

The main variable is the duration of breastfeeding (in weeks) and whether the baby was weaned (an indicator of failure). Explanatory variables include race, poverty status, maternal smoking and drinking habits, maternal age at birth, maternal education, and prenatal care. The only significant variables are race (categorical: white \((71\%)\), black \((13\%)\) and other \((16\%)\)), poverty status (\(x_2\): 756 non-poor (82%), 171 poor (18%)), and smoking habits (\(x_3\); 657 non-smoking mothers (71%), 270 mothers who smoked after birth (29%)). Figure 4 shows the K–M estimator by the three covariates.

Figure 4
figure 4

Estimated K–M survival curve for covariates of American breastfeeding data set.

Table 4 shows that the AIC and BIC criteria for the same piecewise type models fitted in the last application. Again, based on those criteria the WPE2 model provides the best fit among the considered models.

Table 4 Log-likelihood, AIC and BIC for the PE, PPE, WPE1 and WPE2 models for second dataset.

In addition, Table 5 presents the estimates for those models. Based on the results, we can obtain the following conclusions (keeping in mind that the other covariates are kept fixed).

  • The risk of early weaning in black race is \(\exp (\widehat{\beta _{11}})=1.2\) higher than white race (95% confidence interval 1.0 - 1.5), and in another race it is \(\exp (\widehat{\beta _{12}})=1.5\) times higher with respect to the white race (95% confidence interval 1.2 - 1.8).

  • The risk of early weaning of mothers in poverty status is \(\exp (\widehat{\beta _2})=0.9\) times in relation to mothers without poverty status (95% confidence interval 0.7 - 1.0).

  • The risk of early weaning who smoked after the baby was born is \(\exp (\widehat{\beta _3})=1.4\) times higher with respect to mothers who did not smoke after birth (95% confidence interval 1.2 - 1.6).

Table 5 ML estimates and standard errors (in parenthesis) for the WPE2 model in american breastfeeding data.

Conclusions

The introduction of the Weibull piecewise distribution represents a significant advancement in flexible statistical modeling. By allowing for different shapes of the hazard function across multiple time intervals, this model accommodates the complexities often observed in real-world data where risks may change over time, such as breastfeeding data. The dual approach to time partitioning–either estimating it directly from the data or predefining intervals–provides versatility for its application. This flexibility enables researchers and practitioners to tailor the model to their specific data characteristics, enhancing its relevance and usability.

Simulation studies have demonstrated the effectiveness of maximum likelihood estimation for deriving parameter values across various sample sizes. Such empirical validation is crucial, as it affirms the robustness of the model when applied to different contexts. This aspect also highlights the model’s practicality, offering a reliable method for analysts facing diverse data challenges. Finally, the proposed distribution is applied to a real data set to show the potential of this new distribution.

In conclusion, we define a flexible approach to modeling lifetime datasets across multiple time intervals by compounding the classes of Weibull (which is generally adequate for modeling monotone hazard rates) and PE (which allows breaking points to be incorporated into the analysis) distributions. The compounding procedure follows by taking the Weibull family of distributions as the baseline distribution in (2). In addition, we motivate the use of the new distribution in breastfeeding data. We hope that these two facts combined may attract more complex applications in survival analysis. Further exploration of the model could involve applying it to different datasets to assess its performance and robustness compared to traditional distributions. Additionally, investigating potential applications in fields such as reliability engineering, survival analysis, and risk management could provide valuable insights into its utility and versatility.