Abstract
The World Health Organization recommends exclusive breastfeeding for the first 6 months of life, but several factors influence its cessation. Changes in breastfeeding behavior have been reported in the literature among infants at 0, 1, 3 and 5 months. The purpose of this paper is to introduce a new proportional hazards model based on the transformed-transformer method, employing the Weibull and piecewise exponential distributions as baseline models, with the aim of setting the partitions at the time when most weaning occurs. Some properties of the model, including monotonicity and quantile functions, are discussed in detail. Additionally, a regression model for censored data is presented. The parameters of the proposed distribution are estimated using the maximum likelihood approach. Some numerical illustrations of the estimators are presented with a discussion of the obtained results. Finally, two real applications related to breastfeeding in Brazil and the USA are provided to illustrate the usefulness of our proposal
Similar content being viewed by others
Introduction
Breastfeeding is feeding a mother’s breast milk to her infant directly from the breast or by expressing (or pumping) the milk and bottle-feeding it to the infant. This method provides the ideal nutrition for infants and significantly contributes to their health, growth, and development. Additionally, breastfeeding benefits the mother’s health and fosters a strong emotional bond between mother and baby1. The World Health Organization recommends exclusive breastfeeding for the first six months of life, followed by continued breastfeeding and appropriate complementary foods for up to two years or beyond. Breastfeeding data is essential for several reasons:
-
1.
Infant health: breastfeeding data helps understand infants’ health benefits. Breastfed babies have lower risks of various conditions, including asthma, obesity, type 1 diabetes, severe lower respiratory diseases, acute otitis media (ear infections), sudden infant death syndrome (SIDS), gastrointestinal infections (such as diarrhea and vomiting), necrotizing enterocolitis (NEC) in preterm infants, malnutrition, and nephrotic syndrome1,2,3,4,5.
-
2.
Maternal health: breastfeeding can also provide health benefits for mothers, including a reduced risk of hypertension, type 2 diabetes, ovarian cancer, and breast cancer2.
-
3.
Health care costs: Breastfeeding data can help understand and reduce health care costs. For example, if 90% of families breastfed exclusively for six months, the United States would save $2.2 billion per year on medical costs6,7.
-
4.
Public health policy: breastfeeding data shapes public health policies. It helps develop interventions and programs that promote breastfeeding, especially in communities with low breastfeeding rates8.
-
5.
Research: breastfeeding data is also essential for research purposes. It can be used to study the effects of breastfeeding on various health outcomes and to identify factors that influence breastfeeding initiation and duration9.
-
6.
Education and support: breastfeeding data can help to identify areas where more education and support are needed to encourage breastfeeding10.
Several factors influence the cessation of exclusive breastfeeding, including difficulties with breastfeeding, concerns about infant weight gain, illnesses affecting mothers or infants, the need for medication, misperceptions about milk supply, younger maternal age, lower levels of maternal education, unplanned pregnancies, employment outside the home, and a lack of emotional support, among others1,5,11,12,13,14,15.
Exclusive breastfeeding rates widely vary beyond six months. For instance, they are only 2% in Bulgaria, 3.7% in Poland, and 1% in Greece, Finland, and the United Kingdom. In contrast, some countries meet international recommendations, including Kyrgyzstan at 56.1%, Georgia at 54.8%, Croatia at 52.4%, Slovakia at 49%, and Hungary at 44%16.
However, the literature has documented that there are critical periods for breastfeeding. Changes in breastfeeding behavior among infants at 0, 1, 3.5, and 5 months. These periods must be incorporated into the analysis of the data in some way. Many babies stopped breastfeeding for various reasons. This aligns with findings by Gianni et al.11, which indicated that 63% of Spanish mothers surveyed reported experiencing difficulties within the first month postpartum12. Similarly, a study on Taiwanese women showed that only 29.3% maintained exclusive breastfeeding before the third month13. Furthermore, only 38.4% of Saudi Arabian women breastfed exclusively for six months or more, with many ceasing prior to that mark14. Furthermore, because lactation studies often cannot be considered over such a long period, it is common for this type of study to end before all mothers in the study have weaned, making some of the data subject to right-censoring. In summary, breastfeeding data is crucial for monitoring the health of infants and mothers, informing public health policy, guiding research, and supporting breastfeeding education and promotion efforts; in this article, we will focus on point 5.
Thus, the survival analysis is an ad hoc tool to analyze data related to the time of the occurrence until an event of interest, typically censored. In this line, many distributions have been used in the literature. A key distribution used in this context is the Weibull (W) model, both because of the simplicity of its survival and risk functions and because its risk function can be monotonic (increasing, decreasing, or constant) depending only on the shape parameter. The corresponding cumulative distribution (cdf) and probability density function (pdf) for \(T\sim \text{ W }(\gamma ,\alpha )\) are defined by
respectively and \(t>0\), \(\alpha ,\gamma >0\). Other models that appear recurrently in the literature (and especially in classical books, for example,17) associated with survival analysis are the log-normal and log-logistic models. However, the log-normal model has a non-monotone hazard function, while the logistic model can only assume decreasing monotone hazard functions. The problem is that all of these models do not allow for breaking points to be incorporated into the analysis, as discussed above for our weaning problem. An interesting alternative to these models, and considering this issue proposed more than 50 years ago, is the piecewise exponential (PE) model proposed by Feigl and Zelen18. Similar to the exponential model, this distribution has the characteristic of having a constant hazard function, not constant in \((0, \infty )\), but rather constant in each of L defined intervals. Specifically, its hazard function is given by
where \(0=a_0<a_1<\cdots<a_{L-1} < a_L=\infty\). The corresponding cdf and pdf are given by
respectively, with
and
We use the notation \(T\sim PE({\varvec{\lambda }},{\varvec{a}})\) to denote the PE distribution with vector of parameters \({\varvec{\lambda }}=(\lambda _1,\ldots ,\lambda _L)^\top\) (\(\top\) indicates the transpose) and time partition \({\varvec{a}}= (a_0,a_1,\ldots ,a_L)^\top\). The case with covariates was introduced in Friedman19. One of the limitations of the PE model is that its hazard function is constant in each interval. An extension of the PE model solving this problem, using the power model20, was developed by Gómez et al.21 and extended subsequently for the interval-censored case in dos Santos and Schneider22. Along these lines, we seek to propose new models that also allow these breakpoints to be incorporated into the analysis and that are more flexible regarding the hazard function within each interval. This can be developed by creating a new family of distribution following the idea of Alzaatreh et. al.23. Different models can be found in the literature using this family distribution24,25. For our purpose, the Weibull distribution is going to be used, and as a baseline, the PE distribution, then a new distribution is going to be developed in the next sections.
The method proposed by Alzaatreh et al.23, adapted to our context of positive random variables, can be summarized as following. Let T be a random variable with pdf and cdf r(t) and R(t), respectively, and X be a continuous random variable with pdf g(x) defined on \((0,\infty )\). Also, let W(R(t)) be a function of the cdf R(t) of any random variable X so that W(R(t)) satisfies the following conditions:
-
a)
\(W(R(t)) \in (0,\infty )\), \(\forall t>0\);
-
b)
W(R(t)) is differentiable and monotonically non-decreasing, \(\forall t>0\);
-
c)
\(W(R(t)) \rightarrow 0\) as \(t \rightarrow 0\) and \(W(R(t)) \rightarrow \infty\) as \(t \rightarrow \infty\).
The cdf of a new family of distribution is defined as
The pdf related to Eq. (4) is
Motivated by the breastfeeding problem, we propose to consider the PE distribution for r(t) and R(t), henceforth denotes by \(r(t;{\varvec{\lambda }}\mid {\varvec{a}})\) and \(R(t;{\varvec{\lambda }}\mid {\varvec{a}})\), respectively, providing new classes of piecewise models.
The remainder of this paper is organized as follows. In “The new models”, the new model is discussed, and the properties of the new distribution are presented. Estimation using the maximum likelihood (ML) method, the inference used for this model is presented in “Inference”. A simulation study is presented in “Simulation study”, and real-data applications are reported in “Real data applications”. Finally, concluding remarks are given in “Conclusions”.
The new models
For the first time in the literature, we propose to investigate piecewise extensions for the T-X family of distributions considering \(T \sim \text{ PE }({\varvec{\lambda }},{\varvec{a}})\) and \(X \sim \text{ W }(1, \alpha )\) (Bourguignon et al26), i.e., the Weibull distribution with scale and shape parameters 1 and \(\alpha\), respectively. We will focus on this combination of distributions because they provide a piecewise version of the Weibull model, which is a model that has been subject of many studies (see, e.g., Alzaatreh and Ghosh,24; Thair et al.25). We will name this models as Weibull piecewise exponential (WPE) distributions. However, depending as the choice of \(W(\cdot )\), we propose two cases which are detailed below.
Case 1: \(W(R(t)) = \dfrac{R(t)}{1-R(t)}\)
In this case, the cdf of the new model is
where \(R(\cdot ; \varvec{\lambda }\, |\, \varvec{a})\) is the cdf for the PE model described in Eq. (2). We will say \(T\sim \text{ WPE1 }(\varvec{\lambda }, \alpha )\) to refer to a random variable with cdf as in (5). The corresponding pdf is given by
The quantile function, Q(p), \(0< p < 1\), is given by
The hazard function can be expressed as,
The behavior of the hazard function for the WPE1 model is described in the next proposition.
Proposition 1
The hazard function for the WPE1 model is increasing, constant, or decreasing within each interval defined by the partition time for \(\alpha >1\), \(\alpha =1\), and \(\alpha <1\), respectively, except at most in an interval, where it will be non-monotonic.
Proof
The derivative of the logarithm in Eq. (6) is given by
where \(r'(t; \varvec{\lambda }\, |\, \varvec{a})=\partial r(t; \varvec{\lambda }\, |\, \varvec{a})/\partial t=-\kappa _l \lambda _l^2 \exp (-\lambda _l(t-a_{l-1}))I^{(t)}_{[a_{l-1},a_l)}\) and \(I_{A}^{(t)}\) denotes the indicator function. Therefore, \(r'(t; \varvec{\lambda }\, |\, \varvec{a})/r(t; \varvec{\lambda }\, |\, \varvec{a})=-\lambda _l I^{(t)}_{[a_{l-1},a_l)}\), for \(l=1,\dots ,L\). In addition, if \(t\in [a_{l-1},a_l)\) we also can write \(r(t; \varvec{\lambda }\, |\, \varvec{a})=\lambda _l C_{l}\exp (-\lambda _l t)\) and \(R(t; \varvec{\lambda }\, |\, \varvec{a})=1-C_{l}\exp (-\lambda _l t)\), with \(C_{l}=\kappa _l \exp (\lambda _l a_{l-1})\). With those notations and considering \(t \in [a_{l-1},a_l)\), we obtain
Therefore, the solution for \(\partial \log h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})/\partial t=0\) is given by \(t_l=-\frac{1}{\lambda _l}\log \left( \frac{\alpha }{C_l}\right)\). Note that \(t_l \in [a_{l-1},a_l)\), if \(\kappa _l\exp (-\lambda _l(a_l-a_{l-1})) < \alpha \le \kappa _l\). As \(\kappa _1<\kappa _2<\ldots <\kappa _L\), then: i) for a single \(l=1,\ldots ,L\), it can be fulfilled that if \(\kappa _l\exp (-\lambda _l(a_l-a_{l-1})) < \alpha \le \kappa _l\) or; ii) for no l is it true that if \(\kappa _l\exp (-\lambda _l(a_l-a_{l-1})) < \alpha \le \kappa _l\). The first case implies that hazard function is monotonic within all intervals, except in one, whereas the second case implies that the hazard function is monotonic within all the intervals. \(\square\)
Figure 1 shows the hazard function for different time partitions. Note that when \(\alpha <1\), the function is decreasing in both partitions, and if \(\alpha > 1\), the function has different behavior because in the first time partition, the function is increasing, and in the second partition, it is a parabola. It is possible to note that in the first time partition, the function is decreasing and in the second time partition, the function is increasing. \({\varvec{\lambda }}\) value plays the role of scale parameter. The figures were made with the statistical software R27.
Case 2: \(W(R(t))=-\log \left( 1-R(t)\right)\)
In this case, the cdf of the new model is
and the pdf is
Henceforth, we will refer to this model as WPE2\((\varvec{\lambda , a}, \alpha )\). Then, the survival and hazard function are given by,
and
where \(R(\cdot )\) and \(r(\cdot )\) are the pdf and cdf of the PE model defined in Eqs. (2) and (3), respectively. Figure 2 illustrates the flexibility of the WPE2 model. The parameter \(\alpha\) has a similar form in \(L=2\) and \(L=3\). In the case of \(\alpha < 1\), the hazard function is decreasing, and for \(\alpha > 1\) the function is increasing. When \(\alpha\) is too small, the hazard function decreases faster. The different pattern for the hazard function of the WPE2 model is presented in the following proposition.
Proposition 2
The hazard function for the WPE2 model is increasing, constant, or decreasing within each interval defined by the partition time for \(\alpha >1\), \(\alpha =1\), and \(\alpha <1\), respectively.
Proof
As \(r(t; \varvec{\lambda }\, |\, \varvec{a})/[1-R(t; \varvec{\lambda }\, |\, \varvec{a})]\) is constant between each partition time, it follows that
Note that \(\partial \log h(t; \varvec{\lambda }, \alpha \, |\, \varvec{a})/\partial t=0\) has no solution for \(t>0\). It follows directly the result. \(\square\)
Some particular cases of the WPE2 model are:
-
WPE2 reduce to PE model when \(\alpha =1\)
-
WPE2 reduce to Exponential model when \(\alpha =1\) and \(L=1\)
The quantile function is
Proposition 3
If \(T \sim \text {WPE2}(\varvec{\lambda , a}, \alpha )\), then the n-th moments is,
Proof
By definition and using the alternative expression for the pdf we obtain
Using the definition of \(\nabla _{\ell }\) in Eq. (1), putting \(u = w^\alpha\), with \(w = I_{\ell >1} \sum _{d=1}^L \lambda _d (a_d - a_{d-1}) + \lambda _{\ell } (t - a_{\ell - 1})\) and using the identity \((1+b)^c=\sum _{k=0}^{\infty }\left( {\begin{array}{c}c\\ k\end{array}}\right) b^k\), the result is obtained. \(\square\)
The non-homogeneous case
The Cox proportional hazards (PH) model approach28 was considered to incorporate predictor variables. Cox assumed that the hazard ratio between any two observations is constant over time, according to the relationship
where \(h_0(\cdot )\) is called the baseline or reference risk, \({\varvec{x}}_i= (x_{i1}, \ldots , x_{ip})\) is a vector of covariates for the i-th individual and \({\varvec{\beta }}=(\beta _1,\ldots ,\beta _p)\) is the vector of coefficients related to the p observed covariates (both without intercept term). In this context, \(\exp (\beta _j)\) represents the increase or decrease in the hazard function if all covariates were fixed and the j-th covariate was increased in one unit. Based on this scheme, the survival function becomes
with \(S_0(\cdot )\) being the baseline survival function corresponding to \(h_0(\cdot )\). Taking into account our initial problem of considering the critical months for modeling the breastfeeding problem, we propose to explore the use of the WPE1 and WPE2 distributions in this context.
Inference
In this section, we study the ML estimation for the parameters of the WPE1 and WPE2 for the right censored problem.
ML estimation
For this problem and considering that the data are subject to right censoring, the observed data for the i-th individual can be represented by \(t_i=\min (t_i^*,c_i)\), \(\delta _i=I(t_i^*\le c_i)\) and the vector of covariates \(\textbf{x}_{i}\), where \(t_i^*\) and \(c_i\) denotes the failure and censoring times, respectively, and \(I(\cdot )\) denotes the indicator function. Under a right censoring scheme, the corresponding log-likelihood function for \({\varvec{\theta }}=({\varvec{\beta }}^\top ,{\varvec{\lambda }}^\top , \alpha )\) (the vector of parameters) under the assumption of non-informative censoring is expressed as
The ML estimator of \({\varvec{\theta }}\) is obtained maximizing \(\ell ({\varvec{\theta }})\) in relation to \({\varvec{\theta }}\). This is equivalent to solve the equations
Note that
On the other hand, for the WPE1 model
whereas for the WPE2 model
Equation (9) should be solved numerically using some optimization algorithm, such as Newton-Raphson and quasi-Newton29. The R software27 was used for this purpose. Specifically, the optim function was used for solve this optimization problem. Under mild regularity conditions (Cox and Hinley30) the asymptotic distribution of the ML estimator \(\widehat{\varvec{\theta }}\) is
where \(N_{r}({\varvec{\mu }},{\varvec{\Sigma }})\) denotes the r-variate normal distribution with mean \({\varvec{\mu }}\) and variance \({\varvec{\Sigma }}\), \({\varvec{0}}_r\) denotes a vector of zeros with dimension r, \({\varvec{I}}_r\) denotes the identity matrix with dimension r and \(\mathscr {I}({\varvec{\theta }})\) denotes the Fisher information matrix, which can be consistently estimated by \(\widehat{\mathscr {I}}(\widehat{\varvec{\theta }})\), the hessian matrix related to the log-likelihood function in Eq. (8) evaluated at \(\widehat{\varvec{\theta }}\). In addition, approximated confidence intervals for the components of \({\varvec{\theta }}\) can be considered as
Then, the approximated \(100(1-\nu )\%\) confidence intervals for \(\beta _r\), with \(\nu \in (0,1)\) and \(r=1,\ldots ,p\) is given by
where \(z_u\) denotes the u-th quantile of the standard normal distribution, \(\widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\beta }}}_{rr}\) denotes the r-th element of the diagonal of \(\widehat{\mathscr {I}}^{\widehat{\varvec{\beta }}\widehat{\varvec{\beta }}}\). In a similar way, the approximated \(100(1-\nu )\%\) confidence intervals for \(\lambda _\ell\), \(\ell =1,\ldots ,L\) and \(\alpha\) are
respectively.
Simulation study
In this section, we present a simulation study in order to assess the performance of the ML estimators for the WPE2 model under a scenario similar to application 1, which was carried out with the statistical software R27. The algorithm used to generate censored samples for the \(\text {WPE2}\) regression model (with \(100\%C\) censoring of observations) is the following:
-
For \(i = 1,\ldots , n\), draw the covariates vector \(\textbf{x}_i\), where \(\textbf{x}_i=(x_{1i},x_{2i},x_{3i},x_{4i})\) is a random sample from \(X_1\sim Bern(63/150)\), \(X_2\sim Bern(70/150)\), \(X_3\sim Bern(66/150)\) and \(X_4\sim Bern(56/150)\).
-
Draw \(\displaystyle {U_i \sim U(0,1) }\).
-
Compute \(W_i^{(T)} = \displaystyle {1 - \exp \left\{ - \left( \frac{- \log (U_i)}{\textbf{x}_i^{\top } {\varvec{\beta }}} \right) ^{1/\alpha } \right\} }\) and \(W_i^{(C)} = \displaystyle {1 - \exp \left\{ - \left( \frac{- \log (C)}{ \textbf{x}_i^{\top } {\varvec{\beta }}} \right) ^{1/\alpha } \right\} }\).
-
Compute \(Y_i = R^{-1}\left( W_i^{(T)}; {\varvec{\lambda }} \mid \varvec{a} \right) .\) (the failure times)
-
Compute \(C_i=R^{-1}\left( W_i^{(C)}; {\varvec{\lambda }}\mid \varvec{a} \right) .\) (the censoring times)
-
Compute \(T_i=\min (Y_i,C_i)\) and \(\delta _i=I(Y_i \le C_i)\), the observed times and failure indicators.
Similar to the scheme of the first application, we assumed \(L=4\) partitions and the partition time as \({\varvec{a}}= (0,0.08,0.29,0.41)\). The parameters used were \(\varvec{\beta } = (\beta _1, \beta _2,\beta _3,\beta _4) =(0.58,0.76,0.68,0.75)\), \(\varvec{\lambda } =(\lambda _1, \lambda _2,\lambda _3,\lambda _4) =(0.011,0.056,0.189,0.188)\) and \(\alpha =0.58\). Different sample size were considered: 100, 150, 200 and 500, and three censoring cases: \(10\%\), \(25\%\) and \(50\%\). For each combination of \(\varvec{\beta }\), \(\varvec{\lambda }\), \(\alpha\), censoring and sample size, we simulated 1000 replicates of the WPE2 regression model, and the parameter estimation was performed based on the ML method. In each case, we compute the bias, the root of the mean square error (RMSE), the mean of the estimated standard errors (SE), and the 95% coverage probability (CP). The results are summarized in Table 1. It is possible to observe a reduction in the bias for all the parameters and under any censoring scenario when the sample size is increasing. In each case, we compute the mean and the standard error. In addition, the RMSE and SE terms are closer, specially when the sample size is increased, suggesting that the variance of the ML estimators is reasonably well estimated. Finally, the CP terms are closer to the nominal value used to their construction, suggesting that the asymptotic distribution for the ML estimators in Eq. (10) is appropriated even in finite samples.
Real data applications
Several international health organizations established that breast milk is one of the most important ways to ensure the survival and health of children. It is recommended, for infants, exclusive breastfeeding for up to 6 months, starting at the first hour of life, which is recommended by OMS. In this Section, we analyzed Breastfeeding data from Brazil31 and the United States17. The codes are available in a public GitHub repository (https://github.com/Ymgomez89/Proportional-Hazard-Model).
Breastfeeding from Brazil
Colosimo and Giolo31 provide a data set related to 150 mothers, including the time from birth to cessation of breastfeeding (in months), as well as four covariates associated with each. Censored observations occur when the breastfeeding record is cut off for different reasons. The 4 reported covariates are related to previous breastfeeding experience (\(x_1=1\) for yes (\(42\%\)) and \(x_1=0\) for no (\(58\%\))), ideal breastfeeding time (\(x_2=1\) for greater than 6 months (\(47\%\)) and \(x_2=0\) for otherwise (\(53\%\))), breastfeeding problems (\(x_3=1\) for yes (\(44\%\)) and \(x_3=0\) for no (\(56\%\))) and exclusive breastfeeding with breast milk (\(x_4=1\) for yes (\(37\%\)) and \(x_4=0\) for no (\(63\%\))). Figure 3 shows the Kaplan–Meier (K–M) estimator32 by the four covariates.
Based on the discussion provided in the introduction Section, we fix the partition time in 0, 1, 3, and 5 months (i.e., \(L=4\)). Table 2 shows the log-likelihood, Akaike information criterion (AIC; Akaike33), and the Bayesian information criterion (BIC; Schwarz34) values obtained for the fitted piecewise-type models (PE, PPE, WPE1, WPE2) with \(L=4\). Based on those criteria, the WPE2 model is preferred.
The estimates for the WPE2 model are presented in Table 3. Based on the results, we can obtain the following conclusions (keeping in mind that the other covariates are kept fixed).
-
The risk of early weaning for mothers without previous experience in breastfeeding is \(\exp (\widehat{\beta _1})\approx 1.6\) times higher in relation to who did have experience (95% confidence interval 1–2.8).
-
The risk of early weaning for mothers who think that the ideal breastfeeding time is less than or equal to 6 months is approximately \(\exp (\widehat{\beta _2})\approx 1.7\) times the risk of mothers who believe the opposite. (95% confidence interval: 1 - 2.8).
-
The risk of early weaning in mothers who had breastfeeding problems is \(\exp (\widehat{\beta _3})\approx 2\) times higher compared to those who did not experience problems. (95% confidence interval 1.2–3.3).
-
The risk of early weaning in infants who were not exclusively breastfed is \(\exp (\widehat{\beta _4})\approx 1.9\) times greater than the risk of premature weaning in infants who are exclusively breastfed. (95% confidence interval 1.1–3.1).
Breastfeeding from United States
We consider a second data set related to breastfeeding obtained from Klein17. These data were taken from a National Survey of Youth Labor (NLSY) in Ohio, which began in 1979 with youths aged 14–21, interviewed annually until 1988. Beginning in 1983, women were asked about any pregnancy that had occurred since the last interview (pregnancies prior to 1983 were also documented), recording breastfeeding information. The data set includes information on 927 firstborns whose mothers chose to breastfeed. The sample is restricted to children who were born after 1978 and had a gestation of 20–45 weeks in order to eliminate recall problems.
The main variable is the duration of breastfeeding (in weeks) and whether the baby was weaned (an indicator of failure). Explanatory variables include race, poverty status, maternal smoking and drinking habits, maternal age at birth, maternal education, and prenatal care. The only significant variables are race (categorical: white \((71\%)\), black \((13\%)\) and other \((16\%)\)), poverty status (\(x_2\): 756 non-poor (82%), 171 poor (18%)), and smoking habits (\(x_3\); 657 non-smoking mothers (71%), 270 mothers who smoked after birth (29%)). Figure 4 shows the K–M estimator by the three covariates.
Table 4 shows that the AIC and BIC criteria for the same piecewise type models fitted in the last application. Again, based on those criteria the WPE2 model provides the best fit among the considered models.
In addition, Table 5 presents the estimates for those models. Based on the results, we can obtain the following conclusions (keeping in mind that the other covariates are kept fixed).
-
The risk of early weaning in black race is \(\exp (\widehat{\beta _{11}})=1.2\) higher than white race (95% confidence interval 1.0 - 1.5), and in another race it is \(\exp (\widehat{\beta _{12}})=1.5\) times higher with respect to the white race (95% confidence interval 1.2 - 1.8).
-
The risk of early weaning of mothers in poverty status is \(\exp (\widehat{\beta _2})=0.9\) times in relation to mothers without poverty status (95% confidence interval 0.7 - 1.0).
-
The risk of early weaning who smoked after the baby was born is \(\exp (\widehat{\beta _3})=1.4\) times higher with respect to mothers who did not smoke after birth (95% confidence interval 1.2 - 1.6).
Conclusions
The introduction of the Weibull piecewise distribution represents a significant advancement in flexible statistical modeling. By allowing for different shapes of the hazard function across multiple time intervals, this model accommodates the complexities often observed in real-world data where risks may change over time, such as breastfeeding data. The dual approach to time partitioning–either estimating it directly from the data or predefining intervals–provides versatility for its application. This flexibility enables researchers and practitioners to tailor the model to their specific data characteristics, enhancing its relevance and usability.
Simulation studies have demonstrated the effectiveness of maximum likelihood estimation for deriving parameter values across various sample sizes. Such empirical validation is crucial, as it affirms the robustness of the model when applied to different contexts. This aspect also highlights the model’s practicality, offering a reliable method for analysts facing diverse data challenges. Finally, the proposed distribution is applied to a real data set to show the potential of this new distribution.
In conclusion, we define a flexible approach to modeling lifetime datasets across multiple time intervals by compounding the classes of Weibull (which is generally adequate for modeling monotone hazard rates) and PE (which allows breaking points to be incorporated into the analysis) distributions. The compounding procedure follows by taking the Weibull family of distributions as the baseline distribution in (2). In addition, we motivate the use of the new distribution in breastfeeding data. We hope that these two facts combined may attract more complex applications in survival analysis. Further exploration of the model could involve applying it to different datasets to assess its performance and robustness compared to traditional distributions. Additionally, investigating potential applications in fields such as reliability engineering, survival analysis, and risk management could provide valuable insights into its utility and versatility.
Data availability
The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.
References
Zhang, C., Zhao, W., Pan, X. & Huang, A. Effects of feeding patterns during the first 6 months on weight development of infants ages 0–12 months: a longitudinal study. Sci. Rep. 14, 17451 (2024).
Meek, J. Y., Noble, L. & on Breastfeeding, S. Policy statement: Breastfeeding and the use of human milk. Pediatrics150, e2022057988. https://doi.org/10.1542/peds.2022-057988 (2022).
Park, P. G., Heo, J. S., Ahn, Y. H. & Kang, H. G. Association between exclusive breastfeeding and the incidence of childhood nephrotic syndrome. J. Pediatr. 276, 114266–6 (2024).
Kawan, M., Gregory, E. F. & Spatz, D. L. Improving breastfeeding care & support in a large, urban, pediatric primary care practice. J. Pediatr. Nurs. 70, e3-8 (2023).
Shitie, A. et al. Breast problems and associated factors among lactating women in northeast ethiopia, 2022. Sci. Rep. 14, 9202 (2024).
Oliveira, V., Prell, M. & Cheng, X. The economic impacts of breastfeeding: A focus on usda’s special supplemental nutrition program for women, infants, and children (wic). Economic Research Report ERR–261, U.S. Department of Agriculture, Economic Research Service (2019).
Bartick, M. C. et al. Suboptimal breastfeeding in the united states: Maternal and pediatric health outcomes and costs. Maternal Child Nutr. 13, e12366. https://doi.org/10.1111/mcn.12366 (2017).
Organización Mundial de la Salud & UNICEF. Declaración conjunta del director ejecutivo del unicef y director general de la oms con ocasión de la semana mundial de la lactancia materna (2023).
Carter, S. K., Stone, A. & Cox, J. M. About research: Intersectionality in breastfeeding research. J. Hum. Lact. 39, 202–205, https://doi.org/10.1177/08903344231156446 (2023). Epub 2023 Mar 1.
Walsh, A. et al. Improving breastfeeding support through the implementation of the baby-friendly hospital and community initiatives: A scoping review. Int. Breastfeed. J. 18, 22. https://doi.org/10.1186/s13006-023-00556-2 (2023).
Gianni, M. L. et al. Breastfeeding difficulties and risk for early breastfeeding cessation. Nutrients 11, 2266. https://doi.org/10.3390/nu11102266 (2019).
Lechosa-Muñiz, C., Paz-Zulueta, M., Cayón-De las Cuevas, J., Llorca, J. & Cabero-Pérez, M. J. Declared reasons for cessation of breastfeeding during the first year of life: An analysis based on a cohort study in northern spain. Int. J. Environ. Res. Public Health 18, 8414. https://doi.org/10.3390/ijerph18168414 (2021).
Chang, P. C., Li, S. F., Yang, H. Y. et al. Factores asociados con el cese de la lactancia materna exclusiva a la posparto de 1 y 2 meses en taiwán. Int. Breastfeed. J. 14. https://doi.org/10.1186/s13006-019-0213-1 (2019).
Abusabah, A. M. et al. Prevalence and determinants of exclusive breastfeeding among mothers of children aged 6–24 months in the aseer region. Cureus 16, e66788. https://doi.org/10.7759/cureus.66788 (2024).
Zitkute, V. et al. Reasons for breastfeeding cessation in the first year after childbirth in lithuania: A prospective cohort study. Medicina 56, 226. https://doi.org/10.3390/medicina56050226 (2020).
Bagci Bosi, A. T., Eriksen, K. G., Sobko, T., Wijnhoven, T. M. & Breda, J. Prácticas y políticas de lactancia materna en los estados miembros de la región europea de la oms. Salud Pública y Nutrición 19, 753–764 (2016).
Klein, J. P. & Moeschberger, M. L. Survival Analysis Techniques for Censored and Truncated Data (Springer, 1997).
Feigl, P. & Zelen, M. Estimation of exponential survival probabilities with concomitant information. Biometrics 21 (1965).
Friedman, M. Piecewise exponential models for survival data with covariates. Ann. Stat. 10 (1982).
Durrans, S. R. Distributions of fractional order statistics in hydrology. Water Resour. Res. 28, 1649–1655 (1992).
Gómez, Y. M., Gallardo, D. I. & Arnold, B. C. The power piecewise exponential model. J. Stat. Comput. Simul. 10 (2017).
dos Santos Junior, P. C. & Schneider, S. Power piecewise exponential model for interval-censored data. J. Stat. Theory Pract. 16 (2022).
Alzaatreh, A., Lee, C. & Famoye, F. A new method for generating families of continuous distributions. METRON 71, 63 (2013).
Alzaatreh, A. & Gosh, I. On the weibull-x family distributions. J. Stat. Theory Appl. 10 (2015).
Tahir, M. H., Cordeiro, G. M., Alzaatreh, A., Mansoor, M. & Zubair, M. The logistic-x family of distributions and its applications. Commun. Stat. Theory Methods 45 (2016).
Bourguignon, M., Silva, R. B. & Gauss, C. The weibull-g family of probability distributions. J. Data Sci. 12, 53–68 (2014).
Team, R. C. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2025).
Cox, D. R. Regression models and life-tables. J. Roy. Stat. Soc.: Ser. B (Methodol.) 34, 187–202 (1972).
Soetaert, K. & Herman, P. A Practical Guide to Ecological Modelling. Using R as a Simulation Platform (Springer, 2009).
Cox, D. R. & Hinkley, D. V. Theoretical Statistics (CRC Press, Boca Raton, 1979).
Colosimo, E. A. & Giolo, S. R. Análise de sobrevivência aplicada (Editora Blucher, 2021).
Kaplan, E. L. & Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481 (1958).
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974).
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).
Acknowledgements
The work of the first author is partially supported by grant Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) 11230397.
Author information
Authors and Affiliations
Contributions
Y.M.G. and D.I.G. conceived the idea presented. W.E.C. and J.L.S. developed the theory and performed the computations. Y.M., D.G. and M.B. verified the analytical methods. J.L.S. and W.E.C. performed the numerical simulations. Y.M.G. and R.Z. analysed the data. Y.M., R.Z., D.I.G. and J.S. prepared the draft manuscript. All authors reviewed the results and approved the final version of the manuscript. All authors approved the final manuscript as submitted and agreed to be held accountable for all aspects of the work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gómez, Y.M., Caimanque, W.E., Santibáñez, J.L. et al. A new proportional hazard model with applications to breastfeeding data. Sci Rep 15, 21869 (2025). https://doi.org/10.1038/s41598-025-08219-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-08219-4