Abstract
The efficiency optimization methods for natural coagulants are often restricted due to non-scientific trial-and-error approaches. They are inaccurate in predicting the complex interactions of jet mixing parameters, coagulant dosage, and environmental conditions. To overcome these obstacles, this research paper proposes advanced hybrid models in machine learning to enhance flocculation efficiency. We use the CatBoost model with the NTK to learn the intricate nonlinear interactions among jet velocity, mixing time, coagulant dose, pH, and turbidity. CatBoost is effective for dealing with categorical data like diverse coagulants. Meanwhile, NTK boosts the model’s generalization capability, especially when the sample size becomes small or experimental datasets are applied. Lastly, SOMs and MARS are used to identify pattern recognitions in tracing the crucial interaction among mixing parameters. Reinforcement learning techniques—that include DDPG and SAC for dynamic optimization of jet velocity, mixing time, and coagulant dosage—optimize the model in real time. Utilizing NAS and Hyperband to automate model tuning, the timestamp was reduced by 40%. The proposed models heavily improve the efficiency of the flocculation process by 20–25% and allow for a good predictive accuracy of 95–97%. Paramount, however, is that the model has interpretability properties assured by SHAP and counterfactual explanations, which would give actionable insights into the most influencing factors on the efficiency of flocculation. This work represents a substantial advancement for the discipline since it introduces robust, interpretable, and real-time optimization methods to offer a practical tool through which improvement of water treatment processes would be made both sustainable and efficient.
Similar content being viewed by others
Introduction
Agglomeration, a significant process in water and wastewater treatment, removes suspended matter by collecting dispersed particles into larger particles called flocs1. Settling of the flocs from water is then easy and may occur at a fast rate2. The efficiency of flocculation determines water quality but is sensitive to many operational parameters such as jet velocity, coagulant dosage, mixing time, and environmental conditions like pH and turbidity3. Typically, optimization of flocculation is approached through empirical methods, which are tedious and labor-consuming, and many times, the nonlinear interactions of the parameters controlling the process are simply overlooked4. The advent of ML and data-driven techniques might disrupt this ___domain because it enables an even more accurate and adaptive approach to optimizing processes5. Traditionally, optimizing flocculation is mainly based on empirical techniques, labor intensive, and relies on trial-and-error techniques, which do not account for nonlinear interactions between critical parameters such as jet velocity, coagulant dosage, mixing time, and environmental conditions like pH and turbidity6,7. ML has shown much promise as long as reducing the complexity of such relationships allows data optimization8. However, even standard ML models such as Random Forest and Support Vector Machines have proven to generalize poorly, mainly when applied to small datasets. The integration of CatBoost, a gradient-boosting algorithm to handle categorical variables efficiently, is proposed with the NTK improving generalization to transform deep neural networks into kernel-based models for better generalization9. This could help in even more robust prediction by capturing appropriate interactions between the mixing parameters and environmental conditions. Besides predictive modeling, real-time adaptability must be crucial in any dynamic water treatment process10. Deep Deterministic Policy Gradient (DDPG) and Soft Actor-Critic (SAC) reinforcement learning (RL) techniques capable of operating in continuous action spaces are introduced to dynamically adjust jet velocity, coagulant dosage, and mixing time based on real-time conditions11,12. These RL models enable automated optimization, eliminating the need for static process configurations that may not respond well to fluctuating water quality parameters. This model further includes NAS and Hyperband as it automates the tuning of hyperparameters and neural network architectures to reduce the computational overhead while improving the accuracy levels. Recent applications of Random Forest and Support Vector Machines for predicting the outcomes of water treatment operations have been reported7,13. However, such models tend to break down when generalizing the resultant models to small experimental datasets, primarily found in laboratory-scale studies. Furthermore, they fail to clarify the implications of the system; they are not flexible enough to apply real-time optimization in dynamic systems. More advanced models in ML, such as CatBoost and Neural Tangent Kernel (NTK) networks, will offer better performance in addressing these limitations14. CatBoost deals well with categorical data on coagulant type and environmental conditions, whereas NTK models enhance generalization because deep neural networks can mimic kernel methods. These models can better capture the complex interactions of many variables involved in a process, thus enhancing predictive accuracy. On the other hand, whereas traditional optimization methods, including grid search and trial-and-error tuning, require large amounts of resources, RL methods, for example, DDPG and SAC, can adaptively optimize in real-time process parameters. Hence, RL methods are applicable in continuous action space, such as adjusting jet velocity and coagulant dosage. Secondly, using Neural Architecture Search and Hyperband algorithms, one can automate the tuning of the hyperparameters and the architectures of neural networks. This further reduces the models’ computational load and enhances their robustness.
This study integrates advanced techniques such as CatBoost, NTK, reinforcement learning, and optimization algorithms to improve flocculation efficiency as a complete hybrid approach. Although the model could predict the flocculation efficiency with very high accuracy, it also does real-time optimization for jet mixing parameters such that the process can be controlled with enhanced levels. These would further be ensured through Shapley additive explanations (SHAP) and counterfactual explanations, revealing how each parameter impacts the outcome and what actionable improvement could be introduced in the process. Together, it would work on significantly raising efficiency, sustainability, and adaptability in water treatment through flocculation processes.
Motivation and contribution
There is an urgent need to improve the efficiency and sustainability of flocculation processes that are the cores of water and wastewater treatment15. Traditional methodologies for optimizing flocculation, especially when natural coagulants are involved, have always been wanting since they depend on laborious trial and error. These techniques take time and cannot deal with complex, nonlinear interactions between key variables like jet velocity, mixing time, and environmental factors. Traditional prediction models also fail to work satisfactorily with small experimental datasets, thereby showing flocculation efficiency suboptimal for the operations and leading to higher costs. Hybrid machine learning models bring an apparent solution to these challenges: innovative techniques for prediction and optimization. Here, the primary contribution of this work is that a novel integrated machine learning framework that combines CatBoost, NTK, and reinforcement learning models is developed for optimal flocculation efficiency. Unlike earlier works, it puts together a few ML techniques that simultaneously address both aspects: efficient handling of categorical data with CatBoost and improving generalization on small datasets by converting the deep learning model into a kernel method using NTK. From the account of reinforcement learning, DDPG and SAC adjust the jet velocity, coagulant dosage, and mixing timestamp in real-time and guarantee the optimization of flocculation parameters for the process itself. The use of NAS and Hyperband in their model tuning renders their optimization much faster and more effective compared to that found in the literature. The most important point in the paper is that there is a critical requirement to use model interpretability in machine learning-based applications to optimize process performances, which is often neglected. Beyond prediction and optimization of flocculation efficiency, the addition of SHAP and counterfactual explanations in the proposed framework offers actionable insights into which parameters exert the most significant influence on the outcome. Because of this level of transparency, operators can now make more informed decisions and, therefore, further enhance the reliability and applicability of the model in actual real-world operation. In a nutshell, this work contributes toward making progress in improving flocculation technology by providing a comprehensive, adaptive, and interpretable solution that will be primarily distinguished in the improvement of predictive accuracy as well as in the operation efficiency levels.
In-depth review of models used for enhancing flocculation efficiency levels
With the application of natural coagulants and machine learning models, the field of water treatment has reached a whole new level for optimizing the flocculation and pollutant removal processes. Reviewing existing work and methods in the literature has indicated the various approaches and methodologies that could be employed to enhance the efficacy of water treatment systems. All studies are directed toward the following aims: the development of novel coagulants and advanced modeling techniques, including AI, which might be helpful in performance prediction and optimization. Some research has mainly concentrated on natural plant coagulants for water purification. Some researchers have been working on developing eco-friendly coagulants from plant origins and investigating their applications. Mahanna et al.16 reported that one oat-onion seed combination is an attractive candidate for enhancing the turbidity of water. Similarly, Khalidi-Idrissi et al.17 examined a dual process flotation/coagulation-flocculation for refinery wastewater treatment using natural coagulants. Obiora-Okafo et al.18 mentioned that the benefits attached to dye removal by application of natural polymer coagulants were pretty much the same, including such factors as lower toxicity and biodegradability. Further supporting the above-mentioned trend, Igwegbe et al.19 tested green flocculation methods by utilizing Parkia biglobosa extract for purification of municipal landfill leachate, demonstrating how eager scientists are on using bio-based treatment technologies of complex wastewater flows. Tsoutsa et al.20 updated agricultural waste materials for use as natural coagulants or adsorbents, drawing attention to their utilization for dye removal from wastewater. The studies presented here fall within this trend of sustainability and eco-friendliness for water treatment solutions, likely to be more potent and have little to no harm to the environment and human health in contrast to traditional chemical coagulants. Several researches focused on optimizing natural coagulants by different approaches: response surface methodology (RSM), machine learning, and others21. Asadi-Ghalhari et al.5 optimized turbidity removal using polyaluminum chloride and rice starch as coagulant aids. This research integrated RSM with machine learning models to realize better forecasting and optimization of the process parameters. In the same view, Benalia et al.22 used the cactus-based coagulant for lead removal from water; through RSM, the process was optimized. Such optimization strategies offered ideal platforms that improve the performance of natural coagulants over a range of environmental conditions. The advancements in wastewater treatment also include the coagulation mechanism of the natural materials and their interaction with the environment. Investigations into tannin-based coagulants and their efficiencies in water and sewerage waste treatment, as well as in drinking water treatments, were made by Das et al.23. Mohamed Noor and Ngadi 24 accomplished a systematic review of the ecotoxicological hazards that involve the coagulation-flocculation process. Through this type of report, they demonstrated an in-depth analysis of the environmental implications of using chemical coagulants. These studies underscore the need to balance optimization in performance with ecological sustainability in wastewater treatment processes. Bio-coagulant use for industrial wastewater treatment has also been seen in the forefront. Dkhissi et al.25 studied cactus as a flocculant for the treatment of vegetable oil refinery wastewater, showing that the natural materials could actually outperform or compare well with synthetic chemical coagulants.
As can be seen in Table 1, Predicting and optimizing coagulation-flocculation still remains a critically important ___domain of interest for advanced machine learning models. Prediction of removal efficiency using the ANNs, random forests, and SVM has been attempted. Krishnan et al.26 applied ANN in the modeling of the efficiency of polyaluminum chloride and Moringa oleifera with superior prediction accuracy. This line of thought was also reflected in the work of Li et al.28, which compared micro-flocculation and ozonation as precursors for ultrafiltration through modeling their efficacy by machine learning. These studies have given reasons for integrating AI into water treatment processes to improve further predictive values, performance optimization, and reduced reliance on pure empirical, trial-and-error methods. Other optimization work has included complex systems involving bio-coagulants, flocculation processes, and wastewater treatment mechanisms. As an example, de Melo Franco Domingos et al.29 studied the effect of a combination of coagulation-flocculation with hydrodynamic cavitation and ozonation of leachate of a landfill, thereby confirming the benefits of multistage processes for higher efficiency of removal of pollutants. Foulani et al.30 investigated composite coagulants based on polyaluminum chloride and sodium alginate with consideration of performance under changing conditions of quality of water. These works emphasize the advantages of the integration of several treatment technologies to achieve higher removal efficiencies of pollutants. In addition, there is a rising concern over the environmental and health impacts of sludge disposal that contain coagulant residues. Taiwo et al.31 analyzed the pollution, ecological, and health risks of the removal of heavy metals from soil through natural coagulants with an emphasis on the practice of safe disposal. This concern is further echoed in the studies by Madjene et al.32, Golzadeh et al.33 and Bhagat et al.34 which investigated sludge generation and management following the coagulation-flocculation process, emphasizing the development of more sustainable waste management strategies. There also are a good number of studies focused on mathematical modeling of coagulation-flocculation processes to improve process design and scale-up. Bhuvanendran et al.27 have reported the development of a chemical coagulation reactor for treatment of dairy wastewater and possible product recovery from waste sludge. It has proven the applicability of mathematical models for prediction of the outcome of the process and optimization of the reactor design. Anyaene et al. 35 used soft computing techniques such as genetic algorithms and fuzzy logic to optimize the process of bio-coagulation for the treatment of aquaculture wastewater, a new approach towards the modeling and optimization of the process. The growth of pressure towards plant-based coagulants is another theme that inevitably emerges in the literature. Nero et al.36 took up the use of plant parts of Cassia fistula for the treatment of mine wastewater. This throws open the natural material as having an industrial application. Chik et al.37 and Alnawajha et al.38 have also conducted research work on plant-based coagulants in the treatment of aquaculture effluent to determine the efficiency in the removal of suspended solids and turbidity. In another interesting study, Joaquin et al.39 worked on the application of mixed natural and synthetic coagulants for sewage wastewater treatment.
The proposed study advances existing machine learning methodologies by integrating multiple specialized techniques into a single real-time adaptive framework for flocculation optimization, a feature not commonly reported in prior literature. While CatBoost, NTK, and reinforcement learning have individually been explored in water treatment, their combined use to predict and dynamically optimize flocculation efficiency in real time represents a significant methodological enhancement. Unlike traditional studies that apply static ML models for prediction, this framework enables real-time parameter adjustments based on incoming data, improving process adaptability. In addition, using SHAP and counterfactual explanations ensures interpretability, a standard limitation in previous machine learning-based water treatment models. This work introduces novelty by integrating reinforcement learning (DDPG and SAC) for real-time control. Most existing studies rely on static optimization techniques such as grid search or genetic algorithms, which do not adapt dynamically to process variations.
In contrast, a reinforcement learning algorithm continuously adjusts the jet velocity, coagulant dosage, and mixing time depending on real-time water quality parameters, thus allowing for maximum efficiency without manual input. Real-time adaptability substantially enhances operational feasibility, and, therefore, such a model would be deployable in industrial setups where process fluctuation is observed. Furthermore, using NAS and Hyperband ensures automatic model tuning and reduces computational overhead compared to the previous studies, which require time-consuming manual hyperparameter tuning. From an application point of view, this work differs from previous work since it explicitly aims at optimizing natural coagulants, which are inherently more variable than their synthetic counterparts. Unlike previous studies on flocculation, which were mainly based on chemical coagulants with known and predictable behaviour, this model is focused on the nonlinear interactions of plant-based coagulants such as Moringa oleifera and chitosan under different environmental conditions. This hybrid model, therefore, generalizes well across water quality scenarios and can adapt in real-time to accommodate changing contaminant-load profiles and thus provide an interpretation based on identified variables in a very innovative way toward sustainable technologies in water treatment. Relevant progress has been made in water treatment with the prevalent trend toward natural coagulants and bio-flocculants in the removal of pollutants through a review of existing methods. The materials show many advantages from lower toxicity, biodegradability, and cost-effectiveness as compared to synthetic chemical coagulants. Incorporation of machine learning models further enhance the capability to predict and optimize performance with these natural coagulants in efficiency tools for improving water treatment. However, scaling up these processes becomes concerning regarding environmental sustainability and the management of sludge disposal post the treatment process. Future studies should focus on countering these challenges by discussing some advanced modeling techniques, more sustainable waste management strategies, and optimizing coagulant usage on a large-scale industrial application.
Proposed hybrid machine learning model design for optimizing flocculation efficiency using natural coagulants and jet mixing techniques
This section thus discusses the designing of a hybrid machine learning model to optimize efficiency in flocculation by addressing the issues of low efficiency & high complexity present in the current methods by using natural coagulants and jet mixing techniques. The first hybrid machine learning framework offered in this study, as shown in Fig. 1, integrates CatBoost and NTK with DNN, SOMs, and Multivariate Adaptive Regression Splines (MARS) to predict and optimize flocculation efficiency. All the diverse parts of the model will be so fitted together to complement one another to respond to individual challenges through flocculation processes, from nonlinear parameters of jet mixing to coagulant dosage and environmental conditions, such as pH and turbidity. The post-jet mixing system, as an artificial system of flocculation in the real scenario, has a high-precision peristaltic pump to permit controlled dosing of coagulant; it has a variable-speed impeller for mixing and also a turbidity sensor to measure the efficiency of flocculation in real-time scenarios. Each experimental run involved systematically varying jet velocity (0.5–2.5 m/s), mixing time (10–60 min), and coagulant dosage (10–100 mg/L) to capture the full range of operational scenarios.
Data collection applied both optical and gravimetric approaches. Suspended solids concentrations before and after flocculation were measured by time with a turbidity meter, and sedimentation rates were determined gravimetrically. The dataset underwent preprocessing steps such as outlier removal (based on z-score analysis), normalization, and missing value imputation using k-nearest neighbours (KNN) interpolation to ensure data quality sets. The experimental error in measurements was controlled by conducting triplicate measurements for each experimental condition with deviations below ± 2% of the mean value. That method standardizes the methodology, enhances reproducibility, and ensures consistent findings under repeated trials. This work opted to use CatBoost instead of some other ensemble techniques like Random Forest and XGBoost for a few reasons, one being better handling categorical features and better generalization of performance without much risk of overfitting with respect to my limited experimental data sizes. However, Random Forests are powerful, effective models when used on tasks involving classification or regression. Another gradient boosting algorithm is XGBoost, famous for its good performance on structured data but prone to overfitting when applied to small datasets. CatBoost can handle a categorical feature without one-hot encoding and offers ordered boosting, which eliminates target leakage, making it an excellent choice for optimizing flocculation parameters across diverse water treatment conditions. The Neural Tangent Kernel (NTK) is also incorporated to improve generalization in deep learning-based models by approximating infinite-width neural networks as kernel functions. This characteristic ensures that NTK-enhanced models retain high prediction accuracy even at low training data, a prevalent situation in most lab-scale water treatment experiments.
Aside from predictive modeling, RL contributes to real-time process optimization. DDPG and SAC were chosen over optimization techniques such as GA or Bayesian Optimization because of their flexibility for continuous control tasks. The use of DDPG as the actor-critic model for RL would optimize continuous variables such as jet velocity and coagulant dosage while exploring the system in trial and error. The SAC is another entropy-regulated RL algorithm, which improves the robustness by maintaining balance between exploration and exploitation during changes in the water quality condition to identify the optimal flocculation conditions dynamically. However, GA follows a predefined mutation and crossover operation that is normally computationally expensive and less adaptive to real-time changes, while Bayesian Optimization is a relatively better method for static parameter tuning rather than continuous, real-time adjustments. Further work on transfer learning will be done where models learned for specific datasets could be easily transferred to other new treatment plants with minimal need for retraining. Therefore, by addressing all these scalability-related challenges, this proposed framework may be effectively scaled up to handle large-scale applications, making optimization of flocculation more feasible and impactful in the process. CatBoost is a gradient-boosting algorithm that works well to deal with categorical variables; it directly models coagulant types’ effect on flocculation, making it very apt for this study. It trains a series of decision trees by minimizing a loss function over successive iterations in the process. The loss function, L(y,y'), represents the residual error of the response variable in the process between the true 'y' and predicted y’ values in the process. Mathematically for 'N' observations, this is done by minimizing via Eq. 1,
This residual is iteratively corrected by the following decision tree, where the predictions are updated in each iteration ‘t’ via Eq. 2,
where, (gt(xi) denotes the gradient of the loss function with respect to the model output at iteration 't', and η is the learning rate, controlling the step size for the updates. Advantages of CatBoost The superior advantage that CatBoost has is capturing potential in complex interactions between jet velocity, mixing time and dosages of coagulants with robust handling of categorical data samples and it manages to be computationally efficient even in case of small-sized datasets. NTK with DNN: The second component is NTK with DNN, where DNN can generalize better on small experimental datasets, which is a critical advantage in lab-scale studies in process. The NTK framework serves to efficiently linearize a deep neural network as a kernel method with better interpretability but at a cost of retaining expressiveness in deep learning. The NTK can be defined as a kernel function K(x,x′), measuring the similarity of the inputs 'x' and x′, and is calculated as the gradient of the output of the network with respect to weights via Eq. 3,
In this expression, f(x; θ) is the neural network output, and θ is the set of network parameters for this process. The NTK captures how small changes in the input space, such as slight variations in jet velocity or coagulant dosage, affect the predictions of the network, thereby enhancing the robustness of the model. A kernel K(x,x′) is used to project the data into a higher-dimensional space so that the DNN can learn more complex and generalizable function sets. It very effectively helps in reducing overfitting, which mainly occurs in small datasets & samples. SOMs were engaged to visualize high-dimensional input data and reveal hidden patterns in jet mixing configurations and coagulant properties. SOMs map the input data into a two-dimensional grid, preserving the topological relationships amongst the inputs in process.
The model updates the weights of the map through a competitive learning process where the weight vector wi of the neuron closest to the input ‘x’ is updated via Eq. 4,
where, α(t) is the learning rate, and hci(t) is the neighborhood function which decreases with distance from the winning neuron 'c' sets. Through dynamic adjustment of weight vectors over time, SOMs can cluster similar jet mixing and coagulant configurations, providing insights into optimal conditions for the flocculation process. MARS completes the SOMs for modeling nonlinear interactions between jet velocity, dosage, and mixing delays. MARS generates a piecewise linear regression model that captures interactions between these variables in the process. The MARS model can be described as a sum of basis functions Bm(x), where each function models a specific interaction or nonlinearity via Eq. 5,
The basis functions Bm(x) are either hinge functions or their products, so MARS can model sharp flocculation efficiency changes with varying process parameters. The model can identify main breakpoints where a change in the relationship between variables occurs. This is very important for optimizing the nonlinear highly flocculation process. In the reinforcement learning component, DDPG optimizes jet velocity, coagulant dosage, and mixing time dynamically. DDPG works in the continuous action space, adjusting the process parameters by considering the state of the system sets. The policy function μ(s∣θμ) maps 's', the current jet velocity, turbidity, etcetera, to an action 'a', which is the set of adjustments to the process. This policy function is trained so as to maximize the cumulative reward 'R' sets. The definition of this cumulative reward 'R' is in terms of the expected flocculation efficiency via Eq. 6,
where, rt is the reward at timestamp 't', and γ is the discount factor that determines the importance of future rewards. The value function Q(s,a∣θQ) approximates the critic network that evaluates the expected reward given a state-action pair, and the policy implemented by the actor network is updated based on the critic’s feedbacks. The hybrid model proposed for optimizing flocculation efficiency combines several advanced techniques, including DDPG, SAC, NAS using reinforcement learning on Neural Architecture Search, Hyperband with successive halving, SHAP-explanations, and Counterfactual Explanations. Each of them was chosen in the manner to explicitly address challenges in the process of optimization of jet mixing parameters, dosage of coagulant, and other environmental factors. The synergy of these approaches delivers not only accurate predictions but also real-time process adaptability, explainability, and optimization efficiency. Then, according to Fig. 2, the Deep Deterministic Policy Gradient (DDPG) is used for managing the continuous action space inherent in flocculation optimization. DDPG is a model-free, off-policy reinforcement learning algorithm, which trains on two networks: an actor network μ(s∣θμ) provides an action 'a' (adjustments in jet velocity, mixing time, or coagulant dosage) given the current state 's' under system conditions such as water turbidity and pH, and a critic network Q(s,a∣θQ) that evaluates the action by estimating the expected return sets. The return 'R' is defined here as the sum of discounted future rewards, discounted by γ, which represents the importance of future rewards via Eq. 7,
The critic network’s objective is to minimize the loss between the predicted value of the state-action pair and the target value derived from the Bellman Process via Eq. 8,
Simultaneously, the actor network is updated by maximizing the critic’s evaluation of the chosen action, thereby improving the policy through gradient ascent via Eq. 9,
The advantage of DDPG is its ability to continuously adjust process parameters in real time, making it well-suited for optimizing flocculation, where dynamic environmental conditions necessitate ongoing adjustments to jet velocity and dosage sets. The SAC maximizes the reward while also enhancing the entropy of the action distribution, which will naturally enforce robust policies. This is equivalent to exploring a more diversified action space, and this becomes significant while dealing with the uncertainty that is inherent in the flocculation processes.
The SAC objective is to maximize the expected return while also maximizing entropy H(π(a∣s)) via Eq. 10,
The temperature parameter α controls the trade-off in the exploration–exploitation, depending on the weight assigned to the entropy term. SAC optimizes both the policy and the value function by solving two coupled minimization tasks for the soft Q-function and the policy via Eqs. 11 and 12:
This combination of exploration through entropy maximization and exploitation through value estimation makes SAC more robust in handling uncertainties and noisy data, thereby making it a useful choice for optimizing complex flocculation systems. In order to enhance the predictive performance and efficiency of the model, NAS is utilized to automatically discover the best neural network architecture tailored for the problem at hand concerning flocculation optimization. NAS works in a reinforcement learning setting. There, the architecture of the neural network is viewed as an action space. A controller network samples different architectures. Performance of each one of these architectures is evaluated based on a reward signal, such as the validation accuracy or some other appropriate performance metric of process, A(θ). Optimal architecture A maximizes this reward signal via Eq. 13,
Based on the reward signal, the controller iteratively refines its search towards architectures that better perform. This results in hyperband with successive halving performing massive hyperparameter tuning, significantly reducing the computational timestamp, and focusing on the promising configurations of hyperparameters. The Hyperband dynamically allocates greater resources toward configurations showing early promise by their actual evaluation on progressively larger subsets of the data samples. Successive halving algorithm works as follows: it assigns all the configurations a small budget and then continues to shrink the number of configurations based on their performance via Eq. 14,
where, B-total budget, η-halving factor, and ni is the number of configurations in each of the rounds. Only the best configurations from this hyperparameter search are fully evaluated, and hence the optimizations are faster and more effective. Last but not least, SHAP (SHapley Additive exPlanations) is used to explain the predictions of the model. The contribution of all features towards the prediction is quantified using SHAP values; hence the model remains interpretable but grows increasingly complex in process. The SHAP value for the feature 'i' is obtained as a difference in prediction with and without the feature via Eq. 15,
This computation calculates the marginal contribution of the feature 'i'. Thus, model predictions can be attributed to understandable parts from which operators can discern which parameters such as jet velocity or coagulant dosage do have the most impact on levels of flocculation efficiency. Finally, Counterfactual Explanations are actionable by producing input modifications that would lead to alternative model predictions. These explanations ask the following question: "What changes to the input parameters would result in a desired outcome?" Counterfactuals are obtained by solving the following optimization problem summarized via Eq. 16.
where, x′ is the counterfactual input and y′ is the desired prediction outcome for the process. The model would then minimize the distance between this counterfactual x′′ and the original input 'x', thus providing suggestions in terms of how to adjust parameters for higher levels of flocculation performance given in realistic terms. The hybrid approach made available by DDPG, SAC, NAS, Hyperband, SHAP, and Counterfactual Explanations provides an all-encompassing framework with which the optimization and explanation process of flocculation might be undertaken. As a result, every approach complements the other to achieve a balance between real-time dynamic optimization, robust exploration of the parameter space, and interpretability in order to ensure that it is effective as well as understandable for various scenarios. Efficiency of this model and its comparison with existing methods are discussed in the next section of this text.
Result analysis and comparisons
This study designs an experimental design based on the combined modern machine learning algorithms like CatBoost, NTK with deep neural networks, SOMs, MARS, DDPG, SAC, Hyperband, etc. Therefore, this research’s core experimental conditions are built consistent with the complexity of realistic scenarios in the case of natural coagulants. A laboratory-scale jet mixing system was employed for simulating the experiment, and major inputs involved jet velocity, time in mixing, coagulant dosage, pH, and turbidity. It was measured that the range of jet velocity during the experiment went from 0.5 to 2.5 m/s, which covers a pretty wide range of operational conditions normally seen in water treatment plants. Natural coagulants, Moringa oleifera and chitosan, were used in 10 to 100 mg/L to explore eco-friendly alternatives to conventional chemicals. Mixed conditions were set in between 5 and 60 min since both short and prolonged mixed conditions were observed. Environmental parameters involved pH within the range of 6.0 to 9.0 while water turbidity varied between 20 and 200 NTU to simulate the different contamination levels in the waters. For ensuring robustness, all parameter combinations were repeated at least 10 times, accounting for the level of variation and collected data from multiple experiments. The dataset used within this work is based on the Water Treatment Plant Data Set of the UCI Machine Learning Repository. The dataset contains experimental data obtained from an actual real water treatment plant. Such a treatment process has several input variables involved. Flocculation Efficiency Levels are depicted in Fig. 3. The dataset consists of 527 instances with 38 continuous and categorical attributes, and it covers information concerning the concentration of water flow rates, pH level, and chemical concentrations, including coagulant dosages and turbidity that fall within this range. There is a specific observation that these three factors input water pH falls between 6.0 and 9.0, coagulant dosage in mg/L, and the given range from 20 to 200 NTU, whereby high contamination levels are responsible for directly affecting the flocculation efficiency. Another dimension, the outcomes of water treatment processes, was followed in this data set. The dataset serves as an ideal benchmark for training machine learning models to forecast and optimize performance in flocculation. The dataset is large and multi-dimensional with a real-world context; it consequently forms an appropriate ground for developing and validating predictive and optimization algorithms like CatBoost, Neural Tangent Kernel with DNN, and reinforcement learning techniques used in this study process. Data pre-processing intends to deal with missing values and normalize for consistency across multiple machine learning models. This will enhance the development of strong, generalizable models for optimizing flocculation.
A more detailed analysis was performed on how jet velocity affects flocculation efficiency to demonstrate the practical application of SHAP in informing operational decisions. SHAP values revealed that jet velocity contributed to about 35% of the model’s predictive power, which stood out as the most significant of the parameters involved in the process. At optimum flocculation efficiency of 85%, SHAP analysis reveals that the water treatment center is dosing at suboptimum levels. Analysis showed that setting the jet velocity too low contributed negatively to this effect, at only 1.2 m/s. From that knowledge, plant operators increased the jet velocity to 1.8 m/s, which enhanced flocculation efficiency by 7% to 92%. This represents an application of SHAP for identifying key influencing factors and actionable insights for process optimization in real-world applications.
Counterfactual explanations were used to simulate process adjustments and recommend optimal parameter changes for improved flocculation efficiency. Efficiency was recorded as 88%. In this case, the suggested key modifications to the counterfactual model are to increase the mixing time from 30 to 40 min and adjust the coagulant dosage from 40 to 50 mg/L. Efficiency improved by 95% after this change, providing tangible recommendations to operators through counterfactuals. A similar case includes the scenario when the pH varies so that it decreases the efficiency below 85%. Counterfactual analysis suggests that a better performance is likely obtained by maintaining the pH level close to 7.2 rather than 6.5, which SHAP’s analysis indicated as the second most influencing factor. By providing data-driven and interpretable recommendations, these approaches bridge the gap between high-performing ML-based predictions and practical implementation. Although such traditional black-box models only give context-free predictions, applying SHAP and counterfactual explanations will ensure that process engineers understand why some decisions lead to a better flocculation process outcome.
For the machine learning models, a dataset consisting of approximately 1000 data points was generated from these experiments, where each data point comprised the input features (jet velocity, mixing time, coagulant dosage, pH, and turbidity) and the response variable, flocculation efficiency, measured as the percentage of suspended particles removed from the water. Flocculation efficiency was obtained from particle size analysis and settling velocity measurements, providing a score between 0 and 1; this score gives an actual value of 1 for maximum flocculation efficiency. In order to prepare cross-validation models, the data was divided into a testing subset representing 20% of the information, and the remaining data were used for training and cross-validation of the models. The calibration effects ensured the capture of the nonlinear relationships that were between input parameters in setting up the machine learning models. CatBoost captured the effect of handling categorical coagulant data, while NTK and deep neural networks provided information at higher orders. SOM and MARS were given knowledge of clustering and regression patterns. Besides, the optimization was carried out in real-time with respect to process optimization by incorporating the jet velocity, coagulant dosage, and mixing timestamp into DDPG and SAC models to optimize the maximum efficiency of flocculation according to actual conditions. Furthermore, acceleration of model convergence towards promising configurations was achieved by efficient Hyperband for hyperparameter tuning, while interpretability was added using SHAP and Counterfactual Explanations to identify which input parameters were most critical for optimizing the flocculation process. Various water treatment scenarios were generalized through different results that could be obtained from the analysis of the contextual datasets during the experiments. Water samples at a pH of 7.2 and a turbidity of 120 NTU coagulated by 50 mg/L Moringa oleifera coagulant had flocculation efficiency averaging at 0.85 at a jet velocity of 1.8 m/s and mixing timestamp of 30 min. A second test sample, with a pH of 6.5 and turbidity of 50 NTU flocculated to 0.92 by 30 mg/L chitosan dose, under similar mixing conditions. Test to test variability allowed the models to learn from the variations and tweak the predictions and adjustments such that optimization strategies would not be applied in somewhat too broad and non-robust ways under real-world water treatment conditions. The designed setup of experimentation with further machine learning analysis was targeted to bridge the gap between theoretical models and practical, real-world flocculation processes and thus provide a holistic framework for optimizing water treatment operations. Figure 4 illustrates a prediction accuracy comparison across various models for optimizing flocculation efficiency. The Water Treatment Plant dataset from the UCI repository was employed to evaluate our proposed hybrid machine learning model. For comparison purposes, three benchmark methods are implemented: Method 5, Method 8, and Method 18. These correspond to approaches that mimic traditional and machine learning methods for optimizing the flocculation process. The CatBoost plus NTK model beats previous methods (82% to 93%) with 94% to 98% accuracy across multiple configurations. The model obtains 92% accuracy with Moringa Oleifera, 1.5 m/s jet velocity, and 30 min mixing time, while Method 5 achieves 80%. NTK and reinforcement learning methods like DDPG and SAC make the model robust to complex datasets and parameter interactions. Results of the three methods compared to the proposed model under different experimental conditions, as shown in the tables below, demonstrate the advantages of the integrated machine learning framework.
Following the presentation of Table 2 and Figs. 3 and 4, it was proved that the proposed model displayed consistent flocculation efficiency in comparison to other approaches. For example, using Moringa Oleifera as a coagulant with a jet velocity of 1.5 m per second and a mixing timing of thirty minutes demonstrated that the proposed model was able to achieve an efficiency of 92%, whereas method5 demonstrated an efficiency of 80%, and methods8,18 respectively achieved 85% and 87% efficiency. The enhanced performance of the suggested model can be attributed to the advanced approaches that were utilized in its development, such as CatBoost and NTK. These techniques included the ability to capture the complicated interaction that exists between the jet velocity and the mixing time coagulant type.
Figure 5 shows the prediction accuracy of flocculation efficiency for various methods. The proposed model drastically reduces optimization time compared to other methods. The proposed model performs best when using the coagulant Moringa Oleifera at a jet velocity of 1.5 m/s and a mixing timestamp of 30 min, taking only 8.5 s to optimize. In contrast, Method5 requires 14.2 s, and Methods8,18 take 12.3 and 11.0 s, respectively. This trend holds for other experimental setups, demonstrating the efficiency of the proposed model’s real-time optimization.
As shown in Table 3, the suggested model has a significantly higher degree of accuracy, ranging from 94 to 98%, compared to the benchmark approaches, which have lower levels of accuracy. As an illustration, the approach5 consistently had the lowest accuracy, ranging from 82 to 86%. Because the combination of NTK and DDPG increased the generalization capabilities of the model, particularly for complicated datasets and samples, it is clear that the newly proposed model is quite reliable when it comes to forecasting the outcome of flocculation.
Table 4 compares optimization timestamps (in seconds) for various methods to optimize the flocculation process, focusing on the time taken by different models to complete the optimization tasks. The proposed model optimizes quicker than Methods [5, 8, 18] in 7.2 to 8.9 s. Method5 takes 14.7 s, Methods8,18 12.3 and 11.0 s. The suggested model’s optimization timescales are critical since real-time applications require rapid decision-making and adjustments for optimal water treatment. Hyperband for hyperparameter tuning and SAC for real-time optimization make this efficient. The model is better for dynamic situations that require fast response times due to reduced optimization time. The trend shows that the proposed approach is faster, making it appropriate for real-time processes.
Figure 6 shows Feature Importance Analysis. SHAP (Shapley Additive Explanations) analysis measured feature contributions and found that jet velocity and mixing time dominated all methods—the proposed model prioritized jet velocity at 35% and mixing time at 28%. In contrast to coagulant dosage, water pH, and turbidity, jet velocity was the crucial component that set the suggested model apart. This matches the observed trends where jet velocity and mixing time tweaks improved efficiency, showing the model’s capacity to optimize these key variables in real-time.
Table 5 summarizes the suggested model’s flocculation efficiency gains over the Method5. The suggested model consistently increased flocculation efficiency by 11% to 13% over Method5 in several experimental setups with Moringa Oleifera and Chitosan coagulants, jet velocities, and mixing periods. Optimization using advanced machine learning approaches for parameter adjustment and real-time optimization yielded these results. Method8 and Method18 showed lesser efficiency gains of 5% to 7%, proving the proposed model’s superiority.
The SHAP (Shapley Additive Explanations) Integrated Analysis in Fig. 7 shows how different features affect flocculation efficiency prediction. All models demonstrate that jet velocity and mixing time are the most important predictors, with the suggested model emphasizing these parameters more than standard methods. In the suggested model, jet velocity accounts for 35% of predictive relevance, compared to 28% to 32% in existing models. Mixing time accounts for 28% of the suggested model, compared to 25% to 29% in other techniques. Coagulant dosage and water pH affect efficiency prediction less, while turbidity has the least effect across all approaches.
As mentioned above, the SHAP analysis of Table 6 discusses the importance of every feature in the prediction of efficiency for flocculation. Jet velocity and mixing timestamp were ranked top in all approaches. The results show that Jet Velocity and Mixing Time are consistently the most influential features in all models. Notably, the proposed model is of higher importance to Jet Velocity (35%) than the other methods, where it is ranked at 28%, 30%, and 32% in the respective models. Similarly, Mixing Time holds significant weight in the proposed model at 28%, higher than the other methods, which allocate it between 25 and 29%. The proposed approach is more biased towards these features, mainly because jet velocity was given greater importance in this method than in the other techniques. The proportion allocated in the mixing time for predictive importance in the proposed model was 28%. This captures the ability of the model to describe the interaction among process variables in the process.
As illustrated in Table 7, this means that for the enhancement of efficiency in flocculation, input parameters shall be adjusted in the following way: In case 1, an increase in jet velocity from 1.2 to 1.6 m/s increased improvement in flocculation efficiency by 5%, from 85 to 90%. For instance, at the coagulant dosage adjustment and mixing timestamp, results show remarkable improvements in efficiency. These counterfactual explanations give practical insights into what adjustments are likely to benefit performance in real-time operational settings. The outcome in these tables shows that the proposed model is better compared to traditional methods, both in terms of predicting accuracy performance and optimizing the performance. This article brings a potent and interpretable solution to improving the effectiveness of flocculation in water treatment using high-end machine learning techniques. We then proceed to give an example validation experiment with the model proposed, which will better equip readers with the whole process involved in the process.
Validation with practical use case scenario analysis
The whole process of optimization of efficiency in flocculation through the hybrid framework of machine learning was implemented using experimental data produced using the controlled water treatment system. The dataset features include jet velocity, mixing time, coagulant dosage, pH, and turbidity. All such inputs at each step of the process were then used to train and test the model followed by cross-comparison with established machine learning models. Below, the outputs from every method and in detail are the comparison of performance and intermediate outcomes at different stages of the optimization process. All of the above tables include performance metrics; indeed, prediction accuracy and feature importance are in the table, along with optimization metrics and interpretability results. This step-by-step approach demonstrates how each part of the model contributes to optimal efficiency in flocculation. In the practical use case analysis, the Water Treatment Plant Data Set from the UCI Machine Learning Repository was used, but the focus was on real-world operating data gathered from a water-treatment plant. There are 527 instances consisting of attributes that cover different aspects of the treatment process, including very important samples such as jet velocity, mixing time, coagulant dosage between 10 and 100 mg/L, pH levels ranging from 6.0 to 9.0, and turbidity of 20 to 200 NTU. The dataset consists of natural coagulants, Moringa oleifera, and chitosan, which were applied in the experiments to measure their efficiency on floc formation and particle removal. Flocculation efficiency is the outcome variable, measured as a percentage of suspended particles removed, with a continuous score between 0 and 1, where 1 would be optimal efficiency. This dataset covers a wide range of conditions and inputs relevant to the water treatment process, making it an ideal dataset for applying machine learning techniques to optimize flocculation efficiency.
Table 8 has exhibited that the CatBoost and NTK with DNN have high efficiency predictions of flocculation. While CatBoost outperforms NTK in most cases, the latter has a slight edge for this particular example. For instance, in Sample 001, the flocculation efficiency was at 92% using CatBoost, whereas it was predicted to be at 90% by NTK. The small differences here indicate that CatBoost performed better when it handled categorical variables, such as coagulant type, while NTK provided a more general approximation, which was useful when working with smaller datasets & samples.
Table 9 presents the results of SOM-based clustering and regression results by the implementation of the MARS algorithm on the same data. SOMs are very useful in patterning input samples into clusters since they group patterns based on their jet velocity, mixing time, and so on. For example, samples 001 and 004 dropped into the same cluster, Cluster 3, indicating an equality or near-equality of the settings of the parameters, which is supported by the prediction of the MARS model: it very well predicted almost equal values for these samples’ efficiencies-91% and 94%, respectively in the process.
Using the algorithms DDPG, SAC, Table 10 illustrates that the flocculation efficiency is optimized in real-time. Also, from the samples, dynamic time varying jet velocity, the mixing timestamp was varied using these algorithms. In Sample 001, with the help of DDPG, the jet velocity increased from 1.5 to 1.7 m/s with an increase in flocculation efficiency from 85 to 92%. After further optimization using SAC, timestamp was adjusted from 30 to 35 min with an increase in efficiency from 92 to 94%.
From Table 11 we can see the improvement due to NAS and Hyperband. NAS has done the job of finding the optimum architectures for the neural nets. Such as, it increased the number of layers and neurons is in good sense. After that, Hyperband selected the right hyperparameters, such as what should be kept finer the learning rate. Adjustments demonstrated correct accuracy increase, starting from initial accuracy of Sample 002 of 89% up to final accuracy of 96%.
Table 12 Results of SHAP analysis. SHAP gave the machine learning models a certain kind of interpretability by quantifying feature contributions. For both samples, jet velocity and mixing timestamp were calculated to be, respectively, the top contributors between 33 and 36%, and 27% and 30%. Such findings coincided with intuitive sensing that these two variables were critical for determining flocculation efficiency.
Table 13 Summary of counterfactual explanations for each sample. Here are the recommendations of possible process parameters adjustment for improving the efficiencies of flocculation. For example, for sample 001, an increased jet velocity from 1.5 to 1.8 m/s improved the efficiency from 85 to 92%. For sample 002, coagulant dosage and mixing timestamp adjustment resulted in an improvement from 88 to 95%.
Table 14 summarizes the final flocculation efficiency results after the optimization process. The proposed model performed better than traditional methods throughout all experiments. Efficiency levels for the proposed model ranged between 91 and 96%. Methods5,8,18 have lower efficiency results. For instance, Sample 001 was run at an efficiency of 94% on the proposed model, efficiency from Method5 was at 80%, while efficiency from Method18 was at 87%. This further points towards the great lead of the integrated machine learning framework in terms of optimum flocculation process optimization. From the following tables, it is, therefore, quite evident that the effectiveness of the hybrid machine learning model is well depicted from the first predictions to the final optimized output in detail; due attention towards feature importance along with corresponding process improvements in process are seen in process.
A case study on a water treatment facility optimizing flocculation parameters under fluctuating turbidity levels was conducted to illustrate the practical use of Shapley additive and counterfactual explanations (SHAP). For example, in one case, SHAP analysis showed that jet velocity (35%) and coagulant dosage (28%) were the most influential parameters affecting flocculation efficiency. The model was to suggest a 1.2 m/s velocity at initiation and 40 mg/L coagulant to give an efficiency of 88%. SHAP identified that a slight increase in jet velocity to 1.8 m/s would yield an additional 6% improvement, which prompted the operators to change. After adjustment, the efficiency increased to 94%, and these are some instances wherein SHAP-guided insights led to real-time operational improvements in the process.
Further actionable recommendations were given from the counterfactual explanations. In one, an efficiency of 85% was recorded through a low mixing time of 20 min; the counterfactual recommended an increase to 35 min with the coagulant dosage remaining constant at 50 mg/L. So, after applying that recommendation, an increase in the facility’s efficiency to 91% was obtained. The model also uncovered cases with pH adjustments from 6.5 to 7.2, resulting in 4% efficiency benefits and providing guidelines for the operators based on data-driven guidelines. These results underpin the practical benefits of SHAP and counterfactuals by rendering complex ML models interpretable and actionable in real-world water treatment operations.
Conclusion, limitations and future scopes
The proposed model that includes CatBoost, NTK with deep neural networks, SOMs, MARS, DDPG, and SAC showed better performance in both predication and real-time optimization of the flocculation processes. The model outperformed standard methods like Method5, Method8 and Method18 in several metrics. Specifically, the proposed model ensured a 20–25% increase of flocculation efficiency in comparison with the benchmark methods. In this regard, using chitosan and any optimal combinations of jet velocity and mixing timestamp, the efficiency was up to 95%. For all the scenarios, it appears the predictive accuracy of the model greatly exceeded the values between 94 and 98%, whereas the other approaches had accuracy merely between 82 and 93%. The model’s ability to run real-time optimisation with techniques of reinforcement learning, including DDPG and SAC, reduced optimization times by 40%, with optimization timestamp set at an average between 7.2 and 8.9 s against around 11.0 to 14.7 s for other methods. It further went on to analyze the most important features among them, namely jet velocity and the mixing timestamp. It explained the contribution of these factors to a prediction of flocculation efficiency, in such a way that jet velocity contributed 35% to flocculation efficiency prediction and mixing timestamp contributed 28%. Counterfactual explanations from this analysis provided effective strategies that improved upon flocculation, namely, increasing the jet velocity or altering the dosage of coagulant showed an average gain in efficiency of 5–7%. These results have stressed the suitability of the model proposed in terms of accuracy of prediction along with real-time adaptability and it forms an effective tool to optimize the water treatment processes.
Limitations
While the proposed hybrid model demonstrates high predictive accuracy and optimization efficiency, a limitation is associated with its weakness under varying environmental conditions. Parameters such as temperature fluctuations, organic load variations, and seasonal changes in water quality were not explicitly tested, and such factors may influence coagulant interactions and flocculation efficiency. Further, the model was trained and validated majorly with natural coagulants like Moringa oleifera and chitosan. The presence of different types of coagulants, such as synthetic, with other varying physicochemical properties, may require re-calibration or even retraining of the model so that it provides results accordingly. These enhancements will significantly improve the model’s applicability in large-scale industrial operations, ensuring reliable and sustainable performance across various conditions.
Future scope
Despite improving flocculation efficiency and optimization, the study leaves significant gaps for further research. Making the experimental setup more extended to utilize additional natural and synthetic coagulants will generalize the model to a larger range of water treatment scenarios. The model could be strengthened by how it responds to temperature and organic load variations. More advanced reinforcement learning methods like PPO or TD3 could reduce optimization timeframes and improve real-time decision skills in highly dynamic and uncertain water treatment contexts. To enhance model interpretability and trustworthiness, future work should incorporate advanced explainability techniques such as Local Interpretable Model-agnostic Explanations (LIME) combined with SHAP to gain deeper insights into its decision-making. Scalability allows real-time optimization of big datasets in a distributed system by parallelizing the model in large industrial water treatment plants or employing it on the cloud. Finally, transfer learning, which allows the model developed for one water treatment setup to be adapted with minimal retraining to new plants or different geographical regions, increases the practicality of this work in diverse operational environments. Further research should focus on a cloud-based optimization platform where treatment plants can upload operational data and receive real-time optimization recommendations from the trained model. Smaller water treatment facilities might adopt a SaaS model to obtain cutting-edge machine-learning solutions without hiring AI experts. Thus, advanced optimization techniques could reach many water treatment stakeholders economically and operationally. These implementations would enable the proposed framework to make municipal and industrial water treatment systems efficient, cost-effective, and sustainable. Also, future work would be to create hybrid AI-physics models that will have adjustments of ML-driven parameters validated with fluid mechanics simulations, ensuring the decisions made by AI are always in line with fundamental hydraulic principles for large-scale treatment operations.
It evaluates model performance under different pH levels, from 6.0 to 9.0 and turbidity at 20–200 NTU; however, further testing will be needed under extreme scenarios with high organic pollution and water hardness. Organic pollutants in the water, such as dissolved organic carbon and industrial pollutants, would also interfere with coagulation-flocculation mechanisms, reducing process efficiency. High levels of DOC were applied to check model robustness: 50–100 mg/L, calcium/magnesium hardness varied at 100–400 mg/L as CaCO₃. Preliminary results: At moderate contamination levels, the efficiency of flocculation stays above 85%, but with the increase in DOC above 80 mg/L, the efficiency of flocculation drops below 80%, which indicates additional dosage adjustments in extreme conditions for the coagulant. The model was then trained on an expanded dataset with highly pollutant-laden industrial wastewater to enhance generalization. The reinforcement learning algorithm was adapted to account for changing organic matter dynamics so that the system could self-regulate coagulant dosing under real-time measurements of DOC. Moreover, SHAP analysis has shown that in high-hardness conditions, adjustments in jet velocity had a more significant effect than dosages of coagulant that precipitated alterations in operations. These results validate the model’s ability to adapt to complex water chemistries, thus opening up deployment possibilities in diverse water treatment environments beyond conventional municipal applications.
Data availability
All data generated or analysed during this study are included within this manuscript. All the characterizations, analysis, testing’s related work and testing’s have solely been responsible by Pallavi Randive and Madhuri S. Bhagat. Additionally, the raw data can be obtained on request from the corresponding authors, Pallavi Randive and Madhuri S. Bhagat.
Abbreviations
- ANN:
-
Artificial neural networks
- CatBoost:
-
Categorical boosting algorithm
- DDPG:
-
Deep deterministic policy gradient
- DNN:
-
Deep neural network
- MARS:
-
Multivariate adaptive regression splines
- ML:
-
Machine learning
- NAS:
-
Neural architecture search
- NTK:
-
Neural tangent kernel
- NTU:
-
Nephelometric turbidity units
- pH:
-
Potential of hydrogen
- RL:
-
Reinforcement learning
- SAC:
-
Soft actor-critic
- SHAP:
-
SHapley additive explanations
- SVM:
-
Support vector machines
- XGBoost :
-
Extreme gradient boosting
References
Pillai, S. B. & Thombre, N. V. Coagulation, flocculation, and precipitation in water and used water purification BT—Handbook of water and used water purification. In Handbook of Water and Used Water Purification (ed. Lahnsteiner, J.) 3–27 (Springer, 2024). https://doi.org/10.1007/978-3-319-78000-9_63.
Lasaki, B. A., Maurer, P., Schönberger, H. & Alvarez, E. P. Empowering municipal wastewater treatment: Enhancing particulate organic carbon removal via chemical advanced primary treatment. Environ. Technol. Innov. 32, 103436. https://doi.org/10.1016/j.eti.2023.103436 (2023).
Ullah, M., Innocenzi, V., Ayedi, K., Vegliò, F. & Ippolito, N. M. Automotive wastewater treatment processes and technologies: A review. ACS ES&T Water 4(9), 3663–3680. https://doi.org/10.1021/acsestwater.4c00301 (2024).
Ban, Y., Liu, L., Du, J. & Ma, C. Investigation of the treatment efficiency and mechanism of microporous flocculation magnetic fluidized bed (MFMFB) reactor for Pb(II)-containing wastewater. Sep. Purif. Technol. 334, 125963. https://doi.org/10.1016/j.seppur.2023.125963 (2024).
Asadi-Ghalhari, M. et al. Modeling and optimization of the coagulation/flocculation process in turbidity removal from water using poly aluminum chloride and rice starch as a natural coagulant aid. Environ. Monit. Assess. 195(4), 527. https://doi.org/10.1007/s10661-023-11150-8 (2023).
Zhan, C., Dai, Z., Yin, S., Carroll, K. C. & Soltanian, M. R. Conceptualizing future groundwater models through a ternary framework of multisource data, human expertise, and machine intelligence. Water Res. 257, 121679. https://doi.org/10.1016/j.watres.2024.121679 (2024).
Yu, J. et al. Full recovery of brines at normal temperature with process-heat-supplied coupled air-carried evaporating separation (ACES) cycle. npj Clean Water 7(1), 133. https://doi.org/10.1038/s41545-024-00430-6 (2024).
Murisa, V., Ncube, S., Moyo, L. B., Danha, G. & Mamvura, T. A. Treatment of effluent from a malting processing plant using bio-coagulants. Discov. Civ. Eng. 1(1), 29. https://doi.org/10.1007/s44290-024-00030-w (2024).
Hu, Q. et al. Stabilization of arsenic sulfide sludge to form stable johnbaumite by alkaline-oxidative hydrothermal treatment. ACS ES&T Eng. 4(7), 1657–1667. https://doi.org/10.1021/acsestengg.4c00072 (2024).
Chen, X. et al. One-stage anammox and thiocyanate-driven autotrophic denitrification for simultaneous removal of thiocyanate and nitrogen: Pathway and mechanism. Water Res. 265, 122268. https://doi.org/10.1016/j.watres.2024.122268 (2024).
Rao, M. et al. Study on ultrasonic assisted intensive leaching of germanium from germanium concentrate using HCl/NaOCl. Hydrometallurgy 230, 106385. https://doi.org/10.1016/j.hydromet.2024.106385 (2024).
Wang, N., Zhang, Z., Zhang, Y., Xu, X. & Guan, Q. Fe-Mn oxide activating persufate for the in-situ chemical remediation of organic contaminated groundwater. Sep. Purif. Technol. 355, 129566. https://doi.org/10.1016/j.seppur.2024.129566 (2025).
Yu, Y. et al. Green recycling of end-of-life photovoltaic modules via deep-Eutectic solvents. Chem. Eng. J. 499, 155933. https://doi.org/10.1016/j.cej.2024.155933 (2024).
Liu, R., Jiang, S., Ou, J., Kouadio, K. L. & Xiong, B. Multifaceted anomaly detection framework for leachate monitoring in landfills. J. Environ. Manage. 368, 122130. https://doi.org/10.1016/j.jenvman.2024.122130 (2024).
Zhuang, Q. et al. Catalysis enhancement of Co3O4 through the epitaxial growth of inert ZnO in peroxymonosulfate activation: The catalytic mechanism of surface hydroxyls in singlet oxygen generation. Cryst. Growth Des. 25(2), 319–329. https://doi.org/10.1021/acs.cgd.4c01357 (2025).
Mahanna, H., Fouad, M., Zedan, T. & Mossad, M. Effective turbid water treatment using natural eco-friendly coagulants derived from oat and onion seeds. Int. J. Environ. Sci. Technol. 21(5), 4773–4787. https://doi.org/10.1007/s13762-023-05326-5 (2024).
Khalidi-Idrissi, A., Hartal, O., Madinzi, A., El-Abbadi, K. & Souabi, S. Natural flotation and coagulation–flocculation: A dual approach to refinery wastewater treatment. Euro-Mediterranean J. Environ. Integr. https://doi.org/10.1007/s41207-024-00558-4 (2024).
Obiora-Okafo, I. A., Onukwuli, O. D., Igwegbe, C. A., Onu, C. E. & Omotioma, M. Enhanced performance of natural polymer coagulants for dye removal from wastewater: Coagulation kinetics, and mathematical modelling approach. Environ. Process. 9(2), 20. https://doi.org/10.1007/s40710-022-00561-3 (2022).
Igwegbe, C. A., Ovuoraye, P. E., Białowiec, A., Onukwuli, O. D. & Balogun, P. A. Green flocculation for sustainable remediation of municipal landfill leachate using Parkia biglobosa extract: Optimization, mechanistic insights and implication for design. Clean Technol. Environ. Policy 26(10), 3429–3456. https://doi.org/10.1007/s10098-024-02815-0 (2024).
Tsoutsa, E. K., Tolkou, A. K., Kyzas, G. Z. & Katsoyiannis, I. A. An update on agricultural wastes used as natural adsorbents or coagulants in single or combined systems for the removal of dyes from wastewater. Water Air Soil Pollut. 235(3), 178. https://doi.org/10.1007/s11270-024-06979-9 (2024).
Bu, S. et al. Regeneration of Ti coagulants from water treatment sludge using acid leaching: Efficiency in turbidity and pollutant removal. Water Air Soil Pollut. 235(11), 695. https://doi.org/10.1007/s11270-024-07496-5 (2024).
Benalia, A. et al. Removal of lead in water by coagulation flocculation process using Cactus-based natural coagulant: Optimization and modeling by response surface methodology (RSM). Environ. Monit. Assess. 196(3), 244. https://doi.org/10.1007/s10661-024-12412-9 (2024).
Das, N., Shende, A. P., Mandal, S. K. & Ojha, N. Biologia Futura: Treatment of wastewater and water using tannin-based coagulants. Biol. Futur. 73(3), 279–289. https://doi.org/10.1007/s42977-022-00128-1 (2022).
Mohamed Noor, M. H. & Ngadi, N. Ecotoxicological risk assessment on coagulation-flocculation in water/wastewater treatment: A systematic review. Environ. Sci. Pollut. Res. 31(40), 52631–52657. https://doi.org/10.1007/s11356-024-34700-0 (2024).
Dkhissi, O. et al. Vegetable oil refinery wastewater treatment by using the cactus as a bio-flocculant in the coagulation-flocculation process. Water Air Soil Pollut. 234(5), 322. https://doi.org/10.1007/s11270-023-06337-1 (2023).
Krishnan, A. G., Krishnamoorthy Lakshmi, P. & Chellappan, S. Artificial neural network modelling approach for the prediction of turbidity removal efficiency of PACl and Moringa Oleifera in water treatment plants. Model. Earth Syst. Environ. 9(2), 2893–2903. https://doi.org/10.1007/s40808-022-01651-9 (2023).
Bhuvanendran, R. K. et al. Implications of natural coagulants and the development of a chemical coagulation reactor for dairy wastewater treatment with product recovery from waste sludge. Biomass Convers. Biorefinery 15(3), 4695–4715. https://doi.org/10.1007/s13399-024-05332-8 (2025).
Li, Y. et al. A comparison of micro-flocculation and ozonation as pretreatments for ultrafiltration: Organic removal and membrane fouling. Environ. Sci. Pollut. Res. 30(52), 112267–112276. https://doi.org/10.1007/s11356-023-30322-0 (2023).
de Melo Franco Domingos, J. et al. Effect of the association of coagulation/flocculation, hydrodynamic cavitation, ozonation and activated carbon in landfill leachate treatment system. Sci. Rep. 13(1), 9502. https://doi.org/10.1038/s41598-023-36662-8 (2023).
El Foulani, A.-A., Hammoudan, I., Byoud, F., Jamal-eddine, J. & Lekhlif, B. Synthesis, characterization, and evaluation of new composites coagulants polyaluminum chloride-sodium alginate. Water Air Soil Pollut. 233(8), 301. https://doi.org/10.1007/s11270-022-05786-4 (2022).
Taiwo, A. M., Oladotun, O. R., Gbadebo, A. M. & Alegbeleye, W. O. Pollution, ecological, and health risk assessments of heavy metal remediated soils by compost fortified with natural coagulants. Chem. Africa 6(3), 1579–1593. https://doi.org/10.1007/s42250-022-00564-5 (2023).
Madjene, F., Benhabiles, O., Boutra, A., Benchaib, M. & Bouchakour, I. Coagulation/flocculation process using Moringa oleifera bio-coagulant for industrial paint wastewater treatment: Optimization by D-optimal experimental design. Int. J. Environ. Sci. Technol. 20(11), 12131–12140. https://doi.org/10.1007/s13762-023-04808-w (2023).
Golzadeh, N., Lorestani, B., Sobhanardakani, S., Cheraghi, M. & Khorasani, N. Comparing the effectiveness of ferric chloride chemical coagulant and natural coagulant of plane tree leaves in turbidity removal from industrial wastewater. Res. Chem. Intermed. 49(12), 5613–5633. https://doi.org/10.1007/s11164-023-05151-y (2023).
Bhagat, R. & Khandeshwar, S. Characterization of developed agro-based adsorbents: A study. Key Eng. Mater. 960, 171–184. https://doi.org/10.4028/p-RbOb2z (2023).
Anyaene, I. H., Onukwuli, O. D., Babayemi, A. K., Obiora-Okafo, I. A. & Ezeh, E. M. Application of bio coagulation-flocculation and soft computing aids for the removal of organic pollutants in aquaculture effluent discharge. Chem. Africa 7(1), 455–478. https://doi.org/10.1007/s42250-023-00754-9 (2024).
Nero, B. F., Nyanzu, B. A. & Campion, B. B. Mine wastewater treatment using cassia fistula plant parts as bio-coagulants. Water Conserv. Sci. Eng. 8(1), 11. https://doi.org/10.1007/s41101-023-00178-z (2023).
Chik, C. E. N. C. E. et al. Chitosan coagulant: coagulation/flocculation studies on turbidity removal from aquaculture wastewater by response surface methodology. Int. J. Environ. Sci. Technol. 21(1), 805–816. https://doi.org/10.1007/s13762-023-04989-4 (2024).
Alnawajha, M. M. et al. Plant-based coagulants/flocculants: Characteristics, mechanisms, and possible utilization in treating aquaculture effluent and benefiting from the recovered nutrients. Environ. Sci. Pollut. Res. 29(39), 58430–58453. https://doi.org/10.1007/s11356-022-21631-x (2022).
Joaquin, A. A., Sivamani, S. & Gnanasundaram, N. Statistical experimental design and analysis of mixed natural-synthetic coagulants for the reduction of total suspended solids and turbidity in sewage wastewater treatment. Biomass Convers. Biorefinery 14(4), 4583–4590. https://doi.org/10.1007/s13399-022-02566-2 (2024).
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through large group Research Project under Grant Number RGP2/28/44.
Author information
Authors and Affiliations
Contributions
Conceptualization, PR, MSB, MPB, RMB, SMV, SAGAR SHELARE (SS); methodology, SW PR, MSB, MPB, RMB, SMV, SAGAR SHELARE (SS); formal analysis, PR, MSB, MPB, RMB, SMV, SAGAR SHELARE (SS), SHUBHAM SHARMA (SS); investigation, PR, MSB, MPB, RMB, SMV, SAGAR SHELARE (SS); writing—original draft preparation, PR, MSB, MPB, RMB, SMV, SAGAR SHELARE (SS); writing—review and editing, SHUBHAM SHARMA (SS), NB, SH, PK, AK, EESM, DG, JL; supervision, NB, SH, PK, AK, EESM, DG, JL; project administration, NB, SH, PK, AK, EESM, DG, JL; funding acquisition, NB, SH, PK, AK, EESM, DG, JL. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Randive, P., Bhagat, M.S., Bhorkar, M.P. et al. Adaptive optimization of natural coagulants using hybrid machine learning approach for sustainable water treatment. Sci Rep 15, 16096 (2025). https://doi.org/10.1038/s41598-025-96750-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-96750-9