
Behind the Tools
Discover the essence and secrets behind the tools in our web applications.
Photo by Mohammad Rahmani from Unsplash
Machine Learning Model Development
Machine learning (ML) techniques have gained significant attention in recent years for their ability to model and predict complex chemical processes. One such application is the prediction of hydrogen (Hâ‚‚) and carbon dioxide (COâ‚‚) yields from biomass gasification, a thermochemical process that converts organic matter into a gaseous mixture known as syngas.
​
The development of an ML model for this purpose typically involves several key steps:
​
-
Data Collection: A total of 327 experimental datasets were gathered from biomass gasification experiments, including input parameters (e.g., temperature, pressure, biomass composition) and output measurements (e.g. Hâ‚‚ and COâ‚‚ concentrations).
-
Data Preprocessing: Several techniques were applied to prepare the raw data for modeling, including:
-
​Dealing with missing values: Columns with too many missing values were either deleted, or missing values were imputed using mean/median/mode imputation or prediction sub-models.
-
Normalization: The data was normalized to bring all variables into the range of 0 to 1 using a standard normalization formula.
-
One-hot encoding: One-hot encoding was applied to represent categorical variables like feedstock type as binary vectors.
-
-
Data Preparation and Exploratory Data Analysis (EDA): EDA was performed to gain insights into data characteristics, patterns, and relationships. Pearson correlation coefficients were computed to visualize relationships between variables, and highly correlated variables were removed to reduce noise.
-
Regressor Model Development: Regressor models were developed to predict continuous target variables, such as Hâ‚‚ and COâ‚‚ yields. The steps included:
-
​The data was split into 70% training and 30% testing sets.
-
Three regressor models were used: Extra Trees, Gradient Boosting, and K-Nearest Neighbors.
-
Hyperparameters were tuned using RandomizedSearchCV with 5-fold cross-validation to optimize model performance.
-
Models were evaluated using metrics like R-squared, Mean Squared Error, and Mean Absolute Error, and the best-performing model was selected.
-
SHAP (SHapley Additive exPlanations) analysis was employed to interpret model predictions and feature importance using beeswarm and bar plots.​
-
-
Results and Deployment: The analysis revealed that the Extra Trees model achieved impressive accuracy:
-
Hâ‚‚ Concentration Prediction: Extra Trees with a maximum depth of 16 and 280 estimators attained an accuracy of 96.41% (R²=0.9641).
-
COâ‚‚ Concentration Prediction: The default Extra Trees configuration (no maximum depth=None, 100 estimators) excelled with an accuracy of 84.73% (R²=0.8473).
-
These high-performing models were integrated into our web application to predict gas outputs. They also serve as objective equations within the multi-objective optimization process.
-
Multi-objective Optimization
While ML models can effectively predict the outputs of biomass gasification processes, optimizing the operating conditions to achieve desired objectives, such as minimizing COâ‚‚ emissions and maximizing Hâ‚‚ production, requires a different approach. Multi-objective optimization (MOO) techniques are well-suited for addressing such problems, where multiple conflicting objectives need to be simultaneously optimized.
​
In the context of biomass gasification, the primary objectives are often to minimize the production of COâ‚‚, a greenhouse gas, while maximizing the yield of Hâ‚‚, a valuable energy carrier. These two objectives are conflicting, as the conditions that favor high Hâ‚‚ yields may also result in higher COâ‚‚ emissions, and vice versa.
​
The MOO process for optimizing biomass gasification typically involves the following steps:
​
-
Objectives and Constraints Definition: The objectives were to maximize Hâ‚‚ yield and minimize COâ‚‚ yield by adjusting feedstock and operating conditions. The MOO problem was formulated with the objective functions representing Hâ‚‚ and COâ‚‚ yields predicted by the trained ML models. Constraints were set on the bounds of continuous parameters and selection of one category per categorical variable (operation mode, reactor type, bed material, catalyst presence, system scale).
-
Classifier Sub-model Development: Classifier models (Extra Trees, Gradient Boosting, K-Nearest Neighbors) were developed to predict the gasifying agent type from steam/biomass ratio and equivalence ratio. Model performance was evaluated using accuracy, precision, recall, and F1 score. The best classifier model was used as a sub-model to predict feature variables for the main prediction models.
-
Solving MOO Problems: The NSGA-II genetic algorithm was used to solve the MOO problems and find Pareto optimal solutions. Multiple optimization runs were performed with varying population sizes and random seeds to obtain a comprehensive set of optimal solutions.
-
Optimization Result Analysis and Validation: The Pareto optimal solutions were ranked using non-dominated sorting and Data Envelopment Analysis (DEA) methods. Selected solutions from both methods were compared, visualized to construct the Pareto front, and validated using chemical engineering principles.
Biomass Blending
​
Different biomass feedstocks have varying compositions, leading to different product yields. The multi-objective optimization (MOO) approach identified optimal compositions that balance maximizing Hâ‚‚ yield while minimizing undesired COâ‚‚. To achieve these target compositions cost-effectively, an optimal biomass blending strategy is required.
​
The following steps outline the approach used for biomass blending in Thailand:​
​
-
Data Collection and Preparation: This step involved gathering all the necessary data required for the optimization, including biomass compositions, costs, supply amounts, and transportation distances.
-
The first step involved collecting data on the composition, specifically the carbon and hydrogen content, of various biomass types available in Thailand.
-
Data on feedstock costs was gathered, and transportation costs were calculated based on factors such as fuel consumption, tire costs, and truck maintenance.
-
Information on the average annual agricultural product supply by province from 2019 to 2022 was obtained and converted to equivalent biomass amounts using conversion factors.
-
Data on the distances between potential biomass gasification plant locations and biomass suppliers in each province was collected.
-
-
Mathematical Model Formulation: In this step, a mathematical model was formulated to represent the biomass supply chain optimization problem.
-
Indices, parameters, and decision variables were defined to accurately represent the biomass supply chain dynamics.
-
An objective function was formulated to minimize the total cost, which is the sum of the feedstock cost and transportation cost.
-
Various constraints were imposed, including single gasification plant use, supplier use, supplier-plant connections, supply capacity, carbon/hydrogen content balance, minimum biomass supply, and at least one supplier-plant connection.
-
-
Optimization using Mixed Integer Linear Programming (MILP): The optimization was performed using the MILP technique, with the optimal compositions from MOO as targets.
-
The MILP technique was employed to optimize the biomass blending strategy.
-
The optimal compositions from the MOO were used as target carbon and hydrogen contents in the constraints.
-
The goal was to identify the optimal biomass types, amounts, suppliers, and gasification plant locations to meet the target compositions while minimizing the total cost.
-
-
Result Analysis: The optimized solution was analyzed to determine the biomass blending strategy.
-
The optimized solution was thoroughly examined to understand the values of the decision variables.
-
The amount of each biomass type to be transported from each supplier to the selected gasification plant was determined.
-
-
Implementation: This MILP model is implemented in our web application, allowing users to optimize their biomass blending strategies based on specific target compositions and cost structures.​