Boruta-grid-search least square support vector machine for NO2 pollution prediction using big data analytics and IoT emission sensors Habeeb Balogun, Hafiz Alaka and Christian Nnaemeka Egwim Big Data Technologies and Innovation Laboratory, University of Hertfordshire, Hatfield, UK Abstract Purpose – This paper seeks to assess the performance levels of BA-GS-LSSVM compared to popular standalone algorithms used to build NO2 prediction models. The purpose of this paper is to pre-process a relatively large data of NO2 from Internet of Thing (IoT) sensors with time-corresponding weather and traffic data and to use the data to develop NO2 prediction models using BA-GS-LSSVM and popular standalone algorithms to allow for a fair comparison. Design/methodology/approach – This research installed and used data from 14 IoT emission sensors to develop machine learning predictive models for NO2 pollution concentration. The authors used big data analytics infrastructure to retrieve the large volume of data collected in tens of seconds for over 5 months. Weather data from the UK meteorology department and traffic data from the department for transport were collected and merged for the corresponding time and location where the pollution sensors exist. Findings – The results show that the hybrid BA-GS-LSSVM outperforms all other standalone machine learning predictive Model for NO2 pollution. Practical implications – This paper’s hybrid model provides a basis for giving an informed decision on the NO2 pollutant avoidance system. Originality/value – This research installed and used data from 14 IoT emission sensors to develop machine learning predictive models for NO2 pollution concentration. Keywords IoT, Bigdata, Air pollution prediction, Hybrid machine learning Paper type Research paper 1. Introduction Air pollution, a release of pollutants into the air, remains one of the significant challenges in the UK and globally, with over 25,000 associated deaths recorded yearly in the UK [1] and around 8.8 million deaths recorded globally [2]. Apart from deaths, air pollution exposure can result in various short and long-term health challenges [3, 4]. Examples of short-term health challenges include eye pain, throat irritation, headaches, allergic reactions, and upper respiratory infections. While lung cancer, brain damage, liver damage, kidney damage, heart disease, respiratory disease, and suchlike are examples of long-term health challenges [5]. Aside from the severe impact of air pollutants on health, air pollution has significant consequences on the UK and the global economy. It costs the UK government approximately £40bn yearly [6] and around £3 trillion economic costs globally [7]. Recent studies by the centre for research on energy and clean air (CREA) links over 1.5 billion days of absence from NO2 pollution prediction ©Habeeb Balogun, Hafiz Alaka and Christian Nnaemeka Egwim. Published inApplied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http:// creativecommons.org/licences/by/4.0/legalcode The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/2210-8327.htm Received 21 April 2021 Revised 24 June 2021 5 July 2021 Accepted 7 July 2021 Applied Computing and Informatics Emerald Publishing Limited e-ISSN: 2210-8327 p-ISSN: 2634-1964 DOI 10.1108/ACI-04-2021-0092 work, over 3.5million new cases of asthma and approximately 2million preterm births to air pollutants leading to an increase in health care cost and decrease in economic productivity. Air pollutants are airborne substances, usually of two categories: particulate matter and gases. Of the gases, Nitrogen dioxide (NO2) is arguably the most dangerous to human health [8]. NO2 pollutant emanates from combustion processes such as vehicle emissions, and this was noted during the Covid-19 pandemic, with a 20% decrease in global NO2 concentration [9]. However, a recent finding hypothesized that NO2 will still exceed the (Air quality index) AQI limit by 2025 [3]. Therefore, the NO2 AQI estimate poses a responsibility to stakeholders and researchers to devise strategic means to curb exposure to this UK’s pollutant. Arguably, predicting NO2 concentration is among the most efficient and effective ways to save lives from exposure to this deadly pollutant in different geographical locations. Furthermore, this prediction can help people avoid such areas when they have high NO2 concentration levels. 1.1 Related Work Studies on NO2 prediction models have thus justifiably increased since the turn of the millennium. However, if they are helpful to users vulnerable to pollution, e.g. coronavirus patients, the effectiveness of such models depends on the model’s performance. Lesser performance can be misguiding and could expose the user to a pollution hotspot, triggering life-threatening attacks. A machine learning-built predictive model’s performance is, among other factors, vastly dependent on the machine learning algorithm used [10, 11]. Several studies have thus compared some of the most popular algorithms (e.g. artificial neural network, support vector machine, and suchlike) in terms of their performance in predicting NO2 [12– 14] with Random forest, support vector machine usually performing better. However, despite clear proof from the literature that hybrid algorithms have performed better than standalone, they have not been vastly employed in the comparison studies [15, 16]. One such hybrid is the optimal-hybrid artificial intelligent algorithm based on the Least squares support vectormachine optimized by grid search, whose featureswere selected using the Boruta Algorithm (BA-GS-LSSVM). The Least square support vector machine (LSSVM) differs from the classical SVM due to improved objective function. LSSVM is widely used for classification and regression problems due to its high predictive ability compared to classical SVM. Findings from research like the prediction of gasoline’s price [17], speed of wind’s forecast [18] indicated that this model presents more operation speed and convergence accuracy. However, some shortcomings are associated with this algorithm’s performance, including optimizing parameter and feature selection. The BA-GS-LSSVM solves these comings. Thus, this paper seeks to assess the performance levels of BA-GS-LSSVM compared to popular standalone algorithms used to build NO2 prediction models. The objectives are as follows: (1) To pre-process a relatively large data of NO2 from IoT sensors with time- corresponding weather and traffic data (2) To use the data to develop NO2 prediction models using BA-GS-LSSVM and popular standalone algorithms to allow for a fair comparison. It is imperative to describe the symbols used in this researchwork. Table 1 definesmost of the symbols and their description. The rest of this paper organized as follows: section two presents a brief explanation of the source of data and data volume. Section three presents feature selection techniques used in selecting the valuable features for developing the hybrid model. Section four presents the ACI hybrid model detailing the theoretical/mathematical representation of the model and how it differs from classical SVM. Lastly, Section five describes the development of the BA-GS- LSSVM, other popular standalone machine learning algorithms for NO2 prediction and their performance assessment for comparison. Finally, the conclusion and discussion form part of the fifth section. 2. Data description and big data analytics Many UK cities, just like other cities of the world, suffer from air pollution. A significant contributor to air pollution is increasing traffic emissions [19]. Air pollution caused by traffic depends on the type of vehicle (diesel, gasoline, petrol, electric), level of congestion, time spent in the traffic jam, and the atmospheric/geographical features of the environment at a given time. To monitor/reduce exposure to air pollution, most cities now deploy monitoring sensors for measuring traffic intensity, weather characteristics, and air quality of the environment. The data is collected at specified frequencies (seconds, minutes, hours, days, and suchlike) depending on the users’ preference. For this project, a total of 14 Internet of Things (IoT) monitoring sensors for NO2 and other pollutant concentrations represented as blue circles were deployed across Wolverhampton City in the UK (see Figure 1). The sensors collected NO2 concentration and other harmful pollutant’s data every 10 s for five months (i.e. December 2019 and April 2020). Over ten billion (i.e. 10 3 6 3 603 603 243 303 53 14) data points were generated for this period which was massive. The data through the Middleware gateway deployed on elastic bean of amazon web service Symbol Description yI Observed NO2 concentration y*i Predicted value for NO2 concentration Log(n) Depth of tree n Number of rows d Number of features t Number of trees k Number of k neigbour γ Regularisation parameter σ2 width of Kernel parameter Table 1. Symbol and description Figure 1. Map showing the 14 NO2 IoT monitoring sensors deployed at Wolverhampton City, UK NO2 pollution prediction (AWS) directly dump the data into an AWS Elastic Computing cloud two (EC2) Relational database.We used the AWSEC2 infrastructure to run the big data analytics required for this study due to the extensive data. For the development of the BA-GS-LSSVM, the data from the 14 IoTs monitoring sensors for NO2 pollutant concentration was the dependent variable. In contrast, weather, other pollutants, e.g. PMx and Ozone and traffic data, were the independent variables. The traffic data was sourced from the UK’s Department for Traffic (DfT) and included mainly vehicle counts, split into various vehicle types (see Table 2). The traffic data, covering the same period as the data from the sensors, were retrieved. The weather data for a similar period was recovered from the UK Met Office. It included various weather variables like ambient pressure and humidity, among others (see Table 2). Traffic and weather data were provided hourly, and each had over fifty thousand data points. To match the weather and traffic data with the pollutant data from the sensors, the hourly average of the pollutant concentration was used to match the corresponding hourly weather and traffic data leading to (24 hrs 3 30 days 3 5months 3 14 IoTs) data points. The concentration of NO2 from December 2019 to April 2020 across the 14 installed sensors indicates some interesting trends, with some outliers around the end of 2019 (see Figure 2). These outliers at the end of the year are arguable due to the shopping, Christmas and other season celebration. In addition, the pollution concentration is arguably S/N Features Unit Data source 1 Ambient humidity RH UKMETOFFICE 2 Ambient pressure Pa 3 Ambient temp 0C 4 Humidity RH 5 Temp 0C 6 Road type – DFT 7 Link length in Km – 8 Link length in miles – 9 Pedal cycles – 10 Two wheeled motor – 11 Cars and taxis – 12 Buses and coaches – 13 Lgvs – 14 Hgvs 2 rigid Axle – 15 Hgvs 3 rigid Axle – 16 Hgvs 4 or more rig – 17 Hgvs 3 or 4 Articulate Axle – 18 Hgvs_5_Articulated_Axle – 19 Hgvs_6_Articulated_Axle – 20 All Hgvs – 21 All motor vehicles – 22 Zid – IoT 23 Date – 24 Holiday – 25 Day of the week – 26 X (3d coordinates) – 27 Y (3d coordinates) – 28 Z (3d coordinates) – 29 Pm1 mg/m3 30 Pm10 mg/m3 31 Pm25 mg/m3 Table 2. Independent features after matching the three data sources ACI influenced by the national lockdown imposed across the UK cities during the covid19 pandemic (see Figure 2). Another exciting exploration is the outliers discovered within some days of the week (see Figure 3). Looking at the boxplot, it is arguably the days with the highest amount of traffic as we can hypothetically say these are days many go out to bars, clubs and other gatherings at the end of the week. After pre-processing was completed on the AWS Big data infrastructure, the complete data was split into data (60%) for training and (40%) for model testing at random to avoid biases and other shortcomings. 3. Feature selection The predictive capability of various machine learning depends on the features’ dimensionality; LSSVM is not an exception [20]. Not all features impact the prediction, making feature/variable selection critical in developing/building machine learning predictive models. 600 500 400 300 200 100 0 20 19 -1 2- 22 20 20 -0 1- 01 20 20 -0 1- 22 20 20 -0 2- 01 20 20 -0 2- 22 20 20 -0 3- 01 20 20 -0 3- 02 20 20 -0 4- 01 20 20 -0 5- 01 20 20 -0 4- 22 co n ce n tr at io n N O 2 [ u g /m 3 ] Datetime [–] 80 70 60 50 40 30 20 10 0 N O 2 T h u rs d ay F ri d ay S at u rd ay S u n d ay M o n d ay T u es d ay W ed n es d ay Figure 2. The hourly averaged NO2 concentration across the 14 IoT emission sensors Figure 3. Day of the week NO2 Concentration collected across the 14 IoT sensors NO2 pollution prediction Dimensionality reduction has been proven to help make predictive models perform better [21]. Of the reduction techniques, feature selection is selecting the most impactful features from the original set of features as the new input features. Since Random forest (RF) has consistently proven in past studies, e.g. [21–24], to be very good at selecting themost impactful features, thewrapper Boruta algorithm(BA) built around the RF was implemented for feature selection in this study. BA uses the same strategy as the classical RF classifier model introduced by [25]. The BA is implemented using the following steps: (1) Replicate and add a copy of all input features, i.e. weather and traffic features, to form an information system (IS) (2) Shuffle the IS and remove correlations among features in the IS (3) Apply a random forest classifier on the comprehensive IS (4) compute the Z scores represented as n for all the features and Identify themaximum n among shadow features (MnSF) (5) Assign a value to every feature that scores more than MnSF. (6) Carry out a test of equality and drop features lower than MnSF (7) Eliminate all the shadow features (8) Repeat the procedure After applying the feature selection process, a total of 13 essential features were selected by the BA, namely, Timestamp, O3, All motor vehicles, Humidity, Ambient pressure, Temperature, PM10, PM2.5, PM1, Day of the week, and x,y,z which is the 3d-geocentric- representation of the longitude and latitude. The timestamp arguably suggests some level of consistency in the pollutant levels at a specific time. For instance, the morning peak period or close of the day peak periods results in higher traffic. Afterwards, the 13 selected features will be used in developing an LSSVM predictive model, as discussed in the next section. 4. Least square support vector machine LSSVM, improvement to SVMwas proposed [15]. LSSVMprovides a linear equation solution with an improvement in the objective function of classical SVM. We use xk as the 13 feature selected with BA and yk is the NO2 concentration Then the improved SVM model can be mathematically written: yðxÞ ¼ ωT :∅ðxÞ þ b (1) where, ∅ðxÞ5 nonlinear mapping function, ω5 weight, and b 5 bias. The equation can be expressed: min ω;b;e ðω; eÞ ¼ 1 2 ωTω þ 1 2 γ Xn k¼1 e2k (2) Subject to yk ¼ ωT∅ðxkÞ þ b þ ek; k ¼ 1; 2; . . . n (3) Where, γ 5 regularisation parameter and ek 5 error term. ACI The model can be optimized using the LaGrange function as follows Lðω; b; e; αÞ ¼ 1 2 ωTω þ 1 2 γ Xn k¼1 e2k  Xn k¼1  αk  ωT∅ðxkÞ þ b þ ek  yk  (4) where αk ∈ R 5 Lagrange multiplier From Karush-Kuhn-Tucker (KKT) equation given as. 8>>>< >>>: ω ¼ Xn k¼1 αkwðxkÞ Xn k¼1 αk ¼ 0 αk ¼ ekγ; ωTwðxkÞ þ bþ ek  yk ¼ 0 (5) The optimization equation can be transformed to the linear equation given as Eqn 6, after eliminating the variables ω and ek.2 4 0 1 . . . 1 1 Kðx1; x1Þ þ 1=γ . . . Kðx1; xiÞ 1 Kðxj; x1Þ . . . Kðxj; xiÞ þ 1=γ 3 5 2 4 b α1 αj 3 5 ¼ 2 4 0 y1 yj 3 5 (6) The final equation of the LSSVM model is. f ðxÞ ¼ Xj k¼1 ∝ kKðx;xiÞ þ b (7) where Kðx;xiÞ ¼ wðxÞT * wðxjÞ is the kernel function: The finite response and the Radial basis function (RBF) kernel function was used in this research and mathematically expressed as, Kðx;xiÞ ¼ exp  −γ*jx xij2  (8) Where, γ ¼ 1 2σ2 4.1 Grid search After the development of the LSSVM using the features selected with the BA, the optimization of the LSSVM model parameter: regularisation parameter ( γ ) and kernel- parameter ðσ2Þ is another challenging area that should not be ignored as it could lead to poor prediction performance if not carefully chosen. The choice is where the grid search algorithm comes in. It does this by pairing the all- possible values of regularisation and kernel parameter ðγ; σ2Þ. It is applied to optimize these two parameters to have an improved prediction capability. Each pair of regularisation and kernel parameters is subjected to cross-validation and hence producing MSE value. For this study, the LSSVM parameters were initialized, taking a search range [0.01, 1000] and [0.1, 1000] for γ and σ2, respectively. Afterwards, cross-validation, with all possible values of γ, and σ2, the pair with the minimum MSE is the best, and that was used to create GS-LSSVM. NO2 pollution prediction 5. Model development process and performance measures Related works on the development of predictive models (e.g. [26– 29], identified Random forest (RF), Support vector machine (SVM), Decision tree (DT), XGboost (XGB), Adaboost, Artificial neural network (ANN) and Linear Regression (LR) as powerful machine learning algorithms for prediction. These popular algorithms were developed and compared with BA-GS-LSSVM. Given that feature selection may not be entirely favourable to some algorithms [11, 20], we developed the predictive models for each algorithm in two ways to allow fairer comparison. The first was to develop the models using all the available variables before the feature selection processes. The results from this were recorded and compared (see section 5 on results). The second was to develop the models using the 13 features selected with the Boruta algorithm. The results from this were also recorded and compared (see section 5 on results). Finally, the best results for each algorithm (whether from the first or second development) were compared to determine the best algorithm overall. The LSSVM Regression model has no specific python package, so we have implemented the Scikit learn package in python. Figure 4 presents the flow chart and overall procedures to build the hybrid GS-LSSVM predictive model to predict the concentration of NO2. To determine the predictive capability of a regression predictive machine learning model, various metrics measures loss and score models. Among these metrics, four, including the mean absolute error (MAE), mean square error (MSE), Explained variance score (EVS) and R Squared (R2), were used in this paper because of their popularity and are briefly described below. MAE is the average absolute variation(error) between each point in a scatter plot between the actual observation and the corresponding predicted value. It is a risk metric corresponding to the expected value of the absolute error loss. The best possible score is 0.00; the higher the MAE, the worse the predictive model’s performance. The MAE can be mathematically written as MAE ¼ 1=n Xn i yi  y*i  (9) Data source IoT sensor Traffic data Meteorology data Jupyter API ETL RDS Amazon S3 AWS Platform/Bigdata Feature Selection Data pre-process Feature extraction Optimize LSSVM Sagemaker Model Building API Gateway Amazon EC2 API: Application programme interface ETL: Extraction, Transform and Loading RDS: Relational database EC2: Amazon Elastic compute cloud IoT: internet of Things S3: Simple storage service AWS: Amazon web service Jupyter: Local Python IDE environment Data Matching Using Unique keys Figure 4. Flowchart for the GS-LSSVM ACI MSE is another risk metric that corresponds to the average of all the error squares between the predicted value and the actual value of the target variable. It is also referred to as mean squared deviation. MSE value is strictly positive, ranging between [0,1], and values closer to zero signifies a better predictive model. The mathematical definition of MSE is as follows. MSE ¼ 1=n Xn i¼0  yi  y*i 2 (10) Unlike the risk metric functions (i.e. MAE, MSE), The Explained variance score and R-square score depicts a better regression model when the score is getting closer to 1.0 and not zero. The EVS score measures the variation (a measure of dispersion) of the test data set. The best possible score of EVS is 1.0, and it is mathematically written as. EVS ¼ 1 yi  y * i yi (11) Lastly, R-squared referred to as the coefficient of determination is the proportion of dispersion in the feature(s) and the target variable. It indicates the goodness of fit and aid the measurement of how well-unseen data are likely to be predicted by the model. The best possible value for R-square is 1.0. It is mathematically given as. R2 ¼ 1 Pn i¼0  yi  y*i 2 Pn i¼0ðyi  yiÞ2 (12) Where yI ¼ Pn i¼1 yi n 5.1 Discussion of result In this study, an optimal-hybrid artificial intelligent algorithm based on the Least squares support vectormachine was optimized by grid search, whose features were selected using the Boruta Algorithm (BA-GS-LSSVM) to predict NO2 pollutant concentration were developed. We identified the most optimal values for the parametric functions of LSSVM to be γ ¼ 1000 and σ2 ¼ 10 using grid the search. The model was all developed following the union of the three data sources, including the weather, traffic, and IoT data on a big data platform considering many data points recorded (i.e. 24 hrs 3 30 days 3 5 months 3 14 IoTs). The data were merged and matched for the development of the predictive models. We then compared the performance capability of the proposed hybrid model and other powerful standalone machine learning in predicting NO2, however, in two streams to achieve a fair comparison with no bias. The two streams of comparisons to avoid biases are; (1) All models implemented without feature selection (2) All Models implemented with feature selection. Table 3 and Figure 5 presents metrics for developed models with (WF) and without feature (WoF) selection for a fair comparison. The BA-GS-LSSVM and all other ML models developed were done on a big data platform due to the algorithms’ time complexity. The time complexity of the models developed are GS-LSSVM:O(n3), RF:O(d*log(n)), KNN:O(knd), ANN:O(n4), DT:O(n*log(n)*d), SVR:O(n3), XGB:O(n*d*log(n)), LR:O(nd), LSSVM:O(n3), and ADB:O(nd2). As shown in Figure 5, the error measures, including the MAE, MSE for all the developed models, were presented in decreasing order. The order, in this case, shows the Adaboost (AB) to have the maximum error, followed by the LSSVM and linear Regression (LR) implemented without feature selection. At the same time, we can see GS-LSSVM with feature selection, i.e. BA-GS-LSSVM with the most negligible error score value. This explains the higher performance ability of the hybrid model over other standalone models. NO2 pollution prediction In addition, the doughnut chart shows the R-squared score for all the models developed, and the maximum score was yielded in the development of BA-GS-LSSVM. i.e. 6.35%(0.82). The assertion of the bias caused by feature selection was proved in this paper; for instance, the first approach (i.e. model’s implementation without feature selection) shows poor and woeful model performance. In addition, themodels developedwith feature selectionwas subjected to the 10-fold cross- validation to ensure efficient/unbias evaluation. For this, the research present in Figure 6, a box and whisker plot showing the spread in different performance metrics across each cross- validation fold for each algorithm From these results, BA-GS-LSSVM is identified the best considering its minimal error metric scores (i.e. lowest MAE andMSE score) compared to other ML developed. At the same time, BA-GS-LSSVM has scored the highest EVS and R-square score. Thus, our model Conclusively performs best compared to all other standard and powerful standalone ML models developed in this paper. The use of the big-data platform reduced the computational complexity for most of the models implemented. Also, the lower computational complexity of the LSSVM over the SVM is another outstanding advantage recognized in this research. Model/Metrics MAE MSE R-square EVS WoF WF WoF WF WoF WF WoF WF GS-LSSVM 8.32 6.91 114.57 97.08 0.76 0.82 0.76 0.82 KNN 7.6 7.59 152.9 131.93 0.73 0.77 0.73 0.77 RF 5.8 7.66 110.8 132.87 0.79 0.77 0.79 0.77 ANN 8.3 8.26 164 134.3 0.71 0.77 0.71 0.77 LSSVM 11.6 8.32 285.5 114.57 0.51 0.76 0.51 0.76 XGB 9.4 9.01 219 189.1 0.62 0.74 0.62 0.74 SVR 9.5 9.01 219.5 189.1 0.62 0.67 0.62 0.67 LR 9.5 10.3 219.5 199.98 0.62 0.62 0.62 0.62 DT 8.1 10.49 192 267.48 0.67 0.54 0.67 0.54 ADB 18.4 14.5 523 332.67 0.1 0.42 0.1 0.42 Figure 5. The NO2 concentration predictive models developed Table 3. Results of the overall model developed ACI R -S qu ar e sc or e M A E S co re M SE S co re EV S Sc or e 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2– 8 – 9 – 1 0 – 1 1 – 1 2 – 1 3 – 1 0 0 – 1 5 0 – 2 0 0 – 2 5 0 – 3 0 0 – 3 5 0L S S V M S V M L R D T R F X G B A d aB o o st A N N K N N G S _ L S S V M L S S V M S V M L R D T R F X G B A d aB o o st A N N K N N G S _ L S S V M L S S V M S V M L R D T R F X G B A d aB o o st A N N K N N G S _ L S S V M L S S V M S V M L R D T R F X G B A d aB o o st A N N K N N G S _ L S S V M (a ) (b ) (d ) (c ) Figure 6. Graphical illustrationss of the spread in (a) R-square, (b) MAE, (c) MSE, and (d) EVS score for each algorithm developed with feature selection NO2 pollution prediction 6. Conclusions High-precision NO2 prediction is critical to people’s well-being, especially those that are vulnerable to air pollution. However, the BA-GS-LSSVM model in this paper happens to be appealing and proves to be better than popular algorithms. To demonstrate the advantages of the BA-GS-LSSVM model, nine different algorithms were compared. At the end of the study, the following list of the inferences can be reached, including: (1) Boruta, a dimensionality selection technique, improves the performance of the ML model (2) Compared with all other standalone models developed, our model, BA-GS-LSSVM, exhibits a better predictive ability in NO2 concentration (3) The BA-GS-LSSVMmodel provides a basis for delivering an informed decision on the NO2 pollutant avoidance system. Future studies should explore the use of BA-GS-LSSVM to predict other pollutants in the UK and other parts of the world experiencing this outburst in air pollution concentration. References 1. Public Health England. Review of interventions to improve outdoor air quality and public health. 2019. 2. Nethery RC, Dominici F. Estimating pollution-attributable mortality at the regional and global scales: challenges in uncertainty estimation and causal inference. Eur Heart J. Oxford University Press. 2019; 40(20): 1597-1599. 3. DEFRA. Supplement to the UK plan for tackling roadside nitrogen dioxide concentrations. 2018(October), 1-54. 4. DfT and DEFRA. UK plan for tackling roadside nitrogen dioxide concentrations: detailed plan. Dep. Environ. Food Rural Aff. together with Dep Transp. 2017(July), 1-11. 5. Abdul Halim ND et al. The long-term assessment of air quality on an island in Malaysia. Heliyon. 2018; 4(12). 6. WHO Regional Office for Europe OECD. Economic cost of the health impact of air pollution in Europe: clean air, health and wealth. Eur Environ Heal Process. 2015, 1-54. 7. Myllyvirta L. Quantifying the economic costs of air pollution from fossil fuels key messages. 2020, 2-13. 8. Kopparapu R, Arney G., Haqq-Misra J., Lustig-Yaeger J., Villanueva G. Nitrogen dioxide pollution as a signature of extraterrestrial technology. Astrophys J. 2021; 908(2): 164. 9. Bauwens M et al. Impact of coronavirus outbreak on NO2 pollution assessed using TROPOMI and OMI observations. Geophys Res Lett. 2020; 47(11): 1-9. 10. Alaka HA et al. Systematic review of bankruptcy prediction models: towards a framework for tool selection. Expert Syst Appl. 2018; 94: 164-184. 11. Alaka H, Oyedele L, Owolabi H, Akinade O, Bilal M, Ajayi S. Firms failure prediction models. IEEE Trans Eng Manag. 2018(4), 1-10. 12. Kaminska JA. A random forest partition model for predicting NO2 concentrations from traffic flow and meteorological conditions. Sci Total Environ. 2019; 651(2): 475-483. 13. Juhos I, Makra L, Toth B. Forecasting of traffic origin NO and NO2 concentrations by support vector machines and neural networks using principal component analysis. Simul Model Pract Theory. 2008; 16(9): 1488-1502. 14. Dou X et al. Estimates of daily ground-level NO2 concentrations in China based on big data and machine learning approaches. 2020; arXiv(2). ACI 15. Ardabili S, Mosavi A, Varkonyi-Koczy AR. Advances in machine learning modeling reviewing hybrid and ensemble methods. Lect Notes Networks Syst. 2020; 101(August): 215-227. 16. Karballaeezadeh N, Mohammadzadeh SD, Shamshirband S, Hajikhodaverdikhan P, Mosavi A, wing Chau K. Prediction of remaining service life of pavement using an optimized support vector machine (case study of Semnan–Firuzkuh road). Eng Appl Comput Fluid Mechs. 2019; 13(1): 188-198. 17. Mustaffa Z, Yusof Y, Kamaruddin SS. Gasoline price forecasting: an application of LSSVM with improved ABC. Proced - Soc Behav Sci. 2014; 129: 601-609. 18. Tian Z. Short-term wind speed prediction based on LMD and improved FA optimized combined kernel function LSSVM. Eng Appl Artif Intell. 2020; 91(February): 103573. 19. Jia C, Li W, Wu T, He M. Road traffic and air pollution: evidence from a nationwide traffic control during coronavirus disease 2019 outbreak. Sci Total Environ. 2021. 20. Hafiz A, Lukumon O, Muhammad B, Olugbenga A, Hakeem O, Saheed A. Bankruptcy prediction of construction businesses: towards a big data analytics approach. Proc. - 2015 IEEE 1st Int Conf Big Data Comput Serv Appl BigDataService. 2015; 2015, 347-352. 21. Reddy GT et al. Analysis of dimensionality reduction techniques on big data. IEEE Access. 2020; 8: 54776-54788. 22. Deng H, Runger G. Gene selection with guided regularised random forest. Pattern Recognit. 2013; 46(12), 3483-3489. 23. Menze BH et al. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinf. 2009; 10: 1-16. 24. Dimitriadis SI, Liparas D, Tsolaki MN. Random forest feature selection, fusion and ensemble strategy: combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer’s disease patients: from the alzheimer’s disease neuroimaging initiative (ADNI) data. J Neurosci Methods. 2018; 302: 14-23. 25. Kursa MB, Rudnicki WR. Feature selection with the boruta package. J Stat Softw. 2010; 36(11): 1-13. 26. Choi J, Gu B, Chin S, Lee JS. Machine learning predictive model based on national data for fatal accidents of construction workers. Autom Constr. 2020; 110(May): 102974. 27. Purnus A, Bodea CN. A predictive model of contractor financial effort in transport infrastructure projects. Proced Eng. 2017; 196(June): 746-753. 28. Bilal M, yedele LO, Guidelines for applied machine learning in construction industry—a case of profit margins estimation, Adv Eng Informatics. 2020; 43(March) 2019, 101013. 29. Mehtab S, Sen J, Stock price prediction using convolutional neural networks on a multivariate timeseries. 2020. Corresponding author Hafiz Alaka can be contacted at: hafizalaka@outlook.com For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: permissions@emeraldinsight.com NO2 pollution prediction