Effect of traffic dataset on various machine-learning algorithms when forecasting air quality
Purpose (limit 100 words) Road traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic datasets on air quality predictions has not been clearly investigated. This research investigates the effects traffic dataset have on the performance of Machine Learning (ML) predictive models in air quality prediction. Design/methodology/approach (limit 100 words) To achieve this, we have set up an experiment with the control dataset having only the Air Quality (AQ) dataset and Meteorological (Met) dataset. While the experimental dataset is made up of the AQ dataset, Met dataset and Traffic dataset. Several ML models (such as Extra Trees Regressor, eXtreme Gradient Boosting Regressor, Random Forest Regressor, K-Neighbors Regressor, and five others) were trained, tested, and compared on these individual combinations of datasets to predict the volume of PM2.5, PM10, NO2, and O3 in the atmosphere at various time of the day. Findings (limit 100 words) The result obtained showed that various ML algorithms react differently to the traffic dataset despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%. Research limitations/implications (limit 100 words) This research is limited in terms of the study area and the result cannot be generalized outside of the UK as many conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research. Therefore, leaving out a few other ML algorithms. Practical implications (limit 100 words) This study reinforces the belief that the traffic dataset has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form traffic dataset in the development of an air quality prediction model. This implies that developers and researchers in air quality prediction need to identify the ML algorithms that behave in their best interest before implementation. Originality/value (limit 100 words) This will enable researchers to focus more on algorithms of benefit when using traffic datasets in air quality prediction.