Random Forest Feature Selection for Particulate Matter (PM10) Pollution Concentration

Balogun, Habeeb; Alaka, Hafiz; Egwim, Christian; Ajayi, Saheed O.

View/Open

Random_forest_feature_selection_for_PM10_pollution_concentration.pdf (PDF, 1Mb)

Author

Balogun, Habeeb

Alaka, Hafiz

Egwim, Christian

Ajayi, Saheed O.

Abstract

There are already countless articles on strategies to limit human exposure to particulate matter10 (PM10) pollution because of their disastrous impact on the environment and people's well-being in the United Kingdom (UK) and around the globe. Strategies such as imposing sanctions on places with higher levels of exposure, dissuading non-environmentally friendly vehicles, motivating bicycles for transportation, and encouraging the use of eco-friendly fuels in industries. All these methods are viable options but will take longer to implement. For this, efficient PM10 predictive machine learning is needed with the most impactful features/data identified. The predictive model will offer more strategic avoidance techniques to this lethal air pollutant, in addition to all other current efforts. However, the diversity of the existing data is a challenge. This paper solves this by (1) Bringing together numerous data sources into an Amazon web service big data platform and (2) Investigating which exact feature contributes best to building a high-performance PM10 machine learning predictive model. Examples of such data sources in this research include traffic information, pollution concentration information, geographical/built environment information, and meteorological information. Furthermore, this paper applied random forest in selecting the most impactful features due to its better performance over the decision tree Feature selection and XGBoost feature selection method. As part of the discovery from this research work, it is now clearly discovered that the height of buildings in a geographical area has a role in the dispersion of PM10.