Monitoring the health of a city by its soundscape

Traffic, sirens, horns, building work: this is the daily soundtrack of our cities. We have got used to it, but such noise may point to severe pollution


Have you ever noticed how many sounds make up an urban environment? They can tell us much about what is happening in a city. Most are due to traffic. Monitoring and classifying such sounds allows the level of noise pollution to be assessed, which is a growing concern for large cities. By monitoring the soundscape, authorities can improve the lives of town dwellers. In the last years, urban sound classification has achieved remarkable progress and is still an active research area in audio pattern recognition.

A recent paper published by MAtchUP partner Universitat Politecnica de Valencia, investigates this topic and presents pioneering WASN technology – short for Wireless Acoustic Sensor Network. This system recognises and classifies a given set of sounds. The authors of the paper have tested the technology in big avenues in Valencia, which is one of the smart cities participating in MAtchUP Project.


This work presents a wireless acoustic sensor network (WASN) that monitors urban environments by recognizing a given set of sound events or classes. The nodes of the WASN are Raspberry Pi devices that not only record the ambient sound, but also detect and recognize different sound events. All the signal processing tasks, from the recording to the classification carried out by a convolutional neural network (CNN), are run on Raspberry Pi devices. Due to the low cost of the proposed acoustic nodes, the system exhibits a very high potential scalability. Regarding the underlying WASN, it has been designed according to the open standard FIWARE, thus the whole system can be deployed without the need of proprietary software. Regarding the performance of the sound classifier, the proposed WASN achieves similar accuracy compared to other WASNs that make use of cloud computing. However, the proposed WASN significantly minimizes the network traffic since it does not exchange audio signals, but only contextual information in form of labels. On the other hand, most of the time the class reported by the WASN nodes is the “background” soundscape, which usually contains no event of interest. This is the case when monitoring the soundscape of big avenues, where four events have been identified: “traffic”, “siren”, “horn” and “noisy vehicles”, being the “traffic” class associated to the background soundscape. In this paper, the use of a simple pre-detection stage prior to the CNN classification is proposed, with the aim of saving computation and power consumption at the nodes. The pre-detection stage is able to differentiate the other three relevant sounds from the “traffic” and activates the classifier only when some of these three events are likely occurring. The proposed pre-detection stage has been validated through data recorded in the city of Valencia (Spain), achieving a reduction of the Raspberry Pi CPU’s usage by a factor of six.