Iā€™m excited to share my challenging experience working on my latest project using Edge Impulse. This time, I was on a mission to train a machine learning model to recognise and respond to two core sounds in Egyptian percussion: “Doom” and “Tak.” Spoiler alert: it was a lot tougher than my “Hey Shrouk” project! 

Step 1: Data Collection I started by collecting data for the “Doom” and “Tak” sounds. This involved recording multiple samples of each sound.

Edge Impulse data collection interface with options to set label, length, and category. A blue 'Start recording' button is centered, with 'Audio captured with current settings: 0s' displayed below.

Step 2: Dataset Overview With 41 seconds of audio data collected, I split it into training and test sets. This step is for validating the model later on. Each sound (“Doom” and “Tak”) had its dedicated samples labelled for effective training. 

Edge Impulse dataset overview showing 41s of collected data, 73% train/27% test split, and a list of 'tak' audio samples with timestamps.
Edge Impulse dataset overview showing 41s of collected data, 73% train/27% test split, and a list of 'doom' audio samples with timestamps.

Step 3: Impulse Design Designing the impulse involved setting up time series data, adding a processing block, and configuring the learning block. This setup is the backbone of how the model will interpret and classify the sounds! šŸ§©

Edge Impulse interface for configuring impulse design, showing time series data settings for audio input with options to add processing and learning blocks.
Edge Impulse feature extraction options, highlighting Audio (MFCC) and Audio (MFE) methods for processing voice and non-voice audio signals respectively.
Edge Impulse interface showing configured impulse design for audio classification, with time series data, Audio (MFE) processing, and Classification blocks set up to distinguish between 'doom' and 'tak' sounds.
Edge Impulse interface showing feature generation for a machine learning model. Training set contains 30s of data with 2 classes (doom, tak) and 30 training windows. A 'Generate features' button is prominent.

Step 4: Feature Generation Here, I used the Audio (MFE) block to extract Mel-filterbank energy features from the audio signals. These features are essential for distinguishing between “Doom” and “Tak.” This step felt like giving the model its auditory senses šŸŽ§

Console output showing successful feature generation process in Edge Impulse. Steps include dimension reduction, embedding construction, and output file writing, completed in about 9 seconds.

Step 5: Training the Model Next, I trained the model with the collected data. This part was challenging as I had to tweak parameters like sample rate and noise handling. Despite my efforts, the model sometimes detected a “Tak” when there was none and vice versa, though it handled “Doom” better. šŸ‹ļøā€ā™‚ļø

Step 6: Feature Explorer Visualising the features in the explorer gave me insights into how well the model was learning to differentiate the sounds. The scatter plot shows the clusters of “Doom” and “Tak,” albeit with some overlap due to noise issues. šŸ”

Screenshot of the Edge Impulse feature explorer showing a scatter plot with clusters of 'Doom' and 'Tak' sound features, with 'Doom' represented by blue dots and 'Tak' by orange dots. Processing time and peak RAM usage statistics are displayed below.

Step 7: Neural Network Architecture I configured the neural network with several convolutional layers to process the audio features. This architecture aimed to enhance the model’s ability to recognise intricate patterns in the sounds

Screenshot of the Edge Impulse neural network architecture configuration page. The setup includes layers such as reshape, 1D convolution, dropout, flatten, and an output layer for classifying 'Doom' and 'Tak' sounds. A 'Start training' button is visible at the bottom.

Step 8: Model Performance The final model showed an impressive accuracy of 100% on the validation set, but achieving this in real-world conditions proved tricky due to noise and timing issues. The confusion matrix highlights the model’s perfect score on the test data, but it still needs refinement for practical use šŸ™

The results from the model training, showing the performance of 100% accuracy from the limited test and training dataset.

Working on the “Doom” and “Tak” project with Edge Impulse is an enlightening experience. It pushed my boundaries and taught me the importance of fine-tuning parameters to handle real-world complexities. I am also struggling with sample rates  and timings for more accurate representation of a beat. While the journey is a little tough, especially dealing with noise and sample rate issues, the process is still rewarding and Iā€™m super excited to figure out how to solve those issues!

Machine learning is as much an art as it is a science, I will keep experimenting! šŸ’ŖšŸ½šŸ„³

< Back

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.