I’m excited to share my challenging experience working on my latest project using Edge Impulse. This time, I was on a mission to train a machine learning model to recognise and respond to two core sounds in Egyptian percussion: “Doom” and “Tak.” Spoiler alert: it was a lot tougher than my “Hey Shrouk” project!
Step 1: Data Collection I started by collecting data for the “Doom” and “Tak” sounds. This involved recording multiple samples of each sound.
Step 2: Dataset Overview With 41 seconds of audio data collected, I split it into training and test sets. This step is for validating the model later on. Each sound (“Doom” and “Tak”) had its dedicated samples labelled for effective training.
Step 3: Impulse Design Designing the impulse involved setting up time series data, adding a processing block, and configuring the learning block. This setup is the backbone of how the model will interpret and classify the sounds! ?
Step 4: Feature Generation Here, I used the Audio (MFE) block to extract Mel-filterbank energy features from the audio signals. These features are essential for distinguishing between “Doom” and “Tak.” This step felt like giving the model its auditory senses ?
Step 5: Training the Model Next, I trained the model with the collected data. This part was challenging as I had to tweak parameters like sample rate and noise handling. Despite my efforts, the model sometimes detected a “Tak” when there was none and vice versa, though it handled “Doom” better. ?️♂️
Step 6: Feature Explorer Visualising the features in the explorer gave me insights into how well the model was learning to differentiate the sounds. The scatter plot shows the clusters of “Doom” and “Tak,” albeit with some overlap due to noise issues. ?
Step 7: Neural Network Architecture I configured the neural network with several convolutional layers to process the audio features. This architecture aimed to enhance the model’s ability to recognise intricate patterns in the sounds
Step 8: Model Performance The final model showed an impressive accuracy of 100% on the validation set, but achieving this in real-world conditions proved tricky due to noise and timing issues. The confusion matrix highlights the model’s perfect score on the test data, but it still needs refinement for practical use ?
Working on the “Doom” and “Tak” project with Edge Impulse is an enlightening experience. It pushed my boundaries and taught me the importance of fine-tuning parameters to handle real-world complexities. I am also struggling with sample rates and timings for more accurate representation of a beat. While the journey is a little tough, especially dealing with noise and sample rate issues, the process is still rewarding and I’m super excited to figure out how to solve those issues!
Machine learning is as much an art as it is a science, I will keep experimenting! ???