Hey Shrouk! Building My AI Voice-Activated Assistant!

In my quest to make my belly dancing robot recognise percussion sounds, I started with a more generic AI sound recognition exercise. And it was super exciting! “Hey Siri”? Nah Uh – Hey Shrouk! ?Let me take you on a journey through my experience of building a model using Edge Impulse that responds to my voice saying, “Hey Shrouk!”

Step 1: Collecting Data To kick things off, I needed to gather some data. Edge Impulse made this super easy! I recorded myself saying “Hey Shrouk” multiple times. The interface even had a cool progress ring to show when I was collecting samples. With just a few minutes of audio data, I was all set. ?

Edge Impulse interface for sound recognition model creation. Screen shows "[3/10] Collecting some data" with instructions to record "Hey Shrouk" samples. A colorful circular progress bar indicates 0 seconds of recording remaining.

Step 2: Data Collection Complete Once I had enough samples, Edge Impulse gave me the green light with a big, friendly checkmark. LEVEL UP! ✔️

Edge Impulse interface showing completion of data collection. A green checkmark in a circle is displayed above the text "Great! We have recorded enough samples." An "OK" button is centered below.

Step 3: Designing the Impulse Next, I moved on to designing the impulse. Edge Impulse takes raw data, applies signal processing, and uses a learning block to classify new data. It sounds complicated, but let’s walk through it together! ?‍♂️

Edge Impulse interface showing step 5 of 10: "Designing your impulse". It explains the concept of an 'impulse' for detecting "Hey Shrouk", mentioning MFE signal processing and neural network classifier. Options to quit or proceed to spectrograms are visible.

Step 4: Generating Spectrograms To make sense of the audio data, I converted it into spectrograms. This step highlights interesting frequencies and reduces the amount of data, making it easier for the model to understand my voice ?

Edge Impulse interface showing step 6 of 10: "Spectrograms". It explains the role of signal processing in impulse creation, mentioning de-noising, frequency highlighting, and data reduction. Options to quit or proceed to generate features are visible.

Step 5: Raw Data Visualization Here’s a glimpse of the raw audio data I collected. It’s like looking at the heartbeat of my voice representing every “Hey Shrouk” I recorded! ?

Edge Impulse interface showing raw audio data visualization for "Hey Shrouk.26" sample. The graph displays audio amplitude over time (0-990ms), with significant waveform activity visible between 350-700ms. A playback control is present below the graph.

Step 6: DSP Results The Digital Signal Processing (DSP) results. This step helped the AI model differentiate between my voice and background noise ?

This image shows a spectrogram representing the Mel Energies (DSP Output) from audio processing. The spectrogram displays frequencies over time, with colors ranging from blue (low energy) to red (high energy). Vertical patterns indicate distinct sound events, corresponding to utterances of "Hey Shrouk" captured during data collection for the voice recognition model.

Step 7: FFT Bin Weighting Next up was the FFT Bin Weighting. This visual shows how the model processes different frequencies in my voice!

This image shows the FFT (Fast Fourier Transform) Bin Weighting visualization in Edge Impulse. It displays a curved line graph with colors transitioning from red (low frequencies) to blue (high frequencies). Below is a color scale bar and a snippet of processed feature values. This graph represents how different frequency components are weighted in the audio processing, which is crucial for voice recognition model training.

Step 8: Tuning Feature Parameters I fine-tuned parameters like frame length, frame stride, and filter number. These settings ensure that the model accurately captures the nuances of my voice by changing the size of the sample (i.e. recording!) and how much time we skip forward in the audio recording in each pass!

This image shows the parameter settings for audio processing in Edge Impulse. It displays raw feature values labeled as "hey_shrouk" and various adjustable parameters for Mel-filterbank energy features. These include frame length, frame stride, filter number, FFT length, frequency ranges, and noise floor. The interface allows for manual adjustment or autotuning of these parameters, which are crucial for optimizing the voice recognition model's performance.

Step 9: Exploring Features The Feature Explorer gave me a visual representation of all the data features. Seeing the clusters of “Hey Shrouk” data separated from noise was like finding order in chaos! No model is 100% accurate though and we can see a few “Hey Shrouk” outliers that have snuck into the noise and unknown data cluster. ?

The Feature Explorer shows a scatter plot of audio samples. Most "hey_shrouk" samples (blue dots) are clustered separately from noise and unknown samples (orange and green dots), which are intermixed. This visual separation indicates the model is distinguishing the target phrase from other sounds, though some overlap exists.

Step 10: Training the Model Finally, it was time to train the neural network. Edge Impulse showed me the performance metrics, including accuracy, loss, and a confusion matrix. I was excited to see a high accuracy rate of 95.9%! ?

This image shows the model's performance metrics after training. The overall accuracy is 95.9% with a loss of 0.15. A confusion matrix displays the classification results for "hey_shrouk", noise, and unknown categories. The model correctly identifies "hey_shrouk" 93.3% of the time, noise 97.2%, and unknown sounds 95.2%. F1 scores for each category are also provided, all above 0.93, indicating strong performance across all classes.

Creating a voice-activated assistant with Edge Impulse was an amazing experience! The platform’s user-friendly interface and powerful tools made it easy and fun to bring my project to life. If you’re into AI, machine learning, or just love tinkering with tech, I highly recommend giving Edge Impulse a try. Who knows what awesome projects you’ll come up with next? ??✨

Hey Shrouk! Building My AI Voice-Activated Assistant!

Leave a Reply Cancel reply

Join our mailing list

Follow us