Introduction to Shrouk’s Studio!

The image shows Shrouk working intently at their colourful and creative electronics lab. Shrouk has an afro hairstyle and is wearing a brightly coloured dress which is mainly deep oranges and reds with intricate patterns. She is working on a printed circuit board with an oscilloscope, seemingly deeply focused on the detailed electronics work or experiment they are engaged in.

Shrouk's workspace has an artistic, energetic vibe, filled with various bright decorations and objects in oranges, pinks, blues and greens that match Shrouk's expressive and unique style. The photograph captures a moment of concentration as Shrouk applies their skills to an electronics project, with the vibrant atmosphere of the lab reflecting Shrouk's creative personality and approach. The setting suggests Shrouk is bringing an imaginative flair to their electronics work.

Welcome to the Shrouk El-Attar Digital Studio! I am a neurodivergent British-Egyptian artist, engineer, and creative technologist, aiming to create a digital space where cultural heritage meets cutting-edge technology ✨?

My studio specialises in creating interactive AI-driven art installations and non-binary belly dancing robots. I am super committed to decolonizing technology and making engineering accessible to underrepresented communities, including refugees and disabled individuals.

Thank you for taking the time to stop by and feel free to leave a comment! Let’s inspire and empower others to explore the fusion of creativity and technology, breaking down barriers and redefining the future of art and tech! ????

View post >

Playing Back Sounds at SPECIFIC Timestamps with JavaScript! ?

Hello! So this has been super cool and fun, hope you find it fun too! I’ve been working with JavaScript to make sure we are playing sounds at the exact right times!

Imagine you have two different sound patterns:

  • Doom (1-second pause) Doom
  • Doom (half a second pause) Doom

To make this work, we need to play these sounds back at just the right moments. We can use a bit of JavaScript magic to play sounds at specific timestamps. Check out this script:

Code snippet in a dark-themed editor showing HTML and JavaScript for a web page titled 'Play Sounds at Timestamps'. The HTML structure includes a button to play sounds, and the JavaScript code defines an array of sound data with timestamps and a function to play these sounds at specified times using the Audio API and setTimeout.
An accessible version of the code is printed in full at the bottom of the page

Breaking It Down

  1. HTML Structure: We have an HTML page with a button that says “Play Sounds.” When you click it, the playSounds() function runs.
  2. Sound Data: The soundData array lists our sounds (doom and tak real precaution sounds!) with their sources and the times they should play.
  3. playSounds Function: This function goes through each sound in soundData and uses setTimeout to play each one at the right time.

Our machine learning model first identifies when each sound occurs which we then use to export the timestamps to create our soundData array. Using JavaScript, we can now play the sounds exactly when the AI model hears them! Yay! ???


Here’s an accessible version of the code you can copy and paste if you’d like to give it a go yourself!

Click here if you want to copy and paste the code to try out for yourself! Show code
<!DOCTYPE html>
<html>
<head>
  <title>Play Sounds at Timestamps</title>
</head>
<body>
  <h1>Play Sounds at Timestamps</h1>
  <button onclick="playSounds()">Play Sounds</button>

  <script>
    const soundData = [
      { "src": "sound1.mp3", "timestamp": 1000 },
      { "src": "sound2.mp3", "timestamp": 3000 },
      { "src": "sound3.mp3", "timestamp": 5000 }
    ];

    function playSounds() {
      const audioElements = soundData.map(data => {
        const audio = new Audio(data.src);
        setTimeout(() => {
          audio.play();
        }, data.timestamp);
        return audio;
      });
    }
  </script>
</body>
</html>

View post >

AI RECOGNISING Timestamps! ⏰

Let’s get one thing straight! ☝?

Doom (1-second pause) Doom ?️⏳⏳?️

??‍♀️ is a DIFFERENT sound from

Doom (half a second pause) Doom ?️⏳?️

I need to play those back using real instruments at the right time. To achieve this, I have to do two things:

  1. Get the timestamps of the sound recognition from my machine learning model in a way that can interact with a web app.
  2. Play back the sounds at the correct intervals.

Today, I want to chat about the first part. ?

Once I run my machine learning model, it recognises and timestamps when sounds were performed. You can see this in the first photo below. The model shows the probabilities of recognising each sound, such as “doom”, “tak”, or “tak-a-tak”. ?

A dashboard showing detailed results of sound recognition. It displays a table with timestamps and probabilities for different sounds (background noise, doom, tak, tak-a-tak). On the right, there are two scatter plots visualizing processed features and spectral features, with color-coded data points representing different sound classifications.

Next, I need to export this model as a WebAssembly package so it can run in a web browser. This allows anyone to interact with my online doom/tak exhibition! ?

The deployment configuration screen for a machine learning model named "Doom Tak". It shows options for deploying the model as a WebAssembly package for web browsers. The interface includes a QR code and a "Launch in browser" button for testing the prototype.

In the second photo, you can see the deployment section where I configure the WebAssembly package. This makes the model run efficiently in web browsers without an internet connection, minimising latency.

Exporting the model is straightforward. Once exported, it generates a JavaScript model that can be used on a webpage. Now, I can call this entire machine learning model from a web app script to do whatever I want. 

Easy peasy lemon squeezy! ?

A close-up of the model optimization settings. It compares quantized (int8) and unoptimized (float32) versions of the model, showing metrics like latency, RAM usage, and flash memory requirements for different components (MFCC, Spectral Features, Classifier). The image also shows a "Build" button and indicates that the estimates are for a MacBook Pro 16" 2020 with an Intel Core i9 2.4GHz processor.

View post >

Understanding Spectral Analysis in my Doom Tak Machine Learning Model! ?

One of the big improvements I implemented to my machine learning model is using something called ✨  spectral analysis ✨ 

But what is Spectral Analysis? ?

Spectral analysis helps us break down sounds into their different parts. For example, when you hear a song, you can pick out the drums, guitar, and vocals right?

Well, spectral analysis does something similar by looking at the different frequencies in a sound which makes it easier for our model to tell different sounds apart! Woohoo!

Why Use Spectral Analysis? ??‍♀️

Unlike other methods that only look at how loud a sound is over time, spectral analysis gives us a detailed picture by showing us all the different frequencies! This helps our model recognise and separate sounds that might seem similar at first!

How We Use Spectral Analysis ??‍♀️

First, we get the sound data ready. This means making the audio signals more uniform and cutting the audio into smaller parts. Then, we use a tool called Fast Fourier Transform (FFT) to change the sound data from a time-based view to a frequency-based view. This lets us see all the different frequencies in the sound. After using FFT, we pick out important details from the frequency data to understand the sound better. 

We already use a method called MFCC (Mel-Frequency Cepstral Coefficients – check out my previous blog about it!) to get key features from sounds. By adding spectral analysis to MFCC, we make our model EVEN BETTER at recognising sounds! ?

It is still not perfect, but this has made my machine learning model much better at detecting and differentiating between doom and tak sounds!

Two graphs illustrating spectral analysis. Top: Time-domain waveform of a complex audio signal. Bottom: Frequency-domain spectrum showing the signal's component frequencies, demonstrating how spectral analysis breaks down sound into its constituent parts.

View post >

But how am I supposed to detect sounds close together?! ?

One issue I had with the previous machine learning model I built was the inability to add more than one sound directly after each other. I had to wait about 1 second before starting the next sound for the machine learning model to recognise it  (I know it doesn’t sound like much, but trust me – IT IS!)

That’s frustrating and not acceptable. Let’s fix this! ??

After extensive research, we identified that time series data is crucial for our needs. It can significantly improve our model’s performance. ?️

So, what did we do?

1. Made the Window Size Smaller ??

  • We adjusted the window size to be shorter. Why? Because the samples are brief, and we want it to be just long enough to capture the longest possible sound, which is “takatak” at 750 milliseconds.

2. Decreased the “Window Increase” Interval ?

  • We reduced the window increase interval to 100ms. Why? Because if the model only updates every half a second, it can only predict at those intervals. By making the window increase more gradual, the model can catch sounds more accurately and promptly.

3. Actual BIG BOSS: Spectral Analysis ??

  • Spectral analysis is awesome, she is the real game-changer here! Last time, we talked about using Mel-Frequency Cepstral Coefficients (MFCC), which are super cool. This time, we are also adding spectral analysis alongside MFCC. This extra step allows the model to examine the waveform itself to detect a “doom” or a “tak” based on its shape. More on Spectral Analysis in the next blog!
Diagram showing sliding window technique for time series analysis. Top: Timeline with colored dots representing data points. Bottom: Overlapping windows capturing subsets of data points over time.

View post >

What The Heck is an MFCC?!

Three time-series graphs with three headings off to the left of each graph. The first heading is "Time Domain Waveform" and the graph shows a typical waveform. The two underneath are "Spectrogram" and "MFCC Spectrogram" and show two graphs in a range of green, yellow, orange and red colours

MFCC?! Yuh, Mel-frequency cepstral coefficients – obvz! Errrm, what?! Listen, they may be the secret sauce in my Doom and Tak project ?Could they be key to my AI-powered bellydancing robot? Could they help me bring this project to life and solve my woes? Maybe!

So, what makes MFCCs so special for analysing rhythmic sounds? Well, it all starts with how they mimic our human hearing. MFCCs are like the ears of AI models, capturing the way we perceive sound by focusing on the most crucial frequencies in vocal sounds. This means they MIGHT be brilliant at picking up the nuances in my “Doom” and “Tak” Egyptian rhythm sounds, making sure the robot dances perfectly to the beat.

Rhythmic sounds, as you know, are all about those delightful patterns and beats. MFCCs don’t just capture the sound; they capture the essence of the rhythm. By breaking down the audio into tiny, manageable bits, they allow the AI to understand the energy and timing in each beat of “Doom” and “Tak.” Sounds like it might be exactly what I need!

Another reason MFCCs are the thing I’m experimenting with at the moment is their efficiency. They condense all the important information from the audio into a compact form, making it easier and faster for the AI to process. This means quicker training times and more responsive performance, which is exactly what you need when you’re trying to get a robot to dance along!

One of the biggest challenges in audio analysis is dealing with background noise and variations in loudness. Thankfully, MFCCs seem to be potentially robust against these issues. Theoretically, they should maintain their accuracy even when there’s a bit of chaos in the background.

Stay tuned while I investigate! ?

Over and Out ?

View post >

Tuning the Beats! (with AI obvz)

I’m excited to share my challenging experience working on my latest project using Edge Impulse. This time, I was on a mission to train a machine learning model to recognise and respond to two core sounds in Egyptian percussion: “Doom” and “Tak.” Spoiler alert: it was a lot tougher than my “Hey Shrouk” project! 

Step 1: Data Collection I started by collecting data for the “Doom” and “Tak” sounds. This involved recording multiple samples of each sound.

Edge Impulse data collection interface with options to set label, length, and category. A blue 'Start recording' button is centered, with 'Audio captured with current settings: 0s' displayed below.

Step 2: Dataset Overview With 41 seconds of audio data collected, I split it into training and test sets. This step is for validating the model later on. Each sound (“Doom” and “Tak”) had its dedicated samples labelled for effective training. 

Edge Impulse dataset overview showing 41s of collected data, 73% train/27% test split, and a list of 'tak' audio samples with timestamps.
Edge Impulse dataset overview showing 41s of collected data, 73% train/27% test split, and a list of 'doom' audio samples with timestamps.

Step 3: Impulse Design Designing the impulse involved setting up time series data, adding a processing block, and configuring the learning block. This setup is the backbone of how the model will interpret and classify the sounds! ?

Edge Impulse interface for configuring impulse design, showing time series data settings for audio input with options to add processing and learning blocks.
Edge Impulse feature extraction options, highlighting Audio (MFCC) and Audio (MFE) methods for processing voice and non-voice audio signals respectively.
Edge Impulse interface showing configured impulse design for audio classification, with time series data, Audio (MFE) processing, and Classification blocks set up to distinguish between 'doom' and 'tak' sounds.
Edge Impulse interface showing feature generation for a machine learning model. Training set contains 30s of data with 2 classes (doom, tak) and 30 training windows. A 'Generate features' button is prominent.

Step 4: Feature Generation Here, I used the Audio (MFE) block to extract Mel-filterbank energy features from the audio signals. These features are essential for distinguishing between “Doom” and “Tak.” This step felt like giving the model its auditory senses ?

Console output showing successful feature generation process in Edge Impulse. Steps include dimension reduction, embedding construction, and output file writing, completed in about 9 seconds.

Step 5: Training the Model Next, I trained the model with the collected data. This part was challenging as I had to tweak parameters like sample rate and noise handling. Despite my efforts, the model sometimes detected a “Tak” when there was none and vice versa, though it handled “Doom” better. ?️‍♂️

Step 6: Feature Explorer Visualising the features in the explorer gave me insights into how well the model was learning to differentiate the sounds. The scatter plot shows the clusters of “Doom” and “Tak,” albeit with some overlap due to noise issues. ?

Screenshot of the Edge Impulse feature explorer showing a scatter plot with clusters of 'Doom' and 'Tak' sound features, with 'Doom' represented by blue dots and 'Tak' by orange dots. Processing time and peak RAM usage statistics are displayed below.

Step 7: Neural Network Architecture I configured the neural network with several convolutional layers to process the audio features. This architecture aimed to enhance the model’s ability to recognise intricate patterns in the sounds

Screenshot of the Edge Impulse neural network architecture configuration page. The setup includes layers such as reshape, 1D convolution, dropout, flatten, and an output layer for classifying 'Doom' and 'Tak' sounds. A 'Start training' button is visible at the bottom.

Step 8: Model Performance The final model showed an impressive accuracy of 100% on the validation set, but achieving this in real-world conditions proved tricky due to noise and timing issues. The confusion matrix highlights the model’s perfect score on the test data, but it still needs refinement for practical use ?

The results from the model training, showing the performance of 100% accuracy from the limited test and training dataset.

Working on the “Doom” and “Tak” project with Edge Impulse is an enlightening experience. It pushed my boundaries and taught me the importance of fine-tuning parameters to handle real-world complexities. I am also struggling with sample rates  and timings for more accurate representation of a beat. While the journey is a little tough, especially dealing with noise and sample rate issues, the process is still rewarding and I’m super excited to figure out how to solve those issues!

Machine learning is as much an art as it is a science, I will keep experimenting! ???

View post >

Hey Shrouk! Building My AI Voice-Activated Assistant!

In my quest to make my belly dancing robot recognise percussion sounds, I started with a more generic AI sound recognition exercise. And it was super exciting! “Hey Siri”? Nah Uh – Hey Shrouk! ?Let me take you on a journey through my experience of building a model using Edge Impulse that responds to my voice saying, “Hey Shrouk!”

Step 1: Collecting Data To kick things off, I needed to gather some data. Edge Impulse made this super easy! I recorded myself saying “Hey Shrouk” multiple times. The interface even had a cool progress ring to show when I was collecting samples. With just a few minutes of audio data, I was all set. ?

Edge Impulse interface for sound recognition model creation. Screen shows "[3/10] Collecting some data" with instructions to record "Hey Shrouk" samples. A colorful circular progress bar indicates 0 seconds of recording remaining.

Step 2: Data Collection Complete Once I had enough samples, Edge Impulse gave me the green light with a big, friendly checkmark. LEVEL UP! ✔️

Edge Impulse interface showing completion of data collection. A green checkmark in a circle is displayed above the text "Great! We have recorded enough samples." An "OK" button is centered below.

Step 3: Designing the Impulse Next, I moved on to designing the impulse. Edge Impulse takes raw data, applies signal processing, and uses a learning block to classify new data. It sounds complicated, but let’s walk through it together! ?‍♂️

Edge Impulse interface showing step 5 of 10: "Designing your impulse". It explains the concept of an 'impulse' for detecting "Hey Shrouk", mentioning MFE signal processing and neural network classifier. Options to quit or proceed to spectrograms are visible.

Step 4: Generating Spectrograms To make sense of the audio data, I converted it into spectrograms. This step highlights interesting frequencies and reduces the amount of data, making it easier for the model to understand my voice ?

Edge Impulse interface showing step 6 of 10: "Spectrograms". It explains the role of signal processing in impulse creation, mentioning de-noising, frequency highlighting, and data reduction. Options to quit or proceed to generate features are visible.

Step 5: Raw Data Visualization Here’s a glimpse of the raw audio data I collected. It’s like looking at the heartbeat of my voice representing every “Hey Shrouk” I recorded! ?

Edge Impulse interface showing raw audio data visualization for "Hey Shrouk.26" sample. The graph displays audio amplitude over time (0-990ms), with significant waveform activity visible between 350-700ms. A playback control is present below the graph.

Step 6: DSP Results The Digital Signal Processing (DSP) results. This step helped the AI model differentiate between my voice and background noise ?

This image shows a spectrogram representing the Mel Energies (DSP Output) from audio processing. The spectrogram displays frequencies over time, with colors ranging from blue (low energy) to red (high energy). Vertical patterns indicate distinct sound events, corresponding to utterances of "Hey Shrouk" captured during data collection for the voice recognition model.

Step 7: FFT Bin Weighting Next up was the FFT Bin Weighting. This visual shows how the model processes different frequencies in my voice! 

This image shows the FFT (Fast Fourier Transform) Bin Weighting visualization in Edge Impulse. It displays a curved line graph with colors transitioning from red (low frequencies) to blue (high frequencies). Below is a color scale bar and a snippet of processed feature values. This graph represents how different frequency components are weighted in the audio processing, which is crucial for voice recognition model training.

Step 8: Tuning Feature Parameters I fine-tuned parameters like frame length, frame stride, and filter number. These settings ensure that the model accurately captures the nuances of my voice by changing the size of the sample (i.e. recording!) and how much time we skip forward in the audio recording in each pass!

This image shows the parameter settings for audio processing in Edge Impulse. It displays raw feature values labeled as "hey_shrouk" and various adjustable parameters for Mel-filterbank energy features. These include frame length, frame stride, filter number, FFT length, frequency ranges, and noise floor. The interface allows for manual adjustment or autotuning of these parameters, which are crucial for optimizing the voice recognition model's performance.

Step 9: Exploring Features The Feature Explorer gave me a visual representation of all the data features. Seeing the clusters of “Hey Shrouk” data separated from noise was like finding order in chaos! No model is 100% accurate though and we can see a few “Hey Shrouk” outliers that have snuck into the noise and unknown data cluster. ?

The Feature Explorer shows a scatter plot of audio samples. Most "hey_shrouk" samples (blue dots) are clustered separately from noise and unknown samples (orange and green dots), which are intermixed. This visual separation indicates the model is distinguishing the target phrase from other sounds, though some overlap exists.

Step 10: Training the Model Finally, it was time to train the neural network. Edge Impulse showed me the performance metrics, including accuracy, loss, and a confusion matrix. I was excited to see a high accuracy rate of 95.9%! ?

This image shows the model's performance metrics after training. The overall accuracy is 95.9% with a loss of 0.15. A confusion matrix displays the classification results for "hey_shrouk", noise, and unknown categories. The model correctly identifies "hey_shrouk" 93.3% of the time, noise 97.2%, and unknown sounds 95.2%. F1 scores for each category are also provided, all above 0.93, indicating strong performance across all classes.

Creating a voice-activated assistant with Edge Impulse was an amazing experience! The platform’s user-friendly interface and powerful tools made it easy and fun to bring my project to life. If you’re into AI, machine learning, or just love tinkering with tech, I highly recommend giving Edge Impulse a try. Who knows what awesome projects you’ll come up with next? ??✨

View post >

Dancing for Freedom: Honouring Sarah Hegazi

A photo of Sarah Hegazi smiling widely and holding a rainbow flag aloft at a crowded nighttime outdoor event, with stadium lights visible in the background.

When I dance, I always dance with a rainbow flag. The same flag that Sarah Hegazi was imprisoned, tortured, and eventually died for. When I dance, I dance as a protest. I dance in all my queerness. I dance in all my Egyptianess. I will not apologise for who I am. I will not apologise for existing.

The Bellydancing Robot: A Blend of Art and Technology

As a neurodivergent, British-Egyptian queer refugee woman in engineering, I am vastly underrepresented in this field. The engineering industry can be incredibly challenging and often feels unwelcoming for people like me. Despite my love for engineering, the lack of diversity and representation makes it a difficult environment. 

Representation is crucial. When people see someone like me succeeding in engineering and combining it with art, it opens doors and inspires others. It challenges the stereotypes and norms that have long dominated this field. It is important to showcase that diversity in engineering leads to innovation and richer, more inclusive technological advancements.

The bellydancing robot project began as a way to explore how robotics could be used in art as a form of resistance against the systemic barriers that exist in my industry. Using my engineering skills and love for dance, I designed a robot that can perform bellydancing moves. It celebrates my cultural heritage.

However, the term “belly dance” is a Western name that has colonised a rich and diverse tradition. The dance I integrate into my robot’s performances is rooted in Egyptian culture, not the often exoticised version seen in Western media. This project is my way of reclaiming and decolonising this art form, highlighting its true origins and cultural significance.

Sarah Hegazi: A Symbol of Courage and Resistance

Sarah Hegazi was an Egyptian LGBTQ+ activist who became a symbol of courage and defiance. In 2017, she was arrested and tortured for raising a rainbow flag at a concert in Cairo. Despite her suffering, Sarah continued to fight for LGBTQ+ rights until her death in 2020. Her story is one of bravery and a strong commitment to justice and equality. Sarah was also my friend.

Combining Technology and Advocacy

The bellydancing robot is a way for me to honour Sarah Hegazi’s legacy. Just as Sarah used her voice to fight for the marginalised, the robot uses dance to promote messages of inclusivity and resilience. By including LGBTQ+ themes in the performances, I aim to raise awareness and create a sense of solidarity within the community.

A Tribute to Sarah Hegazi

Sarah died for my right to wave this flag. I will exercise this right every single chance I get. It’s been almost four years since Sarah died, and at times, it still doesn’t feel real. I met Sarah in 2016 through a mutual friend. We were part of a secret LGBT+ Egyptian group. Sarah always encouraged me to visit Egypt again, saying it was getting safer for LGBTQ+ people.

Then, one night in 2017, at a Mashrou3 Leila concert in Cairo, she waved a rainbow flag. For a few hours, I thought she was right—Egypt was getting safer. But soon, terror swept through our community. People disappeared, and Sarah was taken by the Egyptian secret services.

We later found out she was held in solitary confinement under terrorism charges. She was tortured with electric shocks, beaten, and sexually assaulted. When she was released, activists helped her escape to Canada. Despite the trauma, Sarah continued to support our community.

Why Vital Capacities Is So Important

Vital Capacities is crucial because it merges accessibility with artistic innovation. It provides a unique platform where I can grow and contribute. The residency’s commitment to breaking down barriers and fostering diverse artistic expressions aligns perfectly with my goals to decolonise technology and make engineering more inclusive.

Integrating AI into the Bellydancing Robot

My residency with Vital Capacities is about enhancing my ongoing project by integrating Artificial Intelligence into the bellydancing robot. The aim is to create a dynamic installation that not only performs Egyptian dance but interacts with audiences through AI. It will recognise and respond to a mix of ‘doom’ and ‘tak,’ the core sounds of Egyptian percussion, and choreograph dance movements in real-time.

Integrating AI into my work with the bellydancing robot will be transformative. It will elevate my artistic practice from static performances to interactive, responsive shows where each audience interaction is unique. This technology not only broadens the scope of my performances but also deepens my engagement with audiences, creating a truly immersive experience.

Forever

Sarah’s life and activism have greatly influenced my work. Through the bellydancing robot, I continue her legacy by creating a platform that entertains, educates, and inspires. Each performance is a reminder of the courage needed to stand up against injustice and a tribute to those who fight for equality and human rights.

Next time you wave a rainbow flag, please remember that Sarah Hegazi died for your right to do so. Wear that flag as a badge of honour. Pride is a protest. Our mere existence is an act of defiance. Forever, in your name, Sarah. ?✊??️‍?

View post >

Making posts accessible with AI ?

As part of my artist residency exploring AI and inclusivity, I’ve been utilising the AI assistant Claude to make my visual content more accessible. One simple yet impactful way I’m doing this is by having Claude generate alt-text for the images I share. ?️


Alt-text is a written description of an image that gets read aloud by screen readers, enabling people with sight and vision impairments to understand the content of visual media. It’s an important part of making the web more accessible and inclusive. ✨


To create the alt-text, I simply upload an image to my conversation with Claude and ask it to provide a detailed description of what the image depicts. For example, for the photo of me in the Welcome the the Studio post, Claude created the text below (with a tiny few tweaks and edits from me first!):


“The image shows Shrouk working intently at their colourful and creative electronics lab. Shrouk has an afro hairstyle and is wearing a brightly coloured dress which is mainly deep oranges and reds with intricate patterns. She is working on a printed circuit board with an oscilloscope, seemingly deeply focused on the detailed electronics work or experiment they are engaged in.

Shrouk’s workspace has an artistic, energetic vibe, filled with various bright decorations and objects in oranges, pinks, blues and greens that match Shrouk’s expressive and unique style. The photograph captures a moment of concentration as Shrouk applies their skills to an electronics project, with the vibrant atmosphere of the lab reflecting Shrouk’s creative personality and approach. The setting suggests Shrouk is bringing an imaginative flair to their electronics work.”


Descriptions like this capture the key details in a clear, concise manner. Using an AI assistant to quickly produce high-quality alt-text at scale can make the process of adding this important accessibility element much easier. What used to be a time-consuming manual task can now be largely automated. ?


This is just one example of how AI tools can be harnessed to make content more inclusive. I encourage other creators to explore using AI to generate alt-text and to continue to come up with innovative ways to leverage emerging technologies in service of accessibility. Together, we can make strides toward a more inclusive digital world. ? ?️ ?

View post >