Artful Image Processing and Algorithmic Drawing

Having set up the object detection pipeline, I proceeded to the image processing stage, where the raw visual data and the model’s outputs are transformed into a more artistic representation. My process involved several key steps. First, I applied edge detection algorithms to the video frames. This technique identifies points in a digital image where the brightness changes sharply, effectively outlining the shapes and contours of objects in the scene. Next, I inverted the black and white colour, creating a stark, high-contrast visual style. Finally, I took the bounding boxes generated by the YOLO detection model and redrew them onto this processed image. This layering of machine perception over a stylised version of reality creates a compelling visual dialogue between the actual scene and the AI’s interpretation of it.

Processing Perceptions with YOLO and the COCO Dataset

With the video pipeline established, I turned my attention to processing the visual data using a combination of powerful tools. The core of this stage is the YOLO (You Only Look Once) object detection model. YOLO is a state-of-the-art, real-time object detection system that identifies and classifies objects in a single pass of an image, making it incredibly fast and efficient. For this project, I am intentionally using the model with the pre-trained COCO (Common Objects in Context) dataset. The COCO dataset is a large-scale collection of images depicting common objects in everyday scenes and is a standard benchmark for training and evaluating computer vision models.

My goal is not to achieve flawless object recognition but rather to play with the inherent “mistakes” and misinterpretations the machine makes. The default COCO dataset is perfectly suited for this, as its generalised training can lead to incorrect predictions when applied to novel or ambiguous scenes. To manipulate the image data, which is essentially a collection of pixels, I am using NumPy (Numerical Python). NumPy is a fundamental library for scientific computing in Python that allows for efficient manipulation of large, multi-dimensional arrays and matrices—the very structure that represents digital images.

What is Object Detection?

Object detection is a field of computer vision and image processing concerned with identifying and locating instances of objects within images and videos. Unlike simple image classification, which assigns a single label to an entire image, object detection models draw bounding boxes around each detected object and assign a class label to it, providing more detailed information about the scene.

What are NumPy and the COCO Dataset?

  • NumPy: A Python library that provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. In image processing, an image is treated as a 3D array (height, width, colour channels), making NumPy an indispensable tool for any pixel-level manipulation.
  • COCO Dataset: Standing for “Common Objects in Context,” this is a massive dataset designed for object detection, segmentation, and captioning tasks. It contains hundreds of thousands of images with millions of labelled object instances across 80 “thing” categories and 91 “stuff” categories, providing a rich foundation for training computer vision models.

Objects Detectable by the COCO Dataset:

The COCO dataset can identify 80 common object categories, including:

  • People: person
  • Vehicles: bicycle, car, motorcycle, airplane, bus, train, truck, boat
  • Outdoor: traffic light, fire hydrant, stop sign, parking meter, bench
  • Animals: bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe
  • Accessories: backpack, umbrella, handbag, tie, suitcase
  • Sports: frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket
  • Kitchen: bottle, wine glass, cup, fork, knife, spoon, bowl
  • Food: banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake
  • Furniture: chair, couch, potted plant, bed, dining table, toilet
  • Electronics: tv, laptop, mouse, remote, keyboard, cell phone
  • Appliances: microwave, oven, toaster, sink, refrigerator
  • Indoor: book, clock, vase, scissors, teddy bear, hair drier, toothbrush

Weaving a World Model with Reinforcement Learning Concepts

With a large dataset of generated policies, the next step is to import them back into the primary software application that displays the 360-degree video. This integration allows the dynamically generated rules to influence the visual output or behaviour of the system in real-time. My use of the term “policy” is a deliberate nod to its origins in the field of Reinforcement Learning (RL), a concept dating back to the 1990s. In RL, a policy is the strategy an agent employs to make decisions and take actions in its environment. It is the core component that dictates the agent’s behaviour as it learns through trial and error to maximise cumulative reward. By generating policies based on visual input, my system is, in a sense, creating its own world model—a simplified, learned representation of its environment and the relationships within it. This process echoes the fundamental principles of how an AI agent learns to react to and make sense of the real world, a topic I have delved into in more detail in some of my earlier writings.

Generating AI Policies from Object Proximity

In a more experimental turn, I developed a separate piece of software to explore the concept of emergent behaviour based on the object detection output. This program uses a Large Language Model (LLM) to generate “policies” when objects from the COCO dataset are detected in close proximity on the screen. The system calculates the normalised distance between the bounding boxes of detected objects. This distance value is then fed to the LLM, which has been prompted to generate a policy or rule based on the perceived danger or interaction potential of the objects being close together. For instance, if a “person” and a “car” are detected very close to each other, the LLM might generate a high-alert policy, whereas a “cup” and a “dining table” would result in a benign, functional policy. This creates a dynamic system where the AI is not just identifying objects, but also creating a narrative or a set of rules about their relationships in the environment.

Setting the software with 360-Degree Vision

The initial phase of this project involved tackling the technical groundwork required to process 360-degree video. I began by using OpenCV, a powerful open-source computer vision library, to stitch together the two separate video feeds from my 360-degree camera. OpenCV is an essential tool for real-time image and video processing, providing the necessary functions to merge the hemispheric views into a single, equirectangular frame. After successfully connecting the camera to my computer, I set up a basic Python workspace within my integrated development environment (IDE). The next step was to write a script that could access the camera’s video stream and display it in a new window, confirming that the foundational hardware and software were communicating correctly. This setup provides the visual canvas upon which the subsequent layers of AI-driven interpretation will be built.

Embracing the Algorithmic Uncanny

I am revisiting a creative process that has captivated my interest for some time: enabling an agent to perceive and learn about its environment through the lens of a computer vision model. In a previous exploration, I experimented with CLIP (Contrastive Language-Image Pre-Training), which led to the whimsical creation of a sphere composed of text, a visual representation of the model’s understanding. This time, however, my focus shifts to the YOLO (You Only Look Once) model. My prior experiences with YOLO, using the default COCO dataset, often yielded amusingly incorrect object detections—a lamp mistaken for a toilet, or a cup identified as a person’s head. Instead of striving for perfect accuracy, I intend to embrace these algorithmic errors. This project will be a playful exploration of the incorrectness and the fascinating illusions generated by an AI model, turning its faults into a source of creative inspiration.

* visualization using CLIP and Blender for artwork “Golem Wander in Crossroads”

Ultralytics YOLO

https://docs.ultralytics.com

Ultralytics YOLO is a family of real-time object detection models renowned for their speed and efficiency. Unlike traditional models that require multiple passes over an image, YOLO processes the entire image in a single pass to identify and locate objects, making it ideal for applications like autonomous driving and video surveillance. The architecture divides an image into a grid, and each grid cell is responsible for predicting bounding boxes and class probabilities for objects centered within it. Over the years, YOLO has evolved through numerous versions, each improving on the speed and accuracy of its predecessors.
(Text from Gemini-2.5-Pro and edited by artist)

CLIP

https://github.com/openai/CLIP

CLIP (Contrastive Language-Image Pre-Training), developed by OpenAI, is a neural network that learns visual concepts from natural language descriptions. It consists of two main components: an image encoder and a text encoder, which are trained jointly on a massive dataset of 400 million image-text pairs from the internet. This allows CLIP to create a shared embedding space where similar images and text descriptions are located close to one another. A key capability of CLIP is “zero-shot” classification, meaning it can classify images into categories it wasn’t explicitly trained on, simply by providing text descriptions of those categories.

(Text from Gemini-2.5-Pro and edited by artist)

COCO

https://cocodataset.org/#home

https://docs.ultralytics.com/datasets/detect/coco

COCO (Common Objects in Context), is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.

(Text from Ultralytics YOLO Docs)

Return to Earth: Buzz Aldrin’s Autobiography

A book standing spine-up on a dark cherry wood surface. It is old and beaten up, worn-out blue, with faded gold lettering that just about reads: RETURN TO EARTH. More text underneath and to the side are illegible. It casts a monolithic shadow in front of itself.

I read Buzz Aldrin’s autobiography ‘Return to Earth’ in 2020/1 as part of my research for The Siren of the Deep – a solo show I did at Eastside Projects in Birmingham. The arc of it has stayed with me ever since, and is a part of this work too, so I’ve come back to it now. He describes his whole life leading up to the moon landing – achieving excellence in every stage of the specific path he’d cut out for himself (or sometimes by others), until this extreme pinnacle of achievement – something no-one else had ever done before himself and Niel Armstrong – a transcendence so astounding it had previously barely been imaginable. But then what? This is the bit I’m interested in. He ended up in a psychiatric hospital, because the depression that followed was too much to cope with. Because, what do you do after you’ve landed on the moon? 

I think this is the hole that apophany fills: the absence of epiphany.

An open book. On the left page is a black and white photo of an astronaut in their suit. In the reflection of their helmet is the sketchy figure of someone else, or perhaps just some equipment. The horizon in the reflection and behind the astronaut is black, meeting the edge of the moon they’re standing on. On the right page is text in a dated font which reads ‘RETURN TO EARTH, Colonel Edwin E. “Buzz” Aldrin, Jr., with Wayne Warga, Random House, New York.’ A white hand holds the book open, a milky painted thumbnail pins down the left page at the bottom. The book is resting on a white surface.

The Dream Pool Backrooms

A computer generated image of a room covered in white tiles, from ceiling to walls to floor. In its centre is a circular hole in the ceiling through which a pillar extends, and a staircase spirals around. The room is half full of turquoise water. In the background, a tunnel leads elsewhere.

Since I went to Bath, I’ve been dreaming about its backrooms. Divots in sandy stone filled with green water that I can slide my arm into, an exact fit. Semi-organic cave networks getting darker and deeper, pools of water deep and shallow suspended and plunging at different levels.

Do you know about The Backrooms? They’re this concept of liminal spaces, usually devoid of living beings, that you can enter by leaving reality. There’s a Dream Pool genre of Backrooms that I’ve been following for a while, and earlier this year a game: Dreamcore, came out which holds a lot of these to explore. There’s something cyclical about recognising these as a pocket of your subconscious mind, then taking the images back into your sleep…

Another entirely tiled room. These tiles are pastel pink, and in the centre of the room is a hexagon shaped pool full of water. The room follows its shape and two doorways are opposite us, on two of the sides of the shape. One leads to darkness, it looks like a stairway downstairs. The other has a staircase upstairs, and an orange rectangular light from an unknown source illuminates it.
Another tiled space, this one dark and gloomy. It's the corner of a swimming pool from above, where the metal ladder leads into the water. It plunges down into an unrealistic depth, mysterious and dark.
A room in a 1960s minimalist style. An expanse of plain floor and ceiling space, marked only by the reflections of ripples of an oblong pool on the left. These reflections also pick out a large sphere at the back of the room, a plain sofa on the right, and some more ornamental sphere shapes, much smaller. Above the pool is a black expanse, perhaps a night sky, perhaps a blank ceiling.
A sunlit white tiled interior. The walls curve in a wavy line, a long thin pool following them, and a pillar that seems architecturally functionally useless. At least, according to waking logic.
A dark underground space, all tiled. It looks like a subway or somewhere corporate and abandoned - but only recently - a still living plant in a plant box at the right. The kind you find outside an office block. A stairway upwards leads to darkness, and an ankle-deep measure of water pervades throughout, strangely purple, showing ripples of movement from nothing we can see capable of motion. It seems lit by a torch, whilst the edges recede into blackness.
A dark white tiled space with a winding thin pool lit by spotlights from above. A rubber ring floats on its surface next to a metal ladder leading into it.
A very creepy pool interior. An abandoned public swimming pool - a colourful spiral flume at the left, and a converted industrial looking ceiling with institutional lighting. But they don't fully light the space, it's a bit too dark to be comfortable, and unnervingly misty.
A sunlit white tiled interior, half full of turquoise water. Two doorways on different wall in front of us fall back into a never-ending series of doorways behind them.

More moments

A hand hangs out of the car window and is reflected in the wing mirror. The road behind doesn't look too busy.
A hand hangs out of the car window and is reflected in the wing mirror. The fingers are stretched out as if trying to reach or perhaps gently asking for attention. The road behind gets busier. There is a double decker bus and lorry approaching.
A hand hangs out of the car window and is reflected in the wing mirror. The fingers are straight but relaxed as if trying to gently asking for attention or maybe feeling a breeze. The road behind gets busier. There is a double decker bus and the lorry passes by.

Sometimes accidents can be so difficult to recreate. Sometimes in trying to recreate them, they can lead to new things. I loved the gentleness of the hand compared to all the noise and traffic in the background. Not sure if the gesture is quite what I’d like it to be yet.

Lullaby

Rebekah Ubuntu has been encouraging us to consider the process of this residency to be ‘the thing’. A big part of my process has been working around looking after my son, so to mark this here is a lullaby that I’ve been singing to him since he was born, recorded on my phone a while ago so he can hear it even on the rare occasion I’m not there.

Video Description: The subtitles of the song are in white text at the bottom of the screen. They overlay a background of refracted light ripples moving leisurely across sand at the bottom of the ocean. Occasionally they bounce backwards and reverse their direction.

Audio Caption: Leah sings in a bathroom – slightly echoey but small-sounding. The mic is just a phone, and sometimes distorts with the sound of breathing. The song is slow, with lots of space in between lines.

Song to the Siren | Tim Buckley

(Lyrics are slightly altered by Leah from the original)

All afloat on the shipless ocean

I did all my best to smile

‘Til your singing eyes and fingers

Drew me loving to your isle

For you sang

Sail to me

Sail to me let me unfold you

Here I am

Here I am

Waiting to hold you

Did I dream 

You dreamed about me?

Were you hare while I was fox?

Now my foolish boat is leaning

Torn lovelorn on your rocks

For you sang

Touch me not

Touch me not come back tomorrow

Oh my heart

Oh my heart

Shies from the sorrow

I’m as puzzled as a newborn child

I’m as riddled as the tide

Should I stand amid the breakers?

Should I lie with death for my bride?

And you sang

Swim to me

Swim to me let me unfold you

Here I am

Here I am

Waiting to hold you.

Had a little moment earlier in the week feeling the breeze on my hand while in traffic. I wanted to recreate this image or at least experiment with this idea a little more but then some barriers got in the way (broken lifts) so I’ve been stuck indoors for a couple days.

An arm hangs out a car window t on a tree-lined street In London. The car's side mirror reflects the hand feeling a gentle breeze. It is a sunny day.

A solid medium orange color background
A solid muted coral-pink color background
A solid warm beige or tan color background

Some colours I noticed dominating previous images I have taken. I had it in my head that I wanted to use more colour in any potential new work. I wondered if some previous work could inform what those colours could be…?

Experimenting…

Today I’ve been testing out an experimental way of filming, which I’ll use to make part of the work. Here is a little peek at what’s to come…

ID: Out of the darkness appears a stream of white light. It refracts into rainbow colours, then reassembles its original colour, slipping in and out of pink yellow and blue and back to white again. Waves of more defined lines flow like ripples, but smokey – lighter than water.

At the Altar

‘Those seeking divine help for an illness or affliction might rest overnight in special temple buildings. On waking, priests of the Roman god of healing, Aesculapius, helped them interpret their dreams or visions’

I made it to the temple.

A museum banner in a pillared 18th century interior. On it is a Roman sculpted face, obscured on shadow and picked out on dramatic light. It reads ‘the goddess awaits you at the temple of Sulis Minerva’
A stone sculpted head in a dark space. It’s on a plinth and its mouth and nose have eroded away, leaving blank eyes and and impressive plaited hair arrangement like a crown over her head.
A coffin underfoot, under glass. It is small and yellowish, made of an unknown material. It is enclosed at one end, and warped by its two thousand years.
Behind a statue, its cape hanging in folds, we look down upon the green bath from a height. The statue’s counterparts face it opposite, along the walkway which follows around the edge of the bath. Each of them is permanently posed, guarding or adorning the watery centre.
Up close at the corner of the bath. The cut stone corner descends in steps, the water consumes them in its milky green opacity.
A central view of the bath from the bathside: a green rectangular body of water, Roman pillars surrounding it. A small walkway runs behind the pillars, ending at ancient walls. Above, statues line a balcony, and the windows of other old (but perhaps less ancient) buildings surround it.

Image IDs in Alt Text, Video IDs here:

  1. A green body of water, edged in stone. At this corner, a flat rock – perhaps an ancient seat – is laid over a stream trickling underneath it, from a source behind, into the milky green pool. We zoom out and see more of the walkway behind, and the length of the bath. Pillars surround the edge of the pool, receding into darkness behind. We zoom back in to the gentle trickling. 
  2. Hot steaming water gushes out of an arched hole. Dark and underground, its surfaces stained orange by sulphur or some mysterious element.
  3. A hot, bubbling green thermal spring. It is contained by straight stone edges, cut into a square with a corner lopped off where the wall of a building in the same material meets it.  We zoom into the bubbles, becoming consumed by it.
  4. Water rippling gently in the sun, down its shallow stone path. The surface underneath is stained orange by something, something invisible in the clear water. It flows underneath a stone slab, and into its destination: the large body of green water. We follow its small journey.

REST without AI

This week, Hong Kong was battered by heavy rain, and I took the chance to take a breather and recharge. The last few weeks have been manic. I’ve been working on three software projects at once. The non-stop pace had left me totally overloaded, so this rain break was just what I needed. I decided to visit my wife home village, a recharging place in the middle of the city’s forests. The air smelt of earth, and the quiet beauty of the landscape was a nice change. I could feel the tension of my tightly wound days begin to unravel, replaced by a sense of calm that felt long overdue. The mountains were like silent guards, making me think about the balance between creativity and rest.

I might have got myself a little stuck in my searches. I tried a few online image libraries… trawling the many pages of the Wellcome Collection’s catalogue which I still haven’t reached the end of.

I already have examples of what I would like from some of my previous work that I have shared so I’m giving myself a reminder that the task isn’t impossible. I am considering the thought that maybe I’m already surrounded by the images I’m looking for. For example, I have a mug with this John William Waterhouse painting on it.

A 19th-century painting depicting Saint Cecilia seated in a garden, eyes closed in serene contemplation, with an open book resting on her lap. Two angels kneel before her, one playing a violin and the other holding an instrument. Behind them, a stone balustrade overlooks a harbor with ships and distant mountains. The scene is filled with lush roses and greenery, evoking a peaceful, spiritual atmosphere.
Saint Cecilia (1895) by John William Waterhouse

Books on Drawing™️

Wanted to include some phone images I took from a book on drawing people that I found in the local library. There are loads of them on drawing people, cats, dogs, flowers, buildings… It’s all very Drawing™️.

An image of an open book. The left page shows a man's trunk sketched and his head turned to the side. The right page includes a few sketches each focusing on different sections of the trunk.

A little fascinated by the eery “perfection” of it all. Especially in this book which was full of sketches and descriptions of muscles that make up a body part and how to combine it all together on a super athletic male body. It’s quite the opposite to what I was hoping to find when I set out on this search for images. It’s almost too healthy and tense. There’s no ease.

An image of an open book. The left pag has text explaining how different parts of the head come together. The right page includes a few sketches each focusing on different aspects of a head such as the skull and different perspectives.

A page from a book showing a drawing of a shoulder with every muscle clearly highlighted.

On the left, a vintage ad shows a woman lighting a "Metro" gas burner in a classic interior. On the right, a modern photo in a black frame depicts a hand holding a rain-soaked handrail.

I was digging through old images and enjoyed how, upon opening the image on the left, it brought up the one on the right which was buried under windows and tabs on my screen.