With no crew, no camera, little money and a lot of words, anyone can become a filmmaker today. Satyen K. Bordoloi chronicles the history of the AI filmmaking movement with an emphasis on the week that rewrote cinema history.


In 2001: A Space Odyssey, the appearance of the black monolith throughout history signalled humanity’s leap forward. It eventually led to AI and, via a disobedient AI system, Hal 9000, on to the next evolution of humanity itself. In the last 250 years of science, the world has seen many such monolith moments. The one for cinema was 130 years ago with the invention of the video camera, thus filmmaking.

A year ago, between June 6 and 12, another monolith appeared: Kling AI, launched by China’s Kuaishou Technology on June 6, and on June 12, Luma AI’s Dream Machine erupted across the globe. In that week of June 6 to 12, the camera died. Not literally, of course. But as social feeds worldwide flooded with videos conjured from words and images alone, we crossed a threshold and into the age of No-Camera Filmmaking. A year later, we stand in the aftershocks of an earthquake in the media landscape that is now coming for the world.

AI killed the camera

Pre-History: The Age of AI

The dream of Artificial Intelligence began not with silicon, but with a soul: that of Alan Turing. He became the father of computing and artificial intelligence, envisioning in the 1940s and 50s, machines that could “think”. By 1966, ELIZA, a rudimentary text generator, mimicked human conversation so well that it fooled users. However, the minds of scientists had gone where computing hardware had yet to reach. Yet, through many AI winters and summers, researchers kept pursuing the idea of AI even though hardware limitations stifled progress.

This research birthed, in the 1980s-90s, RNNs (Recurrent Neural Networks) and LSTMs (Long Short-Term Memory networks) – architectures that processed sequences, like language. They became the basis for computing systems to understand the most essential thing: context.

Then came 2012, which changed AI more than any other year before. It was the year when neural networks learned to see. AlexNet, a Convolutional Neural Network (CNN), demolished rivals at the ImageNet Challenge. Meanwhile, in a famous “Cat Experiment”, Google trained an AI on 10 million cat images – proving neural nets could recognise the tangible world. The AI renaissance, the actual birth of AI, had just begun.

GANs, VAEs, and The Attention Revolution

In 2014, Ian Goodfellow developed GANs (Generative Adversarial Networks), which changed everything. The clue is in the name, but imagine a digital arena like the Colosseum in Rome: a generator creates images, while a discriminator destroys them. Their adversarial back-and-forth led to data that was synthetic but felt real. Along with these, VAEs (Variational Autoencoders) gave AI a mathematical backbone for generating coherent forms.

However, the actual earthquake moment, when generative AI was born, happened in 2017. Google researchers’ paper “Attention Is All You Need” introduced the Transformer architecture. The self-attention mechanism of this system allowed AI to weigh word relationships, finally grasping context well. This led to LLMs (Large Language Models), text-to-image tools, and AI video tools. And you know the weird coincidence: this paper was released on June 12, 2017 – exactly seven years before Luma AI’s Dream Machine release.

The Image Revolution: From Blobs to Bollywood

One of the early images created in Dall-E after entering the prompt: A photorealistic image of an astronaut riding a horse

As early as 2015, alignDRAW spat out pixelated abstractions – proof that text could generate images, however crudely. But later, it was GANs that lit the spark to improvement. In January 2021, OpenAI’s DALL-E announcement stunned the world, for it had managed to create images out of simple words. It was released for anyone to use only a year and a half later in July 2022, and proved revolutionary immediately. Input “A photorealistic image of an astronaut riding a horse” and voilà – a man with a spacesuit on a horse in the space backdrop emerges from a surrealist’s dream and onto your screen. Others, like MidJourney and Stable Diffusion, turned amateurs into digital artists.

This was created in July 2022, when the first generation of text-to-image tools was released (Image Courtesy)

In Bollywood, my screenwriting partner, Sumit Purohit – screenwriter and editor of the wily Scam 1992 – became an early adopter. First Dall-E, but he found his true Mojo with Midjourney, using which he crafted whimsical vignettes: leopards commuting on Mumbai locals, the history of giant goats in Uttarakhand.

Birth Pangs: Video’s Cambrian Explosion (2022-2024)

As text-to-image models matured, AI companies raced to figure out ways to animate it. Runway’s Gen-1 (2022) could apply the style of an image or text prompts to existing videos; Gen-2 could make small clips from just text, though they were often distorted and bad. Pika Labs could generate equally unworthy clips, while Google’s Imagen Video and Phenaki, and Meta’s Make-A-Video also struggled with coherence in high-definition.

The results were bad, though that didn’t stop AI film festivals from being organised. These had “films” created with AI images and bad AI videos of the time, stitched together with human voices and sound effects added on the editing table to give a semblance of films.

The video series titled ‘Will Smith Eating Spaghetti’ became an informal benchmark of video generated using AI

One of the worst results of these early days came from a Reddit user who used available tools to post, on March 23, 2023, “Will Smith Eating Spaghetti” – an AI-generated nightmare of morphing noodles and the melting face of Will Smith. It went viral not for its quality, but precisely for the lack of it. Overnight, it became the Uncanny Valley Benchmark of AI videos, as if teasing each AI company: could your AI model make Will Smith eat spaghetti without traumatising humanity?

What was clear from all these bad early videos was that AI-generated videos were no longer a matter of if but when.

Though OpenAI teased Sora with these videos generated on their models, they weren’t the first to release to everyone in the world

And that is what happened. On February 15, 2024, OpenAI announced Sora and released videos created with the model, and the world gasped. Here were photorealistic videos with complex camera motions, emotional depth, and temporal understanding. No-Camera Filmmaking now was just a matter of which of these text-to-video tools was released first to the world, because even Google, not to be outdone, announced Veo in April with some videos, but Sora had stolen the buzz.

Around this time, in May 2024, I joined an AI startup in Mumbai,Phenomenal AI, that was trying to create a text-to-video model in India. We announced our Beta on July 7 but have yet to reach global levels.

But the actual earthquake came in…

The Week the Camera Died

June 6–12, 2024: On June 6, 2024, Kuaishou Technology’s Kling AI launched in China, rendering cinematic scenes in minutes. The rest of the world held its breath, but not for long, as on June 12, Luma AI – led by Mumbai-born, Indian CEO Amit Jain – unleashed Dream Machine globally. Social media exploded. Memes, music videos, and micro-films flooded social media timelines. For the first time, anyone could direct without needing a camera.

Luma AI announced its text-to-video model, Dream Machine, with this video on June 12, coincidentally the seventh anniversary of the transformative Transformers paper

I generated videos myself: a dancer twirling in a monsoon, a cyberpunk Mumbai alley. But motion-heavy scenes? Still glitchy. I tried to make a car chase sequence, but the video looked like something Salvador Dali would make with cars morphing into one another. However, if the videos didn’t need extreme motions, it blurred what was real and what was AI pixel dust.

Sora teased it early in a curated show of clips (some of which we learnt later were VFX-heavy). Still, it was only with Luma that everyone could make cinematic quality videos for free and finally see what this technology could do, and where it was headed: towards a new art form – the no-camera cinema. And you know the cosmic coincidence? On June 12, the day Dream Machine launched, marked seven years since “Attention Is All You Need” birthed the Transformer – the architecture that made generative AI, and thus Dream Machine, possible.

This might be the first and only article to call June 6-12 No-Camera Filmmaking Week today. But mark my words: In a couple of decades, that is how this week will be celebrated.

Bollywood’s Digital Big Bang

It was like some centuries-old shackles were released. And why not, if you could make a video for a few hundred rupees, which just days ago would cost millions of rupees to make, wouldn’t you unleash your creativity towards things you hadn’t even fantasised about previously? That is what I saw worldwide as music videos, short films and even some films were attempted, and many were released. And this was happening everywhere, from Timbaktu to Tamil Nadu.

This image created by my screenwriting partner Sumit Purohit on MidJourney fooled me into believing it was real (Image Courtesy)

Take Sumit Purohit. I told him of the same days after Luma and Kling’s release. He spent nights experimenting on them and used MidJourney, Kling, Luma and Eleven Labs to craft trailers for unmade films so far locked in his head, but now, we could see. One, a noir thriller set in colonial India, left producers speechless. It conjured crowds, costumes, clouds, and chiaroscuro lighting without a single DoP or assistant. Bollywood’s boardrooms now buzz with AI – not as a gimmick, but as a co-director. And Sumit’s phone hasn’t stopped buzzing with calls from producers since.

The Floodgates Open: Gen-2 and the New Pioneers

Since that seismic week, the field has detonated. Runway’s Gen-3 Alpha (July 1, 2024) delivers eerie realism; Pika 1.5 (October 1, 2024) has mastered motion physics. China’s Hailuo, Alibaba’s Qwen, and Kling 1.5 pushed boundaries, while LTX Studio empowered storyboard-to-film workflows. Open-source warriors like Hunyuan Video and Mochi democratised code. Even Adobe Firefly Video entered the fray.

With Veo 3, AI videos leave the Uncanny Valley and into the desert of the real where you can’t tell what’s real, what’s AI generated, like this Veo 3 generated video

And then, just a few weeks ago, at Google’s I/O 2025 conference on May 20, Google released their latest iteration, Veo 3. This was another Cambrian explosion of creativity because this tool can merge video, audio and sound effects to give a complete output. If you use a phone or laptop with internet, it’s unlikely that you haven’t seen a video made by Veo 3 so far. It is so real. I, an AI veteran and a reporter for the domain for eight years, have been fooled by a few of them. With Veo 3, telling the fake from the real in the video domain is no longer possible. Everyone’s commenting on its repercussions, so I won’t bore you.

The Ghost in the Machine

One year ago, between June 6 and 12, we buried the camera’s monopoly in filmmaking. The true democratisation of filmmaking was not brought about by cheap mobile cameras, but by text-to-video AI. The lens taught us to capture light; AI teaches us to manifest our visions and fantasies with nothing but words and images. Every AI-generated frame today – whether a child’s bedtime story visualised or a protest film birthed in censorship – emanated from that week when we traded apertures for algorithms.

In 2001: A Space Odyssey, the monolith wasn’t an endpoint. It was a doorway into a new future for humanity. As we enter year two of No-Camera Filmmaking, remember that every single one of us is no longer merely a recorder of reality but an architect of awe. The camera framed what is. AI imagines what could be.

So, happy No-Camera Filmmaking Week, dear reader. May your prompts bend light. And with it, give birth to a new reality.

In case you missed:

Satyen is an award-winning scriptwriter, journalist based in Mumbai. He loves to let his pen roam the intersection of artificial intelligence, consciousness, and quantum mechanics. His written words have appeared in many Indian and foreign publications.

Leave A Reply

Share.
© Copyright Sify Technologies Ltd, 1998-2022. All rights reserved