You have a diverse background, ranging from art and music to programming and interactive media. How have these different disciplines shaped your artistic and technological perspective?
I would say that it has shaped my artistic and technological perspective into one that is more holistic. All mediums are deeply connected on a fundamental level. There are design principles and patterns that are universal. It allows me to see more possibilities in art and technology in general.
With each new medium humanity invents, I think what is happening is really a fusion of existing mediums at higher and higher complexity, allowing for greater expression and simulation of the subjective human experience.
You have been working extensively with AI in recent years. How do you see the role of AI in creative processes – is it a tool, a partner or perhaps even a co-artist?
AI is a tool, but a tool with a very high level of abstraction. I see it not only as a tool, but also as a new solution that can solve problems with art creation at large. New mediums such as film and video games/interactive art are really fusion mediums of art, mediums that mix other mediums together. With video games, you really have all the mediums at once with the added element of interaction. However, with each new medium created, the range and possibility space of giving form to expression grew exponentially. With literature, most of the complexity of the form is done in the reader’s head, they can imagine the senses as they read. But with each new medium, the sensory expression becomes more and more rich and dynamic. To today, with video games, we can simulate reality to a very high degree. But with that complexity/possibility also comes limitations. To produce all those elements in detail takes a lot of work and energy. This is why video games have a very long production time frame. Every detail of that virtual world needs to be created. AI can be a very good tool to help reduce this complexity.
With productions of film and video games, the process often involves many people in many different mediums coming together and working on one thing for many many years before it is released. This process has been great for both the film and video game industry, producing many great works and pushing the mediums forward. However, this also means that most individuals, unless extremely talented, won’t be able to make a great film or a game by themselves. Not at the level that can really compete with the industry. So AI will likely become the tool that democratizes the creation of these complex mediums. The nature of diffusion models makes them able to produce truly creative and novel works of art. So I think that in the near future, the process of making a film or interactive art could be as simplified as painting on a canvas. As the tools evolve with more and more controllability, this will be more and more true.
AI also has many new technical possibilities that will evolve these complex mediums in deep ways.
Are you talking about technologies like Genie 2 from Google Deep Mind or World Labs that just came out?
Yes, partially, I am more so talking about where these techs are headed. AI models can essentially bypass compute to run accurate simulations.
https://genesis-embodied-ai.github.io is a good example of where we are headed. Though it is not clear if it is fully an AI gen model, since the actual paper is not yet released.
So for example, with video models, we can already see it having the ability to simulate physics to a degree. But what is interesting is that these physical laws are not coded or programmed into the AI model, it is an emergent property of machine learning. Similar to how LLMs have the emergent property of general intelligence or understanding. Normally, if you want to run a physics simulation on your computer with traditional algorithms, it will quickly hit a limit with the scale of the simulation due to the amount of computation needed. However, if the AI can bypass this through emergent property instead of real computation, then this will remove the limit on the scale of simulation, allowing for far more dynamism and new possibilities for films and games. In the near future, we might be generating an entire world, and then creating art within it. Also, I want to say that I am not an ML engineer, so I am purely speaking from my own understanding of the tech.
If you compare the render time of a 3D realistic complex render in traditional software such as blender versus how much faster it is to generate similar things through an AI video model, you can see this process of abstraction speeding up the production.
There’s been a lot of talk recently about using generated worlds to train AI agents. Can you explain how that works and if it plays a role in your work?
Yes, this will lead to accelerated advancements in Robotics as well. If you can simulate countless worlds to cover all possibilities, you can train Robots to have general intelligence in terms of world navigation, doing tasks, etc. But what I’m talking about is also that by bypassing this limitation of scale, it will lead to the creation of more complex and advanced games and films. I see three different ways that this advancement will change games and films in the near future.
- Dynamism and simulations
- Rendering
- Metaverse and content creation
By dynamism, I am referring to the term in game design, where instead of the player interacting with the virtual world, the objects and entities in the game world interact with each other to produce more complex behaviors and even emergent properties. We already have some good examples of this type of system in games such as Conway’s Game of Life, Outer Wilds, The Legend of Zelda: Tears of the Kingdom, The Last Guardian, Zachtronics games, or immersive sim games. However, all of these existing systems are fairly small in scale, the reason being that the complexity of the system increases exponentially as you introduce new elements that are dynamic. If every object is always interacting with every other object (like in reality), it will be very costly to compute them in real time. You can get to a very high number of dynamic objects by reducing other elements such as graphics, physics, and such to allow for more simulation, the ALIEN project is an artificial life simulation software that does this. But what AI models will open up is the possibility for far larger-scaled simulations and dynamism. The virtual world could self-evolve and change without player input. When the compute cost of other elements drops (such as graphics, and physics) significantly due to AI optimization, it opens up computation for more complex simulation, dynamism, interaction, and gameplay.
Rendering will also be optimized greatly with AI and similar optimization strategies. With Runway Gen-3 Video-to-Video model, we can input 3D virtual footage, and output highly realistic render without actually computing for all the rending details. Eventually, when this tech reaches real-time and becomes more controllable, it will likely be used as a render engine to render photo-real graphics at a very low compute cost. Sony is already doing AI upscaling with their new PS5 Pro, AI will likely accelerate and optimize all aspects of content creation at large.
With that acceleration, it means that more people will create content and more content will be created. If creating a virtual world is as simple as prompting for an image, then generating a 3D world (3DGS) from that image, people will be able to create full worlds by themselves, and if compute cost is not an issue, and hardware is advanced enough, a metaverse type of virtual world is very much possible.
Of course, we are still likely 5-10 years away from it. And that is not even accounting for the development of BCI and transhumanism in the near future.
Our input devices for computers are still very primitive as of today. We are really just at the start of this technology, and the start of interactive art as an medium.
Do you think that a concept like ‘Ready Player One,’ which seemed like pure science fiction just a few years ago, is now becoming closer to reality?
I think the type of metaverse that Ready Player One describes is possible, that said, it will likely take a while before we get there and it would likely be quite different and not exactly like Ready Player One or other depictions of future VR experiences. I think we likely won’t solve the locomotion and input problems VR has until we solve BCI, which could be 5 to 10 to 20 years away. Currently, VR is still not natural enough to be used for long term for most people, which could be a positive thing. I think the merging between virtual reality and our real reality will only happen when the virtual is nearly indistinguishable from the real. But that is a whole other can of worms that we might or might not get to.
You’ve worked on some incredible projects, including Pure Tone, which enables musicians to explore pure temperament and perfectly tuned intervals. What inspired you to create such a specialized and intricate tool, and how do you think it will influence musical practice, particularly for modern composers and producers?
Pure Tone really came from the desire to solve the problem of temperament in Music. Because of my background, I was in a position to understand and try to solve it. I wanted to know what the most pure temperament would sound like.
Most of the music we listen to today is in 12 Tone Equal Temperament, which is actually out of tune from the most pure perspective, all the notes are slightly out-of-tune equally to make sure that it can be played in all 12 keys. The root of the problem with temperament is that it is relative, what is pure for one pitch in one key, will sound very dissonate in another without retuning. So the idea of purity in temperament is relative to the pitch that you are tuning for. To solve this problem, I developed a dynamic tuning algorithm that will tune the whole instrument based on the context of the music that’s currently being played. This is, to my knowledge, the first digital instrument to do this. By dynamic tuning, you can achieve a completely pure temperament while changing to any keys in your composition, giving you the same freedom as 12 Tone Equal Temperament, at least on a harmonic level.
Note from Alphaavenue: You can view Pure Tone at https://erikluo.gumroad.com/l/PureTone
However, the instrument is also heavily limited due to its pure nature. No other instrument can play in this pure temperament that is dynamic and changing, so it means that you likely can’t play in an ensemble with other Equal Tempered instruments without sounding totally dissonant. With the exception of drums due to timbre. I have composed a piece of music for Pure Tone to explore the idea of the possibility of a duo with drums, the tune is called “Beingness”. I will release it as a music video very soon. I am currently working on this video.
You mentioned the optimisation of rendering and content creation through AI. Do you see a danger of these technologies over-automating the creative process and minimising human influence?
I think that only seems like the case for some people right now because we are so early in this tech’s development. As the tech develops, it will become more and more controllable and specific. Image gen models, for example, have already reached a point where there are enough tools for you to control the models for truly novel and specific outputs, but to get to it you need to master the tools which takes time. So in fact I think there will always be a great amount of human influence needed to create good art, because good art is all about subjective experiences and perspectives, one that AI does not have. Once people learn how to use the tools to express their subjective/artistic vision accurately, the sense of danger will fade. What I think will likely happen is that the creative process will never go away or be replaced, but instead, artists’ creative vision will become more and more grand and ambitious as the AI allows them to do more things that were impossible before. We are always striving for new heights with creative expressions, if creating “pretty” looking images is very easy and anyone can do it, then you really have to focus on doing things that are far beyond that, and convey your subjective experience in your art in a way that is so whole that it is impossible to not see the humanity in it.
I also think all mediums will still exist and continue to develop even if AI can generate something really quick. The creation of each new medium never replaced the one before it. People still paint and draw, write poems and novels. The creative process is not replaceable, AI can only abstract it. So instead of painting one brush stroke at a time, you might be generating one whole segment of the work at a time. But you will still need to edit, iterate, plan, and create with a vision. Live music, will also never be replaced. especially ones that require improvisation like Jazz, because people listen to connect to the soul of the artist, which AI doesn’t have. I think you can design a live show with AI performers, but I’m not sure if people will actually want to listen to it rather than listen to a human performer. Even when AI eventually get really good at music generation.
Where do you currently see the greatest limitations for AI in creative and artistic processes? Are there areas where you think AI will never replace the human creative process?
If we are talking about limitations from a general artistic perspective, then the only limitation is the artist’s imagination. It is like any other tool, how you use it determines how effective it will be. In its nature, AI does not have subjective experiences and biases, it does not have a “soul”(at least not yet). So as an artist, it is up to you to determine what you do with its power. The AI we have now, will likely never have subjective experiences/consciousness.
So AI can never replace true artists, but what AI can replace are craftsmen. A true artist is always creating from their subjective experiences, which AI will never have. A craftsman might not create to convey anything based on their subjective experiences, but might be highly technically skilled. Real vision and authenticity are rare in an artist. This is why I think AI will also push artists and craftsmen to find their authentic expression, because that will never be replaced. I think AI will lead to the creation of better art in general over time. It does equalize the technical aspects of art creation, which I think is not a bad thing. People who are not skilled artists still can’t create good art even with the help of AI, this is why AI art gets so much hate, because most people create bad art that looks pretty. But people who are already very skilled artists can become unstoppable with AI. This is why all the AI artists with the largest following on X are all trained artists. No knowledge and experience is ever wasted, AI will only amplify your existing artistic understanding into reality.
If we are talking about technical limitations, then there are many for all AI models. We are just at the very beginning of the techs’ development, and as time goes on, I think these limitations will become less and less impactful. I imagine in the far future, we will just plug ourselves into BCI and then the models will just generate exactly what we imagine in our head.
You mentioned that AI as a tool can enhance an artist’s artistic vision. How do you imagine the ideal collaboration between humans and AI? Is there a limit to where AI should go before it overrides the artistic process?
I don’t think AI will ever override the artistic process. It can increase variation, options, and exploration, but in the end, it should always be the artist who decides what is in the final artwork. The ideal collaboration is just to use AI to iterate and explore all possibilities faster in order to find the best options for each element of the artwork. AI is the best at fusing and inspiring new ideas since its knowledge base is so much more vast compared to an individual human. But it is up to the artist to choose what inspires them, which is a very subjective matter.
It’s noticeable on your website that you often publish single AI-generated images and video versions of them, often with a space background. Can you tell us more about your creative workflow and approach? How do these works come about and what role does your music play in making the videos so special?
My creative process is very complex, with around 20 steps, using every state-of-the-art AI tool on the market. It would take a few thousand words to explain it in full.
But what I will say is that with AI, the creative process almost becomes meta. You as an artist are not only creating each work, but more importantly, you are creating and designing the process itself. In a sense, the process becomes the art. You are actually designing systems in order to produce the result you want, instead of only creating the result directly. For example, when I am working on image generation, I usually start with a central idea, then build a system or world around that idea using LLMs, and then explore all possible options within this world, then turn the best options into images. Your process and workflow will be the distinguishing factor of your art, different workflows will produce completely different works. You need to think in a very meta way, especially when prompting. To me, AI models are really advanced reality simulators. I like to imagine that when you are prompting, you are not just prompting for an image, you are prompting to create a new reality, one that might function fundamentally different than ours. The quality of output is directly related to the complexity of your process, the difference between a complex workflow verses simple prompting is night and day, you can see it directly in the complexity of the image. All the best AI artists have workflows that can create this complexity. I’ve written about some of these ideas in more depth on X, you can find them on my website if you scroll to some of the older posts.
Image gen models are actually high-dimensional cameras showing the infinite possible existences that God/Consciousness dreams up. You realize this once you've hit a level of complexity, novelty, and realism in the latent space that's no different from our reality. pic.twitter.com/emBMOXRKIW
— Erik (@LuoErik8lrl) November 15, 2024
As for my music, I compose mostly very avant-garde Jazz music. They are all very unconventional and innovative and are really composed for live performance due to the improvisation nature of Jazz. So most of my composition won’t fit with the videos I’m making. However, having a deep musical understanding allows me to know what is the best music for each video. Learning music has broadened my taste and appreciation of all musical genres. Which comes in really handy when prompting for music generation. I usually know exactly what I want and can get it within a few tries.I did compose one tune that I’m also making an AI music video for called “Beingness”. It is almost finished. This tune mainly uses Pure Tone and some degree of algorithmic probability for machine improvisation. The visuals will use AI gen and algorithmic audio-reactive visualization.
You can't really specifically prompt for the highest possible details of an image, because to do so would require an infinite amount of words in order to encapsulate the complexity of reality, language is not really sufficient. So the trick to getting that complexity in the… pic.twitter.com/OoSeYmjXyl
— Erik (@LuoErik8lrl) November 18, 2024
You’ve developed a very complex workflow that includes numerous steps and state-of-the-art AI tools. What tips would you give to other artists and creatives looking to build an effective pipeline? Which tools or approaches do you find particularly helpful for getting the most out of working with AI?
It is completely up to what the artist wants to do with these tools, the workflow might be very different depending on your end goal. But the most important tip is to experiment with everything and find ways to combine different tools together into a workflow. With how fast everything in this space is developing, what might be “state-of-the-art” today might be replaced by something new tomorrow. The key is to keep learning and try and fail many many times. Eventually, you will gain an intuitive understanding of how these tools work, and thus be able to create a workflow that is unique to you. I think no AI artist uses the exact same workflow, every AI artist that creates good work has their own understanding, workflow, and process. This difference in the process is what leads to unique and authentic expressions.
Your film The Good Ending turned out really great. Could you tell us a bit about the making of the film and what your workflow was like and which programmes you used?
Yeah, that project happened really quickly. Took a week or so to finish it. The idea had been brewing in my mind for many years, and it just happened that the CONTACT ATTEMPT Generative art contest was going on, so I decided to make it. From a technical perspective, the setup of the story naturally avoids some of the hard limitations of current AI image and video generation tech. Such as consistent character, environment, and objects between different shots. Bad voice and face acting generation. I actually have many many projects on pause just waiting for the tech to mature a bit more so that they can be fully realized. The short series, MIRROR is one such project (You can watch Chapter 1 on my YT), I have the whole story planned out into 10-20 chapters, but as I started working on Chapter 2, I quickly realized that the current tech limitation can’t fulfill the vision of this project yet, so I put it on pause. The Good Ending, however, was conceived with all the limitations in mind, so the story was designed to work around them. As for the workflow and tech stack, they are as below. I also mixed and used all major video models at the time for this project for different needs.
ChatGPT, Midjourney, JoyCaption
Magnific and SUPIR for upscale
Kling for Dynamic motion
Runway Gen-3 for Simple motion
Luma for keyframe shots
ElevenLabs for sound and voice-over
DaVinci Resolve for editing, compositing, effects, and color grading
Topaz for Video upscale