Weekly AI Technology Discoveries Summary - Week 43

Oct. 27, 2024 | Ryan Fitzpatrick

The artificial intelligence landscape continues to evolve at a breathtaking pace, with this week bringing particularly exciting developments across multiple domains.

From groundbreaking video generation capabilities to sophisticated real-time interaction tools, the boundaries of what's possible with AI continue to expand.

Reimagining Image Creation and Interaction

The way we interact with AI-generated imagery is undergoing a fundamental transformation. Mid Journey's latest release exemplifies this shift with their new web-based editing tool, which introduces sophisticated retexturing capabilities. This isn't just another image editor it's a complete reimagining of how artists can refine and adjust their AI-generated works, offering unprecedented control through intuitive text prompts.

Meanwhile, Krea.ai another discovery of mine this week contains a host of robust tools for flux and SD, including real-time AI interaction with their innovative Real-Time Canvas. By integrating Flux and Stable Diffusion, they've created something that feels less like traditional AI image generation and more like painting with an intelligent assistant. Artists can now see their ideas materialize in real-time, bridging the gap between imagination and creation.

Also they released a video extend feature at the end of the week for all their supported video API's. This is an interesting one as it lets you provide the start and end as either video or image and blend together the middle. Basically a quicker way of blending clips without having to do the manual edits and busy work yourself.

Video Generation Continues to Adapt Feature Parity with Open Source

The video generation space has seen particularly dramatic advances this week. Genmo's release of Mochi-1 represents a significant milestone in democratizing AI video creation. As an open-source alternative to established platforms like Runway's Gen-3, Mochi-1 promises to put powerful text-to-video capabilities in the hands of creators who prefer open-source solutions. Just need a couple H100s and your set but hopefully as time goes on, given its open source nature, it will be optimized to run on less memory requirements.

In the realm of character animation, two notable discoveries stand out.

- Live Portrait by Kijai is an collection of custom nodes in ComfyUI that brings professional-grade lip synchronization to local setups. including img2vid and vid2vid lip sync workflows.

- Runway's act-one new motion capture feature complements this with impressive facial and body gesture synchronization, opening new possibilities for character animation.

Compare for yourself below:

Breaking New Ground in 3D

The 3D modeling space has seen remarkable progress, with Tripo3D's enhanced procedural modeling system leading the charge. Their updated platform, complete with new libraries and presets, is making VR-ready model generation more accessible than ever. Check out some results here: Tuna in 3D & Sushi In 3D

Viggle continues to gain popularity as they introduce the turbo v2 model including templates for multiple characters, with support for custom multi-character templates coming soon. It's quite incredible the way the model is able to understand the source video, then apply a decent 3d model based on one image of the character, and generations take 1 minute usually. The proprietary model JST-1 specifically designed to understand videos and 3d characters to replace or animation scenes with new and interesting characters. They also offer relaxed mode when you run out of tokens to keep generating past your monthly allotment, just at a slower pace. I wish more services adapted this model, it's very generous in a world of currency being tokens per dollar per watt.

A local option seems to exist as well, I just found it today and and in the process of installing it locally to run on a 3090 TI.

Source code can be found here: https://github.com/deepseek-ai/DreamCraft3D/tree/main

Also impressive is Rendernet's breakthrough in character-to-character transfer technology. Their system's ability to maintain character consistency across different scenes outperforms even established platforms like Midjourney, marking a significant advance in character portrayal fidelity.

Language Models: The Next Generation

The language model space continues to evolve with Anthropic's release of Claude 3 and Claude Next models. These updates bring notable improvements in conversational capabilities and context handling, particularly in creative tasks and content creation. The enhanced Q&A responses and more nuanced interactions suggest we're moving closer to truly natural human-AI collaboration.

Also Anthropic's Claude 3.5 Sonnet model, released in beta, contains a computer use feature that allows AI a level of interaction with computers and could revolutionize the way we work. Now, AI like Claude can scroll through screens, move cursors, click, and type. This functionality, while not perfect, is a big step towards fully independent AI.

Looking Ahead

This week's developments represent more than just incremental improvements they signal a fundamental shift in how we create and interact with AI-generated content. The convergence of real-time interaction, sophisticated video generation, and improved character consistency suggests we're entering a new phase in the AI revolution, where the tools are becoming more intuitive, powerful, and accessible.

As these technologies continue to mature, we can expect to see even more exciting developments in the coming weeks and months. The barriers between imagination and creation are becoming increasingly thin, and the tools available to creators are more powerful than ever before. The future of AI-assisted creation looks brighter and more interesting than ever.