Generate voice cloned speech from text and audio
Detect objects in images or videos
Generate images preserving face identity
Generate realistic audio from text