Get the latest news and trends on artificial intelligence (AI), robotics, machine learning, blockchain, IoT and more

News

Meta introduces generative AI model ‘CM3leon’ for text, images

CM3leon's capabilities enhance the performances of image tools, enabling them to produce more cohesive imagery that aligns closely with the provided input prompts

AI Trends India17 July 20231 Mins read994 Views

Meta introduced a generative artificial intelligence (AI) model “CM3leon” (pronounced like chameleon) that performs both text-to-image generation and image-to-text generation.

CM3leon is the first multimodal model trained on a recipe adapted from a plain text language model, including large-scale search augmentation pre-training. This process comprises a comprehensive retrieval-augmented pre-training stage and a subsequent multitask supervised fine-tuning (SFT) stage. The approach is straightforward yet yields a robust model, showcasing that tokenizer-based transformers can match the efficiency of existing generative diffusion-based models. Remarkably, despite employing five times less computational resources than previous transformer-based methods, CM3leon achieves cutting-edge performance in text-to-image generation.

The model combines the versatility and effectiveness of autoregressive models while maintaining cost-effectiveness and inference efficiency. Termed a causal masked mixed-modal (CM3) model, CM3leon can generate sequences of text and images conditioned on arbitrary sequences of image and text content, significantly surpassing the capabilities of previous models that focused solely on text-to-image or image-to-text generation.

According to the company, CM3leon’s capabilities enable its imaging tools to follow prompts more easily and produce more consistent images.

Compared to the most widely used image generation benchmark (Zero-Shot MS-COCO), CM3Leon achieves his FID (Frechet Inception Distance) value of 4.88, a new state-of-the-art technology in text-to-image generation established. Image model, party.

Additionally, the Meta said its generative AI CM3leon excels at various visual language tasks, such as visual question answering and long-form captioning.