Meta Platforms has revealed that its new Meta AI chatbot has been trained using publicly available data from Instagram and Facebook posts, excluding private content shared with family and friends.

According to Reuters, Nick Clegg, Meta’s President of Global Affairs, emphasized the company’s commitment to respecting user privacy by excluding datasets with plenty of personal information and filtering private details from public training data.

Clegg clarified that private chats on Meta’s messaging services were not utilized for training the AI model. The company has taken proactive measures to ensure that private and copyrighted materials are not incorporated into its AI systems, given the potential issues surrounding copyright infringement and privacy concerns. Meta has also refrained from using websites like LinkedIn due to privacy concerns.

During Meta’s annual Connect conference, CEO Mark Zuckerberg introduced Meta AI as a significant product among the company’s latest consumer-facing AI tools. The virtual assistant utilizes a custom model based on Meta’s Llama 2 large language model and Emu, a new model capable of generating images in response to text prompts. Meta AI can generate text, images, and audio while also providing real-time information through a partnership with Microsoft.

As mentioned earlier, public Instagram and Facebook posts, including both text and photos, were used to train Meta AI. The image generation elements were trained using these posts, while the chat functions relied on the Llama 2 model with additional publicly available and annotated datasets. Meta stated that user interactions with Meta AI may also contribute to future feature improvements.

Clegg emphasized that Meta has implemented safety restrictions to prevent the generation of realistic fake images people, especially public figures. He acknowledged the likelihood of litigation regarding the reproduction of copyrighted content and expressed Meta’s belief that fair use doctrine covers the limited use of protected works for the purpose research, commentary, and parody.

When asked about steps taken by Meta to avoid the reproduction of copyrighted imagery, a spokesperson pointed to the new terms of service that prohibit users from generating any content that can possibly violate intellectual property rights and privacy.

Tech companies like Meta, OpenAI, and Google have faced criticism for using internet-scraped information without permission to train AI models. These companies are now actively seeking ways to handle private or copyrighted materials gathered during the training process.

Leave a comment

Your email address will not be published. Required fields are marked *