mastodon.top est l'un des nombreux serveurs Mastodon indépendants que vous pouvez utiliser pour participer au fédiverse.
Mastodon.top est une instance francophone stable, régulièrement mise à jour et accessible à tous hébergée par VirtuBox

Statistiques du serveur :

1,4K
comptes actifs

#aitraining

2 messages2 participants0 message aujourd’hui

Big tech companies want total control but opt-out should be the way to go:

"OpenAI and Google have rejected the government’s preferred approach to solve the dispute about artificial intelligence and copyright.

In February almost every UK daily newspaper gave over its front page and website to a campaign to stop tech giants from exploiting the creative industries.

The government’s plan, which has prompted protests from leading figures in the arts, is to amend copyright law to allowdevelopers to train their AI models on publicly available content for commercial use without consent from rights holders, unless they opt out.

However, OpenAI has called for a broader copyright exemption for AI, rejecting the opt-out model."

thetimes.com/uk/technology-uk/

The Times · AI giants reject government’s approach to solving copyright rowPar Georgia Lambert
#AI#GenerativeAI#UK

"Now consider the chatbot therapist: what are its privacy safeguards? Well, the companies may make some promises about what they will and won't do with the transcripts of your AI sessions, but they are lying. Of course they're lying! AI companies lie about what their technology can do (of course). They lie about what their technologies will do. They lie about money. But most of all, they lie about data.

There is no subject on which AI companies have been more consistently, flagrantly, grotesquely dishonest than training data. When it comes to getting more data, AI companies will lie, cheat and steal in ways that would seem hacky if you wrote them into fiction, like they were pulp-novel dope fiends:
(...)
But it's not just people struggling with their mental health who shouldn't be sharing sensitive data with chatbots – it's everyone. All those business applications that AI companies are pushing, the kind where you entrust an AI with your firm's most commercially sensitive data? Are you crazy? These companies will not only leak that data, they'll sell it to your competition. Hell, Microsoft already does this with Office365 analytics:
(...)
These companies lie all the time about everything, but the thing they lie most about is how they handle sensitive data. It's wild that anyone has to be reminded of this. Letting AI companies handle your sensitive data is like turning arsonists loose in your library with a can of gasoline, a book of matches, and a pinky-promise that this time, they won't set anything on fire."

pluralistic.net/2025/04/01/doc

pluralistic.netPluralistic: Anyone who trusts an AI therapist needs their head examined (01 Apr 2025) – Pluralistic: Daily links from Cory Doctorow
#AI#GenerativeAI#LLMs

Emboldened by #Trump , A.I. Companies Lobby for Fewer Rules

President Trump at the White House in January with, from left, Oracle’s chairman, Larry Ellison; SoftBank’s chief executive, Masayoshi Son; and OpenAI’s chief executive, Sam Altman.
#ai #privacy #openai #softbank #oracle #aitraining #training

nytimes.com/2025/03/24/technol

The New York Times · Emboldened by Trump, A.I. Companies Lobby for Fewer RulesPar Cecilia Kang

Ars Technica: Open Source devs say AI crawlers dominate traffic, forcing blocks on entire countries. “Software developer Xe Iaso reached a breaking point earlier this year when aggressive AI crawler traffic from Amazon overwhelmed their Git repository service, repeatedly causing instability and downtime. Despite configuring standard defensive measures—adjusting robots.txt, blocking known […]

https://rbfirehose.com/2025/03/26/ars-technica-open-source-devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Ars Technica: Open Source devs say AI crawlers dominate traffic, forcing blocks on entire countries | ResearchBuzz: Firehose
Plus via ResearchBuzz: Firehose

TorrentFreak: Meta’s BitTorrent Uploads of ‘Pirate Library’ Data Equaled 30% of Downloads, Expert Says. “A lawsuit filed by several authors against Meta centers on Meta’s alleged use of pirated books for AI training data and the technical details of BitTorrent which was used to obtain them. Yesterday, Meta filed a motion for summary judgment, while countering the authors’ request to […]

https://rbfirehose.com/2025/03/26/torrentfreak-metas-bittorrent-uploads-of-pirate-library-data-equaled-30-of-downloads-expert-says/

MIT Press: A note on LibGen and the unauthorized use of our authors’ work. “We want to be clear: The MIT Press has not licensed any of our books or journal articles for LLM training purposes, nor have we granted permission for any such use. However, we are well aware that many MIT Press publications have ended up in pirated training data sets. We share the deep distress of our authors whose […]

https://rbfirehose.com/2025/03/22/mit-press-a-note-on-libgen-and-the-unauthorized-use-of-our-authors-work/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · MIT Press: A note on LibGen and the unauthorized use of our authors’ work | ResearchBuzz: Firehose
Plus via ResearchBuzz: Firehose
#ai#aitraining#books

Fast Company: Hollywood warns about AI industry’s push to change copyright law. “A who’s who of musicians, actors, directors, and more have teamed up to sound the alarm as AI leaders including OpenAI and Google argue that they shouldn’t have to pay copyright holders for AI training material. In an open letter, submitted to the White House Office of Science and Technology, more than 400 […]

https://rbfirehose.com/2025/03/20/fast-company-hollywood-warns-about-ai-industrys-push-to-change-copyright-law/

"The AI landscape is in danger of being dominated by large companies with deep pockets. These big names are in the news almost daily. But they’re far from the only ones – there are dozens of AI companies with fewer than 10 employees trying to build something new in a particular niche.

This bill demands that creators of any AI model–even a two-person company or a hobbyist tinkering with a small software build– identify copyrighted materials used in training. That requirement will be incredibly onerous, even if limited just to works registered with the U.S. Copyright Office. The registration system is a cumbersome beast at best–neither machine-readable nor accessible, it’s more like a card catalog than a database–that doesn’t offer information sufficient to identify all authors of a work, much less help developers to reliably match works in a training set to works in the system.

Even for major tech companies, meeting these new obligations would be a daunting task. For a small startup, throwing on such an impossible requirement could be a death sentence. If A.B. 412 becomes law, these smaller players will be forced to devote scarce resources to an unworkable compliance regime instead of focusing on development and innovation. The risk of lawsuits—potentially from copyright trolls—would discourage new startups from even attempting to enter the field."

eff.org/deeplinks/2025/03/cali

Electronic Frontier Foundation · California’s A.B. 412: A Bill That Could Crush Startups and Cement A Big Tech AI MonopolyCalifornia legislators have begun debating a bill (A.B. 412) that would require AI developers to track and disclose every registered copyrighted work used in AI training. At first glance, this might sound like a reasonable step toward transparency. But it’s an impossible standard that could crush...
#USA#California#AI

TechCrunch: Bluesky users debate plans around user data and AI training. “Social network Bluesky recently published a proposal on GitHub outlining new options it could give users to indicate whether they want their posts and data to be scraped for things like generative AI training and public archiving.”

https://rbfirehose.com/2025/03/17/techcrunch-bluesky-users-debate-plans-around-user-data-and-ai-training/

#ai#aitraining#bluesky

"Anyone at an AI company who stops to think for half a second should be able to recognize they have a vampiric relationship with the commons. While they rely on these repositories for their sustenance, their adversarial and disrespectful relationships with creators reduce the incentives for anyone to make their work publicly available going forward (freely licensed or otherwise). They drain resources from maintainers of those common repositories often without any compensation. They reduce the visibility of the original sources, leaving people unaware that they can or should contribute towards maintaining such valuable projects. AI companies should want a thriving open access ecosystem, ensuring that the models they trained on Wikipedia in 2020 can be continually expanded and updated. Even if AI companies don’t care about the benefit to the common good, it shouldn’t be hard for them to understand that by bleeding these projects dry, they are destroying their own food supply.

And yet many AI companies seem to give very little thought to this, seemingly looking only at the months in front of them rather than operating on years-long timescales. (Though perhaps anyone who has observed AI companies’ activities more generally will be unsurprised to see that they do not act as though they believe their businesses will be sustainable on the order of years.)

It would be very wise for these companies to immediately begin prioritizing the ongoing health of the commons, so that they do not wind up strangling their golden goose. It would also be very wise for the rest of us to not rely on AI companies to suddenly, miraculously come to their senses or develop a conscience en masse.

Instead, we must ensure that mechanisms are in place to force AI companies to engage with these repositories on their creators' terms."

citationneeded.news/free-and-o

Citation Needed · “Wait, not like that”: Free and open access in the age of generative AIThe real threat isn’t AI using open knowledge — it’s AI companies killing the projects that make knowledge free

#OpenAI declares #AI race “over” if #training on #copyrighted works isn’t fair use

OpenAI is hoping that Donald Trump's AI Action Plan, due out this July, will settle #copyright debates by declaring #AItraining fair use—paving the way for AI companies' unfettered access to training data that OpenAI claims is critical to defeat #China in the AI race.
#fairuse #Trump

arstechnica.com/tech-policy/20

Ars Technica · OpenAI declares AI race “over” if training on copyrighted works isn’t fair usePar Ashley Belanger

TechCrunch: Judge allows authors’ AI copyright lawsuit against Meta to move forward. “A federal judge is allowing an AI-related copyright lawsuit against Meta to move forward, although he dismissed part of the suit. In Kadrey vs. Meta, authors including Richard Kadrey, Sarah Silverman, and Ta-Nehisi Coates have alleged that Meta has violated their intellectual property rights by using their […]

https://rbfirehose.com/2025/03/10/techcrunch-judge-allows-authors-ai-copyright-lawsuit-against-meta-to-move-forward/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · TechCrunch: Judge allows authors’ AI copyright lawsuit against Meta to move forward | ResearchBuzz: Firehose
Plus via ResearchBuzz: Firehose