Google: All your content are belong to us

Apparently, Google has changed their privacy policy and now says that they’ll scrape everything you post online to train their AI tools. I post my Fiction online on Substack and my WordPress blog and now wonder if this is a bad idea.

What makes me most mad about this Google “will scrape what you post online to train AI tools” thing is the gall of them to claim they have the right to do so. They index the Internet, they DON’T own it. I pay for hosting fees, I write the content. It’s like a postman taking things from your house just because he delivers things to your address.

It’s fine if they open a programme calling for data sets submissions with compensation.

It would’ve been fine if they were a company that created Internet tools with ethics in mind.

No, it’s “all your content are belong to us”.

I struck up a conversation on Mastodon about this, and I was hoping someone would come along to tell me that I’m panicking for no reason. But no, most of the responses I got was a resigned “acceptance” that there was no stopping them.

I think the worse thing about this is how some folks who work in tech are rolling their eyes at our reactions.

I suppose I shouldn’t be surprised by the lack of empathy some bros have about the whole “Google is scraping content to train their AI” thing. Lots of people defending Google in tech forums, saying this has been done forever so why are we being such a baby about it?

Maybe because all they can think about is $$ they can get from us content producers and we’re standing in the way of that. All that yummy free content that they could mind and we have the gall to complain about being plundered so they could earn money from our labour? Pfft! We’re so entitlted, us content creators!

I feel like the Internet is devolving faster than my emotions can keep up.

Wondering if there’s a way to protect my content more if I use WordPress or Substack now that Google is openly admitting what we have suspected they have been doing all along.

I heard that paywalls/subscriber only walls can deter them?

What do you think writers can do to protect their content? Or should we just roll over and accept that this is the way things will be from now on?

Read: Google Says It’ll Scrape Everything You Post Online for AI


Comments

92 responses to “Google: All your content are belong to us”

  1. I find the whole thing repulsive – this was not the internet we all hoped it would be, probably not even the founders of Google! I’m not sure I’d trust anything to keep my content safe, not paywalls or anything. So the only alternative is to take your content down, presumably including e-books which seem to be disseminated throughout the world the instant you post them to KDP. And, on that basis, printed books too. Maybe we need to all just start sending paper letters – and more pertinently, samizdat – to each other again?

    Like

    1. Sigh, I hope it never gets to that point. I find it repulsive too. For now, it seems that keeping your content behind a paywall and member a membership option could help. Also, on WP.com you can opt out of being indexed by Google. Of course I have no idea if this will stop them, but it gives SOME control. I guess it’s early days. We need to find out more. I cannot believe that Google will get away with this. There are highly-sensitive content on organisational and government websites after all.

      Like

  2. Back in 1993 I read a book by Jean-Marie Guéhenno (a highly connected person in meatspace) who predicted that the way to gain influence in the new digital world was to tell everybody everything. That is, the connections we build by sharing are more valuable than any individual thing we produce. I followed that advice and it’s worked fabulously. But maybe the world has changed. I couldn’t read Guéhenno’s latest book because it was in a nonstandard epub format that my reader rejected, and the publisher won’t let me try another reader without buying a new copy. This may be an omen.

    What’s the new threat? My first guess was that a LLM will write stories like yours and destroy your market. But a lot of the market destruction already happened when the magazines went under and the book publishers consolidated. I’m not sure what Google can do that’s worse. We know you’re better at imagining than I am — what do you see coming?

    Like

    1. Oof, this will take a post to answer lol. But I’m already seeing real-life ramifications. Freelance writers losing clients, fake news proliferating, indie authors getting their books drowned out by AI trash.
      Society will only realise its effects years from now. By then it could be too late to change things.

      Liked by 1 person

  3. noellemitchell Avatar
    noellemitchell

    This whole thing makes me feel mad too. Google has too much power on the internet now. Not sure what I can do about it though.

    Like

    1. Yeah, most of us feel this way, which makes us madder! lol. I think someone will come up with a solution soon. I doubt governments concerned about their data will want to be scraped, for one.

      Liked by 1 person

  4. Elizabeth Tai :verified:: @Bam One day I shall write fiction about this. Which they will scrape up and I hope it burns when it enters their AI model. via hachyderm.io

    Like

  5. Shine F.: @liztai As a writer, I have no idea how I can protect my work, especially those that are published by media sites I’ve contributed to. Google is acting like they own everything on the web now. It sure feels like resistance is futile. Ugh. via mstdn.social

    Like

  6. Elizabeth Tai :verified:: @luthien1126 Yeah I feel the same. And absolutely furious at being subjected to this. via hachyderm.io

    Like

  7. Shine F.: @liztai There is no option to opt out, argh! via mstdn.social

    Like

  8. Elizabeth Tai :verified:: @luthien1126 Well, there is actually with robots.txt. For wp.com I can opt not to have Google index me. The question is how much they’ll respect that, I guess. via hachyderm.io

    Like

  9. Shine F.: @liztai But what about those media sites I’ve contributed to? I doubt if they’ll lift a finger, and it’s not like I can demand that they do something about it unless these companies are also threatened because of this move by Google. It’s one thing to willingly share my work with the World Wide Web through the articles I contribute to online publications; it’s another to have my published works scraped for AI training. via mstdn.social

    Like

  10. Shine F.: @liztai It’s essentially a lose-lose situation for us. I want potential clients to be able to search for my works, so getting indexed is an advantage. Opting out, on the other hand, means potential clients won’t be able to see my sample works. There is no winning. 😡😭 via mstdn.social

    Like

  11. pariahkite: @liztai @luthien1126 Google will probably only respect the robots tag to the extent that the site would be excluded from search results. They’ll still greedily scrape away at the site I’m sure. via social.cowcornerfeeds.co.in

    Like

  12. Shine F.: @pariahkite @liztai So in the end, Google still wins. For all we know, others are already doing/have already done this (hello, OpenAI!). Google just put it out in the open. via mstdn.social

    Like

  13. As The World Turns: @luthien1126 @liztaiopt out —> stop using google productsstop using GAFAM #deleteGAFAMvisit the following for privacy alternatives servicesswitching.softwaregofoss.net
    deletegafam
    switching.software via libranet.de

    Like

  14. Shine F.: @sammi @liztai Hah, this always seems like the easy way out and the default answer, but for small business owners like me, it really isn’t. Unless I want to cut off ties entirely with the rest of the world. via mstdn.social

    Like

  15. Elizabeth Tai :verified:: @luthien1126 @sammi exactly. I have gotten a work via Google searches hitting my website. via hachyderm.io

    Like

  16. mkj: @liztai It’s not for everyone, but #Google does publish the hosts and IP addresses they use for crawling.https://developers.google.com/search/docs/crawling-indexing/verifying-googlebotI’m pretty sure that information can be used to selectively block Google from crawling one’s site.
    google
    Googlebot and Other Google Crawler Verification | Google Search Central  |  Documentation  |  Google for Developers via social.linux.pizza

    Like

  17. Elizabeth Tai :verified:: @mkj Unfortunately most writers and artists are not tech savvy enough to use this. And they know it, I guess! via hachyderm.io

    Like

  18. mkj: @liztai Indeed, hence the “it’s not for everyone”.I imagine it wouldn’t be THAT difficult to use it to write, say for example, a #WordPress plugin to block Googlebot; if that hasn’t been done already.Hmm, now there’s an idea…
    wordpress via social.linux.pizza

    Like

  19. Elizabeth Tai :verified:: @mkj *waits hopefully* lolDamn it I should’ve been a software engineer or something via hachyderm.io

    Like

  20. LFV: @mkj @liztai Oh yeah, such things already exist. Most of the reputable security plugins will let you block by IP. But then, of course, Google won’t be including your site in search results. So… DuckDuckGo, anyone? via mastodon.social

    Like

  21. mkj: @vickifarmer @liztai True; such blocking would definitely reduce search engine visibility. But so, arguably, would deleting what one has previously published, or not publishing it at all going forward. via social.linux.pizza

    Like

  22. etym dub: @mkj Sounds like the killer ‘app’ here would be a plugin that allows them to index the first page where you put bio, table of contents, whatever you want them to have, & then locks them out from the rest. Especially if there could be a standard form for this page & everyone did it, like “I see you are a crawler, here is my business card, now go away”@liztai via mstdn.social

    Like

  23. pariahkite: @mkj @liztai that’s what their official docs claim. I’ve a feeling they may deploy hidden crawlers that check blocked sites say, once a month via unlisted IPs to evade detection. We all know they want it all! via social.cowcornerfeeds.co.in

    Like

  24. mkj: @pariahkite @liztai That’s a possibility, and I wouldn’t be ENTIRELY surprised if that’s even the case. However, if you were to block them coming from their officially listed hosts, and they take steps to circumvent that block, you DEFINITELY have grounds to argue that they are not just passively grabbing what they can, but actively disregarding when people tell them “no”. via social.linux.pizza

    Like

  25. Flauschpolizei: @liztai I can’t quite believe that copyright laws would allow that use by Google. Obviously, that’s difficult to do for ordinary people but I hope someone sues them via strangeobject.space

    Like

  26. Keev: @liztai We have now robots tag to manage crawling, we should have AI tag as well via mastodonczech.cz

    Like

  27. Ellane W: @liztai Another reason why everyone here should consider activating the setting to auto delete old toots. via pkm.social

    Like

  28. John Caveney woke is me🛠️: @liztai I’ve avoided g00gle like the plague for years, I can’t totally avoid it but I do as much as I can. via toot.community

    Like

  29. Jane Vogel: @liztai #privacy #tech #writingcommunity #art #writing@liztai I tried to contact Google support. The chat claims to be a human but only gives canned answers “Please know that when you create a document within the Google platforms, Jane. It is still within the property of Google.However, no one can access those files in your account.” Sounds like doublespeak but I have to have a subscription to support to get someone to explain it.
    WritingCommunity
    art
    privacy
    tech
    writing via mastodon.social

    Like

  30. R Scott Jones: @liztai We made a faustian bargain that we’d accept “free” services without payment, never realizing how much we’d really be giving up. If the web economy hadn’t been based so heavily on data mining, I can’t imagine that Google would be attempting this. via mastodon.social

    Like

  31. Enrique Barcelli: @liztai I don’t think #Google #privacy policy has changed significantly.I couldn’t find any mention to “For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.” in the policy text either.However, that is beyond the point, because they have mounted a business based on scrapping people data and they need if for that purpose, so they are going to further that activity regardless of what their policies say.It is like big oil saying that they really care about the environment, or banks saying that they are seriously fighting money laundry… they are after profits with a model that hurts others in their way… they may have some concerns about their negative impact on others, but their top priority will always be their profits… these are anti-social behaviors fueled by greed… which are part of our nature and in different degrees of everyone. That is why we need regulation, or otherwise we go back to slavery and genocide.Going back to Google and privacy, they are only 3 ways we can go about it IMHO:1.- Never post, email or say anything you don’t want them to use. A bit unrealistic, but really applicable for the things we really want or need to keep private.2.- Post all our content along with a clear declaration of how we are licensing it, so it is legally clear how and what it can be used for. This is a lot easier in the #Fediverse than in central services, as we can control our own servers and therefore include all the required legal wordings.3.- Sue them and fight them with activism and the laws and regulations of the territories where they operate, like #GPDR in the EU, fraud and money laundry in the US, etc, every time they infringe or abuse, in order to make their anti-social behaviors more and more expensive through fines and jail time, so they are compelled to abide by the social norms/contracts.
    fediverse
    google
    gpdr
    privacy via acc4e.com

    Like

  32. Alexander Hay: @liztai There goes the Open Web. Promised much. Delivered a little. Pounded to dust. via mastodon.social

    Like

Leave a Reply

Your email address will not be published. Required fields are marked *