Categories
The Internet Websites

Following up on delisting my website from Google

It’s been about a month since I decided to take my site off Google in response to their training generative AI on web content. I’ve confirmed my site is included in some small web directories. I looked at my server stats to see how my robots.txt instructions were working and whether Google was delisting my content.

Adding my site to directories

Since I started this in July, I added my Now page to nownownow.com. I’d avoided doing it, despite maintaining a Now page for years, because I felt bad emailing Derek Sivers, who I presume to be a very busy person. But I finally decided he’s a smart guy, so there must be a reason he’s still doing this himself and not having like an intern do it — he must be getting something out of maintaining the site 🤷‍♀️

My site was already on Ye Olde Blogroll (thanks Ray!). I confirmed it was also included in Ooh! Directory, indieblog.page, Search My Site, indieseek.xyz, and Marginalia search. It feels awkward to search yourself, but I suppose ‘it’s the interest of a well-connected indie web’.

(Hahahaha oh noooo I saw an embarrassing website I designed for a relative in 2007 is on Wiby 😅😅😅 I designed it in HTML so it still renders OK, although it’s not what I would make today lol be careful what you put your name on when you’re young I guess)

A drop in referrals from Google

The number of referrals I’m getting from Google has dropped — I’m interpreting that to mean that they are gradually delisting my pages (since I am still getting tons of hits from GoogleBot — I’m waiting to block it in robots.txt till the whole site is delisted).

In June, before starting to delist my site from Google, I got 1300 links from google.com. In July, when I’d started the process of getting my site delisted, that dropped to about 800. For the first half of August, I’ve gotten 300. More anecdotally, in June I had referrals from 15 country-specific versions of Google, down to 2 thus far in August. At this rate, could be another month or two before I’m fully off of Google.

Which bots are crawling my site?

CCBot, the Creative Crawl robot I blocked for providing data to generative AI companies, is no longer crawling my site. So they appear to respect robots.txt 👍

I pulled a list of all the robots that visited my site in July and August: 90 unique bots and spiders, some with as few as one visit, and GoogleBot at the top with more than 12,000 in July. (Actually something called “feed” is higher but I assume that’s related to RSS 🤷‍♀️)

I’ve never looked at the list of bots before, and it was interesting to see the categories they fall into: search engines, feed fetchers, link expanders, SEO sites, and more. (In the list below, I left off a bunch that seemed to be generic or associated with a particular browser.)

Search engines

  • Googlebot and Googlebot-Image and AdsBot-Google
  • 2345Explorer
  • Baidu and Baiduspider
  • Barkrowler
  • bingbot and BingPreview
  • DuckDuckBot-Https and DuckDuckGo-Favicons-Bot
  • Mail.RU Bot
  • MegaIndex.ru
  • MixrankBot
  • MJ12bot
  • MojeekBot
  • msnbot
  • SeznamBot
  • YandexBot and YandexImages and YandexMetrika
  • yacybot

Feed fetchers

Link expanders

  • bit.ly
  • Discordbot
  • facebookexternalhit
  • LinkedInBot
  • Slackbot-LinkExpanding
  • TelegramBot

Recognizable sites

  • Applebot
  • archive.org_bot
  • Pinterestbot

SEO / marketing sites

Sus 🤔

(At least to me — either couldn’t find much info or I see no benefit to supporting their services.)

  • The Knowledge AI
  • 360Spider
  • aiHitBot
  • netEstate NE Crawler

Blocking more bots

Since I originally posted in July, I’ve already added OpenAI’s ChatGPT bot to my robots.txt, but now that I’ve seen the whole list, I’m going to add a few more.

I’ll ditch all the ones I thought were suspect. I’m side-eyeing those SEO bots but I guess they’re not hurting anything so I’ll leave them 🤷‍♀️

I also think I’ll block Pinterestbot. I do use Pinterest as a bookmarking site (primarily for shopping), but I also get really frustrated by Pinterest images in results — I don’t feel a need to supply images for their website. Their “why we crawl your site” is aimed at businesses, so it’s not clear why they’re crawling my personal website. I don’t actually have many images on here — primarily, I re-host cover images for books I review. A handful of images are mine, but the majority are not — there’s no need for Pinterest to pull images from my site.

What to do about other Google bots

There are more Google bots that have crawled my site: Googlebot-Image and AdsBot-Google.

Google doesn’t have a visual generative AI they’re training AFAIK, so I could continue to allow them to index my images — however, my same thought process as Pinterest applies — I have very little original visual content on my site, so it’s not really worth them crawling and indexing.

I only have one hit from the ad bot, but also I don’t understand why they would look at my site in the first place, given I don’t run ads? 🤨

The dilemma of Bing

Bing chat runs on GPT-4 — but it draws on Bing search results in the answers it gives. Is this equivalent to Google’s deep-sea trawling “all your website are belong to us”? If Bing is pulling info from my site, isn’t that nearly as bad as training from my site?

I didn’t want to drop Bing because they chiefly supply DuckDuckGo’s index, which is my usual search engine — but maybe DDG is indexing me themselves? I had 6 hits from the DuckDuckBot over six weeks, but only have 37 pages come up in DDG versus 1800 in Bing search. One would think if DDG used the same index all the same pages would be available? 🤷‍♀️

I’m going to ‘phone a friend’ on this one. (I’m not actually going to phone them, you heathen)

By Tracy Durnell

Writer and designer in the Seattle area. Reach me at tracy.durnell@gmail.com. She/her.

5 replies on “Following up on delisting my website from Google”

What do I want the future of the Internet to look like? Last updated 2024 May 19 | More of my big questions Sub-questions What do I want out of the Internet? What’s a better way to use the Internet? How can I support the independent web? What are the social norms around blogging and…

Leave a Reply