Link tags: scraping

8

Declare your AIndependence: block AI bots, scrapers and crawlers with a single click

This is a great move from Cloudflare. I may start using their service.

AI Pollution – David Bushell – Freelance Web Design (UK)

AI is steeped in marketing drivel, built upon theft, and intent on replacing our creative output with a depressingly shallow imitation.

AI Agents robots.txt Builder | Dark Visitors

A handy resource for keeping your blocklist up to date in your robots.txt file.

Though the name of the website is unfortunate with its racism-via-laziness nomenclature.

Robots.txt - Jim Nielsen’s Blog

I realized why I hadn’t yet added any rules to my robots.txt: I have zero faith in it.

Mercury by Postlight

Readability is back, but now it’s called Mercury.

kimono : Turn websites into structured APIs from your browser in seconds

This tool for building ScrAPIs is an interesting development—the current trend for not providing a simple API (or even a simple RSS feed) is being interpreted as damage and routed around.

I Don’t Need No Stinking API: Web Scraping For Fun and Profit | Hartley Brody

A handy step-by-step guide to scraping HTML to get data out. Useful for services (—cough—Twitter—cough—) that keep changing the rules of their API use.