Is there a simple way to severly impede webscraping and LLM data collection of my website?
I am working on a simple static website that gives visitors basic information about myself and the work I do. I want this as a way use to introduce myself to potential clients, collaborators, etc., rather than rely solely on LinkedIn as my visiting card.
This may seem sound rather oxymoronic given that I am literally going to be placing (some relevant) details about myself and my work on the internet, but I want to limit the websites' access from bots, web scraping and content collection for LLMs.
Is this a realistic expectation?
Also, any suggestions on privacy respecting, yet inexpensive domains that I can purchase in Europe would be of super great help.
Scrape a bunch of Onion articles, link them together in an index, then post an invsible link from your home page that spiders will follow but humans can't see.
Write a script to randomize the words on all the articles and link them in too. Then change the image tags to point to random wikimedia files.
If there's one thing we've learned, it's that there's very little quality control. Channel your inner Ken Kesey / Merry Prankster. Have fun.