Tuesday, February 24, 2026

Free or not to free

Have you ever heard the saying, "If something is free, you're the product"? You innately know that's true. That's why Facebook has ads and listens to your conversations (which is a super-annoying invasion of privacy I might add). And this is why you can pay for ad-free YouTube. Well things are no different with "free" Artificial Intelligence (AI) apps. Instead of shoving ads in your face, you are the bug in their jar. What you input into AI is used to train the AI - you become the training data. It's the price you pay for using their products for "free". However,  even some of the subscription based apps still use your data for training but at least you might be able to opt out of it like with ChatGPT. Same with the paid versions of Google Gemini. So if you are using AI to process personal data, like family trees, DNA results, etc. then be sure to remove all data on living persons so it doesn't end up in the training data. The Coalition for the Responsible AI in Genealogy (CRAIGEN.org) provides a good principle to follow for this: "AI usage can lead to unintended data exposure, putting private information at risk of being publicly disclosed. Therefore, members of the genealogical community take reasonable measures to safeguard private information when using AI." One more thing to mention on the topic of training data: if you are following this blog, I wrote yesterday about how the data that AI produces is from a compendium of all human knowledge. Most of that human knowledge is gained from the AI companies sending out bots to scrape the information from websites. While these would seem harmless if the data is publicly available (unlike Perplexity sending their bots behind paywalls), it does create a huge load on a site's servers. This is why you are seeing such an increase in "Verifying You're Human" captchas. The scraping bots were overloading our server on the International Society of Genetic Genealogy (ISOGG) so we had to implement one of those captchas. The captchas are annoying (but not as annoying as Facebook spying) but at least you now know why we have seen such an increase in use. So go easy on sites that implement those, we should not have to pay a server more for information we provide for free just so an AI company can turn around and charge for it as data they provide!

By the way, I gave ChatGPT 5.2 free license (very little prompting) to create this graphic - pretty good I'd say!

No comments:

Post a Comment