OpenAI claims it has implemented filters ensuring that sources behind paywalls, those collecting personally identifiable information, or any content violating OpenAI’s policies will not be accessed by GPTBot. The OpenAI docs also give instructions about how to block GPTBot from crawling websites using the industry-standard robots.txt file, which is a text file that sits at the root directory of a website and instructs web crawlers (such as those used by search engines) not to index the site.
Source: Sites scramble to block ChatGPT web crawler after instructions emerge