The block on GPTBot can be seen in the robots.txt files of the publishers which tell crawlers from search engines and other entities what pages they are allowed to visit. “Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” OpenAI said in a blogpost that included instructions on how to disallow the crawler.
Source: New York Times, CNN and Australia’s ABC block OpenAI’s GPTBot web crawler from accessing content