Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Trending

UiPath CTO Details ‘Office Layout’ For Agents, Robots And Humans

July 9, 2025

Small Business Credit Is Tightening — Here’s How to Prepare for What’s Ahead

July 9, 2025

What Could a Healthy AI Companion Look Like?

July 9, 2025
Facebook Twitter Instagram
  • Newsletter
  • Submit Articles
  • Privacy
  • Advertise
  • Contact
Facebook Twitter Instagram
Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Subscribe for Alerts
Startup DreamersStartup Dreamers
Home » OpenAI’s ChatGPT Crawler Bot Poses A Dilemma For Publishers And Newsmakers
Innovation

OpenAI’s ChatGPT Crawler Bot Poses A Dilemma For Publishers And Newsmakers

adminBy adminAugust 24, 20230 ViewsNo Comments6 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email

The First Online Content Revolution: From Websites to Directories to Search

Some of us still remember the “dotcom” boom from 1999-2001 and the “Cambrian explosion” of websites catering to the masses as well as to the long tail of people seeking very specific content. All you needed to do was remember the name. Then came Yahoo, AOL, and other directories where websites competed for placement and ratings. Within a few years, these directories were replaced by search engines, with Google emerging as a virtual search monopoly. Google developed the most efficient crawling bot that visited each website, followed every link, and copied the data. It then processed this data and provided a list of the most relevant websites for a specific set of keywords. Now, as a user, you no longer needed to remember the domain name; you could simply type in a few keywords and find the relevant link.

This transition created a huge dilemma for publishers and newsmakers. From one perspective, you wanted to be ranked among the top ten results in search output and make all your content crawlable and search engine optimized (SOE). From another perspective, you lost quite a bit of advertising revenue coming from the front page ads, as well as the visibility of the ads that the user would have seen if he or she were to traverse the website themselves. Most publishers and newsmakers decided to adapt to the new realities of search and become fully-searchable. Even premium paywalled content became available to search engines to increase the probability of the content being picked up by crawling bots. This new paradigm also led to the creation of new media which started supplying user-generated content in the form of blogs and contributor networks. This new media could now compete with traditional media for traffic and advertising revenue while the bulk of the advertising revenue went to the search engines.

This redistribution of power and advertising revenue has been reflected in the valuations of publishers in recent years. One of the most famous and prominent publishing houses, Forbes, was acquired for just $415 million in 2016 by the Hong Kong-based Integrated Whale Media Investments, and is expected to change hands again for around $800 million in a deal led by the young technology genius and the founder of Luminar, Austin Russel. Fortune was acquired for a mere $150 million by the Thai billionaire, Chatchaval Jiaravanon, in 2018. In 2013, Jeff Bezos acquired the Washington Post for $250 million.

In comparison, at the time of this writing, the market capitalization of Google was $1.66 Billion, Facebook traded for around $760 Billion, and Twitter sold for $44 Billion. These tech giants became the leading traffic aggregators and attracted the majority of the advertising revenue, taking ad revenue away from the content creators that fund professional journalism.

This unfair redistribution of ad revenue and publishers’ desire to get the aggregator traffic, led professional media outlets to focus on Search Engine Optimization (SEO), coming up with flashier titles and catering to consumer wants rather than focusing on more balanced and professional reporting. Some governments recognized this trend and are trying to enforce fair distribution of advertising resources. For example, Canada introduced a bill requiring the online giants to share advertising revenue with the publishers, a move strongly opposed by the search engines and social networks.

Publishers’ Dilemma: Should You Allow the ChatGPT Bot to Crawl Your Content?

While there are rumors that ChatGPT was trained on Microsoft Bing’s crawling bot data and much of the other data provided by Microsoft, OpenAI revealed its own web crawler, ChatGPT Bot as a short note in the documentation. Almost immediately, on August 8, 2023, Venture Bryson Masse of VentureBeat reported that some publishers and creators started blocking the bot to preserve their content. Benj Edwards from ArsTechnica expanded on the story.

It is no secret that some of the transformer-based Large Language Models (LLMs) like ChatGPT 4.0 became so good that they started outperforming humans in many tasks including some analytical tasks. These models are still far from perfect and most high-quality publishers, including Forbes.com and Nature Publishing Group, have banned the use of generative tools for content creation by introducing strict policies.

I previously wrote an article explaining that publishers with massive amounts of proprietary content are the most likely beneficiaries of the generative AI revolution as they may be able to develop their own trustworthy chat bots or license the content to generative AI companies. However, if they let their content be “crawled” and processed by the crawler bots operated by the generative AI companies without proper watermarking and copyright notices, they are likely to lose this advantage. At the same time, now that the generative AI systems are becoming more interpretable and can lead to the primary source, not being crawled will decrease the probability of the content being accessed.

This is the new dilemma that most of the publishers will face sooner or later. At this point, it is safer to protect the paywalled content from being crawled and only make the title and keywords accessible to the crawler bots and invest in internal generative AI capabilities.

It is also important to note that much of the published content has already been crawled by the search engines and may be re-used for training of generative AI systems. For example, Google invested in complete digitization of books and has crawled the entire Internet. Will these books and this crawled content be used for training the LLMs? Massive tests need to be conducted to see if this content was already used by the leading AI players.

Will Generative AI Cause Further Decline in Professional Journalism?

It is natural to expect some decline in the quality of content from the publishers and newsmakers where the use of generative tools is allowed or even encouraged. There are multiple spam publishers already doing that and often confusing the search engines. However, we should not underestimate the further potential loss of advertising revenue. Serious publishers require a steady stream of advertising and subscription revenue to maintain their high editorial standards. Here is where lawmakers may come into play to ensure that independent media is supported and professional journalism is encouraged. Otherwise, we are likely to see the Internet being polluted with AI-generated content produced by the LLMs that demonetized professional publishers.

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

UiPath CTO Details ‘Office Layout’ For Agents, Robots And Humans

Innovation July 9, 2025

How Baidu’s ERNIE 4.5 Is Catalyzing China’s AI Transformation

Innovation July 8, 2025

I Want AI In My Business In The Best Way

Innovation July 7, 2025

Today’s ‘Wordle’ #1478 Hints, Clues And Answer For Sunday, July 6th

Innovation July 6, 2025

Today’s ‘Wordle’ #1477 Hints, Clues And Answer For Friday, July 4th

Innovation July 5, 2025

UFC Veteran Announces Retirement 2 Days Before Her 30th Birthday

Innovation July 4, 2025
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

UiPath CTO Details ‘Office Layout’ For Agents, Robots And Humans

July 9, 2025

Small Business Credit Is Tightening — Here’s How to Prepare for What’s Ahead

July 9, 2025

What Could a Healthy AI Companion Look Like?

July 9, 2025

How Baidu’s ERNIE 4.5 Is Catalyzing China’s AI Transformation

July 8, 2025

Why Your Company Needs Flexible Capital (and How to Get It)

July 8, 2025

Latest Posts

I Want AI In My Business In The Best Way

July 7, 2025

2 Simple Strategies to Save More on Prime Day 2025

July 7, 2025

Sisters’ Side Hustle Leads to Hundreds of Millions of Dollars

July 7, 2025

These Startups Are Building Advanced AI Models Without Data Centers

July 7, 2025

Today’s ‘Wordle’ #1478 Hints, Clues And Answer For Sunday, July 6th

July 6, 2025
Advertisement
Demo

Startup Dreamers is your one-stop website for the latest news and updates about how to start a business, follow us now to get the news that matters to you.

Facebook Twitter Instagram Pinterest YouTube
Sections
  • Growing a Business
  • Innovation
  • Leadership
  • Money & Finance
  • Starting a Business
Trending Topics
  • Branding
  • Business Ideas
  • Business Models
  • Business Plans
  • Fundraising

Subscribe to Updates

Get the latest business and startup news and updates directly to your inbox.

© 2025 Startup Dreamers. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.

GET $5000 NO CREDIT