Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Trending

When To See A Dramatic ‘Planet Parade’ This Week As Worlds Align

September 14, 2025

Want to Retire One Day? Avoid 3 Common Retirement Mistakes

September 14, 2025

Why Steve Aoki is Backing Brain-Boosting Gum Brand

September 14, 2025
Facebook Twitter Instagram
  • Newsletter
  • Submit Articles
  • Privacy
  • Advertise
  • Contact
Facebook Twitter Instagram
Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Subscribe for Alerts
Startup DreamersStartup Dreamers
Home » Why The Future Of Generative AI Lies In A Company’s Own Data
Innovation

Why The Future Of Generative AI Lies In A Company’s Own Data

adminBy adminOctober 17, 20230 ViewsNo Comments5 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email

Alex Ratner is the co-founder and CEO at Snorkel AI, and an Affiliate Assistant Professor of Comput. Sci. at the University of Washington.

The age of large language models (LLMs) and generative AI has sparked excitement for business leaders. But those who want to launch their own LLM face many hurdles between wanting a production generative AI tool and developing one that delivers real business value and sustained advantage.

Foundation models themselves have quickly become commoditized. Any developer can build on the Google Bard or OpenAI APIs. More mature organizations can deploy models like Llama 2 in their own walled gardens. But if their competitors also use Llama 2, what advantage do they have? Proprietary data—and the knowledge of how to develop and use it—provides the only sustainable enterprise AI moat.

As generative AI has reached the peak of the Gartner AI Hype Cycle, enterprises are learning that off-the-shelf LLMs can’t solve every problem—particularly not unique, high-value problems. Proprietary data can close the gap, but only when properly curated and developed.

Your Data, Your Moat

Off-the-shelf LLMs yield fun experiments and demos, but using them in a business setting will rarely achieve the accuracy needed to deliver business value. Businesses don’t need chatbots that can discuss poetry as competently as they explain computer code—they need highly accurate specialists.

Private data is a moat—a potential competitive advantage. By leveraging your proprietary data and subject matter expertise, you can build generative models that work better for your domain, your chosen tasks and your customers.

Enterprises can gain these advantages in three ways:

1. Retrieval augmentation.

2. Fine-tuning with prompts and responses.

3. Self-supervised pre-training.

Let’s briefly look at each.

Retrieval Augmentation

Retrieval augmented generation, better known as RAG, allows your generative AI pipeline to enrich prompts with query-specific knowledge from a company’s proprietary databases or document archives. This generally yields better, more accurate answers, even with a standard LLM.

But this is similar to giving an intern access to your intranet: Even with all the information before them, the intern may misunderstand or miscommunicate. To get better performance, your data team needs a customized model.

Fine-Tuning With Prompts And Responses

Data-driven organizations can fine-tune LLMs with curated prompts and responses. This sharpens and improves the model’s output on the organization’s most important tasks. To use a metaphor, a doctor needs access to a patient’s medical chart (retrieval augmentation) and specialty training (fine-tuning) to render an accurate diagnosis. Data scientists can carefully choose the prompts and responses used to fine-tune the LLM to improve performance on a wide variety of tasks or greatly boost performance on a very narrow set of them.

Self-Supervised Pre-Training

Some organizations may want to take their LLM customization further and build a model from scratch. However, this can demand more effort than it’s worth. Firms with business vocabularies well-represented in the embedding spaces of off-the-shelf LLMs can often achieve the necessary performance gains through fine-tuning alone.

If an organization feels that it needs a model custom-built from the ground up, its data team first selects a model architecture and then trains it on unstructured text—initially on a large, generalized corpus, then on proprietary data. This teaches the model to understand the relationships between words in a way that’s specific to the company’s domain, history, positioning and products. The data team can then further train the model on prompts and responses to make it not just knowledgeable but task-oriented.

The Data Lift

The ideal deployment would incorporate all three of the above approaches, but that represents a heavy data labeling load. Studies from McKinsey and Appen show that a lack of high-quality labeled data blocks enterprise AI projects more often than any other factor—and, to be clear, all three of these approaches require labeled data.

Fine-tuning with prompts and responses requires data teams to identify and label prompts according to the task and then determine high-quality responses. Pre-training with self-supervised learning requires companies to carefully curate the unstructured data they feed the model. Training on lunch orders and payroll could degrade performance or cause sensitive data to leak internally.

Even retrieval augmentation benefits from data labeling. Although vector databases efficiently handle relevance metrics, they won’t know if a retrieved document is accurate and up to date. No company wants its internal chatbot to return out-of-date prices or recommend discontinued products.

Data Is Essential To Delivering Generative AI Value

Using your proprietary data to build your AI moat requires work, and that work rests heavily on data-centric approaches including data labeling and curation. Firms can outsource some labeling to crowd workers, but much of it will be too complex, specialized or sensitive for gig workers to handle. And it’s still time-consuming and expensive, similar to relying on internal experts for data labeling.

Data science teams can use active learning approaches such as label spreading to amplify the impact of internal labelers. Programmatic labeling is another option. Our researchers used those tools to build a better LLM.

Your data—properly prepared—is the most important thing your organization brings to AI and where your organization should spend the most time to extract the most value. Your data is your moat. Use it.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

When To See A Dramatic ‘Planet Parade’ This Week As Worlds Align

Innovation September 14, 2025

UFC Cuts Ties With Hard-Luck Former TUF Finalist

Innovation September 13, 2025

We Are At Acute Agency Decay Amid AI. 4 Ways To Preserve Your Brain

Innovation September 12, 2025

49ers Brock Purdy May Miss Week 2 With Toe And Shoulder Injuries

Innovation September 11, 2025

Today’s NYT Mini Crossword Clues And Answers For Wednesday, September 10th

Innovation September 10, 2025

Why ProSocial AI Is The New ESG

Innovation September 9, 2025
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

When To See A Dramatic ‘Planet Parade’ This Week As Worlds Align

September 14, 2025

Want to Retire One Day? Avoid 3 Common Retirement Mistakes

September 14, 2025

Why Steve Aoki is Backing Brain-Boosting Gum Brand

September 14, 2025

I Founded a $1.7 Billion Business. Here’s My Success Secret.

September 14, 2025

UFC Cuts Ties With Hard-Luck Former TUF Finalist

September 13, 2025

Latest Posts

We Are At Acute Agency Decay Amid AI. 4 Ways To Preserve Your Brain

September 12, 2025

Why the Future of Finance Won’t Be Built on Innovation Alone

September 12, 2025

Can Startup Founders Become Great CEOs?

September 12, 2025

The Doomers Who Insist AI Will Kill Us All

September 12, 2025

49ers Brock Purdy May Miss Week 2 With Toe And Shoulder Injuries

September 11, 2025
Advertisement
Demo

Startup Dreamers is your one-stop website for the latest news and updates about how to start a business, follow us now to get the news that matters to you.

Facebook Twitter Instagram Pinterest YouTube
Sections
  • Growing a Business
  • Innovation
  • Leadership
  • Money & Finance
  • Starting a Business
Trending Topics
  • Branding
  • Business Ideas
  • Business Models
  • Business Plans
  • Fundraising

Subscribe to Updates

Get the latest business and startup news and updates directly to your inbox.

© 2025 Startup Dreamers. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.

GET $5000 NO CREDIT