Skip to content

Snowflake-Labs/openflow-unstructured-data-pipeline-demo

Repository files navigation

Unstructured Document Intelligence Demo

Powered by Snowflake Openflow and Cortex

Transform your Google Drive business documents into actionable strategic intelligence with the complete Openflow β†’ Cortex Search β†’ Snowflake Intelligence pipeline.

Complete setup guides, business-focused demos, and step-by-step tutorials

⚠️ IMPORTANT PREREQUISITE: This demo requires Snowflake Openflow, which is currently available only
for Enterprise accounts as BYOC (Bring Your Own Cloud) or SPCS (Snowpark Container Services)
Public Preview
. Contact your Snowflake account team to enable Openflow access.

πŸš€ Executive Summary

This demonstration showcases how Snowflake Intelligence with Cortex Search transforms unstructured business
documents from Google Drive into queryable strategic intelligence through Openflow data processing.

Business Impact: Convert scattered business documents (PDFs, presentations, Word docs, images) into a unified
intelligence platform that executives can query in natural language to make data-driven decisions.

ROI Highlight: Enable non-technical business users to extract insights from documents without SQL knowledge,
reducing time-to-insight by 90% and democratizing access to organizational knowledge.


🌟 What This Demo Provides

Complete Documentation Site

πŸ“– Professional Documentation with:

  • Setup Guides - Prerequisites, database setup, Openflow configuration
  • Business Demos - 4 category-specific presentations ready for different audiences
  • AI Integration - Snowflake Intelligence agent setup for conversational queries
  • Reference Materials - Sample questions, commands, troubleshooting

Business-Ready Demo Categories

Demo Category Target Audience Business Focus
🎯 Strategic Planning C-level executives, board members Investment decisions, market expansion
πŸ”§ Operations Excellence Operations managers, tech leaders Process optimization, modernization
βš–οΈ Compliance & Risk Compliance officers, audit teams Policy enforcement, regulatory adherence
πŸ“š Knowledge Management HR teams, training managers Staff development, knowledge sharing

β†’ Browse All Demo Categories

Competitive Advantages

  • βœ… Multi-Format Processing: PDF, DOCX, PPTX, JPG - all in one platform
  • βœ… Natural Language Queries: "What are our 2025 expansion plans?" vs complex SQL
  • βœ… Enterprise Security: Snowflake's enterprise-grade security for sensitive documents
  • βœ… Scalable Intelligence: Handles thousands of documents with sub-second query response

πŸš€ Quick Start Options

Option 1: Full Documentation Experience (Recommended)

🌐 Browse Complete Site

  • Visual setup guides with screenshots
  • Business-focused demo presentations
  • Copy-pasteable sample queries
  • AI integration tutorials

Option 2: Technical Quick Start

For experienced users who want immediate setup:

# Run the provided setup script
snow -f sql/setup.sql

What the setup script creates:

  • βœ… Role: FESTIVAL_DEMO_ROLE with appropriate permissions
  • βœ… Warehouse: FESTIVAL_DEMO_S for compute resources
  • βœ… Database: OPENFLOW_FESTIVAL_DEMO for data storage
  • βœ… Schema: FESTIVAL_OPS for organized data structure

Alternative - Manual SQL:

-- Or run these commands individually in your Snowflake worksheet
CREATE ROLE IF NOT EXISTS FESTIVAL_DEMO_ROLE;
CREATE WAREHOUSE IF NOT EXISTS FESTIVAL_DEMO_S;
CREATE DATABASE IF NOT EXISTS OPENFLOW_FESTIVAL_DEMO;
CREATE SCHEMA IF NOT EXISTS OPENFLOW_FESTIVAL_DEMO.FESTIVAL_OPS;

Next Steps:

  1. πŸ“‹ Prerequisites - Technical requirements
  2. ⚑ Quick Setup - 15-minute streamlined setup
  3. πŸ”§ Openflow Setup - Connector configuration
  4. 🎯 Demo Categories - Business presentations

πŸ’‘ Sample Capabilities

Once your pipeline is set up, you can ask questions like:

🎯 Strategic: "What are our 2025 expansion strategies and expected ROI?"
πŸ”§ Operations: "Find all technology modernization projects and their budgets"  
βš–οΈ Compliance: "Show me current health and safety policies"
πŸ“š Knowledge: "What training materials are available for staff development?"

β†’ Browse 50+ Sample Questions

πŸ—οΈ What Gets Built

End-to-End Architecture:

graph LR
    A[πŸ“ Google Drive] --> B[πŸ”„ OpenFlow Pipeline]
    B --> C[πŸ” Cortex Search Service]  
    C --> D[πŸ€– Snowflake Intelligence]
    D --> E[πŸ’¬ Natural Language Queries]
    E --> F[πŸ“Š Business Insights]
Loading

Core Components:

  • βœ… Multi-format document processing (PDF, DOCX, PPTX, JPG)
  • βœ… Automated Cortex Search service creation and indexing
  • βœ… Business-focused demo categories for different stakeholders
  • βœ… Optional AI agent for conversational document queries
  • βœ… Production-ready setup with security and authentication

Executive Use Cases Demonstrated

  1. Strategic Planning: "What are our 2025 expansion plans and expected ROI?"

    • Result: Instant access to strategy documents, financial projections, board decisions
  2. Operational Excellence: "Show me all technology modernization projects and their budgets"

    • Result: $2.8M sound system upgrade project with complete business case
  3. Compliance & Risk: "What health and safety policies are currently in effect?"

    • Result: Complete policy documentation with incident analysis
  4. Knowledge Management: "Find all training materials and staff development programs"

    • Result: Cross-department training resources with collaboration insights

Financial Justification

  • Cost Avoidance: Eliminate manual document searching (saves 2-3 hours/week per knowledge worker)
  • Revenue Acceleration: Faster strategic decision-making enables quicker market responses
  • Risk Reduction: Instant compliance documentation access reduces regulatory risks
  • Scalability: Framework supports enterprise-wide knowledge management expansion

πŸ› οΈ For Sales Engineers & Solution Architects

Technical Architecture

πŸ“ Google Shared Drive β†’ πŸ”„ OpenFlow (Google Drive Connector) β†’ 🧠 Cortex Search β†’ πŸ“Š Snowflake Intelligence

Demo Environment Setup

Prerequisites

Google Drive & Google Cloud Requirements:

  • Google Admin Access: Super Admin permissions for your organization
  • Google Cloud Project with Organization Policy Administrator & Organization Administrator roles
  • Service Account Setup:
    • Enable service account key creation (disabled by default)
    • Create service account with JSON key download
    • Configure domain-wide delegation with 6 required OAuth scopes
    • Full setup guide: Google Drive connector documentation

Snowflake Requirements:

  • Account: Enterprise Snowflake account in AWS Commercial Regions
  • Openflow: Available only for Enterprise accounts as BYOC (Bring Your Own Cloud) or SPCS
    (Snowpark Container Services) Public Preview
    • This demo requires Snowflake Openflow which is currently in Public Preview
    • Contact your Snowflake account team to enable Openflow access
  • Service User: SERVICE type user with key-pair authentication
  • Secrets Manager: AWS/Azure/HashiCorp recommended for production
  • Cortex Search: Enabled for document intelligence queries

Quick Start (5 minutes):

  1. Clone repository (all document formats included)
  2. Create Google Drive structure: Use Google Apps Script for automated folder creation
  3. Upload 16 demo documents per folder structure
  4. Configure Openflow Google Drive connector
  5. Execute natural language queries

Document Collection Overview

16 Multi-Format Business Documents:

  • 3 PDF: Formal contracts, policies, financial reports
  • 2 PPTX: Training materials, executive presentations
  • 2 DOCX: Collaborative meeting minutes, project documentation
  • 9 JPG: Visual operational manuals, strategic planning diagrams

4 Business Intelligence Categories:

  1. Strategic & Executive Intelligence (25%)
  2. Operations Excellence & Technology (25%)
  3. Compliance & Risk Management (25%)
  4. Knowledge Management & Training (25%)

Sample Demo Queries

Natural language queries for Snowflake Intelligence after Openflow-Cortex Search integration:

-- Strategic Intelligence
"What are our 2025 expansion plans across all document formats?"

-- Operational Excellence  
"Find all technology modernization projects and their business cases"

-- Cross-Format Analysis
"Show me comprehensive insights across all 16 documents - what patterns emerge?"

These queries become available once Google Drive documents are processed through the Openflow
data pipeline and indexed by Cortex Search for intelligent document retrieval.

Technical Implementation Details

  • Multi-Format Processing: Handles PDF text extraction, PPTX content parsing, DOCX collaboration data, JPG OCR
  • Metadata Extraction: Document authors, versions, collaboration patterns, business categories
  • Natural Language Processing: Cortex Search enables business user queries without SQL knowledge
  • Scalability: Architecture supports thousands of documents with enterprise-grade performance

πŸ“ Repository Structure & Resources

πŸ“Š Analytics & Demo Materials

πŸ“„ Sample Business Documents

πŸ› οΈ Technical Resources

🎯 Quick Navigation

Audience Start Here Key Resource
Product Marketing Business Use Cases Analytics Overview
Executives Business Impact Demo Execution Guide
Technical Teams Architecture Document Collection

πŸ“š Documentation & Resources

Complete Documentation Site

🌐 Snowflake-labs.GitHub.io/Openflow-unstructured-data-pipeline-demo

The documentation site provides:

  • πŸ”§ Technical Setup - Prerequisites, database setup, connector configuration
  • 🎯 Business Demos - Ready-to-present category demonstrations
  • πŸ€– AI Integration - Snowflake Intelligence agent setup
  • πŸ“‹ Reference Guides - Commands, sample questions, troubleshooting

Key Documentation Sections

For Business Analysis

  1. Explore Document Intelligence Analysis for analytics
    opportunities
  2. Review Sample Queries
    for business scenarios
  3. Understand
    Business Value Propositions
    for stakeholder conversations

For Technical Implementation

  1. Review Architecture and technical requirements
  2. Examine Sample Documents for data understanding
  3. Configure Cortex Search Service per Snowflake requirements
  4. Implement Openflow Google Drive connector with provided document collection

πŸ“ˆ Business Value Summary

Metric Current State With Document Intelligence Improvement
Document Search Time 30-60 minutes manual search 5-second natural language query 90% reduction
Cross-Format Analysis Impossible without manual review Instant insights across all formats New capability
Executive Insight Access Requires IT/analyst support Self-service natural language queries 100% democratization
Compliance Documentation Hours of manual document location Instant policy and regulation access 95% time savings

🎯 Next Steps

Immediate Actions

  1. Schedule Demo: Contact solution architect for live demonstration
  2. Assess Documents: Identify your organization's Google Drive document collection
  3. Plan Implementation: Review technical requirements and integration points
  4. ROI Planning: Calculate business value using provided metrics and your document volumes

Enterprise Implementation

  1. Pilot Program: Start with one department's document collection
  2. Scale Planning: Design enterprise-wide document intelligence architecture
  3. Integration Strategy: Connect with existing business systems and workflows
  4. Change Management: Train business users on natural language query capabilities

🎯 Get Started Today

Ready to transform your unstructured business documents into strategic intelligence?

  1. πŸ“– Start with Prerequisites
  2. ⚑ Follow Quick Setup
  3. 🎯 Run Your First Demo
  4. πŸ€– Add AI Integration

πŸ“ž Support & Resources


⚠️ Note: This demo uses synthetic festival operations data for demonstration purposes. All business scenarios, names, and data are fictional and created specifically for showcasing Snowflake capabilities.

License

Copyright (c) Snowflake Inc. All rights reserved. Licensed under the Apache 2.0 license.

Releases

No releases published

Packages

No packages published