diff --git a/.env.example b/.env.example index 77bb5630e..778ad5519 100644 --- a/.env.example +++ b/.env.example @@ -1,6 +1,6 @@ OPENAI_API_KEY= -# Update these with your Supabase details from your project settings > API +# Update these with your Supabase details from your project settings > API and dashboard settings PINECONE_API_KEY= PINECONE_ENVIRONMENT= - +PINECONE_INDEX_NAME= diff --git a/README.md b/README.md index e4d796977..5696be598 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,38 @@ -# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Docs +# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Files -Use the new GPT-4 api to build a chatGPT chatbot for Large PDF docs (56 pages used in this example). +Use the new GPT-4 api to build a chatGPT chatbot for multiple Large PDF files. Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next.js. LangChain is a framework that makes it easier to build scalable AI/LLM apps and chatbots. Pinecone is a vectorstore for storing embeddings and your PDF in text to later retrieve similar docs. [Tutorial video](https://www.youtube.com/watch?v=ih9PBGVVOO4) -[Get in touch via twitter if you have questions](https://twitter.com/mayowaoshin) +[Join the discord if you have questions](https://discord.gg/E4Mc77qwjm) The visual guide of this repo and tutorial is in the `visual guide` folder. **If you run into errors, please review the troubleshooting section further down this page.** +Prelude: Please make sure you have already downloaded node on your system and the version is 18 or greater. + ## Development -1. Clone the repo +1. Clone the repo or download the ZIP ``` git clone [github https url] ``` + 2. Install packages +First run `npm install yarn -g` to install yarn globally (if you haven't already). + +Then run: + ``` -pnpm install +yarn install ``` +After installation, you should now see a `node_modules` folder. 3. Set up your `.env` file @@ -37,28 +45,30 @@ OPENAI_API_KEY= PINECONE_API_KEY= PINECONE_ENVIRONMENT= +PINECONE_INDEX_NAME= + ``` - Visit [openai](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key) to retrieve API keys and insert into your `.env` file. -- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys. +- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard. -4. In the `config` folder, replace the `PINECONE_INDEX_NAME` and `PINECONE_NAME_SPACE` with your own details from your pinecone dashboard. +4. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `npm run ingest`. This namespace will later be used for queries and retrieval. -5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAIChat` to a different api model if you don't have access to `gpt-4`. See [the OpenAI docs](https://platform.openai.com/docs/models/model-endpoint-compatibility) for a list of supported `modelName`s. For example you could use `gpt-3.5-turbo` if you do not have access to `gpt-4`, yet. +5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAI` to `gpt-4`, if you have access to `gpt-4` api. Please verify outside this repo that you have access to `gpt-4` api, otherwise the application will not work. -## Convert your PDF to embeddings +## Convert your PDF files to embeddings -1. In `docs` folder replace the pdf with your own pdf doc. +**This repo can load multiple PDF files** -2. In `scripts/ingest-data.ts` replace `filePath` with `docs/{yourdocname}.pdf` +1. Inside `docs` folder, add your pdf files or folders that contain pdf files. -3. Run the script `npm run ingest` to 'ingest' and embed your docs +2. Run the script `npm run ingest` to 'ingest' and embed your docs. If you run into errors troubleshoot below. -4. Check Pinecone dashboard to verify your namespace and vectors have been added. +3. Check Pinecone dashboard to verify your namespace and vectors have been added. ## Run the app -Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment and then type a question in the chat interface. +Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment, and then type a question in the chat interface. ## Troubleshooting @@ -67,18 +77,22 @@ In general, keep an eye out in the `issues` and `discussions` section of this re **General errors** - Make sure you're running the latest Node version. Run `node -v` +- Try a different PDF or convert your PDF to text first. It's possible your PDF is corrupted, scanned, or requires OCR to convert to text. +- `Console.log` the `env` variables and make sure they are exposed. - Make sure you're using the same versions of LangChain and Pinecone as this repo. -- Check that you've created an `.env` file that contains your valid (and working) API keys. -- If you change `modelName` in `OpenAIChat` note that the correct name of the alternative model is `gpt-3.5-turbo` -- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter. +- Check that you've created an `.env` file that contains your valid (and working) API keys, environment and index name. +- If you change `modelName` in `OpenAI`, make sure you have access to the api for the appropriate model. +- Make sure you have enough OpenAI credits and a valid card on your billings account. +- Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local `env` file from the project will be overwritten by systems `env` variable. +- Try to hard code your API keys into the `process.env` variables if there are still issues. **Pinecone errors** -- Make sure your pinecone dashboard `environment` and `index` matches the one in your `config` folder. +- Make sure your pinecone dashboard `environment` and `index` matches the one in the `pinecone.ts` and `.env` files. - Check that you've set the vector dimensions to `1536`. -- Switch your Environment in pinecone to `us-east1-gcp` if the other environment is causing issues. - -If you're stuck after trying all these steps, delete `node_modules`, restart your computer, then `pnpm install` again. +- Make sure your pinecone namespace is in lowercase. +- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter before 7 days. +- Retry from scratch with a new Pinecone project, index, and cloned repo. ## Credit diff --git a/components/layout.tsx b/components/layout.tsx index 4481b4dcb..5e3d20700 100644 --- a/components/layout.tsx +++ b/components/layout.tsx @@ -14,7 +14,7 @@ export default function Layout({ children }: LayoutProps) { -
+
{children}
diff --git a/config/pinecone.ts b/config/pinecone.ts index f1851c8da..ce2dadaad 100644 --- a/config/pinecone.ts +++ b/config/pinecone.ts @@ -1,8 +1,12 @@ /** - * Change the index and namespace to your own + * Change the namespace to the namespace on Pinecone you'd like to store your embeddings. */ -const PINECONE_INDEX_NAME = 'langchainjsfundamentals'; +if (!process.env.PINECONE_INDEX_NAME) { + throw new Error('Missing Pinecone index name in .env file'); +} + +const PINECONE_INDEX_NAME = process.env.PINECONE_INDEX_NAME ?? ''; const PINECONE_NAME_SPACE = 'pdf-test'; //namespace is optional for your vectors diff --git a/declarations/pdf-parse.d.ts b/declarations/pdf-parse.d.ts new file mode 100644 index 000000000..5b2ab5020 --- /dev/null +++ b/declarations/pdf-parse.d.ts @@ -0,0 +1,5 @@ +declare module 'pdf-parse/lib/pdf-parse.js' { + import pdf from 'pdf-parse'; + + export default pdf; +} diff --git a/docs/MorseVsFrederick.pdf b/docs/MorseVsFrederick.pdf deleted file mode 100644 index 570f464ff..000000000 Binary files a/docs/MorseVsFrederick.pdf and /dev/null differ diff --git a/package.json b/package.json index 8d10f96c9..82579df5b 100644 --- a/package.json +++ b/package.json @@ -16,11 +16,11 @@ }, "dependencies": { "@microsoft/fetch-event-source": "^2.0.1", - "@pinecone-database/pinecone": "^0.0.10", + "@pinecone-database/pinecone": "0.0.12", "@radix-ui/react-accordion": "^1.1.1", "clsx": "^1.2.1", "dotenv": "^16.0.3", - "langchain": "0.0.33", + "langchain": "0.0.55", "lucide-react": "^0.125.0", "next": "13.2.3", "pdf-parse": "1.1.1", @@ -43,6 +43,9 @@ "tsx": "^3.12.3", "typescript": "^4.9.5" }, + "engines": { + "node": ">=18" + }, "keywords": [ "starter", "gpt4", diff --git a/pages/api/chat.ts b/pages/api/chat.ts index a96464411..b9f41f54d 100644 --- a/pages/api/chat.ts +++ b/pages/api/chat.ts @@ -1,6 +1,6 @@ import type { NextApiRequest, NextApiResponse } from 'next'; -import { OpenAIEmbeddings } from 'langchain/embeddings'; -import { PineconeStore } from 'langchain/vectorstores'; +import { OpenAIEmbeddings } from 'langchain/embeddings/openai'; +import { PineconeStore } from 'langchain/vectorstores/pinecone'; import { makeChain } from '@/utils/makechain'; import { pinecone } from '@/utils/pinecone-client'; import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone'; @@ -11,52 +11,45 @@ export default async function handler( ) { const { question, history } = req.body; + console.log('question', question); + + //only accept post requests + if (req.method !== 'POST') { + res.status(405).json({ error: 'Method not allowed' }); + return; + } + if (!question) { return res.status(400).json({ message: 'No question in the request' }); } // OpenAI recommends replacing newlines with spaces for best results const sanitizedQuestion = question.trim().replaceAll('\n', ' '); - const index = pinecone.Index(PINECONE_INDEX_NAME); - - /* create vectorstore*/ - const vectorStore = await PineconeStore.fromExistingIndex( - index, - new OpenAIEmbeddings({}), - 'text', - PINECONE_NAME_SPACE, //optional - ); - - res.writeHead(200, { - 'Content-Type': 'text/event-stream', - 'Cache-Control': 'no-cache, no-transform', - Connection: 'keep-alive', - }); - - const sendData = (data: string) => { - res.write(`data: ${data}\n\n`); - }; - - sendData(JSON.stringify({ data: '' })); - - //create chain - const chain = makeChain(vectorStore, (token: string) => { - sendData(JSON.stringify({ data: token })); - }); - try { - //Ask a question + const index = pinecone.Index(PINECONE_INDEX_NAME); + + /* create vectorstore*/ + const vectorStore = await PineconeStore.fromExistingIndex( + new OpenAIEmbeddings({}), + { + pineconeIndex: index, + textKey: 'text', + namespace: PINECONE_NAME_SPACE, //namespace comes from your config folder + }, + ); + + //create chain + const chain = makeChain(vectorStore); + //Ask a question using chat history const response = await chain.call({ question: sanitizedQuestion, chat_history: history || [], }); console.log('response', response); - sendData(JSON.stringify({ sourceDocs: response.sourceDocuments })); - } catch (error) { + res.status(200).json(response); + } catch (error: any) { console.log('error', error); - } finally { - sendData('[DONE]'); - res.end(); + res.status(500).json({ error: error.message || 'Something went wrong' }); } } diff --git a/pages/index.tsx b/pages/index.tsx index 0b29b8450..c80830751 100644 --- a/pages/index.tsx +++ b/pages/index.tsx @@ -1,8 +1,7 @@ -import { useRef, useState, useEffect, useMemo } from 'react'; +import { useRef, useState, useEffect } from 'react'; import Layout from '@/components/layout'; import styles from '@/styles/Home.module.css'; import { Message } from '@/types/chat'; -import { fetchEventSource } from '@microsoft/fetch-event-source'; import Image from 'next/image'; import ReactMarkdown from 'react-markdown'; import LoadingDots from '@/components/ui/LoadingDots'; @@ -17,7 +16,7 @@ import { export default function Home() { const [query, setQuery] = useState(''); const [loading, setLoading] = useState(false); - const [sourceDocs, setSourceDocs] = useState([]); + const [error, setError] = useState(null); const [messageState, setMessageState] = useState<{ messages: Message[]; pending?: string; @@ -31,12 +30,9 @@ export default function Home() { }, ], history: [], - pendingSourceDocs: [], }); - const { messages, pending, history, pendingSourceDocs } = messageState; - - console.log('messageState', messageState); + const { messages, history } = messageState; const messageListRef = useRef(null); const textAreaRef = useRef(null); @@ -49,6 +45,8 @@ export default function Home() { async function handleSubmit(e: any) { e.preventDefault(); + setError(null); + if (!query) { alert('Please input a question'); return; @@ -65,17 +63,13 @@ export default function Home() { message: question, }, ], - pending: undefined, })); setLoading(true); setQuery(''); - setMessageState((state) => ({ ...state, pending: '' })); - - const ctrl = new AbortController(); try { - fetchEventSource('/api/chat', { + const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json', @@ -84,42 +78,35 @@ export default function Home() { question, history, }), - signal: ctrl.signal, - onmessage: (event) => { - if (event.data === '[DONE]') { - setMessageState((state) => ({ - history: [...state.history, [question, state.pending ?? '']], - messages: [ - ...state.messages, - { - type: 'apiMessage', - message: state.pending ?? '', - sourceDocs: state.pendingSourceDocs, - }, - ], - pending: undefined, - pendingSourceDocs: undefined, - })); - setLoading(false); - ctrl.abort(); - } else { - const data = JSON.parse(event.data); - if (data.sourceDocs) { - setMessageState((state) => ({ - ...state, - pendingSourceDocs: data.sourceDocs, - })); - } else { - setMessageState((state) => ({ - ...state, - pending: (state.pending ?? '') + data.data, - })); - } - } - }, }); + const data = await response.json(); + console.log('data', data); + + if (data.error) { + setError(data.error); + } else { + setMessageState((state) => ({ + ...state, + messages: [ + ...state.messages, + { + type: 'apiMessage', + message: data.text, + sourceDocs: data.sourceDocuments, + }, + ], + history: [...state.history, [question, data.text]], + })); + } + console.log('messageState', messageState); + + setLoading(false); + + //scroll to bottom + messageListRef.current?.scrollTo(0, messageListRef.current.scrollHeight); } catch (error) { setLoading(false); + setError('An error occurred while fetching the data. Please try again.'); console.log('error', error); } } @@ -133,21 +120,6 @@ export default function Home() { } }; - const chatMessages = useMemo(() => { - return [ - ...messages, - ...(pending - ? [ - { - type: 'apiMessage', - message: pending, - sourceDocs: pendingSourceDocs, - }, - ] - : []), - ]; - }, [messages, pending, pendingSourceDocs]); - return ( <> @@ -158,12 +130,13 @@ export default function Home() {
- {chatMessages.map((message, index) => { + {messages.map((message, index) => { let icon; let className; if (message.type === 'apiMessage') { icon = ( AI -
+
{icon}
@@ -201,14 +175,17 @@ export default function Home() {
{message.sourceDocs && ( -
+
{message.sourceDocs.map((doc, index) => ( -
+

Source {index + 1}

@@ -230,26 +207,6 @@ export default function Home() { ); })} - {sourceDocs.length > 0 && ( -
- - {sourceDocs.map((doc, index) => ( -
- - -

Source {index + 1}

-
- - - {doc.pageContent} - - -
-
- ))} -
-
- )}
@@ -296,9 +253,14 @@ export default function Home() {
+ {error && ( +
+

{error}

+
+ )}
-