diff --git a/.env.example b/.env.example index 77bb5630e..778ad5519 100644 --- a/.env.example +++ b/.env.example @@ -1,6 +1,6 @@ OPENAI_API_KEY= -# Update these with your Supabase details from your project settings > API +# Update these with your Supabase details from your project settings > API and dashboard settings PINECONE_API_KEY= PINECONE_ENVIRONMENT= - +PINECONE_INDEX_NAME= diff --git a/README.md b/README.md index e4d796977..5696be598 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,38 @@ -# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Docs +# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Files -Use the new GPT-4 api to build a chatGPT chatbot for Large PDF docs (56 pages used in this example). +Use the new GPT-4 api to build a chatGPT chatbot for multiple Large PDF files. Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next.js. LangChain is a framework that makes it easier to build scalable AI/LLM apps and chatbots. Pinecone is a vectorstore for storing embeddings and your PDF in text to later retrieve similar docs. [Tutorial video](https://www.youtube.com/watch?v=ih9PBGVVOO4) -[Get in touch via twitter if you have questions](https://twitter.com/mayowaoshin) +[Join the discord if you have questions](https://discord.gg/E4Mc77qwjm) The visual guide of this repo and tutorial is in the `visual guide` folder. **If you run into errors, please review the troubleshooting section further down this page.** +Prelude: Please make sure you have already downloaded node on your system and the version is 18 or greater. + ## Development -1. Clone the repo +1. Clone the repo or download the ZIP ``` git clone [github https url] ``` + 2. Install packages +First run `npm install yarn -g` to install yarn globally (if you haven't already). + +Then run: + ``` -pnpm install +yarn install ``` +After installation, you should now see a `node_modules` folder. 3. Set up your `.env` file @@ -37,28 +45,30 @@ OPENAI_API_KEY= PINECONE_API_KEY= PINECONE_ENVIRONMENT= +PINECONE_INDEX_NAME= + ``` - Visit [openai](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key) to retrieve API keys and insert into your `.env` file. -- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys. +- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard. -4. In the `config` folder, replace the `PINECONE_INDEX_NAME` and `PINECONE_NAME_SPACE` with your own details from your pinecone dashboard. +4. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `npm run ingest`. This namespace will later be used for queries and retrieval. -5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAIChat` to a different api model if you don't have access to `gpt-4`. See [the OpenAI docs](https://platform.openai.com/docs/models/model-endpoint-compatibility) for a list of supported `modelName`s. For example you could use `gpt-3.5-turbo` if you do not have access to `gpt-4`, yet. +5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAI` to `gpt-4`, if you have access to `gpt-4` api. Please verify outside this repo that you have access to `gpt-4` api, otherwise the application will not work. -## Convert your PDF to embeddings +## Convert your PDF files to embeddings -1. In `docs` folder replace the pdf with your own pdf doc. +**This repo can load multiple PDF files** -2. In `scripts/ingest-data.ts` replace `filePath` with `docs/{yourdocname}.pdf` +1. Inside `docs` folder, add your pdf files or folders that contain pdf files. -3. Run the script `npm run ingest` to 'ingest' and embed your docs +2. Run the script `npm run ingest` to 'ingest' and embed your docs. If you run into errors troubleshoot below. -4. Check Pinecone dashboard to verify your namespace and vectors have been added. +3. Check Pinecone dashboard to verify your namespace and vectors have been added. ## Run the app -Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment and then type a question in the chat interface. +Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment, and then type a question in the chat interface. ## Troubleshooting @@ -67,18 +77,22 @@ In general, keep an eye out in the `issues` and `discussions` section of this re **General errors** - Make sure you're running the latest Node version. Run `node -v` +- Try a different PDF or convert your PDF to text first. It's possible your PDF is corrupted, scanned, or requires OCR to convert to text. +- `Console.log` the `env` variables and make sure they are exposed. - Make sure you're using the same versions of LangChain and Pinecone as this repo. -- Check that you've created an `.env` file that contains your valid (and working) API keys. -- If you change `modelName` in `OpenAIChat` note that the correct name of the alternative model is `gpt-3.5-turbo` -- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter. +- Check that you've created an `.env` file that contains your valid (and working) API keys, environment and index name. +- If you change `modelName` in `OpenAI`, make sure you have access to the api for the appropriate model. +- Make sure you have enough OpenAI credits and a valid card on your billings account. +- Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local `env` file from the project will be overwritten by systems `env` variable. +- Try to hard code your API keys into the `process.env` variables if there are still issues. **Pinecone errors** -- Make sure your pinecone dashboard `environment` and `index` matches the one in your `config` folder. +- Make sure your pinecone dashboard `environment` and `index` matches the one in the `pinecone.ts` and `.env` files. - Check that you've set the vector dimensions to `1536`. -- Switch your Environment in pinecone to `us-east1-gcp` if the other environment is causing issues. - -If you're stuck after trying all these steps, delete `node_modules`, restart your computer, then `pnpm install` again. +- Make sure your pinecone namespace is in lowercase. +- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter before 7 days. +- Retry from scratch with a new Pinecone project, index, and cloned repo. ## Credit diff --git a/components/layout.tsx b/components/layout.tsx index 4481b4dcb..5e3d20700 100644 --- a/components/layout.tsx +++ b/components/layout.tsx @@ -14,7 +14,7 @@ export default function Layout({ children }: LayoutProps) { -
{error}
+