# GPT Crawler Crawl a site to generate knowledge files to create your own custom GPT from one or multiple URLs ![Gif showing the crawl run](https://github.com/BuilderIO/gpt-crawler/assets/844291/feb8763a-152b-4708-9c92-013b5c70d2f2) - [Example](#example) - [Get started](#get-started) - [Running locally](#running-locally) - [Clone the repository](#clone-the-repository) - [Install dependencies](#install-dependencies) - [Configure the crawler](#configure-the-crawler) - [Run your crawler](#run-your-crawler) - [Alternative methods](#alternative-methods) - [Running in a container with Docker](#running-in-a-container-with-docker) - [Running as a CLI](#running-as-a-cli) - [Development](#development) - [Upload your data to OpenAI](#upload-your-data-to-openai) - [Create a custom GPT](#create-a-custom-gpt) - [Create a custom assistant](#create-a-custom-assistant) - [Contributing](#contributing) ## Example [Here is a custom GPT](https://chat.openai.com/g/g-kywiqipmR-builder-io-assistant) that I quickly made to help answer questions about how to use and integrate [Builder.io](https://www.builder.io) by simply providing the URL to the Builder docs. This project crawled the docs and generated the file that I uploaded as the basis for the custom GPT. [Try it out yourself](https://chat.openai.com/g/g-kywiqipmR-builder-io-assistant) by asking questions about how to integrate Builder.io into a site. > Note that you may need a paid ChatGPT plan to access this feature ## Get started ### Running locally #### Clone the repository Be sure you have Node.js >= 16 installed. ```sh git clone https://github.com/builderio/gpt-crawler ``` #### Install dependencies ```sh npm i ``` #### Configure the crawler Open [config.ts](config.ts) and edit the `url` and `selectors` properties to match your needs. E.g. to crawl the Builder.io docs to make our custom GPT you can use: ```ts export const defaultConfig: Config = { url: "https://www.builder.io/c/docs/developers", match: "https://www.builder.io/c/docs/**", selector: `.docs-builder-container`, maxPagesToCrawl: 50, outputFileName: "output.json", }; ``` See [config.ts](src/config.ts) for all available options. Here is a sample of the common configu options: ```ts type Config = { /** URL to start the crawl */ url: string; /** Pattern to match against for links on a page to subsequently crawl */ match: string; /** Selector to grab the inner text from */ selector: string; /** Don't crawl more than this many pages */ maxPagesToCrawl: number; /** File name for the finished data */ outputFileName: string; }; ``` #### Run your crawler ```sh npm start ``` ### Alternative methods #### [Running in a container with Docker](./containerapp/README.md) To obtain the `output.json` with a containerized execution. Go into the `containerapp` directory. Modify the `config.ts` same as above, the `output.json`file should be generated in the data folder. Note : the `outputFileName` property in the `config.ts` file in containerapp folder is configured to work with the container. ### Upload your data to OpenAI The crawl will generate a file called `output.json` at the root of this project. Upload that [to OpenAI](https://platform.openai.com/docs/assistants/overview) to create your custom assistant or custom GPT. #### Create a custom GPT Use this option for UI access to your generated knowledge that you can easily share with others > Note: you may need a paid ChatGPT plan to create and use custom GPTs right now 1. Go to [https://chat.openai.com/](https://chat.openai.com/) 2. Click your name in the bottom left corner 3. Choose "My GPTs" in the menu 4. Choose "Create a GPT" 5. Choose "Configure" 6. Under "Knowledge" choose "Upload a file" and upload the file you generated ![Gif of how to upload a custom GPT](https://github.com/BuilderIO/gpt-crawler/assets/844291/22f27fb5-6ca5-4748-9edd-6bcf00b408cf) #### Create a custom assistant Use this option for API access to your generated knowledge that you can integrate into your product. 1. Go to [https://platform.openai.com/assistants](https://platform.openai.com/assistants) 2. Click "+ Create" 3. Choose "upload" and upload the file you generated ![Gif of how to upload to an assistant](https://github.com/BuilderIO/gpt-crawler/assets/844291/06e6ad36-e2ba-4c6e-8d5a-bf329140de49) ## Contributing Know how to make this project better? Send a PR!

Made with love by Builder.io