Без опису

marcelovicentegc 54fc5ffaa2 feat: add cli 2 роки тому
containerapp 1ad5da74ef fixing output.json extraction from container 2 роки тому
src 54fc5ffaa2 feat: add cli 2 роки тому
.DS_Store e2d3f7f089 commit one 2 роки тому
.dockerignore a855e667de begin 2 роки тому
.gitignore 54fc5ffaa2 feat: add cli 2 роки тому
Dockerfile 54fc5ffaa2 feat: add cli 2 роки тому
LICENSE a5c833b195 Create LICENSE 2 роки тому
README.md 54fc5ffaa2 feat: add cli 2 роки тому
bun.lockb 54fc5ffaa2 feat: add cli 2 роки тому
config.ts 54fc5ffaa2 feat: add cli 2 роки тому
package-lock.json 803f802f25 update package-lock 2 роки тому
package.json 54fc5ffaa2 feat: add cli 2 роки тому
tsconfig.json 54fc5ffaa2 feat: add cli 2 роки тому

README.md

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT from one or multiple URLs

Gif showing the crawl run

Example

Here is a custom GPT that I quickly made to help answer questions about how to use and integrate Builder.io by simply providing the URL to the Builder docs.

This project crawled the docs and generated the file that I uploaded as the basis for the custom GPT.

Try it out yourself by asking questions about how to integrate Builder.io into a site.

Note that you may need a paid ChatGPT plan to access this feature

Get started

Install

npm i -g @builder.io/gpt-crawler

Run

gpt-crawler --url https://www.builder.io/c/docs/developers --match https://www.builder.io/c/docs/** --selector .docs-builder-container --maxPagesToCrawl 50 --outputFileName output.json

Upload your data to OpenAI

The crawl will generate a file called output.json at the root of this project. Upload that to OpenAI to create your custom assistant or custom GPT.

Create a custom GPT

Use this option for UI access to your generated knowledge that you can easily share with others

Note: you may need a paid ChatGPT plan to create and use custom GPTs right now

  1. Go to https://chat.openai.com/
  2. Click your name in the bottom left corner
  3. Choose "My GPTs" in the menu
  4. Choose "Create a GPT"
  5. Choose "Configure"
  6. Under "Knowledge" choose "Upload a file" and upload the file you generated

Gif of how to upload a custom GPT

Create a custom assistant

Use this option for API access to your generated knowledge that you can integrate into your product.

  1. Go to https://platform.openai.com/assistants
  2. Click "+ Create"
  3. Choose "upload" and upload the file you generated

Gif of how to upload to an assistant

(Alternate method) Running in a container with Docker

To obtain the output.json with a containerized execution. Go into the containerapp directory. Modify the config.ts same as above, the output.jsonfile should be generated in the data folder. Note : the outputFileName property in the config.ts file in containerapp folder is configured to work with the container.

Contributing

Know how to make this project better? Send a PR!

Get started developing

Prerequisites

Be sure you have Node.js >= 16 installed along with bun

Clone the repo

git clone https://github.com/builderio/gpt-crawler

Install Dependencies

bun i

Running GPT Crawler with a hardcoded configuration file

Open hardcoded.ts and edit the url, match and selectors properties to match your needs.

E.g. to crawl the Builder.io docs to make our custom GPT you can use:

export const config: Config = {
  url: "https://www.builder.io/c/docs/developers",
  match: "https://www.builder.io/c/docs/**",
  selector: `.docs-builder-container`,
  maxPagesToCrawl: 50,
  outputFileName: "output.json",
};

See the top of the config.ts file for the type definition for what you can configure:

type Config = {
  /** URL to start the crawl */
  url: string;
  /** Pattern to match against for links on a page to subsequently crawl */
  match: string;
  /** Selector to grab the inner text from */
  selector: string;
  /** Don't crawl more than this many pages */
  maxPagesToCrawl: number;
  /** File name for the finished data */
  outputFileName: string;
  /** Optional function to run for each page found */
  onVisitPage?: (options: {
    page: Page;
    pushData: (data: any) => Promise<void>;
  }) => Promise<void>;
  /** Optional timeout for waiting for a selector to appear */
  waitForSelectorTimeout?: number;
};

Run your crawler

bun start



Made with love by Builder.io