Brak opisu

Steve Sewell f0da00d031 Updates 2 lat temu
src e07b50d022 updoots 2 lat temu
.dockerignore a855e667de begin 2 lat temu
.gitignore a855e667de begin 2 lat temu
Dockerfile a855e667de begin 2 lat temu
README.md f0da00d031 Updates 2 lat temu
config.ts e07b50d022 updoots 2 lat temu
package-lock.json a855e667de begin 2 lat temu
package.json f0da00d031 Updates 2 lat temu
tsconfig.json ae80eb88dd forum 2 lat temu

README.md

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT

Get started

Prerequisites

Be sure you have Node.js >= 16 installed

Clone the repo

git clone https://github.com/builderio/gpt-crawler

Configure the crawler

Open config.ts and edit the url and selectors properties to match your needs.

E.g. to crawl the Builder.io docs to make our custom GPT you can use:

export const config: Config = {
  url: "https://www.builder.io/c/docs/developers",
  match: "https://www.builder.io/c/docs/**",
  selector: `.docs-builder-container`,
  maxPagesToCrawl: 1000,
  outputFileName: "output.json",
};

See the top of the file for the type definition for what you can configure:

type Config = {
  /** URL to start the crawl */
  url: string;
  /** Pattern to match against for links on a page to subsequently crawl */
  match: string;
  /** Selector to grab the inner text from */
  selector: string;
  /** Don't crawl more than this many pages */
  maxPagesToCrawl: number;
  /** File name for the finished data */
  outputFileName: string;
  /** Optional function to run for each page found */
  onVisitPage?: (options: {
    page: Page;
    pushData: (data: any) => Promise<void>;
  }) => Promise<void>;
};

Run your crawler

npm start

Upload your data to OpenAI

The crawl will generate a file called output.json at the root of this project. Upload that to OpenAI to create your custom GPT or custom GPT.

Contributing

Know how to make this project better? Send a PR!



Made with love by Builder.io