Нет описания

Steve Sewell f0da00d031 Updates 2 лет назад
src e07b50d022 updoots 2 лет назад
.dockerignore a855e667de begin 2 лет назад
.gitignore a855e667de begin 2 лет назад
Dockerfile a855e667de begin 2 лет назад
README.md f0da00d031 Updates 2 лет назад
config.ts e07b50d022 updoots 2 лет назад
package-lock.json a855e667de begin 2 лет назад
package.json f0da00d031 Updates 2 лет назад
tsconfig.json ae80eb88dd forum 2 лет назад

README.md

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT

Get started

Prerequisites

Be sure you have Node.js >= 16 installed

Clone the repo

git clone https://github.com/builderio/gpt-crawler

Configure the crawler

Open config.ts and edit the url and selectors properties to match your needs.

E.g. to crawl the Builder.io docs to make our custom GPT you can use:

export const config: Config = {
  url: "https://www.builder.io/c/docs/developers",
  match: "https://www.builder.io/c/docs/**",
  selector: `.docs-builder-container`,
  maxPagesToCrawl: 1000,
  outputFileName: "output.json",
};

See the top of the file for the type definition for what you can configure:

type Config = {
  /** URL to start the crawl */
  url: string;
  /** Pattern to match against for links on a page to subsequently crawl */
  match: string;
  /** Selector to grab the inner text from */
  selector: string;
  /** Don't crawl more than this many pages */
  maxPagesToCrawl: number;
  /** File name for the finished data */
  outputFileName: string;
  /** Optional function to run for each page found */
  onVisitPage?: (options: {
    page: Page;
    pushData: (data: any) => Promise<void>;
  }) => Promise<void>;
};

Run your crawler

npm start

Upload your data to OpenAI

The crawl will generate a file called output.json at the root of this project. Upload that to OpenAI to create your custom GPT or custom GPT.

Contributing

Know how to make this project better? Send a PR!



Made with love by Builder.io