Žiadny popis

Steve Sewell f0da00d031 Updates 2 rokov pred
src e07b50d022 updoots 2 rokov pred
.dockerignore a855e667de begin 2 rokov pred
.gitignore a855e667de begin 2 rokov pred
Dockerfile a855e667de begin 2 rokov pred
README.md f0da00d031 Updates 2 rokov pred
config.ts e07b50d022 updoots 2 rokov pred
package-lock.json a855e667de begin 2 rokov pred
package.json f0da00d031 Updates 2 rokov pred
tsconfig.json ae80eb88dd forum 2 rokov pred

README.md

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT

Get started

Prerequisites

Be sure you have Node.js >= 16 installed

Clone the repo

git clone https://github.com/builderio/gpt-crawler

Configure the crawler

Open config.ts and edit the url and selectors properties to match your needs.

E.g. to crawl the Builder.io docs to make our custom GPT you can use:

export const config: Config = {
  url: "https://www.builder.io/c/docs/developers",
  match: "https://www.builder.io/c/docs/**",
  selector: `.docs-builder-container`,
  maxPagesToCrawl: 1000,
  outputFileName: "output.json",
};

See the top of the file for the type definition for what you can configure:

type Config = {
  /** URL to start the crawl */
  url: string;
  /** Pattern to match against for links on a page to subsequently crawl */
  match: string;
  /** Selector to grab the inner text from */
  selector: string;
  /** Don't crawl more than this many pages */
  maxPagesToCrawl: number;
  /** File name for the finished data */
  outputFileName: string;
  /** Optional function to run for each page found */
  onVisitPage?: (options: {
    page: Page;
    pushData: (data: any) => Promise<void>;
  }) => Promise<void>;
};

Run your crawler

npm start

Upload your data to OpenAI

The crawl will generate a file called output.json at the root of this project. Upload that to OpenAI to create your custom GPT or custom GPT.

Contributing

Know how to make this project better? Send a PR!



Made with love by Builder.io