Ei kuvausta

Steve Sewell e07b50d022 updoots 2 vuotta sitten
src e07b50d022 updoots 2 vuotta sitten
.dockerignore a855e667de begin 2 vuotta sitten
.gitignore a855e667de begin 2 vuotta sitten
Dockerfile a855e667de begin 2 vuotta sitten
README.md e07b50d022 updoots 2 vuotta sitten
config.ts e07b50d022 updoots 2 vuotta sitten
package-lock.json a855e667de begin 2 vuotta sitten
package.json e07b50d022 updoots 2 vuotta sitten
tsconfig.json ae80eb88dd forum 2 vuotta sitten

README.md

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT

Get started

Prerequisites

Be sure you have Node.js >= 16 installed

Clone the repo

git clone https://github.com/bridgeproject/gpt-crawler

Configure the crawler

Open config.ts and edit the url and selectors properties to match your needs.

E.g. to crawl the Builder.io docs to make our custom GPT you can use:

export const config: Config = {
  url: "https://www.builder.io/c/docs/developers",
  match: "https://www.builder.io/c/docs/**",
  selector: `.docs-builder-container`,
  maxPagesToCrawl: 1000,
  outputFileName: "output.json",
};

See the top of the file for the type definition for what you can configure:

type Config = {
  /** URL to start the crawl */
  url: string;
  /** Pattern to match against for links on a page to subsequently crawl */
  match: string;
  /** Selector to grab the inner text from */
  selector: string;
  /** Don't crawl more than this many pages */
  maxPagesToCrawl: number;
  /** File name for the finished data */
  outputFileName: string;
  /** Optional function to run for each page found */
  onVisitPage?: (options: {
    page: Page;
    pushData: (data: any) => Promise<void>;
  }) => Promise<void>;
};

Run your crawler

npm start

Upload your data to OpenAI

The crawl will generate a file called output.json at the root of this project. Upload that to OpenAI to create your custom GPT or custom GPT.

Contributing

Know how to make this project better? Send a PR!



Made with love by Builder.io