|
|
2 vuotta sitten | |
|---|---|---|
| src | 2 vuotta sitten | |
| .dockerignore | 2 vuotta sitten | |
| .gitignore | 2 vuotta sitten | |
| Dockerfile | 2 vuotta sitten | |
| README.md | 2 vuotta sitten | |
| config.ts | 2 vuotta sitten | |
| package-lock.json | 2 vuotta sitten | |
| package.json | 2 vuotta sitten | |
| tsconfig.json | 2 vuotta sitten |
Crawl a site to generate knowledge files to create your own custom GPT
Be sure you have Node.js >= 16 installed
git clone https://github.com/bridgeproject/gpt-crawler
Open config.ts and edit the url and selectors properties to match your needs.
E.g. to crawl the Builder.io docs to make our custom GPT you can use:
export const config: Config = {
url: "https://www.builder.io/c/docs/developers",
match: "https://www.builder.io/c/docs/**",
selector: `.docs-builder-container`,
maxPagesToCrawl: 1000,
outputFileName: "output.json",
};
See the top of the file for the type definition for what you can configure:
type Config = {
/** URL to start the crawl */
url: string;
/** Pattern to match against for links on a page to subsequently crawl */
match: string;
/** Selector to grab the inner text from */
selector: string;
/** Don't crawl more than this many pages */
maxPagesToCrawl: number;
/** File name for the finished data */
outputFileName: string;
/** Optional function to run for each page found */
onVisitPage?: (options: {
page: Page;
pushData: (data: any) => Promise<void>;
}) => Promise<void>;
};
npm start
The crawl will generate a file called output.json at the root of this project. Upload that to OpenAI to create your custom GPT or custom GPT.
Know how to make this project better? Send a PR!