|
|
2 rokov pred | |
|---|---|---|
| src | 2 rokov pred | |
| .dockerignore | 2 rokov pred | |
| .gitignore | 2 rokov pred | |
| Dockerfile | 2 rokov pred | |
| README.md | 2 rokov pred | |
| config.ts | 2 rokov pred | |
| package-lock.json | 2 rokov pred | |
| package.json | 2 rokov pred | |
| tsconfig.json | 2 rokov pred |
Crawl a site to generate knowledge files to create your own custom GPT
Be sure you have Node.js >= 16 installed
git clone https://github.com/builderio/gpt-crawler
Open config.ts and edit the url and selectors properties to match your needs.
E.g. to crawl the Builder.io docs to make our custom GPT you can use:
export const config: Config = {
url: "https://www.builder.io/c/docs/developers",
match: "https://www.builder.io/c/docs/**",
selector: `.docs-builder-container`,
maxPagesToCrawl: 1000,
outputFileName: "output.json",
};
See the top of the file for the type definition for what you can configure:
type Config = {
/** URL to start the crawl */
url: string;
/** Pattern to match against for links on a page to subsequently crawl */
match: string;
/** Selector to grab the inner text from */
selector: string;
/** Don't crawl more than this many pages */
maxPagesToCrawl: number;
/** File name for the finished data */
outputFileName: string;
/** Optional function to run for each page found */
onVisitPage?: (options: {
page: Page;
pushData: (data: any) => Promise<void>;
}) => Promise<void>;
};
npm start
The crawl will generate a file called output.json at the root of this project. Upload that to OpenAI to create your custom assistant or custom GPT.
Use this option for UI access to your generated knowledge that you can easily share with others
Note: you may need a paid ChatGPT plan to create and use custom GPTs right now
Use this option for API access to your generated knowledge that you can integrate into your product.
Know how to make this project better? Send a PR!