A command line application that allows users to ask questions about their DuckDB data. The application leverages Groq API's JSON mode to generate SQL queries based on the user's questions and execute them on a DuckDB database.
Text-to-SQL: The application uses natural language processing to convert user questions into SQL queries, making it easy for users to query their data without knowing SQL.
JSON mode: A feature which enables the LLM to respond strictly in a structured JSON output, provided we supply it with the desired format
Data Summarization: After executing a SQL query, the application uses the AI to summarize the resulting data in relation to the user's original question.
The application queries data from two CSV files located in the data folder:
employees.csv: Contains employee data including their ID, full name, and email address.
purchases.csv: Records purchase details including purchase ID, date, associated employee ID, amount, and product name.
The base prompt for the AI is stored in a text file in the prompts folder:
base_prompt.txtA well-crafted system prompt is essential for building a functional Text-to-SQL application. Ours will serve 3 purposes:
chat_with_groq(): Sends a prompt to the Groq API and returns the AI's response.execute_duckdb_query(): Executes a SQL query on a DuckDB database and returns the result.get_summarization(): Generates a prompt for the AI to summarize the data resulting from a SQL query.You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free here.
You can fork and run this application on Replit or run it on the command line with python main.py.
This application is designed to be flexible and can be easily customized to work with your own data. If you want to use your own data, follow these steps:
Replace the CSV files: The application queries data from two CSV files located in the data folder: employees.csv and purchases.csv. Replace these files with your own CSV files.
Modify the base prompt: The base prompt for the AI, stored in the prompts folder as base_prompt.txt, contains specific information about the data metadata. Modify this prompt to match the structure and content of your own data. Make sure to accurately describe the tables, columns, and any specific rules or tips for querying your dataset.
By following these steps, you can tailor the DuckDB Query Generator to your own data and use cases. Feel free to experiment and build off this repository to create your own powerful data querying applications.