Sanyam Bhutani 3 mēneši atpakaļ
vecāks
revīzija
db7f9e9d8a
100 mainītis faili ar 0 papildinājumiem un 16126 dzēšanām
  1. 0 55
      docs/FAQ.md
  2. 0 64
      docs/LLM_finetuning.md
  3. BIN
      docs/img/feature_based_fn.png
  4. BIN
      docs/img/feature_based_fn_2.png
  5. BIN
      docs/img/full_param_fn.png
  6. BIN
      docs/img/llama2_gradio.png
  7. BIN
      docs/img/llama2_streamlit.png
  8. BIN
      docs/img/llama2_streamlit2.png
  9. BIN
      docs/img/messenger_api_settings.png
  10. BIN
      docs/img/messenger_llama_arch.jpg
  11. BIN
      docs/img/wandb_screenshot.png
  12. BIN
      docs/img/whatsapp_dashboard.jpg
  13. BIN
      docs/img/whatsapp_llama_arch.jpg
  14. 0 198
      docs/multi_gpu.md
  15. 0 128
      docs/single_gpu.md
  16. 0 8
      recipes/3p_integrations/README.md
  17. 0 307
      recipes/3p_integrations/aws/getting_started_llama_3_on_amazon_bedrock.ipynb
  18. 0 2151
      recipes/3p_integrations/aws/prompt_engineering_with_llama_2_on_amazon_bedrock.ipynb
  19. 0 579
      recipes/3p_integrations/aws/react_llama_3_bedrock_wk.ipynb
  20. 0 494
      recipes/3p_integrations/azure/Azure MaaS/azure_api_example.ipynb
  21. 0 2
      recipes/3p_integrations/azure/README.md
  22. 0 11
      recipes/3p_integrations/crusoe/README.md
  23. 0 85
      recipes/3p_integrations/crusoe/vllm-fp8/README.md
  24. BIN
      recipes/3p_integrations/crusoe/vllm-fp8/assets/tpot_vs_qps_chart.png
  25. BIN
      recipes/3p_integrations/crusoe/vllm-fp8/assets/ttft_vs_qps_chart.png
  26. 0 427
      recipes/3p_integrations/crusoe/vllm-fp8/benchmarks/backend_request_func.py
  27. 0 770
      recipes/3p_integrations/crusoe/vllm-fp8/benchmarks/benchmark_serving.py
  28. 0 518
      recipes/3p_integrations/crusoe/vllm-fp8/benchmarks/sonnet.txt
  29. 0 59
      recipes/3p_integrations/crusoe/vllm-fp8/convert_hf_to_fp8.py
  30. 0 41
      recipes/3p_integrations/crusoe/vllm-fp8/main.tf
  31. 0 72
      recipes/3p_integrations/crusoe/vllm-fp8/plot.py
  32. 0 12
      recipes/3p_integrations/crusoe/vllm-fp8/pyproject.toml
  33. 0 12
      recipes/3p_integrations/crusoe/vllm-fp8/run_benchmark.sh
  34. 0 1038
      recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/Function-Calling-101-Ecommerce.ipynb
  35. 0 41
      recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/customers.csv
  36. 0 21
      recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/orders.csv
  37. 0 21
      recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/products.csv
  38. 0 8
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/data/employees.csv
  39. 0 6
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/data/purchases.csv
  40. 0 677
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/json-mode-function-calling-for-sql.ipynb
  41. 0 7
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/employees-without-purchases.yaml
  42. 0 9
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/most-expensive-purchase.yaml
  43. 0 11
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/most-recent-purchases.yaml
  44. 0 6
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/number-of-teslas.yaml
  45. 0 639
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/SDOH-Json-mode.ipynb
  46. 0 31
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00456321.txt
  47. 0 28
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00567289.txt
  48. 0 28
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00678934.txt
  49. 0 32
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00785642.txt
  50. 0 30
      recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00893247.txt
  51. 0 427
      recipes/3p_integrations/groq/groq-api-cookbook/llama3-stock-market-function-calling/llama3-stock-market-function-calling.ipynb
  52. 0 340
      recipes/3p_integrations/groq/groq-api-cookbook/parallel-tool-use/parallel-tool-use.ipynb
  53. 0 2
      recipes/3p_integrations/groq/groq-api-cookbook/parallel-tool-use/requirements.txt
  54. 0 993
      recipes/3p_integrations/groq/groq-api-cookbook/rag-langchain-presidential-speeches/presidential_speeches.csv
  55. 0 664
      recipes/3p_integrations/groq/groq-api-cookbook/rag-langchain-presidential-speeches/rag-langchain-presidential-speeches.ipynb
  56. 0 21
      recipes/3p_integrations/groq/groq-example-templates/conversational-chatbot-langchain/README.md
  57. 0 74
      recipes/3p_integrations/groq/groq-example-templates/conversational-chatbot-langchain/main.py
  58. 0 0
      recipes/3p_integrations/groq/groq-example-templates/conversational-chatbot-langchain/requirements.txt
  59. 0 23
      recipes/3p_integrations/groq/groq-example-templates/crewai-agents/README.md
  60. 0 184
      recipes/3p_integrations/groq/groq-example-templates/crewai-agents/main.py
  61. 0 3
      recipes/3p_integrations/groq/groq-example-templates/crewai-agents/requirements.txt
  62. 0 21
      recipes/3p_integrations/groq/groq-example-templates/groq-quickstart-conversational-chatbot/README.md
  63. 0 38
      recipes/3p_integrations/groq/groq-example-templates/groq-quickstart-conversational-chatbot/main.py
  64. 0 1
      recipes/3p_integrations/groq/groq-example-templates/groq-quickstart-conversational-chatbot/requirements.txt
  65. 0 27
      recipes/3p_integrations/groq/groq-example-templates/groqing-the-stock-market-function-calling-llama3/README.md
  66. 0 139
      recipes/3p_integrations/groq/groq-example-templates/groqing-the-stock-market-function-calling-llama3/main.py
  67. 0 12
      recipes/3p_integrations/groq/groq-example-templates/groqing-the-stock-market-function-calling-llama3/requirements.txt
  68. 0 21
      recipes/3p_integrations/groq/groq-example-templates/llamachat-conversational-chatbot-with-llamaIndex/README.md
  69. 0 46
      recipes/3p_integrations/groq/groq-example-templates/llamachat-conversational-chatbot-with-llamaIndex/main.py
  70. 0 2
      recipes/3p_integrations/groq/groq-example-templates/llamachat-conversational-chatbot-with-llamaIndex/requirements.txt
  71. 0 33
      recipes/3p_integrations/groq/groq-example-templates/presidential-speeches-rag-with-pinecone/README.md
  72. 0 114
      recipes/3p_integrations/groq/groq-example-templates/presidential-speeches-rag-with-pinecone/main.py
  73. 0 8
      recipes/3p_integrations/groq/groq-example-templates/presidential-speeches-rag-with-pinecone/requirements.txt
  74. 0 57
      recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/README.md
  75. 0 8
      recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/data/employees.csv
  76. 0 6
      recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/data/purchases.csv
  77. 0 145
      recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/main.py
  78. 0 42
      recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/prompts/base_prompt.txt
  79. 0 4
      recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/requirements.txt
  80. 0 53
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/README.md
  81. 0 8
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/data/employees.csv
  82. 0 6
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/data/purchases.csv
  83. 0 158
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/main.py
  84. 0 9
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/requirements.txt
  85. 0 7
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/employees-without-purchases.yaml
  86. 0 9
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/most-expensive-purchase.yaml
  87. 0 9
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/most-recent-purchases.yaml
  88. 0 6
      recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/number-of-teslas.yaml
  89. 0 1708
      recipes/3p_integrations/groq/llama3_cookbook_groq.ipynb
  90. 0 26
      recipes/3p_integrations/lamini/text2sql_memory_tuning/README.md
  91. BIN
      recipes/3p_integrations/lamini/text2sql_memory_tuning/assets/manual_filtering.png
  92. BIN
      recipes/3p_integrations/lamini/text2sql_memory_tuning/assets/website.png
  93. 0 40
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/gold-test-set-v2.jsonl
  94. 0 20
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/gold-test-set.jsonl
  95. 0 220
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/archive/generated_queries_large_filtered_cleaned.jsonl
  96. 0 128
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/archive/generated_queries_v2_large_filtered_cleaned.jsonl
  97. 0 159
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries.jsonl
  98. 0 1149
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries_large.jsonl
  99. 0 330
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries_large_filtered.jsonl
  100. 0 0
      recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries_v2.jsonl

+ 0 - 55
docs/FAQ.md

@@ -1,55 +0,0 @@
-# FAQ
-
-Here we discuss frequently asked questions that may occur and we found useful along the way.
-
-1. Does FSDP support mixed precision in one FSDP unit? Meaning, in one FSDP unit some of the parameters are in Fp16/Bf16 and others in FP32.
-
-    FSDP requires each FSDP unit to have consistent precision, so this case is not supported at this point. It might be added in future but no ETA at the moment.
-
-2.  How does FSDP handles mixed grad requirements?
-
-    FSDP does not support mixed `require_grad` in one FSDP unit. This means if you are planning to freeze some layers, you need to do it on the FSDP unit level rather than model layer. For example, let us assume our model has 30 decoder layers and we want to freeze the bottom 28 layers and only train 2 top transformer layers. In this case, we need to make sure `require_grad` for the top two transformer layers are set to `True`.
-
-3. How do PEFT methods work with FSDP in terms of grad requirements/layer freezing?
-
-    We wrap the PEFT modules separate from the transformer layer in auto_wrapping policy, that would result in PEFT models having `require_grad=True` while the rest of the model is  `require_grad=False`.
-
-4. Can I add custom datasets?
-
-    Yes, you can find more information on how to do that [here](../recipes/quickstart/finetuning/datasets/README.md).
-
-5. What are the hardware SKU requirements for deploying these models?
-
-    Hardware requirements vary based on latency, throughput and cost constraints. For good latency, the models were split across multiple GPUs with tensor parallelism in a machine with NVIDIA A100s or H100s. But TPUs, other types of GPUs like A10G, T4, L4, or even commodity hardware can also be used to deploy these models (e.g. https://github.com/ggerganov/llama.cpp).
-    If working on a CPU, it is worth looking at this [blog post](https://www.intel.com/content/www/us/en/developer/articles/news/llama2.html) from Intel for an idea of Llama 2's performance on a CPU.
-
-6. What are the hardware SKU requirements for fine-tuning Llama pre-trained models?
-
-    Fine-tuning requirements vary based on amount of data, time to complete fine-tuning and cost constraints. To fine-tune these models we have generally used multiple NVIDIA A100 machines with data parallelism across nodes and a mix of data and tensor parallelism intra node. But using a single machine, or other GPU types like NVIDIA A10G or H100 are definitely possible (e.g. alpaca models are trained on a single RTX4090: https://github.com/tloen/alpaca-lora).
-
-7. How to handle CUDA memory fragmentations during fine-tuning that may lead into an OOM?
-
-    In some cases you may experience that after model checkpointing specially with FSDP (this usually does not happen with PEFT methods), the reserved and allocated CUDA memory has increased. This might be due to CUDA memory fragmentations. PyTorch recenly added an enviroment variable that helps to better manage memory fragmentation (this feature in available on PyTorch nightlies at the time of writing this doc July 30 2023). You can set this in your main training script as follows:
-
-    ```bash
-
-    os.environ['PYTORCH_CUDA_ALLOC_CONF']='expandable_segments:True'
-
-    ```
-    We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../src/llama_recipes/utils/train_utils.py), feel free to uncomment it if required.
-
-8. Additional debugging flags?
-
-    The environment variable `TORCH_DISTRIBUTED_DEBUG` can be used to trigger additional useful logging and collective synchronization checks to ensure all ranks are synchronized appropriately. `TORCH_DISTRIBUTED_DEBUG` can be set to either OFF (default), INFO, or DETAIL depending on the debugging level required. Please note that the most verbose option, DETAIL may impact the application performance and thus should only be used when debugging issues.
-
-    We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../src/llama_recipes/utils/train_utils.py), feel free to uncomment it if required.
-
-9. I am getting import errors when running inference.
-
-    Verify that CUDA environment variables are set correctly on your machine. For example for bitsandbytes, you can generally set it as below to get things working on A100 80g's on AWS.
-
-    ```bash
-    export CUDA_HOME="/usr/local/cuda-11.8"
-    export PATH=$CUDA_HOME/bin:$PATH
-    export LD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/lib64:$CUDA_HOME/efa/lib:/opt/amazon/efa/lib:$LD_LIBRARY_PATH
-    ```

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 64
docs/LLM_finetuning.md


BIN
docs/img/feature_based_fn.png


BIN
docs/img/feature_based_fn_2.png


BIN
docs/img/full_param_fn.png


BIN
docs/img/llama2_gradio.png


BIN
docs/img/llama2_streamlit.png


BIN
docs/img/llama2_streamlit2.png


BIN
docs/img/messenger_api_settings.png


BIN
docs/img/messenger_llama_arch.jpg


BIN
docs/img/wandb_screenshot.png


BIN
docs/img/whatsapp_dashboard.jpg


BIN
docs/img/whatsapp_llama_arch.jpg


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 198
docs/multi_gpu.md


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 128
docs/single_gpu.md


+ 0 - 8
recipes/3p_integrations/README.md

@@ -1,8 +0,0 @@
-## Llama-Recipes 3P Integrations
-
-This folder contains example scripts showcasing the use of Meta Llama with popular platforms and tooling in the LLM ecosystem. 
-
-Each folder is maintained by the platform-owner. 
-
-> [!NOTE]
-> If you'd like to add your platform here, please open a new issue with details of your examples.

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 307
recipes/3p_integrations/aws/getting_started_llama_3_on_amazon_bedrock.ipynb


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 2151
recipes/3p_integrations/aws/prompt_engineering_with_llama_2_on_amazon_bedrock.ipynb


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 579
recipes/3p_integrations/aws/react_llama_3_bedrock_wk.ipynb


+ 0 - 494
recipes/3p_integrations/azure/Azure MaaS/azure_api_example.ipynb

@@ -1,494 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "# Use Azure API with Llama 3.1\n",
-        "\n",
-        "This notebook shows examples of how to use Llama 3.1 APIs offered by Microsoft Azure. We will cover:  \n",
-        "* HTTP requests API usage for Llama 3.1 instruct models in CLI\n",
-        "* HTTP requests API usage for Llama 3.1 instruct models in Python\n",
-        "* Plug the APIs into LangChain\n",
-        "* Wire the model with Gradio to build a simple chatbot with memory\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Prerequisite\n",
-        "\n",
-        "Before we start building with Azure Llama 3.1 APIs, there are certain steps we need to take to deploy the models:\n",
-        "\n",
-        "* Register for a valid Azure account with subscription [here](https://azure.microsoft.com/en-us/free/search/?ef_id=_k_CjwKCAiA-P-rBhBEEiwAQEXhH5OHAJLhzzcNsuxwpa5c9EJFcuAjeh6EvZw4afirjbWXXWkiZXmU2hoC5GoQAvD_BwE_k_&OCID=AIDcmm5edswduu_SEM__k_CjwKCAiA-P-rBhBEEiwAQEXhH5OHAJLhzzcNsuxwpa5c9EJFcuAjeh6EvZw4afirjbWXXWkiZXmU2hoC5GoQAvD_BwE_k_&gad_source=1&gclid=CjwKCAiA-P-rBhBEEiwAQEXhH5OHAJLhzzcNsuxwpa5c9EJFcuAjeh6EvZw4afirjbWXXWkiZXmU2hoC5GoQAvD_BwE)\n",
-        "* Take a quick look on what is the [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home) and navigate to the website from the link in the article\n",
-        "* Follow the demos in the article to create a project and [resource](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal) group.\n",
-        "* For Llama 3.1 instruct models from Model catalog, click Deploy in the model page and select \"Serverless API with Azure AI Content Safety\". Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.\n",
-        "* For Llama 3.1 pretrained models, Azure currently only support manual deployment under regular subscription. This means you will need to acquire a virtual machine with managed compute resource. We won't cover it here in this tutorial.\n",
-        "\n",
-        "For more information, you should consult Azure's official documentation [here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-llama?tabs=azure-studio) for model deployment and inference."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## HTTP Requests API Usage in CLI\n",
-        "\n",
-        "### Basics\n",
-        "\n",
-        "The usage and schema of the API are identical to Llama 3 API hosted on Azure.\n",
-        "\n",
-        "For using the REST API, You will need to have an Endpoint url and Authentication Key associated with that endpoint.  \n",
-        "This can be acquired from previous steps.  \n",
-        "\n",
-        "In this chat completion example for instruct model, we use a simple curl call for illustration. There are three major components:  \n",
-        "\n",
-        "* The `host-url` is your endpoint url with completion schema. \n",
-        "* The `headers` defines the content type as well as your api key. \n",
-        "* The `payload` or `data`, which is your prompt detail and model hyper parameters."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "The `host-url` needs to be `/v1/chat/completions` and the request payload to include roles in conversations. Here is a sample payload:  \n",
-        "\n",
-        "```\n",
-        "{ \n",
-        "  \"messages\": [ \n",
-        "    { \n",
-        "      \"content\": \"You are a helpful assistant.\", \n",
-        "      \"role\": \"system\" \n",
-        "},  \n",
-        "    { \n",
-        "      \"content\": \"Hello!\", \n",
-        "      \"role\": \"user\" \n",
-        "    } \n",
-        "  ], \n",
-        "  \"max_tokens\": 50, \n",
-        "} \n",
-        "```\n",
-        "\n",
-        "Here is a sample curl call for chat completion"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"What is good about Wuhan?\",\"role\":\"user\"}], \"max_tokens\": 50}'"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Streaming\n",
-        "\n",
-        "One fantastic feature the API offers is the streaming capability.  \n",
-        "Streaming allows the generated tokens to be sent as data-only server-sent events whenever they become available.  \n",
-        "This is extremely important for interactive applications such as chatbots, so the user is always engaged.  \n",
-        "\n",
-        "To use streaming, simply set `\"stream\":true` as part of the request payload.  \n",
-        "In the streaming mode, the REST API response will be different from non-streaming mode.\n",
-        "\n",
-        "Here is an example: "
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"What is good about Wuhan?\",\"role\":\"user\"}], \"max_tokens\": 500, \"stream\": true}'"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "As you can see the result comes back as a stream of `data` objects, each contains generated information including a `choice`.  \n",
-        "The stream terminated by a `data:[DONE]\\n\\n` message."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Content Safety Filtering\n",
-        "\n",
-        "If you enabled content filtering during deployment, Azure Llama 3.1 API endpoints will have content safety feature turned on. Both input prompt and output tokens are filtered by this service automatically.  \n",
-        "To know more about the impact to the request/response payload, please refer to official guide [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=python).   \n",
-        "\n",
-        "For model input and output, if the filter detects there is harmful content, the generation will error out with additional information. \n",
-        "\n",
-        "If you disabled content filtering during deployment, Llama models had content safety built-in for generation. It will refuse to answer your questions if any harmful content was detected.\n",
-        "\n",
-        "Here is an example prompt that triggered content safety filtering:\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"How to make bomb?\",\"role\":\"user\"}], \"max_tokens\": 50}'"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## HTTP Requests API Usage in Python\n",
-        "\n",
-        "Besides calling the API directly from command line tools, you can also programatically call them in Python.  \n",
-        "\n",
-        "Here is an example for the instruct model:\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "import urllib.request\n",
-        "import json\n",
-        "\n",
-        "#Configure payload data sending to API endpoint\n",
-        "data = {\"messages\":[\n",
-        "            {\"role\":\"system\", \"content\":\"You are a helpful assistant.\"},\n",
-        "            {\"role\":\"user\", \"content\":\"What is good about Wuhan?\"}],\n",
-        "        \"max_tokens\": 500,\n",
-        "        \"temperature\": 0.9,\n",
-        "        \"stream\": True,\n",
-        "}\n",
-        "\n",
-        "body = str.encode(json.dumps(data))\n",
-        "\n",
-        "#Replace the url with your API endpoint\n",
-        "url = 'https://your-endpoint.inference.ai.azure.com/v1/chat/completions'\n",
-        "\n",
-        "#Replace this with the key for the endpoint\n",
-        "api_key = 'your-auth-key'\n",
-        "if not api_key:\n",
-        "    raise Exception(\"API Key is missing\")\n",
-        "\n",
-        "headers = {'Content-Type':'application/json', 'Authorization':(api_key)}\n",
-        "\n",
-        "req = urllib.request.Request(url, body, headers)\n",
-        "\n",
-        "try:\n",
-        "    response = urllib.request.urlopen(req)\n",
-        "    result = response.read()\n",
-        "    print(result)\n",
-        "except urllib.error.HTTPError as error:\n",
-        "    print(\"The request failed with status code: \" + str(error.code))\n",
-        "    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure\n",
-        "    print(error.info())\n",
-        "    print(error.read().decode(\"utf8\", 'ignore'))\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "However in this example, the streamed data content returns back as a single payload. It didn't stream as a serial of data events as we wished. To build true streaming capabilities utilizing the API endpoint, we will utilize the [`requests`](https://requests.readthedocs.io/en/latest/) library instead."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Streaming in Python\n",
-        "\n",
-        "`Requests` library is a simple HTTP library for Python built with [`urllib3`](https://github.com/urllib3/urllib3). It automatically maintains the keep-alive and HTTP connection pooling. With the `Session` class, we can easily stream the result from our API calls.  \n",
-        "\n",
-        "Here is a quick example:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "import json\n",
-        "import requests\n",
-        "\n",
-        "data = {\"messages\":[\n",
-        "            {\"role\":\"system\", \"content\":\"You are a helpful assistant.\"},\n",
-        "            {\"role\":\"user\", \"content\":\"What is good about Wuhan?\"}],\n",
-        "        \"max_tokens\": 500,\n",
-        "        \"temperature\": 0.9,\n",
-        "        \"stream\": True\n",
-        "}\n",
-        "\n",
-        "\n",
-        "def post_stream(url):\n",
-        "    s = requests.Session()\n",
-        "    api_key = \"your-auth-key\"\n",
-        "    headers = {'Content-Type':'application/json', 'Authorization':(api_key)}\n",
-        "\n",
-        "    with s.post(url, data=json.dumps(data), headers=headers, stream=True) as resp:\n",
-        "        print(resp.status_code)\n",
-        "        for line in resp.iter_lines():\n",
-        "            if line:\n",
-        "                print(line)\n",
-        "\n",
-        "\n",
-        "url = \"https://your-endpoint.inference.ai.azure.com/v1/chat/completions\"\n",
-        "post_stream(url)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Use Llama 3.1 API with LangChain\n",
-        "\n",
-        "In this section, we will demonstrate how to use Llama 3.1 APIs with LangChain, one of the most popular framework to accelerate building your AI product.  \n",
-        "One common solution here is to create your customized LLM instance, so you can add it to various chains to complete different tasks.  \n",
-        "In this example, we will use the `AzureMLChatOnlineEndpoint` class LangChain provides to build a customized LLM instance. This particular class is designed to take in Azure endpoint and API keys as inputs and wire it with HTTP calls. So the underlying of it is very similar to how we used `urllib.request` library to send RESTful calls in previous examples to the Azure Endpoint.   \n",
-        "\n",
-        "First, let's install dependencies: \n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "pip install langchain"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Once all dependencies are installed, you can directly create a `llm` instance based on `AzureMLChatOnlineEndpoint` as follows:  "
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from langchain_community.chat_models.azureml_endpoint import (\n",
-        "    AzureMLEndpointApiType,\n",
-        "    CustomOpenAIChatContentFormatter,\n",
-        "    AzureMLChatOnlineEndpoint,\n",
-        ")\n",
-        "\n",
-        "from langchain_core.messages import HumanMessage\n",
-        "\n",
-        "llm = AzureMLChatOnlineEndpoint(\n",
-        "    endpoint_api_key=\"your-auth-key\",\n",
-        "    endpoint_url=\"https://your-endpoint.inference.ai.azure.com/v1/chat/completions\",\n",
-        "    endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
-        "    model_kwargs={\"temperature\": 0.6, \"max_tokens\": 256, \"top_p\": 0.9},\n",
-        "    content_formatter=CustomOpenAIChatContentFormatter(),\n",
-        ")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "However, you might wonder what is the `CustomOpenAIChatContentFormatter` in the context when creating the `llm` instance?   \n",
-        "The `CustomOpenAIChatContentFormatter` is a [handler class](https://python.langchain.com/docs/integrations/llms/azure_ml#content-formatter) for transforming the request and response of an AzureML endpoint to match with required schema. Since there are various models in the Azure model catalog, each of which needs to handle the data accordingly.  \n",
-        "In our case, we can use the default `CustomOpenAIChatContentFormatter` which can handle Llama model schemas. If you need to have special handlings, you can customize this specific class. \n",
-        "\n",
-        "Once you have the `llm` ready, you can simple inference it by:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "response = llm.invoke([HumanMessage(content=\"What is good about Wuhan?\")])\n",
-        "response"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Here is an example that you can create a translator chain with the `llm` instance and translate English to French:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from langchain.chains import LLMChain\n",
-        "from langchain.prompts import PromptTemplate\n",
-        "\n",
-        "template = \"\"\"\n",
-        "You are a Translator. Translate the following content from {input_language} to {output_language} and reply with only the translated result.\n",
-        "{input_content}\n",
-        "\"\"\"\n",
-        "\n",
-        "translator_chain = LLMChain(\n",
-        "    llm = llm,\n",
-        "    prompt = PromptTemplate(\n",
-        "            template=template,\n",
-        "            input_variables=[\"input_language\", \"output_language\", \"input_content\"],\n",
-        "        ),\n",
-        ")\n",
-        "\n",
-        "print(translator_chain.run(input_language=\"English\", output_language=\"French\", input_content=\"What is good about Wuhan?\"))\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Build a chatbot with Llama 3.1 API\n",
-        "\n",
-        "In this section, we will build a simple chatbot using Azure Llama 3.1 API, LangChain and [Gradio](https://www.gradio.app/)'s `ChatInterface` with memory capability.\n",
-        "\n",
-        "Gradio is a framework to help demo your machine learning model with a web interface. We also have a dedicated Gradio chatbot [example](https://github.com/meta-llama/llama-recipes/blob/main/recipes/use_cases/customerservice_chatbots/RAG_chatbot/RAG_Chatbot_Example.ipynb) built with Llama 3 on-premises with RAG.   \n",
-        "\n",
-        "First, let's install Gradio dependencies.\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "pip install gradio==4.39.0"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Let's use `AzureMLChatOnlineEndpoint` class from the previous example.  \n",
-        "In this example, we have three major components:  \n",
-        "1. Chatbot UI hosted as web interface by Gradio. These are the UI logics that render our model predictions.\n",
-        "2. Model itself, which is the core component that ingests prompts and returns an answer back.\n",
-        "3. Memory component, which stores previous conversation context. In this example, we will use [conversation window buffer](https://python.langchain.com/docs/modules/memory/types/buffer_window) which logs context in certain time window in the past. \n",
-        "\n",
-        "All of them are chained together using LangChain."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "import gradio as gr\n",
-        "import langchain\n",
-        "from langchain.chains import ConversationChain\n",
-        "from langchain.prompts import PromptTemplate\n",
-        "from langchain.memory import ConversationBufferWindowMemory\n",
-        "from langchain_core.messages import HumanMessage\n",
-        "from langchain_community.chat_models.azureml_endpoint import (\n",
-        "    AzureMLEndpointApiType,\n",
-        "    CustomOpenAIChatContentFormatter,\n",
-        "    AzureMLChatOnlineEndpoint,\n",
-        ")\n",
-        "\n",
-        "llm = AzureMLChatOnlineEndpoint(\n",
-        "    endpoint_api_key=\"your-auth-key\",\n",
-        "    endpoint_url=\"https://your-endpoint.inference.ai.azure.com/v1/chat/completions\",\n",
-        "    endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
-        "    model_kwargs={\"temperature\": 0.6, \"max_tokens\": 256, \"top_p\": 0.9},\n",
-        "    content_formatter=CustomOpenAIChatContentFormatter(),\n",
-        ")\n",
-        "\n",
-        "langchain.debug=True\n",
-        "\n",
-        "#Create memory\n",
-        "memory = ConversationBufferWindowMemory(llm=llm, k=5, memory_key=\"chat_history\", ai_prefix=\"Assistant\", human_prefix=\"User\")\n",
-        "\n",
-        "#Create input prompt template with chat history for chaining\n",
-        "INPUT_TEMPLATE = \"\"\"Current conversation:\n",
-        "{chat_history}\n",
-        "\n",
-        "User question:{input}\"\"\"\n",
-        "\n",
-        "conversation_prompt_template = PromptTemplate(\n",
-        "    input_variables=[\"chat_history\", \"input\"], template=INPUT_TEMPLATE\n",
-        ")\n",
-        "\n",
-        "conversation_chain_with_memory = ConversationChain(\n",
-        "    llm = llm,\n",
-        "    prompt = conversation_prompt_template,\n",
-        "    verbose = True,\n",
-        "    memory = memory,\n",
-        ")\n",
-        "\n",
-        "#Prediction\n",
-        "def predict(message, history):\n",
-        "    history_format = []\n",
-        "    for user, assistant in history:\n",
-        "        history_format.append({\"role\": \"user\", \"content\": user })\n",
-        "        history_format.append({\"role\": \"assistant\", \"content\":assistant})\n",
-        "    history_format.append({\"role\": \"user\", \"content\": message})\n",
-        "    response = conversation_chain_with_memory.run(input=message)\n",
-        "    return response\n",
-        "\n",
-        "#Launch Gradio chatbot interface\n",
-        "gr.ChatInterface(predict).launch()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "After successfully executing the code above, a chat interface should appear as the interactive output or you can open the localhost url in your selected browser window. You can see how amazing it is to build a AI chatbot just in few lines of code.\n",
-        "\n",
-        "This concludes our tutorial and examples. Here are some additional reference:  \n",
-        "* [Fine-tune Llama](https://learn.microsoft.com/azure/ai-studio/how-to/fine-tune-model-llama)\n",
-        "* [Plan and manage costs (marketplace)](https://learn.microsoft.com/azure/ai-studio/how-to/costs-plan-manage#monitor-costs-for-models-offered-through-the-azure-marketplace)\n"
-      ]
-    }
-  ],
-  "metadata": {
-    "fileHeader": "",
-    "fileUid": "599e1edd-cd59-4e55-823f-17157fc07b18",
-    "isAdHoc": false,
-    "kernelspec": {
-      "display_name": "Python 3",
-      "language": "python",
-      "name": "python3"
-    },
-    "language_info": {
-      "codemirror_mode": {
-        "name": "ipython",
-        "version": 3
-      },
-      "file_extension": ".py",
-      "mimetype": "text/x-python",
-      "name": "python",
-      "nbconvert_exporter": "python",
-      "pygments_lexer": "ipython3",
-      "version": "3.9.6"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 2
-}

+ 0 - 2
recipes/3p_integrations/azure/README.md

@@ -1,2 +0,0 @@
-In this folder, we show various recipes for Llama models working with Azure AI services. This includes:
-* Examples for running Llama model inference on Azure's serverless API offerings (aka. MaaS)

+ 0 - 11
recipes/3p_integrations/crusoe/README.md

@@ -1,11 +0,0 @@
-Below are recipes for deploying common Llama workflows on [Crusoe's](https://crusoe.ai) high-performance, sustainable cloud. Each workflow corresponds to a subfolder with its own README and supplemental materials. Please reference the table below for hardware requirements.
-
-| Workflow | Model(s) | VM type | Storage |
-|:----:  | :----:  | :----:| :----: |
-| [Serving Llama3.1 in FP8 with vLLM](vllm-fp8/) | [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct), [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | l40s-48gb.8x | 256 GiB Persistent Disk |
-
-# Requirements
-First, ensure that you have a Crusoe account (you can sign up [here](https://console.crusoecloud.com/)). We will provision resources using Terraform, please ensure that your environment is configured and refer to the Crusoe [docs](https://github.com/crusoecloud/terraform-provider-crusoe?tab=readme-ov-file#getting-started) for guidance.
-
-# Serving Models
-Some recipes in this repo require firewall rules to expose ports in order to reach the inference server. To manage firewall rules, please refer to our [networking documentation](https://docs.crusoecloud.com/networking/firewall-rules/managing-firewall-rules).

+ 0 - 85
recipes/3p_integrations/crusoe/vllm-fp8/README.md

@@ -1,85 +0,0 @@
-In this article, we will show how to benchmark FP8 models on L40S using the vLLM inference engine. At the end, you should have an understanding of how to use `llm-compressor` to create quantize existing Llama3 finetunes in higher precision to fp8, benchmark throughput and latency to compare performance, and finally serve models using `vllm`.
-
-# Provisioning Resources
-First, navigate to this repository from your local machine. Update the corresponding variables in `locals` inside `main.tf` to match your environment (e.g. the path to your SSH key), then initialize the terraform project with `terraform init` and provision resources with `terraform apply`. Note that this will create a VM equipped with 8xL40S and a 256GB persistent disk. After the VM has been created, terraform will output the public IP address.
-
-## Mount Storage
-`ssh` into your VM. Then, run the below commands to mount the attached disk to `/scratch`.
-```bash
-mkfs.ext4 /dev/vdb
-mkdir /scratch
-mount -t ext4 /dev/vdb /scratch
-cd /scratch
-```
-
-# Install Dependencies
-We'll use [uv](https://github.com/astral-sh/uv) to install dependencies. First, install the tool with
-```bash
-apt-get update && apt-get install -y curl
-apt-get install tmux
-curl -LsSf https://astral.sh/uv/install.sh | sh
-source $HOME/.cargo/env
-```
-
-Now, clone the recipes and navigate to this tutorial. Initialize the virtual environment and install dependencies:
-```bash
-git clone https://github.com/meta-llama/llama-recipes.git
-cd llama-recipes/recipes/3p_integrations/crusoe/vllm-fp8/
-uv add vllm setuptools
-```
-
-# Run Benchmarks
-Before starting the vLLM server, we'll configure HuggingFace to save to our shared disk, specify the model tag, and set tensor parallelism to 1.
-```bash
-export HF_HOME=/scratch/
-export MODEL=neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
-export TP_SIZE=1
-```
-Now, we'll use tmux to run our server inside of a detachable session.
-```bash
-tmux new -s server
-uv run vllm serve $MODEL --enable-chunked-prefill --disable-log-requests --tensor-parallel-size $TP_SIZE
-```
-vLLM will download the model from HF and serve it on port 8000. Now, detach from the tmux session (`ctrl+b` then `d`) and we'll simulate a client.
-```bash
-tmux new -s client
-chmod +x run_benchmark.sh
-./run_benchmark.sh
-```
-Let's inspect the benchmark script to see what's going on.
-```bash
-TOTAL_SECONDS=120
-QPS_RATES=("1" "3" "5" "7" "9")
-
-for QPS in ${QPS_RATES[@]}; do
-    NUM_PROMPTS=$((TOTAL_SECONDS * QPS))
-    echo "===== RUNNING NUM_PROMPTS = $NUM_PROMPTS QPS = $QPS ====="
-
-    uv run benchmarks/benchmark_serving.py \
-        --model $MODEL \
-        --dataset-name sonnet --sonnet-input-len 550 --sonnet-output-len 150 --dataset-path benchmarks/sonnet.txt \
-        --num-prompts $NUM_PROMPTS --request-rate $QPS --save-result
-done
-```
-This is a convenience wrapper that re-runs the vLLM `benchmarks/benchmark_serving.py` with queries-per-second (QPS) gradually increasing from 1 to 9 and saves the results. After each run completes, a JSON will appear in the same directory containing inference statistics.
-
-# Results
-We repeated the above benchmark across the fp8 and fp16 versions of both Llama3.1 8B and 70B.
-
-![TPOT vs QPS](assets/tpot_vs_qps_chart.png "TPOT vs QPS")
-In the above chart, we compare time-per-output-token (TPOT) across different QPS volumes. For fp16 70B we run across 8 GPUs while in fp8 we only use 4 and we still maintain the same TPOT range. The 8B models are run across 1 GPU though fp8 is noticeably faster.
-
-![TPOT vs QPS](assets/ttft_vs_qps_chart.png "TTFT vs QPS")
-Looking at our time-to-first-token (TTFT), we observe the same trends. Even though the fp8 70B is run across half as many GPUs, its TTFT is roughly the same as the fp16 version on 8.
-
-# Converting Llama3 models to FP8
-If you wish to convert your existing finetunes to FP8, we can easily achieve this using [llmcompressor](https://github.com/vllm-project/llm-compressor).
-```bash
-uv add llmcompressor
-uv run convert_hf_to_fp8.py NousResearch/Hermes-3-Llama-3.1-70B
-```
-
-To use the converted model, update `$MODEL` to your absolute path for the converted version, then rerun `uv run vllm serve $MODEL --enable-chunked-prefill --disable-log-requests --tensor-parallel-size $TP_SIZE`. Now, we have a vLLM server up with our converted finetune and can rerun our previous benchmarks to verify performance.
-
-# Cleaning up
-To clean up the resources we've provisioned, we can simply run `terraform destroy` from within this repository on your local machine.

BIN
recipes/3p_integrations/crusoe/vllm-fp8/assets/tpot_vs_qps_chart.png


BIN
recipes/3p_integrations/crusoe/vllm-fp8/assets/ttft_vs_qps_chart.png


+ 0 - 427
recipes/3p_integrations/crusoe/vllm-fp8/benchmarks/backend_request_func.py

@@ -1,427 +0,0 @@
-import json
-import os
-import sys
-import time
-import traceback
-from dataclasses import dataclass, field
-from typing import List, Optional, Union
-
-import aiohttp
-import huggingface_hub.constants
-from tqdm.asyncio import tqdm
-from transformers import (AutoTokenizer, PreTrainedTokenizer,
-                          PreTrainedTokenizerFast)
-
-AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(total=6 * 60 * 60)
-
-
-@dataclass
-class RequestFuncInput:
-    prompt: str
-    api_url: str
-    prompt_len: int
-    output_len: int
-    model: str
-    best_of: int = 1
-    use_beam_search: bool = False
-
-
-@dataclass
-class RequestFuncOutput:
-    generated_text: str = ""
-    success: bool = False
-    latency: float = 0.0
-    ttft: float = 0.0  # Time to first token
-    itl: List[float] = field(
-        default_factory=list)  # List of inter-token latencies
-    prompt_len: int = 0
-    error: str = ""
-
-
-async def async_request_tgi(
-    request_func_input: RequestFuncInput,
-    pbar: Optional[tqdm] = None,
-) -> RequestFuncOutput:
-    api_url = request_func_input.api_url
-    assert api_url.endswith("generate_stream")
-
-    async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session:
-        assert not request_func_input.use_beam_search
-        params = {
-            "best_of": request_func_input.best_of,
-            "max_new_tokens": request_func_input.output_len,
-            "do_sample": True,
-            "temperature": 0.01,  # TGI does not accept 0.0 temperature.
-            "top_p": 0.99,  # TGI does not accept 1.0 top_p.
-        }
-        payload = {
-            "inputs": request_func_input.prompt,
-            "parameters": params,
-        }
-        output = RequestFuncOutput()
-        output.prompt_len = request_func_input.prompt_len
-
-        ttft = 0.0
-        st = time.perf_counter()
-        most_recent_timestamp = st
-        try:
-            async with session.post(url=api_url, json=payload) as response:
-                if response.status == 200:
-                    async for chunk_bytes in response.content:
-                        chunk_bytes = chunk_bytes.strip()
-                        if not chunk_bytes:
-                            continue
-                        chunk_bytes = chunk_bytes.decode("utf-8")
-
-                        #NOTE: Sometimes TGI returns a ping response without
-                        # any data, we should skip it.
-                        if chunk_bytes.startswith(":"):
-                            continue
-                        chunk = remove_prefix(chunk_bytes, "data:")
-
-                        data = json.loads(chunk)
-                        timestamp = time.perf_counter()
-                        # First token
-                        if ttft == 0.0:
-                            ttft = time.perf_counter() - st
-                            output.ttft = ttft
-
-                        # Decoding phase
-                        else:
-                            output.itl.append(timestamp -
-                                              most_recent_timestamp)
-
-                        most_recent_timestamp = timestamp
-
-                    output.latency = most_recent_timestamp - st
-                    output.success = True
-                    output.generated_text = data["generated_text"]
-                else:
-                    output.error = response.reason or ""
-                    output.success = False
-        except Exception:
-            output.success = False
-            exc_info = sys.exc_info()
-            output.error = "".join(traceback.format_exception(*exc_info))
-
-        if pbar:
-            pbar.update(1)
-        return output
-
-
-async def async_request_trt_llm(
-    request_func_input: RequestFuncInput,
-    pbar: Optional[tqdm] = None,
-) -> RequestFuncOutput:
-    api_url = request_func_input.api_url
-    assert api_url.endswith("generate_stream")
-
-    async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session:
-        assert not request_func_input.use_beam_search
-        assert request_func_input.best_of == 1
-        payload = {
-            "accumulate_tokens": True,
-            "text_input": request_func_input.prompt,
-            "temperature": 0.0,
-            "top_p": 1.0,
-            "max_tokens": request_func_input.output_len,
-            "stream": True,
-        }
-        output = RequestFuncOutput()
-        output.prompt_len = request_func_input.prompt_len
-
-        ttft = 0.0
-        st = time.perf_counter()
-        most_recent_timestamp = st
-        try:
-            async with session.post(url=api_url, json=payload) as response:
-                if response.status == 200:
-                    async for chunk_bytes in response.content:
-                        chunk_bytes = chunk_bytes.strip()
-                        if not chunk_bytes:
-                            continue
-
-                        chunk = remove_prefix(chunk_bytes.decode("utf-8"),
-                                              "data:")
-
-                        data = json.loads(chunk)
-                        output.generated_text += data["text_output"]
-                        timestamp = time.perf_counter()
-                        # First token
-                        if ttft == 0.0:
-                            ttft = time.perf_counter() - st
-                            output.ttft = ttft
-
-                        # Decoding phase
-                        else:
-                            output.itl.append(timestamp -
-                                              most_recent_timestamp)
-
-                        most_recent_timestamp = timestamp
-
-                    output.latency = most_recent_timestamp - st
-                    output.success = True
-
-                else:
-                    output.error = response.reason or ""
-                    output.success = False
-        except Exception:
-            output.success = False
-            exc_info = sys.exc_info()
-            output.error = "".join(traceback.format_exception(*exc_info))
-
-        if pbar:
-            pbar.update(1)
-        return output
-
-
-async def async_request_deepspeed_mii(
-    request_func_input: RequestFuncInput,
-    pbar: Optional[tqdm] = None,
-) -> RequestFuncOutput:
-    async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session:
-        assert request_func_input.best_of == 1
-        assert not request_func_input.use_beam_search
-
-        payload = {
-            "prompt": request_func_input.prompt,
-            "max_tokens": request_func_input.output_len,
-            "temperature": 0.01,  # deepspeed-mii does not accept 0.0 temp.
-            "top_p": 1.0,
-        }
-        output = RequestFuncOutput()
-        output.prompt_len = request_func_input.prompt_len
-
-        # NOTE: DeepSpeed-MII doesn't support streaming as of Jan 28 2024,
-        # will use 0 as placeholder.
-        # See https://github.com/microsoft/DeepSpeed-MII/pull/311
-        output.ttft = 0
-
-        st = time.perf_counter()
-        try:
-            async with session.post(url=request_func_input.api_url,
-                                    json=payload) as response:
-                if response.status == 200:
-                    parsed_resp = await response.json()
-                    output.latency = time.perf_counter() - st
-                    output.generated_text = parsed_resp["text"][0]
-                    output.success = True
-                else:
-                    output.error = response.reason or ""
-                    output.success = False
-        except Exception:
-            output.success = False
-            exc_info = sys.exc_info()
-            output.error = "".join(traceback.format_exception(*exc_info))
-
-        if pbar:
-            pbar.update(1)
-        return output
-
-
-async def async_request_openai_completions(
-    request_func_input: RequestFuncInput,
-    pbar: Optional[tqdm] = None,
-) -> RequestFuncOutput:
-    api_url = request_func_input.api_url
-    assert api_url.endswith(
-        ("completions", "profile")
-    ), "OpenAI Completions API URL must end with 'completions' or 'profile'."
-
-    async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session:
-        assert not request_func_input.use_beam_search
-        payload = {
-            "model": request_func_input.model,
-            "prompt": request_func_input.prompt,
-            "temperature": 0.0,
-            "best_of": request_func_input.best_of,
-            "max_tokens": request_func_input.output_len,
-            "stream": True,
-        }
-        headers = {
-            "Authorization": f"Bearer {os.environ.get('OPENAI_API_KEY')}"
-        }
-
-        output = RequestFuncOutput()
-        output.prompt_len = request_func_input.prompt_len
-
-        generated_text = ""
-        ttft = 0.0
-        st = time.perf_counter()
-        most_recent_timestamp = st
-        try:
-            async with session.post(url=api_url, json=payload,
-                                    headers=headers) as response:
-                if response.status == 200:
-                    async for chunk_bytes in response.content:
-                        chunk_bytes = chunk_bytes.strip()
-                        if not chunk_bytes:
-                            continue
-
-                        chunk = remove_prefix(chunk_bytes.decode("utf-8"),
-                                              "data: ")
-                        if chunk == "[DONE]":
-                            latency = time.perf_counter() - st
-                        else:
-                            data = json.loads(chunk)
-
-                            # NOTE: Some completion API might have a last
-                            # usage summary response without a token so we
-                            # want to check a token was generated
-                            if data["choices"][0]["text"]:
-                                timestamp = time.perf_counter()
-                                # First token
-                                if ttft == 0.0:
-                                    ttft = time.perf_counter() - st
-                                    output.ttft = ttft
-
-                                # Decoding phase
-                                else:
-                                    output.itl.append(timestamp -
-                                                      most_recent_timestamp)
-
-                                most_recent_timestamp = timestamp
-                                generated_text += data["choices"][0]["text"]
-
-                    output.generated_text = generated_text
-                    output.success = True
-                    output.latency = latency
-                else:
-                    output.error = response.reason or ""
-                    output.success = False
-        except Exception:
-            output.success = False
-            exc_info = sys.exc_info()
-            output.error = "".join(traceback.format_exception(*exc_info))
-
-    if pbar:
-        pbar.update(1)
-    return output
-
-
-async def async_request_openai_chat_completions(
-    request_func_input: RequestFuncInput,
-    pbar: Optional[tqdm] = None,
-) -> RequestFuncOutput:
-    api_url = request_func_input.api_url
-    assert api_url.endswith(
-        "chat/completions"
-    ), "OpenAI Chat Completions API URL must end with 'chat/completions'."
-
-    async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session:
-        assert not request_func_input.use_beam_search
-        payload = {
-            "model": request_func_input.model,
-            "messages": [
-                {
-                    "role": "user",
-                    "content": request_func_input.prompt,
-                },
-            ],
-            "temperature": 0.0,
-            "max_tokens": request_func_input.output_len,
-            "stream": True,
-        }
-        headers = {
-            "Content-Type": "application/json",
-            "Authorization": f"Bearer {os.environ.get('OPENAI_API_KEY')}",
-        }
-
-        output = RequestFuncOutput()
-        output.prompt_len = request_func_input.prompt_len
-
-        generated_text = ""
-        ttft = 0.0
-        st = time.perf_counter()
-        most_recent_timestamp = st
-        try:
-            async with session.post(url=api_url, json=payload,
-                                    headers=headers) as response:
-                if response.status == 200:
-                    async for chunk_bytes in response.content:
-                        chunk_bytes = chunk_bytes.strip()
-                        if not chunk_bytes:
-                            continue
-
-                        chunk = remove_prefix(chunk_bytes.decode("utf-8"),
-                                              "data: ")
-                        if chunk == "[DONE]":
-                            latency = time.perf_counter() - st
-                        else:
-                            timestamp = time.perf_counter()
-                            data = json.loads(chunk)
-
-                            delta = data["choices"][0]["delta"]
-                            if delta.get("content", None):
-                                # First token
-                                if ttft == 0.0:
-                                    ttft = time.perf_counter() - st
-                                    output.ttft = ttft
-
-                                # Decoding phase
-                                else:
-                                    output.itl.append(timestamp -
-                                                      most_recent_timestamp)
-
-                                generated_text += delta["content"]
-
-                            most_recent_timestamp = timestamp
-
-                    output.generated_text = generated_text
-                    output.success = True
-                    output.latency = latency
-                else:
-                    output.error = response.reason or ""
-                    output.success = False
-        except Exception:
-            output.success = False
-            exc_info = sys.exc_info()
-            output.error = "".join(traceback.format_exception(*exc_info))
-
-    if pbar:
-        pbar.update(1)
-    return output
-
-
-# Since vllm must support Python 3.8, we can't use str.removeprefix(prefix)
-# introduced in Python 3.9
-def remove_prefix(text: str, prefix: str) -> str:
-    if text.startswith(prefix):
-        return text[len(prefix):]
-    return text
-
-
-def get_model(pretrained_model_name_or_path: str) -> str:
-    if os.getenv('VLLM_USE_MODELSCOPE', 'False').lower() == 'true':
-        from modelscope import snapshot_download
-
-        model_path = snapshot_download(
-            model_id=pretrained_model_name_or_path,
-            local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE,
-            ignore_file_pattern=[".*.pt", ".*.safetensors", ".*.bin"])
-
-        return model_path
-    return pretrained_model_name_or_path
-
-
-def get_tokenizer(
-    pretrained_model_name_or_path: str, trust_remote_code: bool
-) -> Union[PreTrainedTokenizer, PreTrainedTokenizerFast]:
-    if pretrained_model_name_or_path is not None and not os.path.exists(
-            pretrained_model_name_or_path):
-        pretrained_model_name_or_path = get_model(
-            pretrained_model_name_or_path)
-    return AutoTokenizer.from_pretrained(pretrained_model_name_or_path,
-                                         trust_remote_code=trust_remote_code)
-
-
-ASYNC_REQUEST_FUNCS = {
-    "tgi": async_request_tgi,
-    "vllm": async_request_openai_completions,
-    "lmdeploy": async_request_openai_completions,
-    "deepspeed-mii": async_request_deepspeed_mii,
-    "openai": async_request_openai_completions,
-    "openai-chat": async_request_openai_chat_completions,
-    "tensorrt-llm": async_request_trt_llm,
-    "scalellm": async_request_openai_completions,
-}

+ 0 - 770
recipes/3p_integrations/crusoe/vllm-fp8/benchmarks/benchmark_serving.py

@@ -1,770 +0,0 @@
-"""Benchmark online serving throughput.
-
-On the server side, run one of the following commands:
-    vLLM OpenAI API server
-    vllm serve <your_model> \
-        --swap-space 16 \
-        --disable-log-requests
-
-    (TGI backend)
-    ./launch_tgi_server.sh <your_model> <max_batch_total_tokens>
-
-On the client side, run:
-    python benchmarks/benchmark_serving.py \
-        --backend <backend> \
-        --model <your_model> \
-        --dataset-name sharegpt \
-        --dataset-path <path to dataset> \
-        --request-rate <request_rate> \ # By default <request_rate> is inf
-        --num-prompts <num_prompts> # By default <num_prompts> is 1000
-
-    when using tgi backend, add
-        --endpoint /generate_stream
-    to the end of the command above.
-"""
-import argparse
-import asyncio
-import json
-import os
-import random
-import time
-import warnings
-from dataclasses import dataclass
-from datetime import datetime
-from typing import Any, AsyncGenerator, Dict, List, Optional, Tuple
-
-import numpy as np
-from backend_request_func import (ASYNC_REQUEST_FUNCS, RequestFuncInput,
-                                  RequestFuncOutput)
-from tqdm.asyncio import tqdm
-from transformers import PreTrainedTokenizerBase
-
-try:
-    from vllm.transformers_utils.tokenizer import get_tokenizer
-except ImportError:
-    from backend_request_func import get_tokenizer
-
-try:
-    from vllm.utils import FlexibleArgumentParser
-except ImportError:
-    from argparse import ArgumentParser as FlexibleArgumentParser
-
-
-@dataclass
-class BenchmarkMetrics:
-    completed: int
-    total_input: int
-    total_output: int
-    request_throughput: float
-    input_throughput: float
-    output_throughput: float
-    mean_ttft_ms: float
-    median_ttft_ms: float
-    std_ttft_ms: float
-    p99_ttft_ms: float
-    mean_tpot_ms: float
-    median_tpot_ms: float
-    std_tpot_ms: float
-    p99_tpot_ms: float
-    mean_itl_ms: float
-    median_itl_ms: float
-    std_itl_ms: float
-    p99_itl_ms: float
-
-
-def sample_sharegpt_requests(
-    dataset_path: str,
-    num_requests: int,
-    tokenizer: PreTrainedTokenizerBase,
-    fixed_output_len: Optional[int] = None,
-) -> List[Tuple[str, int, int]]:
-    if fixed_output_len is not None and fixed_output_len < 4:
-        raise ValueError("output_len too small")
-    # Load the dataset.
-    with open(dataset_path) as f:
-        dataset = json.load(f)
-    # Filter out the conversations with less than 2 turns.
-    dataset = [data for data in dataset if len(data["conversations"]) >= 2]
-    # Only keep the first two turns of each conversation.
-    dataset = [(data["conversations"][0]["value"],
-                data["conversations"][1]["value"]) for data in dataset]
-
-    # Shuffle the dataset.
-    random.shuffle(dataset)
-
-    # Filter out sequences that are too long or too short
-    filtered_dataset: List[Tuple[str, int, int]] = []
-    for i in range(len(dataset)):
-        if len(filtered_dataset) == num_requests:
-            break
-
-        # Tokenize the prompts and completions.
-        prompt = dataset[i][0]
-        prompt_token_ids = tokenizer(prompt).input_ids
-        completion = dataset[i][1]
-        completion_token_ids = tokenizer(completion).input_ids
-        prompt_len = len(prompt_token_ids)
-        output_len = len(completion_token_ids
-                         ) if fixed_output_len is None else fixed_output_len
-        if prompt_len < 4 or output_len < 4:
-            # Prune too short sequences.
-            continue
-        if prompt_len > 1024 or prompt_len + output_len > 2048:
-            # Prune too long sequences.
-            continue
-        filtered_dataset.append((prompt, prompt_len, output_len))
-
-    return filtered_dataset
-
-
-def sample_sonnet_requests(
-    dataset_path: str,
-    num_requests: int,
-    input_len: int,
-    output_len: int,
-    prefix_len: int,
-    tokenizer: PreTrainedTokenizerBase,
-) -> List[Tuple[str, str, int, int]]:
-    assert (
-        input_len > prefix_len
-    ), "'args.sonnet-input-len' must be greater than 'args.prefix-input-len'."
-
-    # Load the dataset.
-    with open(dataset_path) as f:
-        poem_lines = f.readlines()
-
-    # Tokenize the poem lines.
-    poem_token_ids = tokenizer(poem_lines).input_ids
-    average_poem_len = sum(
-        len(token_ids) for token_ids in poem_token_ids) / len(poem_token_ids)
-
-    # Base prefix for all requests.
-    base_prompt = "Pick as many lines as you can from these poem lines:\n"
-    base_message = [{
-        "role": "user",
-        "content": base_prompt,
-    }]
-    base_prompt_formatted = tokenizer.apply_chat_template(
-        base_message, add_generation_prompt=True, tokenize=False)
-    base_prompt_offset = len(tokenizer(base_prompt_formatted).input_ids)
-
-    assert (
-        input_len > base_prompt_offset
-    ), f"Please set 'args.sonnet-input-len' higher than {base_prompt_offset}."
-    num_input_lines = round(
-        (input_len - base_prompt_offset) / average_poem_len)
-
-    # First approximately `prefix_len` number of tokens in the
-    # prompt are fixed poem lines.
-    assert (
-        prefix_len > base_prompt_offset
-    ), f"Please set 'args.sonnet-prefix-len' higher than {base_prompt_offset}."
-
-    num_prefix_lines = round(
-        (prefix_len - base_prompt_offset) / average_poem_len)
-    prefix_lines = poem_lines[:num_prefix_lines]
-
-    # Sample the rest of lines per request.
-    sampled_requests: List[Tuple[str, int, int]] = []
-    for _ in range(num_requests):
-        sampled_lines = "".join(
-            prefix_lines +
-            random.sample(poem_lines, num_input_lines - num_prefix_lines))
-
-        prompt = f"{base_prompt}{sampled_lines}"
-        message = [
-            {
-                "role": "user",
-                "content": prompt,
-            },
-        ]
-        prompt_formatted = tokenizer.apply_chat_template(
-            message, add_generation_prompt=True, tokenize=False)
-        prompt_len = len(tokenizer(prompt_formatted).input_ids)
-        sampled_requests.append(
-            (prompt, prompt_formatted, prompt_len, output_len))
-
-    return sampled_requests
-
-
-def sample_random_requests(
-        input_len: int, output_len: int, num_prompts: int, range_ratio: float,
-        tokenizer: PreTrainedTokenizerBase) -> List[Tuple[str, int, int]]:
-
-    input_lens = np.random.randint(
-        int(input_len * range_ratio),
-        input_len + 1,
-        size=num_prompts,
-    )
-    output_lens = np.random.randint(
-        int(output_len * range_ratio),
-        output_len + 1,
-        size=num_prompts,
-    )
-    offsets = np.random.randint(0, tokenizer.vocab_size, size=num_prompts)
-    input_requests = []
-    for i in range(num_prompts):
-        prompt = tokenizer.decode([(offsets[i] + i + j) % tokenizer.vocab_size
-                                   for j in range(input_lens[i])])
-        input_requests.append(
-            (prompt, int(input_lens[i]), int(output_lens[i])))
-
-    return input_requests
-
-
-async def get_request(
-    input_requests: List[Tuple[str, int, int]],
-    request_rate: float,
-) -> AsyncGenerator[Tuple[str, int, int], None]:
-    input_requests = iter(input_requests)
-    for request in input_requests:
-        yield request
-
-        if request_rate == float("inf"):
-            # If the request rate is infinity, then we don't need to wait.
-            continue
-
-        # Sample the request interval from the exponential distribution.
-        interval = np.random.exponential(1.0 / request_rate)
-        # The next request will be sent after the interval.
-        await asyncio.sleep(interval)
-
-
-def calculate_metrics(
-    input_requests: List[Tuple[str, int, int]],
-    outputs: List[RequestFuncOutput],
-    dur_s: float,
-    tokenizer: PreTrainedTokenizerBase,
-) -> Tuple[BenchmarkMetrics, List[int]]:
-    actual_output_lens: List[int] = []
-    total_input = 0
-    completed = 0
-    itls: List[float] = []
-    tpots: List[float] = []
-    ttfts: List[float] = []
-    for i in range(len(outputs)):
-        if outputs[i].success:
-            # We use the tokenizer to count the number of output tokens for all
-            # serving backends instead of looking at len(outputs[i].itl) since
-            # multiple output tokens may be bundled together
-            # Note : this may inflate the output token count slightly
-            output_len = len(
-                tokenizer(outputs[i].generated_text,
-                          add_special_tokens=False).input_ids)
-            actual_output_lens.append(output_len)
-            total_input += input_requests[i][1]
-            if output_len > 1:
-                tpots.append(
-                    (outputs[i].latency - outputs[i].ttft) / (output_len - 1))
-            itls += outputs[i].itl
-            ttfts.append(outputs[i].ttft)
-            completed += 1
-        else:
-            actual_output_lens.append(0)
-
-    if completed == 0:
-        warnings.warn(
-            "All requests failed. This is likely due to a misconfiguration "
-            "on the benchmark arguments.",
-            stacklevel=2)
-    metrics = BenchmarkMetrics(
-        completed=completed,
-        total_input=total_input,
-        total_output=sum(actual_output_lens),
-        request_throughput=completed / dur_s,
-        input_throughput=total_input / dur_s,
-        output_throughput=sum(actual_output_lens) / dur_s,
-        mean_ttft_ms=np.mean(ttfts or 0) *
-        1000,  # ttfts is empty if streaming is not supported by backend
-        median_ttft_ms=np.median(ttfts or 0) * 1000,
-        std_ttft_ms=np.std(ttfts or 0) * 1000,
-        p99_ttft_ms=np.percentile(ttfts or 0, 99) * 1000,
-        mean_tpot_ms=np.mean(tpots or 0) * 1000,
-        median_tpot_ms=np.median(tpots or 0) * 1000,
-        std_tpot_ms=np.std(tpots or 0) * 1000,
-        p99_tpot_ms=np.percentile(tpots or 0, 99) * 1000,
-        mean_itl_ms=np.mean(itls or 0) * 1000,
-        median_itl_ms=np.median(itls or 0) * 1000,
-        std_itl_ms=np.std(itls or 0) * 1000,
-        p99_itl_ms=np.percentile(itls or 0, 99) * 1000,
-    )
-
-    return metrics, actual_output_lens
-
-
-async def benchmark(
-    backend: str,
-    api_url: str,
-    base_url: str,
-    model_id: str,
-    tokenizer: PreTrainedTokenizerBase,
-    input_requests: List[Tuple[str, int, int]],
-    best_of: int,
-    use_beam_search: bool,
-    request_rate: float,
-    disable_tqdm: bool,
-    profile: bool,
-):
-    if backend in ASYNC_REQUEST_FUNCS:
-        request_func = ASYNC_REQUEST_FUNCS[backend]
-    else:
-        raise ValueError(f"Unknown backend: {backend}")
-
-    print("Starting initial single prompt test run...")
-    test_prompt, test_prompt_len, test_output_len = input_requests[0]
-    test_input = RequestFuncInput(
-        model=model_id,
-        prompt=test_prompt,
-        api_url=api_url,
-        prompt_len=test_prompt_len,
-        output_len=test_output_len,
-        best_of=best_of,
-        use_beam_search=use_beam_search,
-    )
-    test_output = await request_func(request_func_input=test_input)
-    if not test_output.success:
-        raise ValueError(
-            "Initial test run failed - Please make sure benchmark arguments "
-            f"are correctly specified. Error: {test_output.error}")
-    else:
-        print("Initial test run completed. Starting main benchmark run...")
-
-    if profile:
-        print("Starting profiler...")
-        profile_input = RequestFuncInput(
-            model=model_id,
-            prompt=test_prompt,
-            api_url=base_url + "/start_profile",
-            prompt_len=test_prompt_len,
-            output_len=test_output_len,
-            best_of=best_of,
-            use_beam_search=use_beam_search,
-        )
-        profile_output = await request_func(request_func_input=profile_input)
-        if profile_output.success:
-            print("Profiler started")
-
-    print(f"Traffic request rate: {request_rate}")
-
-    pbar = None if disable_tqdm else tqdm(total=len(input_requests))
-
-    benchmark_start_time = time.perf_counter()
-    tasks: List[asyncio.Task] = []
-    async for request in get_request(input_requests, request_rate):
-        prompt, prompt_len, output_len = request
-        request_func_input = RequestFuncInput(
-            model=model_id,
-            prompt=prompt,
-            api_url=api_url,
-            prompt_len=prompt_len,
-            output_len=output_len,
-            best_of=best_of,
-            use_beam_search=use_beam_search,
-        )
-        tasks.append(
-            asyncio.create_task(
-                request_func(request_func_input=request_func_input,
-                             pbar=pbar)))
-    outputs: List[RequestFuncOutput] = await asyncio.gather(*tasks)
-
-    if profile:
-        print("Stopping profiler...")
-        profile_input = RequestFuncInput(
-            model=model_id,
-            prompt=test_prompt,
-            api_url=base_url + "/stop_profile",
-            prompt_len=test_prompt_len,
-            output_len=test_output_len,
-            best_of=best_of,
-            use_beam_search=use_beam_search,
-        )
-        profile_output = await request_func(request_func_input=profile_input)
-        if profile_output.success:
-            print("Profiler stopped")
-
-    if pbar is not None:
-        pbar.close()
-
-    benchmark_duration = time.perf_counter() - benchmark_start_time
-
-    metrics, actual_output_lens = calculate_metrics(
-        input_requests=input_requests,
-        outputs=outputs,
-        dur_s=benchmark_duration,
-        tokenizer=tokenizer,
-    )
-
-    print("{s:{c}^{n}}".format(s=' Serving Benchmark Result ', n=50, c='='))
-    print("{:<40} {:<10}".format("Successful requests:", metrics.completed))
-    print("{:<40} {:<10.2f}".format("Benchmark duration (s):",
-                                    benchmark_duration))
-    print("{:<40} {:<10}".format("Total input tokens:", metrics.total_input))
-    print("{:<40} {:<10}".format("Total generated tokens:",
-                                 metrics.total_output))
-    print("{:<40} {:<10.2f}".format("Request throughput (req/s):",
-                                    metrics.request_throughput))
-    print("{:<40} {:<10.2f}".format("Input token throughput (tok/s):",
-                                    metrics.input_throughput))
-    print("{:<40} {:<10.2f}".format("Output token throughput (tok/s):",
-                                    metrics.output_throughput))
-    print("{s:{c}^{n}}".format(s='Time to First Token', n=50, c='-'))
-    print("{:<40} {:<10.2f}".format("Mean TTFT (ms):", metrics.mean_ttft_ms))
-    print("{:<40} {:<10.2f}".format("Median TTFT (ms):",
-                                    metrics.median_ttft_ms))
-    print("{:<40} {:<10.2f}".format("P99 TTFT (ms):", metrics.p99_ttft_ms))
-    print("{s:{c}^{n}}".format(s='Time per Output Token (excl. 1st token)',
-                               n=50,
-                               c='-'))
-    print("{:<40} {:<10.2f}".format("Mean TPOT (ms):", metrics.mean_tpot_ms))
-    print("{:<40} {:<10.2f}".format("Median TPOT (ms):",
-                                    metrics.median_tpot_ms))
-    print("{:<40} {:<10.2f}".format("P99 TPOT (ms):", metrics.p99_tpot_ms))
-    print("{s:{c}^{n}}".format(s='Inter-token Latency', n=50, c='-'))
-    print("{:<40} {:<10.2f}".format("Mean ITL (ms):", metrics.mean_itl_ms))
-    print("{:<40} {:<10.2f}".format("Median ITL (ms):", metrics.median_itl_ms))
-    print("{:<40} {:<10.2f}".format("P99 ITL (ms):", metrics.p99_itl_ms))
-    print("=" * 50)
-
-    result = {
-        "duration": benchmark_duration,
-        "completed": metrics.completed,
-        "total_input_tokens": metrics.total_input,
-        "total_output_tokens": metrics.total_output,
-        "request_throughput": metrics.request_throughput,
-        "input_throughput": metrics.input_throughput,
-        "output_throughput": metrics.output_throughput,
-        "mean_ttft_ms": metrics.mean_ttft_ms,
-        "median_ttft_ms": metrics.median_ttft_ms,
-        "std_ttft_ms": metrics.std_ttft_ms,
-        "p99_ttft_ms": metrics.p99_ttft_ms,
-        "mean_tpot_ms": metrics.mean_tpot_ms,
-        "median_tpot_ms": metrics.median_tpot_ms,
-        "std_tpot_ms": metrics.std_tpot_ms,
-        "p99_tpot_ms": metrics.p99_tpot_ms,
-        "mean_itl_ms": metrics.mean_itl_ms,
-        "median_itl_ms": metrics.median_itl_ms,
-        "std_itl_ms": metrics.std_itl_ms,
-        "p99_itl_ms": metrics.p99_itl_ms,
-        "input_lens": [output.prompt_len for output in outputs],
-        "output_lens": actual_output_lens,
-        "ttfts": [output.ttft for output in outputs],
-        "itls": [output.itl for output in outputs],
-        "generated_texts": [output.generated_text for output in outputs],
-        "errors": [output.error for output in outputs],
-    }
-    return result
-
-
-def main(args: argparse.Namespace):
-    print(args)
-    random.seed(args.seed)
-    np.random.seed(args.seed)
-
-    backend = args.backend
-    model_id = args.model
-    tokenizer_id = args.tokenizer if args.tokenizer is not None else args.model
-
-    if args.base_url is not None:
-        api_url = f"{args.base_url}{args.endpoint}"
-        base_url = f"{args.base_url}"
-    else:
-        api_url = f"http://{args.host}:{args.port}{args.endpoint}"
-        base_url = f"http://{args.host}:{args.port}"
-
-    tokenizer = get_tokenizer(tokenizer_id,
-                              trust_remote_code=args.trust_remote_code)
-
-    if args.dataset is not None:
-        warnings.warn(
-            "The '--dataset' argument will be deprecated in the next "
-            "release. Please use '--dataset-name' and "
-            "'--dataset-path' in the future runs.",
-            stacklevel=2)
-        input_requests = sample_sharegpt_requests(
-            dataset_path=args.dataset,
-            num_requests=args.num_prompts,
-            tokenizer=tokenizer,
-            fixed_output_len=args.sharegpt_output_len,
-        )
-
-    elif args.dataset_name == "sharegpt":
-        input_requests = sample_sharegpt_requests(
-            dataset_path=args.dataset_path,
-            num_requests=args.num_prompts,
-            tokenizer=tokenizer,
-            fixed_output_len=args.sharegpt_output_len,
-        )
-
-    elif args.dataset_name == "sonnet":
-        # Do not format the prompt, pass to message directly
-        if args.backend == "openai-chat":
-            input_requests = sample_sonnet_requests(
-                dataset_path=args.dataset_path,
-                num_requests=args.num_prompts,
-                input_len=args.sonnet_input_len,
-                output_len=args.sonnet_output_len,
-                prefix_len=args.sonnet_prefix_len,
-                tokenizer=tokenizer,
-            )
-            input_requests = [(prompt, prompt_len, output_len)
-                              for prompt, prompt_formatted, prompt_len,
-                              output_len in input_requests]
-        else:
-            assert (
-                tokenizer.chat_template or tokenizer.default_chat_template
-            ), "Tokenizer/model must have chat template for sonnet dataset."
-            input_requests = sample_sonnet_requests(
-                dataset_path=args.dataset_path,
-                num_requests=args.num_prompts,
-                input_len=args.sonnet_input_len,
-                output_len=args.sonnet_output_len,
-                prefix_len=args.sonnet_prefix_len,
-                tokenizer=tokenizer,
-            )
-            input_requests = [(prompt_formatted, prompt_len, output_len)
-                              for prompt, prompt_formatted, prompt_len,
-                              output_len in input_requests]
-
-    elif args.dataset_name == "random":
-        input_requests = sample_random_requests(
-            input_len=args.random_input_len,
-            output_len=args.random_output_len,
-            num_prompts=args.num_prompts,
-            range_ratio=args.random_range_ratio,
-            tokenizer=tokenizer,
-        )
-
-    else:
-        raise ValueError(f"Unknown dataset: {args.dataset_name}")
-
-    benchmark_result = asyncio.run(
-        benchmark(
-            backend=backend,
-            api_url=api_url,
-            base_url=base_url,
-            model_id=model_id,
-            tokenizer=tokenizer,
-            input_requests=input_requests,
-            best_of=args.best_of,
-            use_beam_search=args.use_beam_search,
-            request_rate=args.request_rate,
-            disable_tqdm=args.disable_tqdm,
-            profile=args.profile,
-        ))
-
-    # Save config and results to json
-    if args.save_result:
-        result_json: Dict[str, Any] = {}
-
-        # Setup
-        current_dt = datetime.now().strftime("%Y%m%d-%H%M%S")
-        result_json["date"] = current_dt
-        result_json["backend"] = backend
-        result_json["model_id"] = model_id
-        result_json["tokenizer_id"] = tokenizer_id
-        result_json["best_of"] = args.best_of
-        result_json["use_beam_search"] = args.use_beam_search
-        result_json["num_prompts"] = args.num_prompts
-
-        # Metadata
-        if args.metadata:
-            for item in args.metadata:
-                if "=" in item:
-                    kvstring = item.split("=")
-                    result_json[kvstring[0].strip()] = kvstring[1].strip()
-                else:
-                    raise ValueError(
-                        "Invalid metadata format. Please use KEY=VALUE format."
-                    )
-
-        # Traffic
-        result_json["request_rate"] = (
-            args.request_rate if args.request_rate < float("inf") else "inf")
-
-        # Merge with benchmark result
-        result_json = {**result_json, **benchmark_result}
-
-        # Save to file
-        base_model_id = model_id.split("/")[-1]
-        file_name = f"{backend}-{args.request_rate}qps-{base_model_id}-{current_dt}.json"  #noqa
-        if args.result_filename:
-            file_name = args.result_filename
-        if args.result_dir:
-            file_name = os.path.join(args.result_dir, file_name)
-        with open(file_name, "w") as outfile:
-            json.dump(result_json, outfile)
-
-
-if __name__ == "__main__":
-    parser = FlexibleArgumentParser(
-        description="Benchmark the online serving throughput.")
-    parser.add_argument(
-        "--backend",
-        type=str,
-        default="vllm",
-        choices=list(ASYNC_REQUEST_FUNCS.keys()),
-    )
-    parser.add_argument(
-        "--base-url",
-        type=str,
-        default=None,
-        help="Server or API base url if not using http host and port.",
-    )
-    parser.add_argument("--host", type=str, default="localhost")
-    parser.add_argument("--port", type=int, default=8000)
-    parser.add_argument(
-        "--endpoint",
-        type=str,
-        default="/v1/completions",
-        help="API endpoint.",
-    )
-    parser.add_argument(
-        "--dataset",
-        type=str,
-        default=None,
-        help="Path to the ShareGPT dataset, will be deprecated in the "
-        "next release.",
-    )
-    parser.add_argument(
-        "--dataset-name",
-        type=str,
-        default="sharegpt",
-        choices=["sharegpt", "sonnet", "random"],
-        help="Name of the dataset to benchmark on.",
-    )
-    parser.add_argument("--dataset-path",
-                        type=str,
-                        default=None,
-                        help="Path to the dataset.")
-    parser.add_argument(
-        "--model",
-        type=str,
-        required=True,
-        help="Name of the model.",
-    )
-    parser.add_argument(
-        "--tokenizer",
-        type=str,
-        help=
-        "Name or path of the tokenizer, if not using the default tokenizer.",  # noqa: E501
-    )
-    parser.add_argument(
-        "--best-of",
-        type=int,
-        default=1,
-        help="Generates `best_of` sequences per prompt and "
-        "returns the best one.",
-    )
-    parser.add_argument("--use-beam-search", action="store_true")
-    parser.add_argument(
-        "--num-prompts",
-        type=int,
-        default=1000,
-        help="Number of prompts to process.",
-    )
-    parser.add_argument(
-        "--sharegpt-output-len",
-        type=int,
-        default=None,
-        help="Output length for each request. Overrides the output length "
-        "from the ShareGPT dataset.")
-    parser.add_argument(
-        "--sonnet-input-len",
-        type=int,
-        default=550,
-        help=
-        "Number of input tokens per request, used only for sonnet dataset.",
-    )
-    parser.add_argument(
-        "--sonnet-output-len",
-        type=int,
-        default=150,
-        help=
-        "Number of output tokens per request, used only for sonnet dataset.",
-    )
-    parser.add_argument(
-        "--sonnet-prefix-len",
-        type=int,
-        default=200,
-        help=
-        "Number of prefix tokens per request, used only for sonnet dataset.",
-    )
-    parser.add_argument(
-        "--random-input-len",
-        type=int,
-        default=1024,
-        help=
-        "Number of input tokens per request, used only for random sampling.",
-    )
-    parser.add_argument(
-        "--random-output-len",
-        type=int,
-        default=128,
-        help=
-        "Number of output tokens per request, used only for random sampling.",
-    )
-    parser.add_argument(
-        "--random-range-ratio",
-        type=float,
-        default=1.0,
-        help="Range of sampled ratio of input/output length, "
-        "used only for random sampling.",
-    )
-    parser.add_argument(
-        "--request-rate",
-        type=float,
-        default=float("inf"),
-        help="Number of requests per second. If this is inf, "
-        "then all the requests are sent at time 0. "
-        "Otherwise, we use Poisson process to synthesize "
-        "the request arrival times.",
-    )
-    parser.add_argument("--seed", type=int, default=0)
-    parser.add_argument(
-        "--trust-remote-code",
-        action="store_true",
-        help="Trust remote code from huggingface",
-    )
-    parser.add_argument(
-        "--disable-tqdm",
-        action="store_true",
-        help="Specify to disable tqdm progress bar.",
-    )
-    parser.add_argument(
-        "--profile",
-        action="store_true",
-        help="Use Torch Profiler. The endpoint must be launched with "
-        "VLLM_TORCH_PROFILER_DIR to enable profiler.",
-    )
-    parser.add_argument(
-        "--save-result",
-        action="store_true",
-        help="Specify to save benchmark results to a json file",
-    )
-    parser.add_argument(
-        "--metadata",
-        metavar="KEY=VALUE",
-        nargs="*",
-        help="Key-value pairs (e.g, --metadata version=0.3.3 tp=1) "
-        "for metadata of this run to be saved in the result JSON file "
-        "for record keeping purposes.",
-    )
-    parser.add_argument(
-        "--result-dir",
-        type=str,
-        default=None,
-        help="Specify directory to save benchmark json results."
-        "If not specified, results are saved in the current directory.",
-    )
-    parser.add_argument(
-        "--result-filename",
-        type=str,
-        default=None,
-        help="Specify the filename to save benchmark json results."
-        "If not specified, results will be saved in "
-        "{backend}-{args.request_rate}qps-{base_model_id}-{current_dt}.json"
-        " format.",
-    )
-
-    args = parser.parse_args()
-    main(args)

+ 0 - 518
recipes/3p_integrations/crusoe/vllm-fp8/benchmarks/sonnet.txt

@@ -1,518 +0,0 @@
-FROM fairest creatures we desire increase,
-That thereby beauty's rose might never die,
-But as the riper should by time decease,
-His tender heir might bear his memory:
-But thou, contracted to thine own bright eyes,
-Feed'st thy light'st flame with self-substantial fuel,
-Making a famine where abundance lies,
-Thyself thy foe, to thy sweet self too cruel.
-Thou that art now the world's fresh ornament
-And only herald to the gaudy spring,
-Within thine own bud buriest thy content
-And, tender churl, makest waste in niggarding.
-Pity the world, or else this glutton be,
-To eat the world's due, by the grave and thee.
-When forty winters shall beseige thy brow,
-And dig deep trenches in thy beauty's field,
-Thy youth's proud livery, so gazed on now,
-Will be a tatter'd weed, of small worth held:
-Then being ask'd where all thy beauty lies,
-Where all the treasure of thy lusty days,
-To say, within thine own deep-sunken eyes,
-Were an all-eating shame and thriftless praise.
-How much more praise deserved thy beauty's use,
-If thou couldst answer 'This fair child of mine
-Shall sum my count and make my old excuse,'
-Proving his beauty by succession thine!
-This were to be new made when thou art old,
-And see thy blood warm when thou feel'st it cold.
-Look in thy glass, and tell the face thou viewest
-Now is the time that face should form another;
-Whose fresh repair if now thou not renewest,
-Thou dost beguile the world, unbless some mother.
-For where is she so fair whose unear'd womb
-Disdains the tillage of thy husbandry?
-Or who is he so fond will be the tomb
-Of his self-love, to stop posterity?
-Thou art thy mother's glass, and she in thee
-Calls back the lovely April of her prime:
-So thou through windows of thine age shall see
-Despite of wrinkles this thy golden time.
-But if thou live, remember'd not to be,
-Die single, and thine image dies with thee.
-Unthrifty loveliness, why dost thou spend
-Upon thyself thy beauty's legacy?
-Nature's bequest gives nothing but doth lend,
-And being frank she lends to those are free.
-Then, beauteous niggard, why dost thou abuse
-The bounteous largess given thee to give?
-Profitless usurer, why dost thou use
-So great a sum of sums, yet canst not live?
-For having traffic with thyself alone,
-Thou of thyself thy sweet self dost deceive.
-Then how, when nature calls thee to be gone,
-What acceptable audit canst thou leave?
-Thy unused beauty must be tomb'd with thee,
-Which, used, lives th' executor to be.
-Those hours, that with gentle work did frame
-The lovely gaze where every eye doth dwell,
-Will play the tyrants to the very same
-And that unfair which fairly doth excel:
-For never-resting time leads summer on
-To hideous winter and confounds him there;
-Sap cheque'd with frost and lusty leaves quite gone,
-Beauty o'ersnow'd and bareness every where:
-Then, were not summer's distillation left,
-A liquid prisoner pent in walls of glass,
-Beauty's effect with beauty were bereft,
-Nor it nor no remembrance what it was:
-But flowers distill'd though they with winter meet,
-Leese but their show; their substance still lives sweet.
-Then let not winter's ragged hand deface
-In thee thy summer, ere thou be distill'd:
-Make sweet some vial; treasure thou some place
-With beauty's treasure, ere it be self-kill'd.
-That use is not forbidden usury,
-Which happies those that pay the willing loan;
-That's for thyself to breed another thee,
-Or ten times happier, be it ten for one;
-Ten times thyself were happier than thou art,
-If ten of thine ten times refigured thee:
-Then what could death do, if thou shouldst depart,
-Leaving thee living in posterity?
-Be not self-will'd, for thou art much too fair
-To be death's conquest and make worms thine heir.
-Lo! in the orient when the gracious light
-Lifts up his burning head, each under eye
-Doth homage to his new-appearing sight,
-Serving with looks his sacred majesty;
-And having climb'd the steep-up heavenly hill,
-Resembling strong youth in his middle age,
-yet mortal looks adore his beauty still,
-Attending on his golden pilgrimage;
-But when from highmost pitch, with weary car,
-Like feeble age, he reeleth from the day,
-The eyes, 'fore duteous, now converted are
-From his low tract and look another way:
-So thou, thyself out-going in thy noon,
-Unlook'd on diest, unless thou get a son.
-Music to hear, why hear'st thou music sadly?
-Sweets with sweets war not, joy delights in joy.
-Why lovest thou that which thou receivest not gladly,
-Or else receivest with pleasure thine annoy?
-If the true concord of well-tuned sounds,
-By unions married, do offend thine ear,
-They do but sweetly chide thee, who confounds
-In singleness the parts that thou shouldst bear.
-Mark how one string, sweet husband to another,
-Strikes each in each by mutual ordering,
-Resembling sire and child and happy mother
-Who all in one, one pleasing note do sing:
-Whose speechless song, being many, seeming one,
-Sings this to thee: 'thou single wilt prove none.'
-Is it for fear to wet a widow's eye
-That thou consumest thyself in single life?
-Ah! if thou issueless shalt hap to die.
-The world will wail thee, like a makeless wife;
-The world will be thy widow and still weep
-That thou no form of thee hast left behind,
-When every private widow well may keep
-By children's eyes her husband's shape in mind.
-Look, what an unthrift in the world doth spend
-Shifts but his place, for still the world enjoys it;
-But beauty's waste hath in the world an end,
-And kept unused, the user so destroys it.
-No love toward others in that bosom sits
-That on himself such murderous shame commits.
-For shame! deny that thou bear'st love to any,
-Who for thyself art so unprovident.
-Grant, if thou wilt, thou art beloved of many,
-But that thou none lovest is most evident;
-For thou art so possess'd with murderous hate
-That 'gainst thyself thou stick'st not to conspire.
-Seeking that beauteous roof to ruinate
-Which to repair should be thy chief desire.
-O, change thy thought, that I may change my mind!
-Shall hate be fairer lodged than gentle love?
-Be, as thy presence is, gracious and kind,
-Or to thyself at least kind-hearted prove:
-Make thee another self, for love of me,
-That beauty still may live in thine or thee.
-As fast as thou shalt wane, so fast thou growest
-In one of thine, from that which thou departest;
-And that fresh blood which youngly thou bestowest
-Thou mayst call thine when thou from youth convertest.
-Herein lives wisdom, beauty and increase:
-Without this, folly, age and cold decay:
-If all were minded so, the times should cease
-And threescore year would make the world away.
-Let those whom Nature hath not made for store,
-Harsh featureless and rude, barrenly perish:
-Look, whom she best endow'd she gave the more;
-Which bounteous gift thou shouldst in bounty cherish:
-She carved thee for her seal, and meant thereby
-Thou shouldst print more, not let that copy die.
-When I do count the clock that tells the time,
-And see the brave day sunk in hideous night;
-When I behold the violet past prime,
-And sable curls all silver'd o'er with white;
-When lofty trees I see barren of leaves
-Which erst from heat did canopy the herd,
-And summer's green all girded up in sheaves
-Borne on the bier with white and bristly beard,
-Then of thy beauty do I question make,
-That thou among the wastes of time must go,
-Since sweets and beauties do themselves forsake
-And die as fast as they see others grow;
-And nothing 'gainst Time's scythe can make defence
-Save breed, to brave him when he takes thee hence.
-O, that you were yourself! but, love, you are
-No longer yours than you yourself here live:
-Against this coming end you should prepare,
-And your sweet semblance to some other give.
-So should that beauty which you hold in lease
-Find no determination: then you were
-Yourself again after yourself's decease,
-When your sweet issue your sweet form should bear.
-Who lets so fair a house fall to decay,
-Which husbandry in honour might uphold
-Against the stormy gusts of winter's day
-And barren rage of death's eternal cold?
-O, none but unthrifts! Dear my love, you know
-You had a father: let your son say so.
-Not from the stars do I my judgment pluck;
-And yet methinks I have astronomy,
-But not to tell of good or evil luck,
-Of plagues, of dearths, or seasons' quality;
-Nor can I fortune to brief minutes tell,
-Pointing to each his thunder, rain and wind,
-Or say with princes if it shall go well,
-By oft predict that I in heaven find:
-But from thine eyes my knowledge I derive,
-And, constant stars, in them I read such art
-As truth and beauty shall together thrive,
-If from thyself to store thou wouldst convert;
-Or else of thee this I prognosticate:
-Thy end is truth's and beauty's doom and date.
-When I consider every thing that grows
-Holds in perfection but a little moment,
-That this huge stage presenteth nought but shows
-Whereon the stars in secret influence comment;
-When I perceive that men as plants increase,
-Cheered and cheque'd even by the self-same sky,
-Vaunt in their youthful sap, at height decrease,
-And wear their brave state out of memory;
-Then the conceit of this inconstant stay
-Sets you most rich in youth before my sight,
-Where wasteful Time debateth with Decay,
-To change your day of youth to sullied night;
-And all in war with Time for love of you,
-As he takes from you, I engraft you new.
-But wherefore do not you a mightier way
-Make war upon this bloody tyrant, Time?
-And fortify yourself in your decay
-With means more blessed than my barren rhyme?
-Now stand you on the top of happy hours,
-And many maiden gardens yet unset
-With virtuous wish would bear your living flowers,
-Much liker than your painted counterfeit:
-So should the lines of life that life repair,
-Which this, Time's pencil, or my pupil pen,
-Neither in inward worth nor outward fair,
-Can make you live yourself in eyes of men.
-To give away yourself keeps yourself still,
-And you must live, drawn by your own sweet skill.
-Who will believe my verse in time to come,
-If it were fill'd with your most high deserts?
-Though yet, heaven knows, it is but as a tomb
-Which hides your life and shows not half your parts.
-If I could write the beauty of your eyes
-And in fresh numbers number all your graces,
-The age to come would say 'This poet lies:
-Such heavenly touches ne'er touch'd earthly faces.'
-So should my papers yellow'd with their age
-Be scorn'd like old men of less truth than tongue,
-And your true rights be term'd a poet's rage
-And stretched metre of an antique song:
-But were some child of yours alive that time,
-You should live twice; in it and in my rhyme.
-Shall I compare thee to a summer's day?
-Thou art more lovely and more temperate:
-Rough winds do shake the darling buds of May,
-And summer's lease hath all too short a date:
-Sometime too hot the eye of heaven shines,
-And often is his gold complexion dimm'd;
-And every fair from fair sometime declines,
-By chance or nature's changing course untrimm'd;
-But thy eternal summer shall not fade
-Nor lose possession of that fair thou owest;
-Nor shall Death brag thou wander'st in his shade,
-When in eternal lines to time thou growest:
-So long as men can breathe or eyes can see,
-So long lives this and this gives life to thee.
-Devouring Time, blunt thou the lion's paws,
-And make the earth devour her own sweet brood;
-Pluck the keen teeth from the fierce tiger's jaws,
-And burn the long-lived phoenix in her blood;
-Make glad and sorry seasons as thou fleets,
-And do whate'er thou wilt, swift-footed Time,
-To the wide world and all her fading sweets;
-But I forbid thee one most heinous crime:
-O, carve not with thy hours my love's fair brow,
-Nor draw no lines there with thine antique pen;
-Him in thy course untainted do allow
-For beauty's pattern to succeeding men.
-Yet, do thy worst, old Time: despite thy wrong,
-My love shall in my verse ever live young.
-A woman's face with Nature's own hand painted
-Hast thou, the master-mistress of my passion;
-A woman's gentle heart, but not acquainted
-With shifting change, as is false women's fashion;
-An eye more bright than theirs, less false in rolling,
-Gilding the object whereupon it gazeth;
-A man in hue, all 'hues' in his controlling,
-Much steals men's eyes and women's souls amazeth.
-And for a woman wert thou first created;
-Till Nature, as she wrought thee, fell a-doting,
-And by addition me of thee defeated,
-By adding one thing to my purpose nothing.
-But since she prick'd thee out for women's pleasure,
-Mine be thy love and thy love's use their treasure.
-So is it not with me as with that Muse
-Stirr'd by a painted beauty to his verse,
-Who heaven itself for ornament doth use
-And every fair with his fair doth rehearse
-Making a couplement of proud compare,
-With sun and moon, with earth and sea's rich gems,
-With April's first-born flowers, and all things rare
-That heaven's air in this huge rondure hems.
-O' let me, true in love, but truly write,
-And then believe me, my love is as fair
-As any mother's child, though not so bright
-As those gold candles fix'd in heaven's air:
-Let them say more than like of hearsay well;
-I will not praise that purpose not to sell.
-My glass shall not persuade me I am old,
-So long as youth and thou are of one date;
-But when in thee time's furrows I behold,
-Then look I death my days should expiate.
-For all that beauty that doth cover thee
-Is but the seemly raiment of my heart,
-Which in thy breast doth live, as thine in me:
-How can I then be elder than thou art?
-O, therefore, love, be of thyself so wary
-As I, not for myself, but for thee will;
-Bearing thy heart, which I will keep so chary
-As tender nurse her babe from faring ill.
-Presume not on thy heart when mine is slain;
-Thou gavest me thine, not to give back again.
-As an unperfect actor on the stage
-Who with his fear is put besides his part,
-Or some fierce thing replete with too much rage,
-Whose strength's abundance weakens his own heart.
-So I, for fear of trust, forget to say
-The perfect ceremony of love's rite,
-And in mine own love's strength seem to decay,
-O'ercharged with burden of mine own love's might.
-O, let my books be then the eloquence
-And dumb presagers of my speaking breast,
-Who plead for love and look for recompense
-More than that tongue that more hath more express'd.
-O, learn to read what silent love hath writ:
-To hear with eyes belongs to love's fine wit.
-Mine eye hath play'd the painter and hath stell'd
-Thy beauty's form in table of my heart;
-My body is the frame wherein 'tis held,
-And perspective it is the painter's art.
-For through the painter must you see his skill,
-To find where your true image pictured lies;
-Which in my bosom's shop is hanging still,
-That hath his windows glazed with thine eyes.
-Now see what good turns eyes for eyes have done:
-Mine eyes have drawn thy shape, and thine for me
-Are windows to my breast, where-through the sun
-Delights to peep, to gaze therein on thee;
-Yet eyes this cunning want to grace their art;
-They draw but what they see, know not the heart.
-Let those who are in favour with their stars
-Of public honour and proud titles boast,
-Whilst I, whom fortune of such triumph bars,
-Unlook'd for joy in that I honour most.
-Great princes' favourites their fair leaves spread
-But as the marigold at the sun's eye,
-And in themselves their pride lies buried,
-For at a frown they in their glory die.
-The painful warrior famoused for fight,
-After a thousand victories once foil'd,
-Is from the book of honour razed quite,
-And all the rest forgot for which he toil'd:
-Then happy I, that love and am beloved
-Where I may not remove nor be removed.
-Lord of my love, to whom in vassalage
-Thy merit hath my duty strongly knit,
-To thee I send this written embassage,
-To witness duty, not to show my wit:
-Duty so great, which wit so poor as mine
-May make seem bare, in wanting words to show it,
-But that I hope some good conceit of thine
-In thy soul's thought, all naked, will bestow it;
-Till whatsoever star that guides my moving
-Points on me graciously with fair aspect
-And puts apparel on my tatter'd loving,
-To show me worthy of thy sweet respect:
-Then may I dare to boast how I do love thee;
-Till then not show my head where thou mayst prove me.
-Weary with toil, I haste me to my bed,
-The dear repose for limbs with travel tired;
-But then begins a journey in my head,
-To work my mind, when body's work's expired:
-For then my thoughts, from far where I abide,
-Intend a zealous pilgrimage to thee,
-And keep my drooping eyelids open wide,
-Looking on darkness which the blind do see
-Save that my soul's imaginary sight
-Presents thy shadow to my sightless view,
-Which, like a jewel hung in ghastly night,
-Makes black night beauteous and her old face new.
-Lo! thus, by day my limbs, by night my mind,
-For thee and for myself no quiet find.
-How can I then return in happy plight,
-That am debarr'd the benefit of rest?
-When day's oppression is not eased by night,
-But day by night, and night by day, oppress'd?
-And each, though enemies to either's reign,
-Do in consent shake hands to torture me;
-The one by toil, the other to complain
-How far I toil, still farther off from thee.
-I tell the day, to please them thou art bright
-And dost him grace when clouds do blot the heaven:
-So flatter I the swart-complexion'd night,
-When sparkling stars twire not thou gild'st the even.
-But day doth daily draw my sorrows longer
-And night doth nightly make grief's strength seem stronger.
-When, in disgrace with fortune and men's eyes,
-I all alone beweep my outcast state
-And trouble deal heaven with my bootless cries
-And look upon myself and curse my fate,
-Wishing me like to one more rich in hope,
-Featured like him, like him with friends possess'd,
-Desiring this man's art and that man's scope,
-With what I most enjoy contented least;
-Yet in these thoughts myself almost despising,
-Haply I think on thee, and then my state,
-Like to the lark at break of day arising
-From sullen earth, sings hymns at heaven's gate;
-For thy sweet love remember'd such wealth brings
-That then I scorn to change my state with kings.
-When to the sessions of sweet silent thought
-I summon up remembrance of things past,
-I sigh the lack of many a thing I sought,
-And with old woes new wail my dear time's waste:
-Then can I drown an eye, unused to flow,
-For precious friends hid in death's dateless night,
-And weep afresh love's long since cancell'd woe,
-And moan the expense of many a vanish'd sight:
-Then can I grieve at grievances foregone,
-And heavily from woe to woe tell o'er
-The sad account of fore-bemoaned moan,
-Which I new pay as if not paid before.
-But if the while I think on thee, dear friend,
-All losses are restored and sorrows end.
-Thy bosom is endeared with all hearts,
-Which I by lacking have supposed dead,
-And there reigns love and all love's loving parts,
-And all those friends which I thought buried.
-How many a holy and obsequious tear
-Hath dear religious love stol'n from mine eye
-As interest of the dead, which now appear
-But things removed that hidden in thee lie!
-Thou art the grave where buried love doth live,
-Hung with the trophies of my lovers gone,
-Who all their parts of me to thee did give;
-That due of many now is thine alone:
-Their images I loved I view in thee,
-And thou, all they, hast all the all of me.
-If thou survive my well-contented day,
-When that churl Death my bones with dust shall cover,
-And shalt by fortune once more re-survey
-These poor rude lines of thy deceased lover,
-Compare them with the bettering of the time,
-And though they be outstripp'd by every pen,
-Reserve them for my love, not for their rhyme,
-Exceeded by the height of happier men.
-O, then vouchsafe me but this loving thought:
-'Had my friend's Muse grown with this growing age,
-A dearer birth than this his love had brought,
-To march in ranks of better equipage:
-But since he died and poets better prove,
-Theirs for their style I'll read, his for his love.'
-Full many a glorious morning have I seen
-Flatter the mountain-tops with sovereign eye,
-Kissing with golden face the meadows green,
-Gilding pale streams with heavenly alchemy;
-Anon permit the basest clouds to ride
-With ugly rack on his celestial face,
-And from the forlorn world his visage hide,
-Stealing unseen to west with this disgrace:
-Even so my sun one early morn did shine
-With all triumphant splendor on my brow;
-But out, alack! he was but one hour mine;
-The region cloud hath mask'd him from me now.
-Yet him for this my love no whit disdaineth;
-Suns of the world may stain when heaven's sun staineth.
-Why didst thou promise such a beauteous day,
-And make me travel forth without my cloak,
-To let base clouds o'ertake me in my way,
-Hiding thy bravery in their rotten smoke?
-'Tis not enough that through the cloud thou break,
-To dry the rain on my storm-beaten face,
-For no man well of such a salve can speak
-That heals the wound and cures not the disgrace:
-Nor can thy shame give physic to my grief;
-Though thou repent, yet I have still the loss:
-The offender's sorrow lends but weak relief
-To him that bears the strong offence's cross.
-Ah! but those tears are pearl which thy love sheds,
-And they are rich and ransom all ill deeds.
-No more be grieved at that which thou hast done:
-Roses have thorns, and silver fountains mud;
-Clouds and eclipses stain both moon and sun,
-And loathsome canker lives in sweetest bud.
-All men make faults, and even I in this,
-Authorizing thy trespass with compare,
-Myself corrupting, salving thy amiss,
-Excusing thy sins more than thy sins are;
-For to thy sensual fault I bring in sense--
-Thy adverse party is thy advocate--
-And 'gainst myself a lawful plea commence:
-Such civil war is in my love and hate
-That I an accessary needs must be
-To that sweet thief which sourly robs from me.
-Let me confess that we two must be twain,
-Although our undivided loves are one:
-So shall those blots that do with me remain
-Without thy help by me be borne alone.
-In our two loves there is but one respect,
-Though in our lives a separable spite,
-Which though it alter not love's sole effect,
-Yet doth it steal sweet hours from love's delight.
-I may not evermore acknowledge thee,
-Lest my bewailed guilt should do thee shame,
-Nor thou with public kindness honour me,
-Unless thou take that honour from thy name:
-But do not so; I love thee in such sort
-As, thou being mine, mine is thy good report.
-As a decrepit father takes delight
-To see his active child do deeds of youth,
-So I, made lame by fortune's dearest spite,
-Take all my comfort of thy worth and truth.
-For whether beauty, birth, or wealth, or wit,
-Or any of these all, or all, or more,
-Entitled in thy parts do crowned sit,
-I make my love engrafted to this store:
-So then I am not lame, poor, nor despised,
-Whilst that this shadow doth such substance give
-That I in thy abundance am sufficed
-And by a part of all thy glory live.
-Look, what is best, that best I wish in thee:
-This wish I have; then ten times happy me!

+ 0 - 59
recipes/3p_integrations/crusoe/vllm-fp8/convert_hf_to_fp8.py

@@ -1,59 +0,0 @@
-import torch
-import argparse
-from transformers import AutoTokenizer
-from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
-from llmcompressor.transformers.compression.helpers import (  # noqa
-    calculate_offload_device_map,
-    custom_offload_device_map,
-)
-
-def main():
-    parser = argparse.ArgumentParser(description="Compress a language model.")
-    parser.add_argument("model_stub", type=str, help="The model stub (e.g., 'bosonai/Higgs-Llama-3-70B')")
-    args = parser.parse_args()
-
-    recipe = """
-    quant_stage:
-        quant_modifiers:
-            QuantizationModifier:
-                ignore: ["lm_head"]
-                config_groups:
-                    group_0:
-                        weights:
-                            num_bits: 8
-                            type: float
-                            strategy: channel
-                            dynamic: false
-                            symmetric: true
-                        input_activations:
-                            num_bits: 8
-                            type: float
-                            strategy: token
-                            dynamic: true
-                            symmetric: true
-                        targets: ["Linear"]
-    """
-
-    model_stub = args.model_stub
-    model_name = model_stub.split("/")[-1]
-
-    device_map = calculate_offload_device_map(
-        model_stub, reserve_for_hessians=False, num_gpus=1, torch_dtype=torch.float16
-    )
-
-    model = SparseAutoModelForCausalLM.from_pretrained(
-        model_stub, torch_dtype=torch.float16, device_map=device_map
-    )
-
-    output_dir = f"./{model_name}-FP8-dynamic"
-
-    oneshot(
-        model=model,
-        recipe=recipe,
-        output_dir=output_dir,
-        save_compressed=True,
-        tokenizer=AutoTokenizer.from_pretrained(model_stub),
-    )
-
-if __name__ == "__main__":
-    main()

+ 0 - 41
recipes/3p_integrations/crusoe/vllm-fp8/main.tf

@@ -1,41 +0,0 @@
-terraform {
-  required_providers {
-    crusoe = {
-      source = "registry.terraform.io/crusoecloud/crusoe"
-    }
-  }
-}
-
-locals {
-  my_ssh_key = file("~/.ssh/id_ed25519.pub")
-}
-
-// new VM
-resource "crusoe_compute_instance" "vllm_vm" {
-  name     = "vllm-example"
-  type     = "l40s-48gb.8x"
-  location = "us-southcentral1-a"
-
-  # specify the base image
-  image = "ubuntu22.04-nvidia-slurm:12.4"
-
-  disks = [
-    {
-      id              = crusoe_storage_disk.vllm_data_disk.id
-      mode            = "read-write"
-      attachment_type = "data"
-    }
-  ]
-
-  ssh_key = local.my_ssh_key
-}
-
-resource "crusoe_storage_disk" "vllm_data_disk" {
-  name     = "vllm-example-disk"
-  size     = "256GiB"
-  location = "us-southcentral1-a"
-}
-
-output "instance_public_ip" {
-  value = crusoe_compute_instance.vllm_vm.network_interfaces[0].public_ipv4.address
-}

+ 0 - 72
recipes/3p_integrations/crusoe/vllm-fp8/plot.py

@@ -1,72 +0,0 @@
-import json
-import os
-import re
-import matplotlib.pyplot as plt
-import numpy as np
-from collections import defaultdict
-
-def extract_info_from_filename(filename):
-    pattern = r'(?P<backend>[^-]+)-(?P<qps>\d+\.\d+)qps-(?P<model>.+)-(?P<date>\d{8}-\d{6})\.json'
-    match = re.match(pattern, filename)
-    if match:
-        return {
-            'qps': float(match.group('qps')),
-            'model': match.group('model')
-        }
-    return None
-
-def read_json_files(directory):
-    data_tpot = defaultdict(list)
-    data_ttft = defaultdict(list)
-    for filename in os.listdir(directory):
-        if filename.endswith('.json'):
-            filepath = os.path.join(directory, filename)
-            file_info = extract_info_from_filename(filename)
-            if file_info:
-                with open(filepath, 'r') as file:
-                    json_data = json.load(file)
-                    median_tpot = json_data.get('median_tpot_ms')
-                    std_tpot = json_data.get('std_tpot_ms')
-                    median_ttft = json_data.get('median_ttft_ms')
-                    std_ttft = json_data.get('std_ttft_ms')
-                    if all(v is not None for v in [median_tpot, std_tpot, median_ttft, std_ttft]):
-                        data_tpot[file_info['model']].append((file_info['qps'], median_tpot, std_tpot))
-                        data_ttft[file_info['model']].append((file_info['qps'], median_ttft, std_ttft))
-    return {
-        'tpot': {model: sorted(points) for model, points in data_tpot.items()},
-        'ttft': {model: sorted(points) for model, points in data_ttft.items()}
-    }
-
-def create_chart(data, metric, filename):
-    plt.figure(figsize=(12, 6))
-    
-    colors = plt.cm.rainbow(np.linspace(0, 1, len(data)))
-    for (model, points), color in zip(data.items(), colors):
-        qps_values, median_values, std_values = zip(*points)
-        plt.errorbar(qps_values, median_values, yerr=std_values, fmt='o-', capsize=5, capthick=2, label=model, color=color)
-        plt.fill_between(qps_values, 
-                         np.array(median_values) - np.array(std_values),
-                         np.array(median_values) + np.array(std_values),
-                         alpha=0.2, color=color)
-
-    plt.xlabel('QPS (Queries Per Second)')
-    plt.ylabel(f'Median {metric.upper()} (ms)')
-    plt.title(f'Median {metric.upper()} vs QPS with Standard Deviation')
-    plt.grid(True)
-    plt.legend(title='Model', bbox_to_anchor=(1.05, 1), loc='upper left')
-    plt.tight_layout()
-    plt.savefig(filename, dpi=300, bbox_inches='tight')
-    plt.close()
-
-def main():
-    directory = './'
-    data = read_json_files(directory)
-    if data['tpot'] and data['ttft']:
-        create_chart(data['tpot'], 'tpot', 'tpot_vs_qps_chart.png')
-        create_chart(data['ttft'], 'ttft', 'ttft_vs_qps_chart.png')
-        print("Charts have been saved as 'tpot_vs_qps_chart.png' and 'ttft_vs_qps_chart.png'")
-    else:
-        print("No valid data found in the specified directory.")
-
-if __name__ == "__main__":
-    main()

+ 0 - 12
recipes/3p_integrations/crusoe/vllm-fp8/pyproject.toml

@@ -1,12 +0,0 @@
-[project]
-name = "vllm-l40s"
-version = "0.1.0"
-description = "Add your description here"
-readme = "README.md"
-requires-python = ">=3.10"
-dependencies = [
-    "setuptools>=74.0.0",
-    "vllm>=0.5.5",
-    "matplotlib>=3.9.2",
-    "llmcompressor>=0.1.0",
-]

+ 0 - 12
recipes/3p_integrations/crusoe/vllm-fp8/run_benchmark.sh

@@ -1,12 +0,0 @@
-TOTAL_SECONDS=120
-QPS_RATES=("1" "3" "5" "7" "9")
-
-for QPS in ${QPS_RATES[@]}; do
-    NUM_PROMPTS=$((TOTAL_SECONDS * QPS))
-    echo "===== RUNNING NUM_PROMPTS = $NUM_PROMPTS QPS = $QPS ====="
-
-    uv run benchmarks/benchmark_serving.py \
-        --model $MODEL \
-        --dataset-name sonnet --sonnet-input-len 550 --sonnet-output-len 150 --dataset-path benchmarks/sonnet.txt \
-        --num-prompts $NUM_PROMPTS --request-rate $QPS --save-result
-done

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 1038
recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/Function-Calling-101-Ecommerce.ipynb


+ 0 - 41
recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/customers.csv

@@ -1,41 +0,0 @@
-customer_id,name,email,address
-1,Erin Boyle MD,erin.boyle.md@example.com,"165 Brown Springs
-Michaelport, IL 60228"
-2,Matthew Saunders,matthew.saunders@example.com,"219 Steven Mountains
-Port Gabriellafort, OH 52281"
-3,Amanda Anderson,amanda.anderson@example.com,"498 Laurie Glens
-Mitchelltown, CT 93655"
-4,Julian Butler,julian.butler@example.com,"909 Rodriguez Harbors Suite 119
-New Tracyburgh, MS 15487"
-5,Zachary Mitchell MD,zachary.mitchell.md@example.com,"9087 Matthew Drives
-Caitlynshire, OR 42442"
-6,Troy Bennett,troy.bennett@example.com,"73329 Kimberly Loaf Apt. 029
-Shellyborough, TX 55939"
-7,Allison Hall,allison.hall@example.com,"210 Shannon Camp
-New Michael, MO 65990"
-8,Carolyn Davis,carolyn.davis@example.com,"64228 Carol Courts Suite 087
-New Micheleshire, MT 42516"
-9,Cindy Munoz,cindy.munoz@example.com,"1722 Christine Plaza
-Danielport, UT 12261"
-10,Tom Testuser,tom.testuser@example.com,"451 Victoria Bridge Suite 529
-Pageton, WI 27404"
-11,Charles Walker,charles.walker@example.com,"2077 Lamb Drive
-Salazarton, IN 54619"
-12,Brianna Molina,brianna.molina@example.com,"586 Khan Mills Suite 202
-Lake Dominique, VA 98527"
-13,Austin Andrade,austin.andrade@example.com,"4857 Donna Cliffs
-Floydstad, PR 82540"
-14,Brandon Andrade,brandon.andrade@example.com,"906 Olivia Motorway
-Kelleyfort, AK 48960"
-15,Diane Lam,diane.lam@example.com,"070 Eric Rapid Suite 159
-Townsendbury, MI 57664"
-16,Jason Kelly,jason.kelly@example.com,"873 Angela Track Apt. 972
-Stephenville, NV 32705"
-17,Mr. Mitchell Saunders,mr..mitchell.saunders@example.com,"USS White
-FPO AE 91058"
-18,Regina Ross,regina.ross@example.com,"91857 Wendy Place
-East Charlesshire, CA 43705"
-19,Mrs. Denise May DDS,mrs..denise.may.dds@example.com,"64590 Kathleen Cove Apt. 736
-Derrickton, AK 05935"
-20,Lisa Boyle,lisa.boyle@example.com,"USNS Russell
-FPO AE 51528"

+ 0 - 21
recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/orders.csv

@@ -1,21 +0,0 @@
-order_id,product_id,customer_id,order_date
-1,13,18,2024-02-15 15:15
-2,19,6,2024-01-03 17:43
-3,12,20,2024-03-11 1:13
-4,7,20,2024-02-04 12:04
-5,14,3,2024-05-02 17:12
-6,17,6,2024-02-12 1:46
-7,20,4,2024-02-26 2:59
-8,4,7,2024-05-02 16:51
-9,11,2,2024-01-04 11:09
-10,6,9,2024-04-09 15:04
-11,3,7,2024-02-21 21:17
-12,6,18,2024-02-21 18:50
-13,17,11,2024-05-02 16:20
-14,11,15,2024-04-20 2:49
-15,16,7,2024-01-18 1:12
-16,16,16,2024-05-03 11:20
-17,14,18,2024-03-26 22:51
-18,20,16,2024-05-07 23:25
-19,1,12,2024-05-20 12:41
-20,20,3,2024-01-17 7:25

+ 0 - 21
recipes/3p_integrations/groq/groq-api-cookbook/function-calling-101-ecommerce/products.csv

@@ -1,21 +0,0 @@
-product_id,name,description,price,stock_quantity
-1,Laptop,High performance laptop with 16GB RAM and 512GB SSD.,753.03,15
-2,Smartphone,Latest model smartphone with a stunning display and great camera.,398.54,59
-3,Headphones,Noise-cancelling over-ear headphones with long battery life.,889.79,97
-4,Monitor,24-inch 1080p monitor with vibrant colors and wide viewing angles.,604.44,98
-5,Keyboard,Mechanical keyboard with customizable RGB lighting.,500.24,52
-6,Mouse,Wireless mouse with ergonomic design and long battery life.,321.98,57
-7,Printer,All-in-one printer with wireless connectivity and high-quality printing.,695.29,32
-8,Tablet,Portable tablet with 10-inch display and powerful processor.,625.75,28
-9,Smartwatch,Stylish smartwatch with fitness tracking and notifications.,952.72,42
-10,Camera,Digital camera with 20MP sensor and 4K video recording.,247.93,99
-11,Speaker,Bluetooth speaker with excellent sound quality and deep bass.,896.4,32
-12,Router,Wi-Fi router with high speed and wide coverage.,976.16,59
-13,External Hard Drive,1TB external hard drive with fast data transfer speeds.,434.46,18
-14,USB Flash Drive,64GB USB flash drive with compact design and reliable storage.,991.09,77
-15,Microphone,Professional microphone with clear sound and adjustable settings.,276.23,30
-16,Webcam,HD webcam with wide-angle lens and built-in microphone.,890.39,13
-17,Drone,Compact drone with HD camera and stable flight controls.,285.93,37
-18,Projector,Portable projector with bright display and multiple connectivity options.,290.22,31
-19,Fitness Tracker,Fitness tracker with heart rate monitor and sleep tracking.,953.65,4
-20,E-Reader,Lightweight e-reader with high-resolution display and long battery life.,132.15,62

+ 0 - 8
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/data/employees.csv

@@ -1,8 +0,0 @@
-employee_id,name,email
-1,Richard Hendricks,richard@piedpiper.com
-2,Erlich Bachman,erlich@aviato.com
-3,Dinesh Chugtai,dinesh@piedpiper.com
-4,Bertram Gilfoyle,gilfoyle@piedpiper.com
-5,Jared Dunn,jared@piedpiper.com
-6,Monica Hall,monica@raviga.com
-7,Gavin Belson,gavin@hooli.com

+ 0 - 6
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/data/purchases.csv

@@ -1,6 +0,0 @@
-purchase_id,purchase_date,product_name,employee_id,amount
-1,'2024-02-01',iPhone,1,750
-2,'2024-02-02',Tesla,2,70000
-3,'2024-02-03',Humane pin,3,500
-4,'2024-02-04',iPhone,4,700
-5,'2024-02-05',Tesla,5,75000

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 677
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/json-mode-function-calling-for-sql.ipynb


+ 0 - 7
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/employees-without-purchases.yaml

@@ -1,7 +0,0 @@
-description: Employees without a purchase since Feb 1, 2024
-sql: |
-  SELECT employees.name as employees_without_purchases
-  FROM employees.csv AS employees
-  LEFT JOIN purchases.csv AS purchases ON employees.employee_id = purchases.employee_id
-  AND purchases.purchase_date > '2024-02-01'
-  WHERE purchases.purchase_id IS NULL

+ 0 - 9
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/most-expensive-purchase.yaml

@@ -1,9 +0,0 @@
-description: Employee with the most expensive purchase
-sql: |
-  SELECT employees.name AS employee_name,
-        MAX(amount) AS max_purchase_amount
-  FROM purchases.csv AS purchases
-  JOIN employees.csv AS employees ON purchases.employee_id = employees.employee_id
-  GROUP BY employees.name
-  ORDER BY max_purchase_amount DESC
-  LIMIT 1

+ 0 - 11
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/most-recent-purchases.yaml

@@ -1,11 +0,0 @@
-description: Five most recent purchases
-sql: |
-  SELECT 
-         purchases.purchase_date,
-         purchases.product_name,
-         purchases.amount,
-         employees.name
-  FROM purchases.csv AS purchases
-  JOIN employees.csv AS employees ON purchases.employee_id = employees.employee_id
-  ORDER BY purchases.purchase_date DESC
-  LIMIT 5;

+ 0 - 6
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-function-calling-for-sql/verified-queries/number-of-teslas.yaml

@@ -1,6 +0,0 @@
-description: Number of Teslas purchased
-sql: |
-  SELECT COUNT(*) as number_of_teslas
-  FROM purchases.csv AS p
-  JOIN employees.csv AS e ON e.employee_id = p.employee_id
-  WHERE p.product_name = 'Tesla'

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 639
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/SDOH-Json-mode.ipynb


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 31
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00456321.txt


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 28
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00567289.txt


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 28
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00678934.txt


+ 0 - 32
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00785642.txt

@@ -1,32 +0,0 @@
-**Date:** March 28, 2024
-
-**Patient:** Brian Lee, 55 years old
-
-**MRN:** 00785642
-
-**Chief Complaint:** "I've been having trouble managing my blood sugar levels."
-
-**History of Present Illness:** The patient is a 55-year-old with a known diagnosis of Type 2 Diabetes Mellitus, presenting with difficulty in managing blood sugar levels over the past month. Reports fluctuating blood sugar readings despite adherence to prescribed diet and medication. The patient expresses a desire to avoid any complications associated with poor diabetes management.
-
-**Past Medical History:** Type 2 Diabetes Mellitus, controlled hypertension
-
-**Social History:**
-The patient is a self-employed graphic designer, working from a home office. They describe their work as fulfilling and report a stable income. They own a home in a well-regarded neighborhood, noting its quiet and safe environment. The patient has a supportive spouse and a close circle of friends, often participating in social gatherings and community events.
-
-The patient completed a bachelor's degree in graphic design and continues to take online courses to stay updated in their field. They have reliable transportation, a recent model car, ensuring timely access to healthcare appointments. The patient is an active member of a local walking group, which meets thrice a week for exercise and socialization.
-
-Nutritionally, the patient is mindful of their diet, focusing on low-glycemic foods, and has not faced issues with food security. They have comprehensive health insurance coverage through a private provider, with satisfactory benefits that cover their medical needs, including diabetes management.
-
-**Review of Systems:** Reports consistent adherence to diabetic diet and medication regimen. Denies any episodes of hypoglycemia or diabetic ketoacidosis.
-
-**Physical Examination:**
-- General: Well-nourished and well-kept appearance. Alert and oriented.
-- Vitals: BP 130/80, HR 72, Temp 98.6°F, Resp 14/min
-
-**Assessment/Plan:**
-- Review current diabetes management plan and consider medication adjustments.
-- Recommend continuous glucose monitoring (CGM) to better understand glucose patterns and variability.
-- Encourage continued engagement with community exercise groups and dietary mindfulness.
-- Schedule a follow-up appointment in 3 months or sooner if glucose management issues persist.
-
-**Comments:** The patient demonstrates a proactive approach to managing their diabetes, supported by a stable and healthy social environment. Continued focus on lifestyle modification and close monitoring of blood sugar levels are key to preventing complications.

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 30
recipes/3p_integrations/groq/groq-api-cookbook/json-mode-social-determinants-of-health/clinical_notes/00893247.txt


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 427
recipes/3p_integrations/groq/groq-api-cookbook/llama3-stock-market-function-calling/llama3-stock-market-function-calling.ipynb


+ 0 - 340
recipes/3p_integrations/groq/groq-api-cookbook/parallel-tool-use/parallel-tool-use.ipynb

@@ -1,340 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "104f2b97-f9bb-4dcc-a4c8-099710768851",
-   "metadata": {},
-   "source": [
-    "# Parallel Tool use"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f8dc57b6-2c48-4ee3-bb2c-25441274ed2f",
-   "metadata": {},
-   "source": [
-    "### Setup"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e70814b4",
-   "metadata": {},
-   "source": [
-    "Make sure you have `ipykernel` and `pip` pre-installed"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "962ae5e2",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%pip install -r requirements.txt"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "e21816b3",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Groq API key configured: gsk_7FdrzM...'"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import os\n",
-    "import json\n",
-    "\n",
-    "from groq import Groq\n",
-    "from dotenv import load_dotenv\n",
-    "\n",
-    "load_dotenv()\n",
-    "\"Groq API key configured: \" + os.environ[\"GROQ_API_KEY\"][:10] + \"...\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7f7c9c55-e925-4cc1-89f2-58237acf14a4",
-   "metadata": {},
-   "source": [
-    "We will use the ```llama3-70b-8192``` model in this demo. Note that you will need a Groq API Key to proceed and can create an account [here](https://console.groq.com/) to generate one for free. Only Llama 3 models support parallel tool use at this time (05/07/2024).\n",
-    "\n",
-    "We recommend using the 70B Llama 3 model, 8B has subpar consistency."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "0cca781b-1950-4167-b36a-c1099d6b3b00",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "client = Groq(api_key=os.getenv(\"GROQ_API_KEY\"))\n",
-    "model = \"llama3-70b-8192\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2c23ec2b",
-   "metadata": {},
-   "source": [
-    "Let's define a dummy function we can invoke in our tool use loop"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "f2ce18dc",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def get_weather(city: str):\n",
-    "    if city == \"Madrid\":\n",
-    "        return 35\n",
-    "    elif city == \"San Francisco\":\n",
-    "        return 18\n",
-    "    elif city == \"Paris\":\n",
-    "        return 20\n",
-    "    else:\n",
-    "        return 15"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a37e3c92",
-   "metadata": {},
-   "source": [
-    "Now we define our messages and tools and run the completion request."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "6b454910-4352-40cc-b9b2-cc79edabd7c1",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "messages = [\n",
-    "    {\"role\": \"system\", \"content\": \"\"\"You are a helpful assistant.\"\"\"},\n",
-    "    {\n",
-    "        \"role\": \"user\",\n",
-    "        \"content\": \"What is the weather in Paris, Tokyo and Madrid?\",\n",
-    "    },\n",
-    "]\n",
-    "tools = [\n",
-    "    {\n",
-    "        \"type\": \"function\",\n",
-    "        \"function\": {\n",
-    "            \"name\": \"get_weather\",\n",
-    "            \"description\": \"Returns the weather in the given city in degrees Celsius\",\n",
-    "            \"parameters\": {\n",
-    "                \"type\": \"object\",\n",
-    "                \"properties\": {\n",
-    "                    \"city\": {\n",
-    "                        \"type\": \"string\",\n",
-    "                        \"description\": \"The name of the city\",\n",
-    "                    }\n",
-    "                },\n",
-    "                \"required\": [\"city\"],\n",
-    "            },\n",
-    "        },\n",
-    "    }\n",
-    "]\n",
-    "response = client.chat.completions.create(\n",
-    "    model=model, messages=messages, tools=tools, tool_choice=\"auto\", max_tokens=4096\n",
-    ")\n",
-    "\n",
-    "response_message = response.choices[0].message"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "25c2838f",
-   "metadata": {},
-   "source": [
-    "# Processing the tool calls\n",
-    "\n",
-    "Now we process the assistant message and construct the required messages to continue the conversation. \n",
-    "\n",
-    "*Including* invoking each tool_call against our actual function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "fe623ab9",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[\n",
-      "  {\n",
-      "    \"role\": \"system\",\n",
-      "    \"content\": \"You are a helpful assistant.\"\n",
-      "  },\n",
-      "  {\n",
-      "    \"role\": \"user\",\n",
-      "    \"content\": \"What is the weather in Paris, Tokyo and Madrid?\"\n",
-      "  },\n",
-      "  {\n",
-      "    \"role\": \"assistant\",\n",
-      "    \"tool_calls\": [\n",
-      "      {\n",
-      "        \"id\": \"call_5ak8\",\n",
-      "        \"function\": {\n",
-      "          \"name\": \"get_weather\",\n",
-      "          \"arguments\": \"{\\\"city\\\":\\\"Paris\\\"}\"\n",
-      "        },\n",
-      "        \"type\": \"function\"\n",
-      "      },\n",
-      "      {\n",
-      "        \"id\": \"call_zq26\",\n",
-      "        \"function\": {\n",
-      "          \"name\": \"get_weather\",\n",
-      "          \"arguments\": \"{\\\"city\\\":\\\"Tokyo\\\"}\"\n",
-      "        },\n",
-      "        \"type\": \"function\"\n",
-      "      },\n",
-      "      {\n",
-      "        \"id\": \"call_znf3\",\n",
-      "        \"function\": {\n",
-      "          \"name\": \"get_weather\",\n",
-      "          \"arguments\": \"{\\\"city\\\":\\\"Madrid\\\"}\"\n",
-      "        },\n",
-      "        \"type\": \"function\"\n",
-      "      }\n",
-      "    ]\n",
-      "  },\n",
-      "  {\n",
-      "    \"role\": \"tool\",\n",
-      "    \"content\": \"20\",\n",
-      "    \"tool_call_id\": \"call_5ak8\"\n",
-      "  },\n",
-      "  {\n",
-      "    \"role\": \"tool\",\n",
-      "    \"content\": \"15\",\n",
-      "    \"tool_call_id\": \"call_zq26\"\n",
-      "  },\n",
-      "  {\n",
-      "    \"role\": \"tool\",\n",
-      "    \"content\": \"35\",\n",
-      "    \"tool_call_id\": \"call_znf3\"\n",
-      "  }\n",
-      "]\n"
-     ]
-    }
-   ],
-   "source": [
-    "tool_calls = response_message.tool_calls\n",
-    "\n",
-    "messages.append(\n",
-    "    {\n",
-    "        \"role\": \"assistant\",\n",
-    "        \"tool_calls\": [\n",
-    "            {\n",
-    "                \"id\": tool_call.id,\n",
-    "                \"function\": {\n",
-    "                    \"name\": tool_call.function.name,\n",
-    "                    \"arguments\": tool_call.function.arguments,\n",
-    "                },\n",
-    "                \"type\": tool_call.type,\n",
-    "            }\n",
-    "            for tool_call in tool_calls\n",
-    "        ],\n",
-    "    }\n",
-    ")\n",
-    "\n",
-    "available_functions = {\n",
-    "    \"get_weather\": get_weather,\n",
-    "}\n",
-    "for tool_call in tool_calls:\n",
-    "    function_name = tool_call.function.name\n",
-    "    function_to_call = available_functions[function_name]\n",
-    "    function_args = json.loads(tool_call.function.arguments)\n",
-    "    function_response = function_to_call(**function_args)\n",
-    "\n",
-    "    # Note how we create a separate tool call message for each tool call\n",
-    "    # the model is able to discern the tool call result through the tool_call_id\n",
-    "    messages.append(\n",
-    "        {\n",
-    "            \"role\": \"tool\",\n",
-    "            \"content\": json.dumps(function_response),\n",
-    "            \"tool_call_id\": tool_call.id,\n",
-    "        }\n",
-    "    )\n",
-    "\n",
-    "print(json.dumps(messages, indent=2))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1abe981a",
-   "metadata": {},
-   "source": [
-    "Now we run our final completion with multiple tool call results included in the messages array.\n",
-    "\n",
-    "**Note**\n",
-    "\n",
-    "We pass the tool definitions again to help the model understand:\n",
-    "\n",
-    "1. The assistant message with the tool call\n",
-    "2. Interpret the tool results."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "5f077df3",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The weather in Paris is 20°C, in Tokyo is 15°C, and in Madrid is 35°C.\n"
-     ]
-    }
-   ],
-   "source": [
-    "response = client.chat.completions.create(\n",
-    "    model=model, messages=messages, tools=tools, tool_choice=\"auto\", max_tokens=4096\n",
-    ")\n",
-    "\n",
-    "print(response.choices[0].message.content)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.13"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

+ 0 - 2
recipes/3p_integrations/groq/groq-api-cookbook/parallel-tool-use/requirements.txt

@@ -1,2 +0,0 @@
-groq
-python-dotenv

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 993
recipes/3p_integrations/groq/groq-api-cookbook/rag-langchain-presidential-speeches/presidential_speeches.csv


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 664
recipes/3p_integrations/groq/groq-api-cookbook/rag-langchain-presidential-speeches/rag-langchain-presidential-speeches.ipynb


+ 0 - 21
recipes/3p_integrations/groq/groq-example-templates/conversational-chatbot-langchain/README.md

@@ -1,21 +0,0 @@
-# Groq LangChain Conversational Chatbot
-
-A simple application that allows users to interact with a conversational chatbot powered by LangChain. The application uses the Groq API to generate responses and leverages LangChain's [ConversationBufferWindowMemory](https://python.langchain.com/v0.1/docs/modules/memory/types/buffer_window/) to maintain a history of the conversation to provide context for the chatbot's responses.
-
-## Features
-
-- **Conversational Interface**: The application provides a conversational interface where users can ask questions or make statements, and the chatbot responds accordingly.
-
-- **Contextual Responses**: The application maintains a history of the conversation, which is used to provide context for the chatbot's responses.
-
-- **LangChain Integration**: The chatbot is powered by the LangChain API, which uses advanced natural language processing techniques to generate human-like responses.
-
-## Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->
-
-You can [fork and run this application on Replit](https://replit.com/@GroqCloud/Chatbot-with-Conversational-Memory-on-LangChain) or run it on the command line with `python main.py`

+ 0 - 74
recipes/3p_integrations/groq/groq-example-templates/conversational-chatbot-langchain/main.py

@@ -1,74 +0,0 @@
-import os
-from groq import Groq
-
-from langchain.chains import ConversationChain, LLMChain
-from langchain_core.prompts import (
-    ChatPromptTemplate,
-    HumanMessagePromptTemplate,
-    MessagesPlaceholder,
-)
-from langchain_core.messages import SystemMessage
-from langchain.chains.conversation.memory import ConversationBufferWindowMemory
-from langchain_groq import ChatGroq
-from langchain.prompts import PromptTemplate
-
-
-def main():
-    """
-    This function is the main entry point of the application. It sets up the Groq client, the Streamlit interface, and handles the chat interaction.
-    """
-
-    # Get Groq API key
-    groq_api_key = os.environ['GROQ_API_KEY']
-    model = 'llama3-8b-8192'
-    # Initialize Groq Langchain chat object and conversation
-    groq_chat = ChatGroq(
-            groq_api_key=groq_api_key, 
-            model_name=model
-    )
-    
-    print("Hello! I'm your friendly Groq chatbot. I can help answer your questions, provide information, or just chat. I'm also super fast! Let's start our conversation!")
-
-    system_prompt = 'You are a friendly conversational chatbot'
-    conversational_memory_length = 5 # number of previous messages the chatbot will remember during the conversation
-
-    memory = ConversationBufferWindowMemory(k=conversational_memory_length, memory_key="chat_history", return_messages=True)
-
-
-    #chat_history = []
-    while True:
-        user_question = input("Ask a question: ")
-
-        # If the user has asked a question,
-        if user_question:
-
-            # Construct a chat prompt template using various components
-            prompt = ChatPromptTemplate.from_messages(
-                [
-                    SystemMessage(
-                        content=system_prompt
-                    ),  # This is the persistent system prompt that is always included at the start of the chat.
-
-                    MessagesPlaceholder(
-                        variable_name="chat_history"
-                    ),  # This placeholder will be replaced by the actual chat history during the conversation. It helps in maintaining context.
-
-                    HumanMessagePromptTemplate.from_template(
-                        "{human_input}"
-                    ),  # This template is where the user's current input will be injected into the prompt.
-                ]
-            )
-
-            # Create a conversation chain using the LangChain LLM (Language Learning Model)
-            conversation = LLMChain(
-                llm=groq_chat,  # The Groq LangChain chat object initialized earlier.
-                prompt=prompt,  # The constructed prompt template.
-                verbose=False,   # TRUE Enables verbose output, which can be useful for debugging.
-                memory=memory,  # The conversational memory object that stores and manages the conversation history.
-            )
-            # The chatbot's answer is generated by sending the full prompt to the Groq API.
-            response = conversation.predict(human_input=user_question)
-            print("Chatbot:", response)
-
-if __name__ == "__main__":
-    main()

+ 0 - 0
recipes/3p_integrations/groq/groq-example-templates/conversational-chatbot-langchain/requirements.txt


+ 0 - 23
recipes/3p_integrations/groq/groq-example-templates/crewai-agents/README.md

@@ -1,23 +0,0 @@
-# CrewAI Machine Learning Assistant
-
-## Overview
-
-The [CrewAI](https://docs.crewai.com/) Machine Learning Assistant is a command line application designed to kickstart your machine learning projects. It leverages a team of AI agents to guide you through the initial steps of defining, assessing, and solving machine learning problems.
-
-## Features
-
-- **Agents**: Utilizes specialized agents to perform tasks such as problem definition, data assessment, model recommendation, and code generation, enhancing the workflow and efficiency of machine learning projects.
-
-- **CrewAI Framework**: Integrates multiple agents into a cohesive framework, enabling seamless interaction and task execution to streamline the machine learning process.
-
-- **LangChain Integration**: Incorporates LangChain to facilitate natural language processing and enhance the interaction between the user and the machine learning assistant.
-
-## Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->
-
-You can [fork and run this application on Replit](https://replit.com/@GroqCloud/CrewAI-Machine-Learning-Assistant) or run it on the command line with `python main.py`. You can upload a sample .csv to the same directory as `main.py` to give the application a head start on your ML problem. The application will output a Markdown file including python code for your ML use case to the same directory as main.py.

+ 0 - 184
recipes/3p_integrations/groq/groq-example-templates/crewai-agents/main.py

@@ -1,184 +0,0 @@
-import pandas as pd
-import os
-from crewai import Agent, Task, Crew
-from langchain_groq import ChatGroq
-
-
-def main():
-    """
-    Main function to initialize and run the CrewAI Machine Learning Assistant.
-
-    This function sets up a machine learning assistant using the Llama 3 model with the ChatGroq API.
-    It provides a text-based interface for users to define, assess, and solve machine learning problems
-    by interacting with multiple specialized AI agents. The function outputs the results to the console 
-    and writes them to a markdown file.
-
-    Steps:
-    1. Initialize the ChatGroq API with the specified model and API key.
-    2. Display introductory text about the CrewAI Machine Learning Assistant.
-    3. Create and configure four AI agents:
-        - Problem_Definition_Agent: Clarifies the machine learning problem.
-        - Data_Assessment_Agent: Evaluates the quality and suitability of the provided data.
-        - Model_Recommendation_Agent: Suggests suitable machine learning models.
-        - Starter_Code_Generator_Agent: Generates starter Python code for the project.
-    4. Prompt the user to describe their machine learning problem.
-    5. Check if a .csv file is available in the current directory and try to read it as a DataFrame.
-    6. Define tasks for the agents based on user input and data availability.
-    7. Create a Crew instance with the agents and tasks, and run the tasks.
-    8. Print the results and write them to an output markdown file.
-    """
-
-    model = 'llama3-8b-8192'
-
-    llm = ChatGroq(
-            temperature=0, 
-            groq_api_key = os.getenv('GROQ_API_KEY'), 
-            model_name=model
-        )
-
-    print('CrewAI Machine Learning Assistant')
-    multiline_text = """
-    The CrewAI Machine Learning Assistant is designed to guide users through the process of defining, assessing, and solving machine learning problems. It leverages a team of AI agents, each with a specific role, to clarify the problem, evaluate the data, recommend suitable models, and generate starter Python code. Whether you're a seasoned data scientist or a beginner, this application provides valuable insights and a head start in your machine learning projects.
-    """
-
-    print(multiline_text)
-
-
-    Problem_Definition_Agent = Agent(
-        role='Problem_Definition_Agent',
-        goal="""clarify the machine learning problem the user wants to solve, 
-            identifying the type of problem (e.g., classification, regression) and any specific requirements.""",
-        backstory="""You are an expert in understanding and defining machine learning problems. 
-            Your goal is to extract a clear, concise problem statement from the user's input, 
-            ensuring the project starts with a solid foundation.""",
-        verbose=True,
-        allow_delegation=False,
-        llm=llm,
-    )
-
-    Data_Assessment_Agent = Agent(
-        role='Data_Assessment_Agent',
-        goal="""evaluate the data provided by the user, assessing its quality, 
-            suitability for the problem, and suggesting preprocessing steps if necessary.""",
-        backstory="""You specialize in data evaluation and preprocessing. 
-            Your task is to guide the user in preparing their dataset for the machine learning model, 
-            including suggestions for data cleaning and augmentation.""",
-        verbose=True,
-        allow_delegation=False,
-        llm=llm,
-    )
-
-    Model_Recommendation_Agent = Agent(
-        role='Model_Recommendation_Agent',
-        goal="""suggest the most suitable machine learning models based on the problem definition 
-            and data assessment, providing reasons for each recommendation.""",
-        backstory="""As an expert in machine learning algorithms, you recommend models that best fit 
-            the user's problem and data. You provide insights into why certain models may be more effective than others,
-            considering classification vs regression and supervised vs unsupervised frameworks.""",
-        verbose=True,
-        allow_delegation=False,
-        llm=llm,
-    )
-
-
-    Starter_Code_Generator_Agent = Agent(
-        role='Starter_Code_Generator_Agent',
-        goal="""generate starter Python code for the project, including data loading, 
-            model definition, and a basic training loop, based on findings from the problem definitions,
-            data assessment and model recommendation""",
-        backstory="""You are a code wizard, able to generate starter code templates that users 
-            can customize for their projects. Your goal is to give users a head start in their coding efforts.""",
-        verbose=True,
-        allow_delegation=False,
-        llm=llm,
-    )
-
-
-    user_question = input("Describe your ML problem: ")
-    data_upload = False
-    # Check if there is a .csv file in the current directory
-    if any(file.endswith(".csv") for file in os.listdir()):
-        sample_fp = [file for file in os.listdir() if file.endswith(".csv")][0]
-        try:
-            # Attempt to read the uploaded file as a DataFrame
-            df = pd.read_csv(sample_fp).head(5)
-
-            # If successful, set 'data_upload' to True
-            data_upload = True
-
-            # Display the DataFrame in the app
-            print("Data successfully uploaded and read as DataFrame:")
-            print(df)
-        except Exception as e:
-            print(f"Error reading the file: {e}")
-
-    if user_question:
-
-        task_define_problem = Task(
-        description="""Clarify and define the machine learning problem, 
-            including identifying the problem type and specific requirements.
-
-            Here is the user's problem:
-            {ml_problem}
-            """.format(ml_problem=user_question),
-        agent=Problem_Definition_Agent,
-        expected_output="A clear and concise definition of the machine learning problem."
-        )
-
-        if data_upload:
-            task_assess_data = Task(
-                description="""Evaluate the user's data for quality and suitability, 
-                suggesting preprocessing or augmentation steps if needed.
-
-                Here is a sample of the user's data:
-                {df}
-                The file name is called {uploaded_file}
-
-                """.format(df=df.head(),uploaded_file=sample_fp),
-                agent=Data_Assessment_Agent,
-                expected_output="An assessment of the data's quality and suitability, with suggestions for preprocessing or augmentation if necessary."
-            )
-        else:
-            task_assess_data = Task(
-                description="""The user has not uploaded any specific data for this problem,
-                but please go ahead and consider a hypothetical dataset that might be useful
-                for their machine learning problem. 
-                """,
-                agent=Data_Assessment_Agent,
-                expected_output="A hypothetical dataset that might be useful for the user's machine learning problem, along with any necessary preprocessing steps."
-            )
-
-        task_recommend_model = Task(
-        description="""Suggest suitable machine learning models for the defined problem 
-            and assessed data, providing rationale for each suggestion.""",
-        agent=Model_Recommendation_Agent,
-        expected_output="A list of suitable machine learning models for the defined problem and assessed data, along with the rationale for each suggestion."
-        )
-
-
-        task_generate_code = Task(
-        description="""Generate starter Python code tailored to the user's project using the model recommendation agent's recommendation(s), 
-            including snippets for package import, data handling, model definition, and training
-            """,
-        agent=Starter_Code_Generator_Agent,
-        expected_output="Python code snippets for package import, data handling, model definition, and training, tailored to the user's project, plus a brief summary of the problem and model recommendations."
-        )
-
-
-        crew = Crew(
-            agents=[Problem_Definition_Agent, Data_Assessment_Agent, Model_Recommendation_Agent,  Starter_Code_Generator_Agent], 
-            tasks=[task_define_problem, task_assess_data, task_recommend_model,  task_generate_code], 
-            verbose=False
-        )
-
-        result = crew.kickoff()
-
-        print(result)
-
-        with open('output.md', "w") as file:
-            print('\n\nThese results have been exported to output.md')
-            file.write(result)
-
-
-if __name__ == "__main__":
-    main()

+ 0 - 3
recipes/3p_integrations/groq/groq-example-templates/crewai-agents/requirements.txt

@@ -1,3 +0,0 @@
-crewai
-langchain_groq
-pandas

+ 0 - 21
recipes/3p_integrations/groq/groq-example-templates/groq-quickstart-conversational-chatbot/README.md

@@ -1,21 +0,0 @@
-# Groq Quickstart Conversational Chatbot
-
-A simple application that allows users to interact with a conversational chatbot powered by Groq. This application is designed to get users up and running quickly with building a chatbot.
-
-## Features
-
-**Conversational Interface**: Provides a simple interface where users can input text and receive responses from the chatbot.
-
-**Short Responses**: The chatbot replies with very short and concise answers, keeping interactions brief and to the point.
-
-**Groq Integration**: Utilizes the Groq API to generate responses, leveraging the power of the Llama3-70b-8192 model.
-
-## Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->
-
-You can [fork and run this application on Replit](https://replit.com/@GroqCloud/Groq-Quickstart-Conversational-Chatbot) or run it on the command line with `python main.py`.

+ 0 - 38
recipes/3p_integrations/groq/groq-example-templates/groq-quickstart-conversational-chatbot/main.py

@@ -1,38 +0,0 @@
-#set GROQ_API_KEY in the secrets
-
-import os
-from groq import Groq
-
-# Create the Groq client
-client = Groq(
-    api_key=os.environ.get("GROQ_API_KEY")
-)
-
-# Set the system prompt
-system_prompt = {
-    "role": "system",
-    "content":
-    "You are a helpful assistant. You reply with very short answers."
-}
-
-# Initialize the chat history
-chat_history = [system_prompt]
-
-while True:
-  # Get user input from the console
-  user_input = input("You: ")
-
-  # Append the user input to the chat history
-  chat_history.append({"role": "user", "content": user_input})
-
-  response = client.chat.completions.create(model="llama3-70b-8192",
-                                            messages=chat_history,
-                                            max_tokens=100,
-                                            temperature=1.2)
-  # Append the response to the chat history
-  chat_history.append({
-      "role": "assistant",
-      "content": response.choices[0].message.content
-  })
-  # Print the response
-  print("Assistant:", response.choices[0].message.content)

+ 0 - 1
recipes/3p_integrations/groq/groq-example-templates/groq-quickstart-conversational-chatbot/requirements.txt

@@ -1 +0,0 @@
-groq

+ 0 - 27
recipes/3p_integrations/groq/groq-example-templates/groqing-the-stock-market-function-calling-llama3/README.md

@@ -1,27 +0,0 @@
-# 'Groqing the Stock Market' with Llama 3 Function Calling
-
-This is a simple application that leverages the yfinance API to provide insights into stocks and their prices. The application uses the Llama 3 model on Groq in conjunction with Langchain to call functions based on the user prompt.
-
-## Key Functions
-
-- **get_stock_info(symbol, key)**: This function fetches various information about a given stock symbol. The information can be anything from the company's address to its financial ratios. The 'key' parameter specifies the type of information to fetch.
-
-- **get_historical_price(symbol, start_date, end_date)**: This function fetches the historical stock prices for a given symbol from a specified start date to an end date. The returned data is a DataFrame with the date and closing price of the stock.
-
-- **plot_price_over_time(historical_price_dfs)**: This function takes a list of DataFrames (each containing historical price data for a stock) and plots the prices over time using Plotly. The plot is saved to the same directory as the app.
-
-- **call_functions(llm_with_tools, user_prompt)**: This function takes the user's question, invokes the appropriate tool (either get_stock_info or get_historical_price), and generates a response. If the user asked for historical prices, it also calls plot_price_over_time to generate a plot.
-
-## Function Calling
-
-The function calling in this application is handled by the Groq API, abstracted with Langchain. When the user asks a question, the application invokes the appropriate tool with parameters based on the user's question. The tool's output is then used to generate a response.
-
-## Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->
-
-You can [fork and run this application on Replit](https://replit.com/@GroqCloud/Groqing-the-Stock-Market-Function-Calling-with-Llama3) or run it on the command line with `python main.py`.

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 139
recipes/3p_integrations/groq/groq-example-templates/groqing-the-stock-market-function-calling-llama3/main.py


+ 0 - 12
recipes/3p_integrations/groq/groq-example-templates/groqing-the-stock-market-function-calling-llama3/requirements.txt

@@ -1,12 +0,0 @@
-streamlit
-pandas
-numpy
-groq
-langchain_community
-langchain_groq
-yfinance
-plotly
-langchain_core
-nbformat>=4.2.0
-ipython
-kaleido

+ 0 - 21
recipes/3p_integrations/groq/groq-example-templates/llamachat-conversational-chatbot-with-llamaIndex/README.md

@@ -1,21 +0,0 @@
-# LlamaChat: Conversational Chatbot with LlamaIndex and Llama3
-
-A simple application that allows users to interact with a conversational chatbot powered by the LlamaIndex framework and Meta's Llama3 model. The application uses the Groq API to generate responses and supports different modes of interaction, including simple chat, streaming chat, and customizable chat with system prompts.
-
-##Features
-
-**LlamaIndex**: The application uses LlamaIndex to manage and generate responses, leveraging the power of Groq's language model.
-
-**Simple Chat**: Generates responses based on user input using the Groq API with LlamaIndex.
-
-**Streaming Chat**: Provides real-time streaming responses for user input.
-
-**Customizable Chat**: Allows for chat customization by setting a system prompt to guide the chatbot's responses.
-
-##Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->

+ 0 - 46
recipes/3p_integrations/groq/groq-example-templates/llamachat-conversational-chatbot-with-llamaIndex/main.py

@@ -1,46 +0,0 @@
-from llama_index.llms.groq import Groq
-from llama_index.core.llms import ChatMessage
-
-llm = Groq(model="llama3-8b-8192")
-
-
-system_prompt = 'You are a friendly but highly sarcastic chatbot assistant'
-
-while True:
-    # Get the user's question
-    user_input = input("User: ")
-
-    #user_input = 'write a few paragraphs explaining generative AI to a college freshman'
-
-    ##################################
-    # Simple Chat
-    ##################################
-    print('Simple Chat:\n\n')
-    response = llm.complete(user_input)
-    print(response)
-
-
-    ##################################
-    # Streaming Chat
-    ##################################
-    stream_response = llm.stream_complete(
-        user_input
-    )
-    print('\n\nStreaming Chat:\n')
-    for t in stream_response:
-        print(t.delta, end="")
-
-
-    ##################################
-    # Customizable Chat
-    ##################################
-    messages = [
-        ChatMessage(role="system", content=system_prompt),
-        ChatMessage(role="user", content=user_input),
-    ]
-    print('\n\nChat with System Prompt:\n')
-    response_with_system_prompt = llm.chat(messages)
-
-    print(response_with_system_prompt)
-
-

+ 0 - 2
recipes/3p_integrations/groq/groq-example-templates/llamachat-conversational-chatbot-with-llamaIndex/requirements.txt

@@ -1,2 +0,0 @@
-llama_index
-llama-index-llms-groq

+ 0 - 33
recipes/3p_integrations/groq/groq-example-templates/presidential-speeches-rag-with-pinecone/README.md

@@ -1,33 +0,0 @@
-# Presidential Speeches RAG with Pinecone
-
-This repository contains a command line application that allows users to ask questions about US presidental speeches by applying Retrieval-Augmented Generation (RAG) over a Pinecone vector database. The application uses RAG to answer the user's question by retrieving the most relevant presidential speeches and using them to supplant the LLM response.
-
-## Features
-
-- **RAG (Retrieval-Augmented Generation)**: Enhances the generation of responses by integrating retrieval-based methods. This feature allows the system to fetch relevant information from a large corpus of data, providing more accurate and contextually appropriate answers by combining retrieved content with generative capabilities.
-
-- **Vector Databases (Pinecone)**: Integrates with Pinecone to store and manage vector embeddings efficiently. Pinecone's high-performance vector database allows for fast and scalable similarity searches, enabling quick retrieval of relevant data for various machine learning and AI applications.
-
-- **LangChain Integration**: Leverages LangChain to facilitate natural language processing tasks. LangChain enhances the interaction between the user and the system by providing robust language modeling capabilities, ensuring seamless and intuitive communication.
-
-## Code Overview
-
-The main script of the application is [main.py](./main.py). Here's a brief overview of its main functions:
-
-- `get_relevant_excerpts(user_question, docsearch)`: This function takes a user's question and a Pinecone vector store as input, performs a similarity search on the vector store using the user's question, and returns the most relevant excerpts from presidential speeches.
-
-- `get_relevant_excerpts(user_question, docsearch)`: This function takes a user's question and a Pinecone vector store as input, performs a similarity search on the vector store using the user's question, and returns the most relevant excerpts from presidential speeches.
-
-- `presidential_speech_chat_completion(client, model, user_question, relevant_excerpts, additional_context)`: This function takes a Groq client, a pre-trained model, a user's question, relevant excerpts from presidential speeches, and additional context as input. It generates a response to the user's question based on the relevant excerpts and the additional context
-
-## Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example outside of this Repl. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->
-
-You would also need your own [Pinecone](https://www.pinecone.io/) index with presidential speech embeddings to run this code locally. You can create a Pinecone API key and one index for a small project for free on their Starter plan, and visit [this Cookbook post](https://github.com/groq/groq-api-cookbook/blob/dan/replit-conversion/presidential-speeches-rag/presidential-speeches-rag.ipynb) for more info on RAG and a guide to uploading these embeddings to a vector database
-
-You can [fork and run this application on Replit](https://replit.com/@GroqCloud/Presidential-Speeches-RAG-with-Pinecone) or run it on the command line with `python main.py`.

+ 0 - 114
recipes/3p_integrations/groq/groq-example-templates/presidential-speeches-rag-with-pinecone/main.py

@@ -1,114 +0,0 @@
-import pandas as pd
-import numpy as np
-from groq import Groq
-from pinecone import Pinecone
-import os
-
-from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
-from langchain_pinecone import PineconeVectorStore
-
-
-def get_relevant_excerpts(user_question, docsearch):
-    """
-    This function retrieves the most relevant excerpts from presidential speeches based on the user's question.
-    Parameters:
-    user_question (str): The question asked by the user.
-    docsearch (PineconeVectorStore): The Pinecone vector store containing the presidential speeches.
-    Returns:
-    str: A string containing the most relevant excerpts from presidential speeches.
-    """
-
-    # Perform a similarity search on the Pinecone vector store using the user's question
-    relevent_docs = docsearch.similarity_search(user_question)
-
-    # Extract the page content from the top 3 most relevant documents and join them into a single string
-    relevant_excerpts = '\n\n------------------------------------------------------\n\n'.join([doc.page_content for doc in relevent_docs[:3]])
-
-    return relevant_excerpts
-
-
-def presidential_speech_chat_completion(client, model, user_question, relevant_excerpts):
-    """
-    This function generates a response to the user's question using a pre-trained model.
-    Parameters:
-    client (Groq): The Groq client used to interact with the pre-trained model.
-    model (str): The name of the pre-trained model.
-    user_question (str): The question asked by the user.
-    relevant_excerpts (str): A string containing the most relevant excerpts from presidential speeches.
-    Returns:
-    str: A string containing the response to the user's question.
-    """
-
-    # Define the system prompt
-    system_prompt = '''
-    You are a presidential historian. Given the user's question and relevant excerpts from 
-    presidential speeches, answer the question by including direct quotes from presidential speeches. 
-    When using a quote, site the speech that it was from (ignoring the chunk).
-    '''
-
-    # Generate a response to the user's question using the pre-trained model
-    chat_completion = client.chat.completions.create(
-        messages = [
-            {
-                "role": "system",
-                "content":  system_prompt
-            },
-            {
-                "role": "user",
-                "content": "User Question: " + user_question + "\n\nRelevant Speech Exerpt(s):\n\n" + relevant_excerpts,
-            }
-        ],
-        model = model
-    )
-
-    # Extract the response from the chat completion
-    response = chat_completion.choices[0].message.content
-
-    return response
-
-
-def main():
-    """
-    This is the main function that runs the application. It initializes the Groq client and the SentenceTransformer model,
-    gets user input from the Streamlit interface, retrieves relevant excerpts from presidential speeches based on the user's question,
-    generates a response to the user's question using a pre-trained model, and displays the response.
-    """
-
-    model = 'llama3-8b-8192'
-
-    embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
-
-    # Initialize the Groq client
-    groq_api_key = os.getenv('GROQ_API_KEY')
-    pinecone_api_key=os.getenv('PINECONE_API_KEY')
-    pinecone_index_name = "presidential-speeches"
-    client = Groq(
-        api_key=groq_api_key
-    )
-
-    pc = Pinecone(api_key = pinecone_api_key)
-    docsearch = PineconeVectorStore(index_name=pinecone_index_name, embedding=embedding_function)
-
-    # Display the title and introduction of the application
-    print("Presidential Speeches RAG")
-    multiline_text = """
-    Welcome! Ask questions about U.S. presidents, like "What were George Washington's views on democracy?" or "What did Abraham Lincoln say about national unity?". The app matches your question to relevant excerpts from presidential speeches and generates a response using a pre-trained model.
-    """
-
-    print(multiline_text)
-
-
-    while True:
-        # Get the user's question
-        user_question = input("Ask a question about a US president: ")
-
-        if user_question:
-            pinecone_index_name = "presidential-speeches"
-            relevant_excerpts = get_relevant_excerpts(user_question, docsearch)
-            response = presidential_speech_chat_completion(client, model, user_question, relevant_excerpts)
-            print(response)
-
-
-
-if __name__ == "__main__":
-    main()

+ 0 - 8
recipes/3p_integrations/groq/groq-example-templates/presidential-speeches-rag-with-pinecone/requirements.txt

@@ -1,8 +0,0 @@
-pandas
-numpy
-groq
-langchain_community
-langchain_pinecone
-transformers
-scikit-learn
-sentence-transformers

+ 0 - 57
recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/README.md

@@ -1,57 +0,0 @@
-# DuckDB Text-to-SQL with JSON Mode
-
-A command line application that allows users to ask questions about their DuckDB data. The application leverages Groq API's JSON mode to generate SQL queries based on the user's questions and execute them on a DuckDB database.
-
-## Features
-
-- **Text-to-SQL**: The application uses natural language processing to convert user questions into SQL queries, making it easy for users to query their data without knowing SQL.
-
-- **JSON mode**: A feature which enables the LLM to respond strictly in a structured JSON output, provided we supply it with the desired format
-
-- **Data Summarization**: After executing a SQL query, the application uses the AI to summarize the resulting data in relation to the user's original question.
-
-## Data
-
-The application queries data from two CSV files located in the `data` folder:
-
-- `employees.csv`: Contains employee data including their ID, full name, and email address.
-
-- `purchases.csv`: Records purchase details including purchase ID, date, associated employee ID, amount, and product name.
-
-## Prompts
-
-The base prompt for the AI is stored in a text file in the `prompts` folder:
-
-- `base_prompt.txt`
-
-A well-crafted system prompt is essential for building a functional Text-to-SQL application. Ours will serve 3 purposes:
-
-1. Provide the metadata schemas for our database tables
-2. Indicate any relevant context or tips for querying the DuckDB language or our database schema specifically
-3. Define our desired JSON output (note that to use JSON mode, we must include 'JSON' in the prompt)
-
-## Functions
-
-- `chat_with_groq()`: Sends a prompt to the Groq API and returns the AI's response.
-- `execute_duckdb_query()`: Executes a SQL query on a DuckDB database and returns the result.
-- `get_summarization()`: Generates a prompt for the AI to summarize the data resulting from a SQL query.
-
-## Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->
-
-You can [fork and run this application on Replit](https://replit.com/@GroqCloud/Building-a-Text-to-SQL-app-with-Groqs-JSON-mode) or run it on the command line with `python main.py`.
-
-## Customizing with Your Own Data
-
-This application is designed to be flexible and can be easily customized to work with your own data. If you want to use your own data, follow these steps:
-
-1. **Replace the CSV files**: The application queries data from two CSV files located in the `data` folder: `employees.csv` and `purchases.csv`. Replace these files with your own CSV files.
-
-2. **Modify the base prompt**: The base prompt for the AI, stored in the `prompts` folder as `base_prompt.txt`, contains specific information about the data metadata. Modify this prompt to match the structure and content of your own data. Make sure to accurately describe the tables, columns, and any specific rules or tips for querying your dataset.
-
-By following these steps, you can tailor the DuckDB Query Generator to your own data and use cases. Feel free to experiment and build off this repository to create your own powerful data querying applications.

+ 0 - 8
recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/data/employees.csv

@@ -1,8 +0,0 @@
-employee_id,name,email
-1,Richard Hendricks,richard@piedpiper.com
-2,Erlich Bachman,erlich@aviato.com
-3,Dinesh Chugtai,dinesh@piedpiper.com
-4,Bertram Gilfoyle,gilfoyle@piedpiper.com
-5,Jared Dunn,jared@piedpiper.com
-6,Monica Hall,monica@raviga.com
-7,Gavin Belson,gavin@hooli.com

+ 0 - 6
recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/data/purchases.csv

@@ -1,6 +0,0 @@
-purchase_id,purchase_date,product_name,employee_id,amount
-1,'2024-02-01',iPhone,1,750
-2,'2024-02-02',Tesla,2,70000
-3,'2024-02-03',Humane pin,3,500
-4,'2024-02-04',iPhone,4,700
-5,'2024-02-05',Tesla,5,75000

+ 0 - 145
recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/main.py

@@ -1,145 +0,0 @@
-import os
-from groq import Groq
-import json
-import duckdb
-import sqlparse
-
-def chat_with_groq(client, prompt, model, response_format):
-    """
-    This function sends a prompt to the Groq API and retrieves the AI's response.
-
-    Parameters:
-    client (Groq): The Groq API client.
-    prompt (str): The prompt to send to the AI.
-    model (str): The AI model to use for the response.
-    response_format (dict): The format of the response. 
-        If response_format is a dictionary with {"type": "json_object"}, it configures JSON mode.
-
-    Returns:
-    str: The content of the AI's response.
-    """
-    
-    completion = client.chat.completions.create(
-    model=model,
-    messages=[
-        {
-            "role": "user",
-            "content": prompt
-        }
-    ],
-    response_format=response_format
-    )
-
-    return completion.choices[0].message.content
-
-
-def execute_duckdb_query(query):
-    """
-    This function executes a SQL query on a DuckDB database and returns the result.
-
-    Parameters:
-    query (str): The SQL query to execute.
-
-    Returns:
-    DataFrame: The result of the query as a pandas DataFrame.
-    """
-    original_cwd = os.getcwd()
-    os.chdir('data')
-
-    try:
-        conn = duckdb.connect(database=':memory:', read_only=False)
-        query_result = conn.execute(query).fetchdf().reset_index(drop=True)
-    finally:
-        os.chdir(original_cwd)
-
-    return query_result
-
-
-def get_summarization(client, user_question, df, model):
-    """
-    This function generates a summarization prompt based on the user's question and the resulting data. 
-    It then sends this summarization prompt to the Groq API and retrieves the AI's response.
-
-    Parameters:
-    client (Groqcloud): The Groq API client.
-    user_question (str): The user's question.
-    df (DataFrame): The DataFrame resulting from the SQL query.
-    model (str): The AI model to use for the response.
-    
-    Returns:
-    str: The content of the AI's response to the summarization prompt.
-    """
-    prompt = '''
-    A user asked the following question pertaining to local database tables:
-    
-    {user_question}
-    
-    To answer the question, a dataframe was returned:
-    
-    Dataframe:
-    {df}
-    
-    In a few sentences, summarize the data in the table as it pertains to the original user question. Avoid qualifiers like "based on the data" and do not comment on the structure or metadata of the table itself
-    '''.format(user_question = user_question, df = df)
-    
-    # Response format is set to 'None'
-    return chat_with_groq(client,prompt,model,None)
-
-def main():
-    """
-    The main function of the application. It handles user input, controls the flow of the application, 
-    and initiates a conversation in the command line.
-    """
-
-    model = "llama3-70b-8192"
-
-    # Get the Groq API key and create a Groq client
-    groq_api_key = os.getenv('GROQ_API_KEY')
-    client = Groq(
-        api_key=groq_api_key
-    )
-
-    print("Welcome to the DuckDB Query Generator!")
-    print("You can ask questions about the data in the 'employees.csv' and 'purchases.csv' files.")
-
-    # Load the base prompt
-    with open('prompts/base_prompt.txt', 'r') as file:
-        base_prompt = file.read()
-
-    while True:
-        # Get the user's question
-        user_question = input("Ask a question: ")
-
-        if user_question:
-            # Generate the full prompt for the AI
-            full_prompt = base_prompt.format(user_question=user_question)
-
-            # Get the AI's response. Call with '{"type": "json_object"}' to use JSON mode
-            llm_response = chat_with_groq(client, full_prompt, model, {"type": "json_object"})
-
-            result_json = json.loads(llm_response)
-            if 'sql' in result_json:
-                sql_query = result_json['sql']
-                results_df = execute_duckdb_query(sql_query)
-
-                formatted_sql_query = sqlparse.format(sql_query, reindent=True, keyword_case='upper')
-
-                print("```sql\n" + formatted_sql_query + "\n```")
-                print(results_df.to_markdown(index=False))
-
-                summarization = get_summarization(client,user_question,results_df,model)
-                print(summarization.replace('$','\\$'))
-            elif 'error' in result_json:
-                print("ERROR:", 'Could not generate valid SQL for this question')
-                print(result_json['error'])
-
-if __name__ == "__main__":
-    main()
-
-
-
-
-
-
-
-

+ 0 - 42
recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/prompts/base_prompt.txt

@@ -1,42 +0,0 @@
-You are Groq Advisor, and you are tasked with generating SQL queries for DuckDB based on user questions about data stored in two tables derived from CSV files:
-
-Table: employees.csv
-Columns:
-employee_id (INTEGER): A unique identifier for each employee.
-name (VARCHAR): The full name of the employee.
-email (VARCHAR): employee's email address
-
-Table: purchases.csv
-Columns:
-purchase_id (INTEGER): A unique identifier for each purchase.
-purchase_date (DATE): Date of purchase
-employee_id (INTEGER): References the employee_id from the employees table, indicating which employee made the purchase.
-amount (FLOAT): The monetary value of the purchase.
-product_name (STRING): The name of the product purchased
-
-Given a user's question about this data, write a valid DuckDB SQL query that accurately extracts or calculates the requested information from these tables and adheres to SQL best practices for DuckDB, optimizing for readability and performance where applicable.
-
-Here are some tips for writing DuckDB queries:
-* DuckDB syntax requires querying from the .csv file itself, i.e. employees.csv and purchases.csv. For example: SELECT * FROM employees.csv as employees
-* All tables referenced MUST be aliased
-* DuckDB does not implicitly include a GROUP BY clause
-* CURRENT_DATE gets today's date
-* Aggregated fields like COUNT(*) must be appropriately named
-
-And some rules for querying the dataset:
-* Never include employee_id in the output - show employee name instead
-
-Also note that:
-* Valid values for product_name include 'Tesla','iPhone' and 'Humane pin'
-
-
-Question:
---------
-{user_question}
---------
-Reminder: Generate a DuckDB SQL to answer to the question:
-* respond as a valid JSON Document
-* [Best] If the question can be answered with the available tables: {{"sql": <sql here>}} 
-* If the question cannot be answered with the available tables: {{"error": <explanation here>}}
-* Ensure that the entire output is returned on only one single line
-* Keep your query as simple and straightforward as possible; do not use subqueries

+ 0 - 4
recipes/3p_integrations/groq/groq-example-templates/text-to-sql-json-mode/requirements.txt

@@ -1,4 +0,0 @@
-duckdb
-groq
-sqlparse
-pandas

+ 0 - 53
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/README.md

@@ -1,53 +0,0 @@
-# Executing Verified Queries with Function Calling
-
-A command line application that allows users to ask questions about their DuckDB data using the Groq API. The application uses function calling to find the most similar pre-verified query to the user's question, execute it against the data, and return the results.
-
-## Features
-
-- **Function Calling**: The application uses function calling to match the user's question to the most relevant pre-verified SQL query.
-
-- **SQL Execution**: The application executes the selected SQL query on a DuckDB database and displays the result.
-
-## Functions
-
-- `get_verified_queries(directory_path)`: Reads YAML files from the specified directory and loads the verified SQL queries and their descriptions.
-
-- `execute_duckdb_query_function_calling(query_name, verified_queries_dict)`: Executes the provided SQL query using DuckDB and returns the result as a DataFrame.
-
-## Data
-
-The application queries data from CSV files located in the data folder:
-
-- `employees.csv`: Contains employee data including their ID, full name, and email address.
-
-- `purchases.csv`: Records purchase details including purchase ID, date, associated employee ID, amount, and product name.
-
-## Verified Queries
-
-The verified SQL queries and their descriptions are stored in YAML files located in the `verified-queries` folder. Descriptions are used to semantically map prompts to queries:
-
-- `most-recent-purchases.yaml`: Returns the 5 most recent purchases
-
-- `most-expensive-purchase.yaml`: Finds the most expensive purchases
-
-- `number-of-teslas.yaml`: Counts the number of Teslas purchased
-
-- `employees-without-purchases.yaml`: Gets employees without any recent purchases
-
-## Usage
-
-<!-- markdown-link-check-disable -->
-
-You will need to store a valid Groq API Key as a secret to proceed with this example. You can generate one for free [here](https://console.groq.com/keys).
-
-<!-- markdown-link-check-enable -->
-
-You can [fork and run this application on Replit](https://replit.com/@GroqCloud/Execute-Verified-SQL-Queries-with-Function-Calling) or run it on the command line with `python main.py`.
-
-## Customizing with Your Own Data
-
-This application is designed to be flexible and can be easily customized to work with your own data. If you want to use your own data, follow these steps:
-
-1. **Replace the CSV files**: The application queries data from CSV files located in the `data` folder. Replace these files with your own CSV files.
-
-2. **Modify the verified queries**: The verified SQL queries and their descriptions are stored in YAML files located in the `verified-queries` folder. Replace these files with your own verified SQL queries and descriptions.

+ 0 - 8
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/data/employees.csv

@@ -1,8 +0,0 @@
-employee_id,name,email
-1,Richard Hendricks,richard@piedpiper.com
-2,Erlich Bachman,erlich@aviato.com
-3,Dinesh Chugtai,dinesh@piedpiper.com
-4,Bertram Gilfoyle,gilfoyle@piedpiper.com
-5,Jared Dunn,jared@piedpiper.com
-6,Monica Hall,monica@raviga.com
-7,Gavin Belson,gavin@hooli.com

+ 0 - 6
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/data/purchases.csv

@@ -1,6 +0,0 @@
-purchase_id,purchase_date,product_name,employee_id,amount
-1,'2024-02-01',iPhone,1,750
-2,'2024-02-02',Tesla,2,70000
-3,'2024-02-03',Humane pin,3,500
-4,'2024-02-04',iPhone,4,700
-5,'2024-02-05',Tesla,5,75000

+ 0 - 158
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/main.py

@@ -1,158 +0,0 @@
-import os
-from groq import Groq
-import duckdb
-import yaml
-import glob
-import json
-
-def get_verified_queries(directory_path):
-    """
-    Reads YAML files from the specified directory, loads the verified SQL queries and their descriptions,
-    and stores them in a dictionary.
-
-    Parameters:
-        directory_path (str): The path to the directory containing the YAML files with verified queries.
-
-    Returns:
-        dict: A dictionary where the keys are the names of the YAML files (without the directory path and file extension)
-              and the values are the parsed content of the YAML files.
-    """
-    verified_queries_yaml_files = glob.glob(os.path.join(directory_path, '*.yaml'))
-    verified_queries_dict = {}
-    for file in verified_queries_yaml_files:
-        with open(file, 'r') as stream:
-            try:
-                file_name = file[len(directory_path):-5]
-                verified_queries_dict[file_name] = yaml.safe_load(stream)
-            except yaml.YAMLError as exc:
-                continue
-        
-    return verified_queries_dict
-
-
-def execute_duckdb_query_function_calling(query_name,verified_queries_dict):
-    """
-    Executes a SQL query from the verified queries dictionary using DuckDB and returns the result as a DataFrame.
-
-    Parameters:
-        query_name (str): The name of the query to be executed, corresponding to a key in the verified queries dictionary.
-        verified_queries_dict (dict): A dictionary containing verified queries, where the keys are query names and the values
-                                      are dictionaries with query details including the SQL statement.
-
-    Returns:
-        pandas.DataFrame: The result of the executed query as a DataFrame.
-    """
-    
-    original_cwd = os.getcwd()
-    os.chdir('data')
-
-    query = verified_queries_dict[query_name]['sql']
-    
-    try:
-        conn = duckdb.connect(database=':memory:', read_only=False)
-        query_result = conn.execute(query).fetchdf().reset_index(drop=True)
-    finally:
-        os.chdir(original_cwd)
-
-    return query_result
-
-
-model = "llama3-8b-8192"
-
-# Initialize the Groq client
-groq_api_key = os.getenv('GROQ_API_KEY')
-client = Groq(
-    api_key=groq_api_key
-)
-
-directory_path = 'verified-queries/'
-verified_queries_dict = get_verified_queries(directory_path)
-
-# Display the title and introduction of the application
-multiline_text = """
-Welcome! Ask questions about employee data or purchase details, like "Show the 5 most recent purchases" or "What was the most expensive purchase?". The app matches your question to pre-verified SQL queries for accurate results.
-"""
-
-print(multiline_text)
-
-    
-while True:
-    # Get user input from the console
-    user_input = input("You: ")
-
-    
-    #Simplify verified_queries_dict to just show query name and description
-    query_description_mapping = {key: subdict['description'] for key, subdict in verified_queries_dict.items()}
-    
-    # Step 1: send the conversation and available functions to the model
-    # Define the messages to be sent to the Groq API
-    messages = [
-        {
-            "role": "system",
-            "content": '''You are a function calling LLM that uses the data extracted from the execute_duckdb_query_function_calling function to answer questions around a DuckDB dataset.
-
-            Extract the query_name parameter from this mapping by finding the one whose description best matches the user's question: 
-            {query_description_mapping}
-            '''.format(query_description_mapping=query_description_mapping)
-        },
-        {
-            "role": "user",
-            "content": user_input,
-        }
-    ]
-
-    # Define the tool (function) to be used by the Groq API
-    tools = [
-        {
-            "type": "function",
-            "function": {
-                "name": "execute_duckdb_query_function_calling",
-                "description": "Executes a verified DuckDB SQL Query",
-                "parameters": {
-                    "type": "object",
-                    "properties": {
-                        "query_name": {
-                            "type": "string",
-                            "description": "The name of the verified query (i.e. 'most-recent-purchases')",
-                        }
-                    },
-                    "required": ["query_name"],
-                },
-            },
-        }
-    ]
-
-    # Send the conversation and available functions to the Groq API
-    response = client.chat.completions.create(
-        model=model,
-        messages=messages,
-        tools=tools,
-        tool_choice="auto",  
-        max_tokens=4096
-    )
-
-    # Extract the response message and any tool calls from the response
-    response_message = response.choices[0].message
-    tool_calls = response_message.tool_calls
-
-    # Define a dictionary of available functions
-    available_functions = {
-        "execute_duckdb_query_function_calling": execute_duckdb_query_function_calling,
-    }
-
-    # Iterate over the tool calls in the response
-    for tool_call in tool_calls:
-        function_name = tool_call.function.name  # Get the function name
-        function_to_call = available_functions[function_name]  # Get the function to call
-        function_args = json.loads(tool_call.function.arguments)  # Parse the function arguments
-        print('Query found: ', function_args.get("query_name"))
-        
-        # Call the function with the provided arguments
-        function_response = function_to_call(
-            query_name=function_args.get("query_name"),
-            verified_queries_dict=verified_queries_dict
-        )
-
-    # Print the function response (query result)
-    print(function_response)
-

+ 0 - 9
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/requirements.txt

@@ -1,9 +0,0 @@
-groq
-sentence-transformers
-langchain_community
-scikit-learn
-numpy
-duckdb
-pyyaml
-sqlparse
-tabulate

+ 0 - 7
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/employees-without-purchases.yaml

@@ -1,7 +0,0 @@
-description: Employees without a purchase since Feb 1, 2024
-sql: |
-  SELECT employees.name as employees_without_purchases
-  FROM employees.csv AS employees
-  LEFT JOIN purchases.csv AS purchases ON employees.employee_id = purchases.employee_id
-  AND purchases.purchase_date > '2024-02-01'
-  WHERE purchases.purchase_id IS NULL

+ 0 - 9
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/most-expensive-purchase.yaml

@@ -1,9 +0,0 @@
-description: Employee with the most expensive purchase
-sql: |
-  SELECT employees.name AS employee_name,
-        MAX(amount) AS max_purchase_amount
-  FROM purchases.csv AS purchases
-  JOIN employees.csv AS employees ON purchases.employee_id = employees.employee_id
-  GROUP BY employees.name
-  ORDER BY max_purchase_amount DESC
-  LIMIT 1

+ 0 - 9
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/most-recent-purchases.yaml

@@ -1,9 +0,0 @@
-description: Five most recent purchases
-sql: |
-  SELECT purchases.product_name,
-         purchases.amount,
-         employees.name
-  FROM purchases.csv AS purchases
-  JOIN employees.csv AS employees ON purchases.employee_id = employees.employee_id
-  ORDER BY purchases.purchase_date DESC
-  LIMIT 5;

+ 0 - 6
recipes/3p_integrations/groq/groq-example-templates/verified-sql-function-calling/verified-queries/number-of-teslas.yaml

@@ -1,6 +0,0 @@
-description: Number of Teslas purchased
-sql: |
-  SELECT COUNT(*) as number_of_teslas
-  FROM purchases.csv AS p
-  JOIN employees.csv AS e ON e.employee_id = p.employee_id
-  WHERE p.product_name = 'Tesla'

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 1708
recipes/3p_integrations/groq/llama3_cookbook_groq.ipynb


+ 0 - 26
recipes/3p_integrations/lamini/text2sql_memory_tuning/README.md

@@ -1,26 +0,0 @@
-# Tune Llama 3 for text-to-SQL and improve accuracy from 30% to 95%
-
-This repo and notebook `meta_lamini.ipynb` demonstrate how to tune Llama 3 to generate valid SQL queries and improve accuracy from 30% to 95%.
-
-In this notebook we'll be using Lamini, and more specifically, Lamini Memory Tuning.
-
-Lamini is an integrated platform for LLM inference and tuning for the enterprise. Lamini Memory Tuning is a new tool you can use to embed facts into LLMs that improves factual accuracy and reduces hallucinations. Inspired by information retrieval, this method has set a new standard of accuracy for LLMs with less developer effort.
-
-Learn more about Lamini Memory Tuning: https://www.lamini.ai/blog/lamini-memory-tuning
-
-Please head over to https://app.lamini.ai/account to get your free api key.
-
-You can authenticate by writing the following to a file `~/.lamini/configure.yaml`
-
-```
-production:
-    key: <YOUR-LAMINI-API-KEY>
-```
-
-This tuning tutorial uses the `nba_roster` sqlite database to tune a Llama 3 model.
-
-## Additional resources
-
-▫️ Fortune 500 case study: http://www.lamini.ai/blog/llm-text-to-sql <br>
-▫️ Technical paper: https://github.com/lamini-ai/Lamini-Memory-Tuning/blob/main/research-paper.pdf <br>
-▫️ Model weights: https://huggingface.co/engineering-lamini/lamini-1-random

BIN
recipes/3p_integrations/lamini/text2sql_memory_tuning/assets/manual_filtering.png


BIN
recipes/3p_integrations/lamini/text2sql_memory_tuning/assets/website.png


Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 40
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/gold-test-set-v2.jsonl


+ 0 - 20
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/gold-test-set.jsonl

@@ -1,20 +0,0 @@
-{"question": "What is the 99th percentile salary in the NBA?", "answer": "46741590", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
-{"question": "What is the 75th percentile salary in the NBA?", "answer": "13932008", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
-{"question": "What is the 25th percentile salary in the NBA?", "answer": "2413304", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
-{"question": "What is the median weight in the NBA?", "answer": "215", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average weight in the NBA?", "answer": "214.98", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
-{"question": "What is the median height in the NBA?", "answer": "6.58333333333333", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average height in the NBA?", "answer": "6.54986111111111", "sql": "select AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster;"}
-{"question": "Can you tell me how many players are in the NBA?", "answer": "600", "sql": "select count(*) from nba_roster;"}
-{"question": "Would you please let me know what the highest paid players are for each position?", "answer": "The highest paid players are Nikola Jokic (C), Paul George (F), Norman Powell (G), Kevin Durant (PF), Stephen Curry (PG), LeBron James (SF), Bradley Beal (SG).", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "Is Jalen Johnson 23 years old?", "answer": "No, Jalen Johnson is 21 years old", "sql" : "Select name, age from nba_roster where name='Jalen Johnson';"}
-{"question": "Who is the oldest player on the Brooklyn Nets?", "answer": "Spencer Dinwiddie, Dorian Finney-Smith, Royce O'Neale", "sql" : "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
-{"question": "Who has the higest salary on the Memphis Grizzlies?", "answer": "Ja Morant", "sql" : "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "answer": "Darius Garland", "sql" : "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest paid center on the Dallas Mavericks?", "answer": "Dereck Lively II", "sql" : "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How much is Marcus Smart getting paid?", "answer": "$18,833,712", "sql" : "select salary from nba_roster where name='Marcus Smart';"}
-{"question": "What's the average age of the Trail Blazers?", "answer": "24", "sql" : "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
-{"question": "What's the median age of the NBA?", "answer": "25", "sql" : "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What's the median age of the Miami Heat?", "answer": "26", "sql" : "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')/2;"}
-{"question": "What are the 5 teams with the oldest average age in the NBA", "answer": "Golden State Warriors, Milwaukee Bucks, Miami Heat, LA Clippers, Phoenix Suns", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
-{"question": "What is the average salary of Power Forward players in the NBA", "answer": "$10948045", "sql": "select avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary from nba_roster where POS = 'PF';"}

+ 0 - 220
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/archive/generated_queries_large_filtered_cleaned.jsonl

@@ -1,220 +0,0 @@
-{"question": "What is the average height of NBA players who are 25 years old or older", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) AS average_height FROM nba_roster WHERE CAST(AGE AS INTEGER) >= 25;"}
-{"question": "Which team has the most players who attended the University of Michigan", "sql": "SELECT team, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE = 'Michigan' GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the most common position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster;"}
-{"question": "What position has the most players aged 30 or older in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE AGE >= 30 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the age of the oldest 25% of the players in the NBA", "sql": "SELECT CAST(AGE AS INTEGER) AS percentile FROM nba_roster ORDER BY percentile LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster) * 75/100 - 1;"}
-{"question": "What is the average age of players at each position in the NBA", "sql": "SELECT POS, AVG(AGE) AS avg_age FROM nba_roster GROUP BY POS;"}
-{"question": "What is the position with the highest average salary in the NBA", "sql": "SELECT POS, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS avg_salary FROM nba_roster GROUP BY POS ORDER BY avg_salary DESC LIMIT 1;"}
-{"question": "What is the average age of the youngest players in the NBA", "sql": "SELECT AVG(AGE) as avg_age FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What is the team with the highest average salary in the NBA", "sql": "SELECT TEAM, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY TEAM ORDER BY avg_salary DESC LIMIT 1;"}
-{"question": "Who are the top 5 most valuable players in the NBA, considering both their salary and jersey number", "sql": "SELECT name, (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) + CAST(Jersey AS INTEGER)) AS total_value, POS FROM nba_roster WHERE SALARY!= '--' ORDER BY total_value DESC LIMIT 5;"}
-{"question": "Which three teams in the NBA have the highest average salary", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC LIMIT 3;"}
-{"question": "How many players in the NBA are more than 5 years older than the average age of all players", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5;"}
-{"question": "What is the position with the oldest average age in the NBA", "sql": "SELECT POS, AVG(AGE) as avg_age FROM nba_roster GROUP BY POS ORDER BY avg_age DESC LIMIT 1;"}
-{"question": "Which 10 teams in the NBA have the oldest average age among their players", "sql": "SELECT Team, AVG(AGE) AS avg_age FROM nba_roster GROUP BY Team ORDER BY avg_age DESC LIMIT 10;"}
-{"question": "Who is the tallest player in the NBA", "sql": "SELECT NAME, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster GROUP BY NAME ORDER BY height DESC LIMIT 1;"}
-{"question": "Who are the top 5 highest-paid players in the NBA", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 5;"}
-{"question": "How many players in the NBA are older than 10 years old", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE AGE > 10;"}
-{"question": "What are the top 3 colleges with the highest average salaries for their NBA players", "sql": "SELECT COLLEGE, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY average_salary DESC LIMIT 3;"}
-{"question": "What is the 75th percentile salary in the NBA", "sql": "SELECT (SELECT CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) as percentile FROM nba_roster WHERE SALARY!= '--' ORDER BY percentile ASC LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster WHERE SALARY!= '--')*75/100-1) as seventy_fifth_percentile_salary;"}
-{"question": "What is the average age of players on each NBA team", "sql": "SELECT TEAM, AVG(AGE) as average_age FROM nba_roster GROUP BY TEAM ORDER BY average_age;"}
-{"question": "What is the average age of the players on the Toronto Raptors", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "What is the age range of players on each team in the NBA", "sql": "SELECT team, MIN(AGE) as youngest_player, MAX(AGE) as oldest_player FROM nba_roster GROUP BY team;"}
-{"question": "What are the min and max salaries for each team", "sql": "SELECT MIN(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as min_salary, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary, team FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY min_salary DESC, max_salary DESC;"}
-{"question": "What is the name of the player who attended the college with the longest name", "sql": "SELECT NAME, COLLEGE FROM nba_roster WHERE COLLEGE!= '--' ORDER BY LENGTH(COLLEGE) DESC LIMIT 1;"}
-{"question": "What is the number of players on each team in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team;"}
-{"question": "What is the most represented college in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS frequency FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1;"}
-{"question": "How many Boston Celtics players did not attend college", "sql": "SELECT COUNT(*) as count FROM nba_roster WHERE team='Boston Celtics' AND COLLEGE!='--';"}
-{"question": "What is the team with the highest average age in the NBA", "sql": "SELECT AVG(AGE) as average_age, TEAM FROM nba_roster GROUP BY TEAM ORDER BY average_age DESC LIMIT 1;"}
-{"question": "What is the average salary of all players in the NBA, excluding those with a salary of '--'", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the average salary for players of each age group in the NBA, excluding those with unknown salaries", "sql": "SELECT AVG(AGE) AS avg_age, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY AGE ORDER BY avg_age;"}
-{"question": "Who is the player with the highest jersey number in the NBA", "sql": "SELECT NAME, JERSEY FROM nba_roster WHERE JERSEY!= 'NA' ORDER BY JERSEY DESC LIMIT 1;"}
-{"question": "What is the number of players on the Toronto Raptors", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "What is the average age of all NBA players with a known salary", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the position with the highest average age among players between the ages of 22 and 25", "sql": "SELECT AVG(AGE) AS avg_age, POS FROM nba_roster WHERE AGE BETWEEN 22 AND 25 GROUP BY POS ORDER BY avg_age DESC LIMIT 1;"}
-{"question": "What are the top 5 positions in the NBA with the highest average salary", "sql": "SELECT POS, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster GROUP BY POS ORDER BY avg_salary DESC LIMIT 5;"}
-{"question": "What are the top 5 highest-paid players in the NBA", "sql": "SELECT * FROM nba_roster ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 5;"}
-{"question": "Which player has the highest average salary in the NBA", "sql": "SELECT name, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY name ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "Which team has the tallest players on average", "sql": "SELECT TEAM, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster GROUP BY TEAM ORDER BY average_height DESC LIMIT 1;"}
-{"question": "Who is the highest-paid player in the NBA who has attended a college with an unknown college affiliation", "sql": "SELECT NAME FROM nba_roster WHERE SALARY!= '--' AND COLLEGE = '--' ORDER BY CAST(SUBSTR(SALARY, 2) as INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average age and salary for each position in the NBA", "sql": "SELECT POS, AVG(AGE) as avg_age, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster GROUP BY POS;"}
-{"question": "What is the number of unique colleges represented in the NBA", "sql": "SELECT COUNT(DISTINCT COLLEGE) FROM nba_roster WHERE COLLEGE!= '--';"}
-{"question": "Which team has the oldest average age among all NBA teams", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 1;"}
-{"question": "What is the highest-paid player on the Los Angeles Lakers", "sql": "SELECT salary, name FROM nba_roster WHERE team='Los Angeles Lakers' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',', '') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which NBA team has the most players from the University of Michigan", "sql": "SELECT team, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE='Michigan' GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What are the most common positions in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC;"}
-{"question": "What are the top 5 teams with the highest average salary in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster GROUP BY team ORDER BY average_salary DESC LIMIT 5;"}
-{"question": "How many NBA players attended a college other than '--'", "sql": "SELECT COUNT(*) FROM nba_roster WHERE COLLEGE!= '--';"}
-{"question": "Who is the highest-paid player on the Memphis Grizzlies", "sql": "select name, team, salary from nba_roster where team='Memphis Grizzlies' and SALARY!='--' order by CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) desc limit 1;"}
-{"question": "Which team has the highest average salary", "sql": "SELECT Team, AVG(CAST(SUBSTR(SALARY, 2, LENGTH(SALARY)-2) AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY Team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "What college has the highest average age of its alumni in the NBA", "sql": "SELECT NAME, AVG(AGE) as average_age FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY average_age DESC LIMIT 1;"}
-{"question": "Who is the highest-paid player in the NBA who has attended college", "sql": "SELECT NAME FROM nba_roster WHERE COLLEGE!= '--' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest-paid player in the NBA who is older than 25 years old", "sql": "SELECT name, salary FROM nba_roster WHERE AGE > 25 AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average salary for each age group in the NBA", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary, AGE as age_group FROM nba_roster GROUP BY AGE;"}
-{"question": "What is the most common age and position combination in the NBA", "sql": "SELECT AGE, POS, COUNT(*) AS count FROM nba_roster GROUP BY AGE, POS ORDER BY count DESC;"}
-{"question": "Who are the top 5 players with the highest jersey numbers in the NBA", "sql": "SELECT NAME, Jersey FROM nba_roster WHERE Jersey IN (SELECT Jersey FROM nba_roster ORDER BY CAST(CAST(Jersey AS INTEGER) AS INTEGER) DESC LIMIT 5);"}
-{"question": "What is the average height of players in the NBA who are 25 years old or younger", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) AS avg_height FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What are the top 5 highest-paid players in each position in the NBA", "sql": "WITH ranked_positions AS (SELECT *, DENSE_RANK() OVER (PARTITION BY POS ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC) AS rank FROM nba_roster) SELECT * FROM ranked_positions WHERE rank <= 5;"}
-{"question": "How many players in the NBA are older than 25 years old", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE > 25;"}
-{"question": "What is the most common position for players under the age of 25 in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE AGE <= 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "Who is the player with the highest jersey number on the Golden State Warriors", "sql": "SELECT NAME FROM nba_roster WHERE TEAM = 'Golden State Warriors' AND CAST(Jersey AS INTEGER) = (SELECT MAX(CAST(Jersey AS INTEGER)) FROM nba_roster WHERE TEAM = 'Golden State Warriors');"}
-{"question": "Which five teams in the NBA have the largest rosters", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team ORDER BY num_players DESC LIMIT 5;"}
-{"question": "What is the average salary for each position in the NBA, and which position has the highest average salary", "sql": "SELECT POS, AVG(CAST(SUBSTR(SALARY, 2) AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS ORDER BY avg_salary DESC;"}
-{"question": "Which team has the highest average salary in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "Who is the oldest player in the NBA, on average, among those with known salaries", "sql": "SELECT NAME, AVG(AGE) as avg_age FROM nba_roster WHERE SALARY!= '--' GROUP BY NAME ORDER BY avg_age DESC LIMIT 1;"}
-{"question": "What is the total salary of all players in the NBA who are 25 years old or younger", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE AGE <= 25;"}
-{"question": "Who is the second-highest paid player on the Memphis Grizzlies", "sql": "select name, team, salary from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1 OFFSET 1;"}
-{"question": "Who are the top 3 highest-paid players in the NBA", "sql": "SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC) as row_num FROM nba_roster WHERE SALARY!= '--') AS subquery WHERE row_num <= 3;"}
-{"question": "What is the average age of players for each team in the NBA", "sql": "SELECT team, AVG(AGE) AS avg_age FROM nba_roster GROUP BY team;"}
-{"question": "How many Boston Celtics players have a salary greater than $5,000,000", "sql": "SELECT COUNT(*) as count FROM nba_roster WHERE team='Boston Celtics' AND CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) > 5000000;"}
-{"question": "What is the average age of the players in the NBA roster", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster;"}
-{"question": "Who are the top 3 highest-paid players at each position in the NBA", "sql": "WITH ranked_positions AS (SELECT *, DENSE_RANK() OVER (PARTITION BY POS ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC) AS rank FROM nba_roster) SELECT * FROM ranked_positions WHERE rank <= 3;"}
-{"question": "Who is the oldest player on the Toronto Raptors", "sql": "SELECT name, age FROM nba_roster WHERE team='Toronto Raptors' ORDER BY age DESC LIMIT 1;"}
-{"question": "Which team has the oldest average age in the NBA", "sql": "SELECT Team, AVG(AGE) AS Average_Age FROM nba_roster GROUP BY Team ORDER BY Average_Age DESC LIMIT 1;"}
-{"question": "What are the positions with the most players under the age of 25", "sql": "SELECT pos, COUNT(*) as num_players FROM nba_roster WHERE age < 25 GROUP BY pos;"}
-{"question": "Who are the top 3 players in the NBA roster with the highest jersey numbers", "sql": "SELECT NAME, JERSEY FROM nba_roster ORDER BY JERSEY DESC LIMIT 3;"}
-{"question": "What is the average height of the youngest players in the NBA", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster WHERE age <= 25;"}
-{"question": "What is the oldest player in the NBA", "sql": "SELECT NAME FROM nba_roster WHERE AGE = (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "What are the top 5 teams with the highest average salaries in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC LIMIT 5;"}
-{"question": "What is the highest-paid player on the same team as a Toronto Raptors player", "sql": "SELECT name, team, salary FROM nba_roster WHERE team IN (SELECT team FROM nba_roster WHERE name IN (SELECT name FROM nba_roster WHERE team='Toronto Raptors')) ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which teams have the most young players in the NBA", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE < 25 GROUP BY team order by num_players desc;"}
-{"question": "What is the position with the most players in the age range of 22-25 in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE AGE BETWEEN 22 AND 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the number of players in the NBA who are older than the average age of all players in the league", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE > (SELECT AVG(AGE) FROM nba_roster);"}
-{"question": "What is the most common position for young players in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE AGE BETWEEN 22 AND 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "Which three teams in the NBA have the largest rosters", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team ORDER BY num_players DESC LIMIT 3;"}
-{"question": "What are the top 5 teams with the oldest average age of players", "sql": "SELECT Team, AVG(AGE) as average_age FROM nba_roster GROUP BY Team ORDER BY average_age DESC LIMIT 5;"}
-{"question": "What age group has the most players in the NBA", "sql": "SELECT AGE, COUNT(*) as count FROM nba_roster GROUP BY AGE ORDER BY count DESC;"}
-{"question": "What is the average age of players in each position in the NBA", "sql": "SELECT AVG(AGE) AS avg_age, POS FROM nba_roster GROUP BY POS ORDER BY avg_age;"}
-{"question": "What are the top 3 highest-paid players from Duke University", "sql": "SELECT name, salary FROM nba_roster WHERE COLLEGE = 'Duke' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 3;"}
-{"question": "Which team has the most non-point guards", "sql": "SELECT name, team FROM nba_roster WHERE team IN (SELECT team FROM nba_roster WHERE POS='PG' GROUP BY team HAVING COUNT(*) > 5 ORDER BY COUNT(*) DESC LIMIT 1) AND POS!= 'PG';"}
-{"question": "Who is the player with the highest jersey number on the Boston Celtics", "sql": "SELECT NAME FROM nba_roster WHERE team='Boston Celtics' AND CAST(Jersey AS INTEGER) = (SELECT MAX(CAST(Jersey AS INTEGER)) FROM nba_roster WHERE team='Boston Celtics');"}
-{"question": "Which teams have the most players aged 25 or older", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE age >= 25 GROUP BY team;"}
-{"question": "How many players in the NBA are older than 20 years old", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE age + (2022 - 2000) > 10;"}
-{"question": "What is the average age and height of Power Forward players in the NBA", "sql": "SELECT AVG(CAST(SUBSTR(AGE, 1, INSTR(AGE,' ') - 1) AS INTEGER)) as average_age, AVG(CAST(SUBSTR(AGE, INSTR(AGE,' ') + 1) AS FLOAT)) as average_height FROM nba_roster WHERE POS = 'PF';"}
-{"question": "Which team has the most players under the age of 36", "sql": "SELECT team, COUNT(*) FROM nba_roster WHERE AGE < 3*12 GROUP BY team ORDER BY COUNT(*) DESC LIMIT 1;"}
-{"question": "What is the number of players under the age of 25 with known heights for each position in the NBA", "sql": "SELECT pos, COUNT(*) as num_players FROM nba_roster WHERE AGE < 25 AND HT!= 'NA' GROUP BY pos;"}
-{"question": "What is the average salary of NBA players 25 years old or younger", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE <= 25 AND SALARY!= '--';"}
-{"question": "What is the most popular position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE POS!= 'NA' GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What position has the most players earning a salary above the average salary in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) > (SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) FROM nba_roster) GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "Which three teams in the NBA have the highest average salaries", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC LIMIT 3;"}
-{"question": "Which five colleges have produced the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 5;"}
-{"question": "Which teams have the most players who are at least 5 years older than the youngest player in the league", "sql": "SELECT team, COUNT(*) AS num_players FROM nba_roster WHERE age - (SELECT MIN(age) FROM nba_roster) > 5 GROUP BY team ORDER BY num_players DESC;"}
-{"question": "Who are the Boston Celtics players aged 25 or older, listed in order of their jersey number", "sql": "SELECT name FROM nba_roster WHERE team='Boston Celtics' AND age>=25 ORDER BY CAST(Jersey AS INTEGER) ASC;"}
-{"question": "What is the average salary of all NBA players who are 25 years or older", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE AGE >= 25;"}
-{"question": "What is the highest-paid player on the Cleveland Cavaliers", "sql": "SELECT salary, name FROM nba_roster WHERE team='Cleveland Cavaliers' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the highest-paid player on the Toronto Raptors", "sql": "SELECT name, salary FROM nba_roster WHERE team='Toronto Raptors' AND salary!= '--' ORDER BY CAST(REPLACE(REPLACE(salary, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the highest average salary for each position in the NBA", "sql": "SELECT POS, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS max_salary FROM nba_roster GROUP BY POS;"}
-{"question": "What is the average salary of all NBA players, excluding those with unknown salaries", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "Who is the youngest player on the Toronto Raptors", "sql": "SELECT NAME FROM nba_roster WHERE AGE = (SELECT MIN(AGE) FROM nba_roster WHERE TEAM = 'Toronto Raptors');"}
-{"question": "What is the height of the 75th percentile of NBA players", "sql": "SELECT CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (SELECT COUNT(*) FROM nba_roster)*0.75;"}
-{"question": "Who are the top 5 players in the NBA with the highest total value, considering both their salary and jersey number", "sql": "SELECT name, (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) + CAST(Jersey AS INTEGER)) as total_value, POS FROM nba_roster WHERE SALARY!= '--' AND Jersey!= 'NA' ORDER BY total_value DESC LIMIT 5;"}
-{"question": "Which colleges have more than one player in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster GROUP BY COLLEGE HAVING COUNT(*) > 1;"}
-{"question": "Who is the highest-paid guard on the Los Angeles Lakers", "sql": "SELECT name FROM nba_roster WHERE team='Los Angeles Lakers' AND POS='G' AND SALARY!='--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which players in the NBA are taller than 6'7", "sql": "SELECT name FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 >= 6.67;"}
-{"question": "What is the average height of all players in the NBA", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height from nba_roster;"}
-{"question": "How many players on the Boston Celtics did not attend college", "sql": "SELECT COUNT(*) FROM nba_roster WHERE team='Boston Celtics' AND COLLEGE!='--';"}
-{"question": "What is the team with the most players 30 or older in the NBA", "sql": "SELECT TEAM, COUNT(*) as num_players FROM nba_roster WHERE AGE >= 30 GROUP BY TEAM ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What are the top 10 most common positions in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 10;"}
-{"question": "What is the number of players on each team who earn more than $1,000,000 and the total number of players on each team", "sql": "SELECT team, COUNT(*) as num_players, SUM(CASE WHEN CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) > 1000000 THEN 1 ELSE 0 END) as num_players_above_1m FROM nba_roster WHERE SALARY!= '--' GROUP BY team;"}
-{"question": "Who is the player with the highest average salary in the NBA", "sql": "SELECT name, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY name ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "Who are the top 5 players in the NBA in terms of their total value, combining their salary and jersey number", "sql": "SELECT name, (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) + CAST(Jersey AS INTEGER)) as total_value, POS FROM nba_roster WHERE SALARY!= '--' ORDER BY total_value DESC LIMIT 5;"}
-{"question": "How many players are on the Toronto Raptors", "sql": "SELECT COUNT(*) FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "Which team has the most players over the age of 30", "sql": "SELECT Team, COUNT(*) as count FROM nba_roster WHERE CAST(AGE as INTEGER) > 30 GROUP BY Team ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average height of point guards in the NBA", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as average_height FROM nba_roster WHERE POS='PG';"}
-{"question": "What is the average salary of players in the NBA who are more than 5 years older than the average age of all players", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5 AND SALARY!= '--';"}
-{"question": "Which five teams in the NBA have the most players on their roster", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team ORDER BY num_players DESC LIMIT 5;"}
-{"question": "What college has produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as frequency FROM nba_roster GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1;"}
-{"question": "What is the number of players in the NBA who are 25 years old or younger", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What are the top 5 players who have played the most seasons in each position in the NBA", "sql": "SELECT pos, name, COUNT(*) as seasons_played FROM nba_roster WHERE SALARY!= '--' GROUP BY pos, name ORDER BY seasons_played DESC LIMIT 5;"}
-{"question": "What are the average salaries for each position in the NBA, and which positions have the highest average salaries", "sql": "SELECT POS, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS avg_salary FROM nba_roster GROUP BY POS ORDER BY avg_salary DESC;"}
-{"question": "Which team has the most players who are significantly older than the average age of all NBA players", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5 GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the most common height range among NBA players under the age of 25", "sql": "SELECT HT, COUNT(*) as count FROM nba_roster WHERE AGE <= 25 GROUP BY HT ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the breakdown of players by position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS;"}
-{"question": "What are the top 3 teams in the NBA with the highest average salary", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster GROUP BY team ORDER BY average_salary DESC LIMIT 3;"}
-{"question": "What is the most common age range and position combination among NBA players", "sql": "SELECT age_range, POS, COUNT(*) AS count FROM (SELECT CASE WHEN AGE <= 25 THEN 'Young' WHEN AGE <= 30 THEN 'Established' ELSE 'Veteran' END AS age_range, POS FROM nba_roster) AS subquery GROUP BY age_range, POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the team with the tallest average height in the NBA", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster group by team order by height desc limit 1;"}
-{"question": "What is the number of the player with the highest jersey number in the NBA", "sql": "SELECT NAME, JERSEY FROM nba_roster ORDER BY CAST(JERSEY AS INTEGER) DESC LIMIT 1;"}
-{"question": "How many players in the NBA are older than the sum of their jersey number and age", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE age + CAST(SUBSTR(Jersey, 1, INSTR(Jersey,' ')-1) AS INTEGER) > 5;"}
-{"question": "How many players in the NBA are under the age of 25", "sql": "SELECT COUNT(*) AS under_25 FROM nba_roster WHERE AGE < 25;"}
-{"question": "What are the top 5 teams in the NBA with the highest average salary", "sql": "SELECT Team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS avg_salary FROM nba_roster GROUP BY Team ORDER BY avg_salary DESC LIMIT 5;"}
-{"question": "What is the average age of players who attended the same college as Otto Porter Jr.", "sql": "SELECT COLLEGE, AVG(AGE) AS avg_age FROM nba_roster WHERE COLLEGE IN (SELECT COLLEGE FROM nba_roster WHERE NAME = 'Otto Porter Jr.') GROUP BY COLLEGE;"}
-{"question": "How many players in the NBA are at least 5 years older than the youngest player in the league", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT MIN(AGE) FROM nba_roster) > 5;"}
-{"question": "How many players in the NBA are more than 5 years older than the average age of all players in the league", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5;"}
-{"question": "What is the average salary of the Toronto Raptors players", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE team='Toronto Raptors' AND SALARY!= '--';"}
-{"question": "Who is the highest-paid player on the Los Angeles Lakers who attended college", "sql": "SELECT NAME FROM nba_roster WHERE TEAM = 'Los Angeles Lakers' AND COLLEGE!= '--' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the position with the most players in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "Who is the highest-paid player in the NBA, excluding those with unknown salaries", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE SALARY = (SELECT MAX(SALARY) FROM nba_roster WHERE SALARY!= '--');"}
-{"question": "What is the name and jersey number of the player with the highest jersey number in the NBA", "sql": "SELECT NAME, JERSEY FROM nba_roster WHERE JERSEY!= 'NA' ORDER BY CAST(JERSEY AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average age of the players in the NBA who are at least 6 feet 7 inches tall", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 >= 6.67;"}
-{"question": "What is the average age of players in the NBA who have a total of 12 years of experience or less", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE AGE * 12 * 5 <= (SELECT SUM(AGE * 12) FROM nba_roster);"}
-{"question": "Which team has the most players from the University of Michigan", "sql": "SELECT team, COUNT(*) as count FROM nba_roster WHERE COLLEGE='Michigan' GROUP BY team ORDER BY count DESC LIMIT 1;"}
-{"question": "Who is the tallest Power Forward in the NBA", "sql": "SELECT POS, NAME, MAX(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) AS max_height FROM nba_roster WHERE POS='PF';"}
-{"question": "What is the average age for each position in the NBA", "sql": "SELECT pos, AVG(AGE) AS avg_age FROM nba_roster WHERE POS IN ('PG', 'SG', 'SF', 'PF', 'C') GROUP BY pos;"}
-{"question": "How many players are currently on the Toronto Raptors", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "Which teams in the NBA have the oldest average age among their players", "sql": "SELECT TEAM, AVG(AGE) as avg_age FROM nba_roster WHERE SALARY!= '--' GROUP BY TEAM ORDER BY avg_age DESC;"}
-{"question": "What is the distribution of players across different positions in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS;"}
-{"question": "How many players in the NBA have been in the league for more than 10 years longer than the average age of all players", "sql": "SELECT COUNT(*) as long_tenured_players FROM nba_roster WHERE AGE > (SELECT AVG(AGE) FROM nba_roster) + 10;"}
-{"question": "What is the average height of all NBA players", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) as INTEGER)) FROM nba_roster;"}
-{"question": "Who is the oldest player from the University of Michigan to have played in the NBA", "sql": "SELECT NAME, MAX(AGE) as oldest FROM nba_roster WHERE COLLEGE='Michigan';"}
-{"question": "What are the most common colleges represented in the NBA, excluding players who did not attend college or did not disclose their college information", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE;"}
-{"question": "What is the average age of all NBA players who are older than 5 years old", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE > 5;"}
-{"question": "Who are the top 5 oldest Point Guards in the NBA", "sql": "SELECT * FROM nba_roster WHERE POS='PG' AND AGE > 25 ORDER BY AGE DESC LIMIT 5;"}
-{"question": "How many players in the NBA are older than 5 years old", "sql": "SELECT COUNT(*) FROM nba_roster WHERE age > 5;"}
-{"question": "How many players in the NBA have had a longer career than the average player and attended a college other than '--'", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5 AND COLLEGE!= '--';"}
-{"question": "What are the top 5 colleges that have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC LIMIT 5;"}
-{"question": "Who is the highest-paid player on the Boston Celtics who plays either Small Forward or Power Forward", "sql": "SELECT name, salary FROM nba_roster WHERE team='Boston Celtics' AND (POS='SF' OR POS='PF' OR POS='SF/PF') AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the number of players in the NBA who attended a college other than '--'?", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE!= '--';"}
-{"question": "How many young players in the NBA are earning a salary", "sql": "SELECT COUNT(*) as young_players FROM nba_roster WHERE AGE <= 25 AND SALARY!= '--';"}
-{"question": "Who are the top 3 players with the highest total value in the NBA", "sql": "SELECT name, team, (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) + CAST(Jersey AS INTEGER)) AS total_value FROM nba_roster WHERE SALARY!= '--' AND Jersey!= 'NA' ORDER BY total_value DESC LIMIT 3;"}
-{"question": "What is the average age of players on each team in the NBA, excluding those with unknown salaries", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_age ASC;"}
-{"question": "Which team has the most players who attended college", "sql": "SELECT TEAM, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY TEAM ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the average age and maximum salary for each position in the NBA", "sql": "SELECT pos, AVG(AGE) as avg_age, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "How many players in the NBA are 25 years old or younger", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What are the top 5 players in the NBA in terms of salary-to-age ratio", "sql": "SELECT NAME, CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) as salary, AGE FROM nba_roster WHERE SALARY!= '--' ORDER BY (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)/AGE) DESC LIMIT 5;"}
-{"question": "What are the top 3 players with the highest jersey numbers who are not Point Guards", "sql": "SELECT jersey, name FROM nba_roster WHERE pos!= 'PG' ORDER BY CAST(Jersey AS INTEGER) DESC LIMIT 3;"}
-{"question": "What is the most common college attended by NBA players", "sql": "SELECT COLLEGE, COUNT(*) AS frequency FROM nba_roster GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1;"}
-{"question": "Who is the tallest player in the league who plays either point guard, shooting guard, or small forward", "sql": "SELECT NAME, HT FROM nba_roster WHERE POS IN ('PG', 'SG', 'SF') ORDER BY HT DESC LIMIT 1;"}
-{"question": "What is the average salary of NBA players who attended the University of Michigan", "sql": "SELECT COLLEGE, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE COLLEGE='Michigan' GROUP BY COLLEGE;"}
-{"question": "What is the tallest team in the NBA", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')) AS INTEGER)) AS average_height FROM nba_roster GROUP BY team ORDER BY average_height DESC LIMIT 1;"}
-{"question": "What is the average height and age of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height, AVG(CAST(AGE AS INTEGER)) as age FROM nba_roster;"}
-{"question": "What positions have more than 5 years of experience compared to the average age of all players in the NBA", "sql": "SELECT POS, COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5 GROUP BY POS;"}
-{"question": "What is the second-highest paid player in the NBA", "sql": "SELECT name FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster WHERE SALARY!= '--') - 1;"}
-{"question": "What is the average age of the youngest power forward in the NBA", "sql": "SELECT AVG(AGE) AS avg_age, CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 AS height FROM nba_roster WHERE POS='PF' GROUP BY height ORDER BY avg_age ASC LIMIT 1;"}
-{"question": "What are the top 5 highest-paid players for each position in the NBA", "sql": "WITH ranked_positions AS (SELECT *, DENSE_RANK() OVER (PARTITION BY POS ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC) as rank FROM nba_roster WHERE SALARY!= '--') SELECT * FROM ranked_positions WHERE rank <= 5;"}
-{"question": "What is the tallest player on each team in the NBA", "sql": "SELECT team, MAX(HT), name as max_height FROM nba_roster WHERE HT!= 'NA' GROUP BY team;"}
-{"question": "What is the position with the oldest players in the NBA", "sql": "SELECT POS, AVG(AGE) AS avg_age FROM nba_roster GROUP BY POS ORDER BY avg_age DESC LIMIT 1;"}
-{"question": "What is the average height of the players in the Boston Celtics", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) AS average_height FROM nba_roster WHERE team='Boston Celtics';"}
-{"question": "How many Los Angeles Lakers players did not attend college", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE team='Los Angeles Lakers' AND COLLEGE!='--';"}
-{"question": "What is the average salary of players on the Toronto Raptors who are 25 years or older", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE team='Toronto Raptors' AND age>=25 AND SALARY!= '--';"}
-{"question": "Which teams have the most players under the age of 25", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE < 25 GROUP BY team ORDER BY num_players DESC;"}
-{"question": "What are the average height and average salary for each team in the NBA", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as avg_height, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster GROUP BY team;"}
-{"question": "Who are the top 5 players in the NBA with the highest jersey numbers", "sql": "SELECT NAME, JERSEY FROM nba_roster WHERE JERSEY!= 'NA' ORDER BY CAST(JERSEY AS INTEGER) DESC LIMIT 5;"}
-{"question": "Who is the highest-paid player from the University of Michigan in the NBA", "sql": "select name, salary from nba_roster where college='Michigan' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average age and height of players for each team in the NBA", "sql": "SELECT AVG(AGE) as avg_age, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as avg_height, TEAM FROM nba_roster GROUP BY TEAM;"}
-{"question": "What is the average age and height of players on teams with more than 5 players in the NBA", "sql": "SELECT TEAM, AVG(AGE) as avg_age, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as avg_height FROM nba_roster GROUP BY TEAM HAVING COUNT(*) > 5;"}
-{"question": "What is the average height of NBA players by age group", "sql": "SELECT AGE, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) AS avg_height FROM nba_roster GROUP BY AGE;"}
-{"question": "What is the average age of NBA players who play as Point Guard or Shooting Guard", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE POS = 'PG' OR POS = 'SG' OR POS = 'PG/SG' OR POS = 'SG/PG';"}
-{"question": "Who are the top 4 highest-paid players in the NBA", "sql": "SELECT POS, NAME, CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) as salary FROM nba_roster ORDER BY salary DESC LIMIT 5 OFFSET 0;"}
-{"question": "Who is the highest-paid player on the Boston Celtics who did not attend college", "sql": "SELECT NAME FROM nba_roster WHERE team='Boston Celtics' AND COLLEGE!='--' AND SALARY=(SELECT MAX(SALARY) FROM nba_roster WHERE team='Boston Celtics' AND COLLEGE!='--');"}
-{"question": "What is the average age of players in the NBA who are taller than 6 feet 7 inches", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 > 6.67;"}
-{"question": "What is the 99th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
-{"question": "What is the 75th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
-{"question": "What is the 25th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
-{"question": "What is the median weight in the NBA?", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average weight in the NBA?", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
-{"question": "What is the median height in the NBA?", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average height in the NBA?", "sql": "select AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster;"}
-{"question": "Can you tell me how many players are in the NBA?", "sql": "select count(*) from nba_roster;"}
-{"question": "Would you please let me know what the highest paid players are for each position?", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "Is Jalen Johnson 23 years old?", "sql": "Select name, age from nba_roster where name='Jalen Johnson';"}
-{"question": "Who is the oldest player on the Brooklyn Nets?", "sql": "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
-{"question": "Who has the higest salary on the Memphis Grizzlies?", "sql": "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "sql": "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest paid center on the Dallas Mavericks?", "sql": "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How much is Marcus Smart getting paid?", "sql": "select salary from nba_roster where name='Marcus Smart';"}
-{"question": "What's the average age of the Trail Blazers?", "sql": "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
-{"question": "What's the median age of the NBA?", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What's the median age of the Miami Heat?", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')/2;"}
-{"question": "What are the 5 teams with the oldest average age in the NBA", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
-{"question": "What is the average salary of Power Forward players in the NBA", "sql": "select avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary from nba_roster where POS = 'PF';"}

+ 0 - 128
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/archive/generated_queries_v2_large_filtered_cleaned.jsonl

@@ -1,128 +0,0 @@
-{"question": "How many players are on each team in the NBA", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster GROUP BY team;"}
-{"question": "Who is the tallest player in the NBA roster", "sql": "SELECT name, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster GROUP BY name ORDER BY height DESC LIMIT 1;"}
-{"question": "What is the average age of NBA players", "sql": "SELECT AVG(AGE) FROM nba_roster;"}
-{"question": "Who is the heaviest player in the NBA", "sql": "SELECT NAME, WT FROM nba_roster WHERE WT!= 'NA' ORDER BY CAST(SUBSTR(WT, 1, INSTR(WT,' ')-1) AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the total salary of all players in the NBA who are at least 6 feet 7 inches tall", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 >= 6.67;"}
-{"question": "Which three teams have the most players from a particular college", "sql": "SELECT team, COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY team, COLLEGE ORDER BY num_players DESC LIMIT 3;"}
-{"question": "What is the total salary for each team in the NBA, excluding teams with missing salary data", "sql": "SELECT team, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS total_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY total_salary DESC;"}
-{"question": "Which team has the most players under the age of 25", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE <= 25 GROUP BY team ORDER BY num_players DESC;"}
-{"question": "What is the average age of players in the NBA who are older than 5 years", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster WHERE AGE * 12 > 60;"}
-{"question": "What team pays its players the most, on average", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "Who is the highest paid center on the Dallas Mavericks who is older than 5 years old", "sql": "SELECT name, salary FROM nba_roster WHERE team='Dallas Mavericks' AND POS='C' AND SALARY!= '--' AND age > 5 ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest-paid Power Forward on the Chicago Bulls", "sql": "SELECT name, salary FROM nba_roster WHERE team='Chicago Bulls' AND POS='PF' AND SALARY!='--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How many players are currently on the Toronto Raptors' roster", "sql": "SELECT COUNT(*) FROM nba_roster WHERE Team = 'Toronto Raptors';"}
-{"question": "How many players in the NBA are over the age of 30", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE AGE > 30;"}
-{"question": "What is the most common position among players 25 or older in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE AGE >= 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the jersey number of the player with the 75th percentile of jersey numbers in the NBA", "sql": "SELECT CAST(Jersey AS INTEGER) as percentile FROM nba_roster ORDER BY percentile LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster) * 0.75;"}
-{"question": "What is the most common position among the Toronto Raptors players", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE team='Toronto Raptors' GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "Which team has the heaviest average weight", "sql": "SELECT team, AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')-1) AS INTEGER) + CAST(SUBSTR(WT, INSTR(WT,' ')+1) AS FLOAT)/16) as average_weight FROM nba_roster WHERE WT!= 'NA' GROUP BY team ORDER BY average_weight DESC LIMIT 1;"}
-{"question": "Who are the top 3 highest-paid Power Forwards in the NBA", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE POS = 'PF' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 3;"}
-{"question": "Which teams have the smallest rosters and what is the average age of their players", "sql": "SELECT team, COUNT(*) AS roster_size, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY roster_size ASC;"}
-{"question": "Which team has the highest average salary for players who attended college", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as salary FROM nba_roster WHERE COLLEGE!= '--' GROUP BY team ORDER BY salary DESC LIMIT 1;"}
-{"question": "Which team has the shortest average height among players 25 years old or younger", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster WHERE AGE <= 25 GROUP BY team ORDER BY height ASC LIMIT 1;"}
-{"question": "Which three teams have the tallest players on average", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster WHERE HT!= 'NA' GROUP BY team ORDER BY average_height DESC LIMIT 3;"}
-{"question": "Who are the top 3 players in the league by salary, excluding those who did not attend college", "sql": "SELECT name, SALARY FROM nba_roster WHERE COLLEGE!= '--' ORDER BY CAST(SUBSTRING(SALARY, 2) AS INTEGER) DESC LIMIT 3;"}
-{"question": "Which five teams have the oldest average age among their players", "sql": "SELECT TEAM, AVG(AGE) as avg_age FROM nba_roster WHERE POS!= '--' GROUP BY TEAM ORDER BY avg_age DESC LIMIT 5;"}
-{"question": "Which three teams in the NBA have the highest average salary among their players", "sql": "SELECT team, AVG(CAST(SUBSTRING(SALARY, 2, LENGTH(SALARY)-2) AS INTEGER)) AS avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC LIMIT 3;"}
-{"question": "Who is the highest-paid player in the NBA who did not attend college", "sql": "SELECT name, SALARY FROM nba_roster WHERE SALARY!= '--' AND COLLEGE = '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average age of players in the Toronto Raptors", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE TEAM = 'Toronto Raptors';"}
-{"question": "What is the player with the highest jersey number that is not 'NA'", "sql": "SELECT MAX(Jersey) as jersey_num, name FROM nba_roster WHERE Jersey!= 'NA' GROUP BY name ORDER BY jersey_num DESC LIMIT 1;"}
-{"question": "Who is the youngest player in the NBA", "sql": "SELECT name FROM nba_roster ORDER BY AGE ASC LIMIT 1;"}
-{"question": "What is the number of players in the NBA who are older than 5 years old", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE AGE > 5;"}
-{"question": "Who is the highest-paid player on the Los Angeles Lakers", "sql": "SELECT name, salary FROM nba_roster WHERE team='Los Angeles Lakers' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which colleges tend to produce the oldest players in the NBA", "sql": "SELECT COLLEGE, AVG(AGE) AS average_age FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY average_age DESC;"}
-{"question": "What percentage of players in the NBA play each position", "sql": "SELECT POS, COUNT(*) as count, ROUND(COUNT(*)*100.0/(SELECT COUNT(*) FROM nba_roster),2) as percentage FROM nba_roster WHERE POS!= '--' GROUP BY POS ORDER BY percentage DESC;"}
-{"question": "What are the top 10 teams with the most players in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team ORDER BY num_players DESC LIMIT 10;"}
-{"question": "What is the average age of players in the NBA who are older than the average age of all players in the league plus 5 years", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE > (SELECT AVG(AGE) FROM nba_roster) + 5;"}
-{"question": "Which team has the most players who are older than the average age of all players in the NBA plus 5 years", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE > (SELECT AVG(AGE) FROM nba_roster) + 5 GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What are the average height and weight for each team in the NBA", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as avg_height, AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) as avg_weight FROM nba_roster GROUP BY team;"}
-{"question": "What are the top 3 teams with the highest average salary", "sql": "SELECT Team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS avg_salary FROM nba_roster GROUP BY Team ORDER BY avg_salary DESC LIMIT 3;"}
-{"question": "What position has the most players in the NBA roster", "sql": "SELECT POS, COUNT(*) AS count, POS FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the age of the 75th percentile of NBA players", "sql": "SELECT age FROM nba_roster WHERE AGE!= '--' ORDER BY age LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster WHERE AGE!= '--')*75/100-1;"}
-{"question": "What is the average age and salary of NBA players, excluding those with unknown salaries", "sql": "SELECT AVG(AGE) AS average_age, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the average age of players in each position group in the NBA", "sql": "SELECT POS, AVG(AGE) AS avg_age FROM nba_roster WHERE POS IN ('PG', 'SG', 'SF', 'PF', 'C') GROUP BY POS;"}
-{"question": "What team has the most players at the point guard position", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE POS='PG' GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the name of the heaviest player on the Los Angeles Lakers", "sql": "SELECT name FROM nba_roster WHERE team='Los Angeles Lakers' AND WT=(SELECT MAX(WT) FROM nba_roster WHERE team='Los Angeles Lakers');"}
-{"question": "Who are the top 5 players in the NBA in terms of salary-to-age ratio", "sql": "SELECT name, CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) as salary, AGE FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)/AGE DESC LIMIT 5;"}
-{"question": "Which NBA teams have the most players who attended college", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!='--' GROUP BY team ORDER BY num_players DESC;"}
-{"question": "What is the highest paid player for each position in the NBA", "sql": "SELECT pos, name, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY pos ORDER BY pos;"}
-{"question": "Which NBA teams have the most players", "sql": "SELECT Team, COUNT(*) as count FROM nba_roster GROUP BY Team ORDER BY count DESC;"}
-{"question": "What is the position with the most players under the age of 25 in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE AGE <= 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What players in the NBA have a height greater than or equal to 6 feet 7 inches", "sql": "SELECT NAME FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 >= 6.67;"}
-{"question": "Who are the top 3 highest-paid players under the age of 25 in the NBA", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE AGE < 25 ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 3;"}
-{"question": "What are the names of all the players on the Toronto Raptors who are 25 years or older", "sql": "SELECT name FROM nba_roster WHERE age >= 25 AND team = 'Toronto Raptors';"}
-{"question": "What is the position with the shortest average height in the NBA", "sql": "SELECT pos, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height, COUNT(*) as count FROM nba_roster GROUP BY pos ORDER BY height ASC LIMIT 1;"}
-{"question": "What is the average age of the players on the Memphis Grizzlies", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE team='Memphis Grizzlies';"}
-{"question": "What are the average ages of the players on each NBA team, listed from youngest to oldest", "sql": "SELECT team, AVG(AGE) as average_age FROM nba_roster GROUP BY team ORDER BY average_age ASC;"}
-{"question": "What is the average age and height for each position in the NBA", "sql": "SELECT POS, AVG(AGE) AS avg_age, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) AS avg_height FROM nba_roster GROUP BY POS;"}
-{"question": "What is the highest-paid player in the NBA", "sql": "SELECT name, SALARY FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(SUBSTR(SALARY, 2) AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which team has the most players 25 or older", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE >= 25 GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the average age of NBA players who are at least 6 feet 7 inches tall", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 >= 6.67;"}
-{"question": "What are all the players in the NBA whose last name is Johnson", "sql": "SELECT * FROM nba_roster WHERE NAME LIKE '%Johnson';"}
-{"question": "What is the average salary for players from each college, and which colleges produce the most highly paid NBA players", "sql": "SELECT COLLEGE, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY avg_salary DESC;"}
-{"question": "Who are the top 3 highest-paid players on the Los Angeles Lakers", "sql": "SELECT name, SALARY FROM nba_roster WHERE team='Los Angeles Lakers' ORDER BY CAST(SUBSTRING(SALARY, 2) AS INTEGER) DESC LIMIT 3;"}
-{"question": "What is the average height of all NBA players who are 25 years old or younger", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as average_height FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What college has the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS COUNT FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY COUNT(*) DESC LIMIT 1;"}
-{"question": "Who are the 25-year-old players on the Toronto Raptors", "sql": "SELECT name FROM nba_roster WHERE team='Toronto Raptors' AND age=25;"}
-{"question": "Who is the highest-paid player in the NBA who attended college", "sql": "SELECT name, SALARY FROM nba_roster WHERE COLLEGE!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What college and position combination has the most players in the NBA", "sql": "SELECT COLLEGE, POS, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE, POS ORDER BY count DESC LIMIT 1;"}
-{"question": "Who is the heaviest player in the NBA roster", "sql": "SELECT name, WT, CAST(SUBSTR(WT, 1, INSTR(WT,' ')-1) AS INTEGER) as weight FROM nba_roster WHERE WT!= 'NA' ORDER BY weight DESC LIMIT 1;"}
-{"question": "What is the average height of players on each team, excluding those under 25 and with unknown heights", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as avg_height FROM nba_roster WHERE HT!= 'NA' AND age > 25 GROUP BY team;"}
-{"question": "What is the average salary of NBA players over the age of 25", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE > 25 AND SALARY!= '--';"}
-{"question": "What are the 5 oldest players in the NBA", "sql": "SELECT NAME, AGE FROM nba_roster WHERE AGE != '--' ORDER BY AGE DESC LIMIT 5;"}
-{"question": "Which team has the most players over the age of 5 in the NBA", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE age > 5 GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "Who is the highest-paid player in the NBA, excluding those under the age of 6 and those with unknown salaries", "sql": "SELECT name, team FROM nba_roster WHERE age > 5 AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average height and weight of players on each NBA team", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as avg_height, AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) AS INTEGER)) as avg_weight FROM nba_roster GROUP BY team;"}
-{"question": "Which positions in the NBA have the most players and which positions have the oldest players on average", "sql": "SELECT POS, COUNT(*) as count, AVG(AGE) as average_age FROM nba_roster GROUP BY POS ORDER BY count DESC;"}
-{"question": "What is the position with the tallest players in the NBA", "sql": "SELECT POS, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as avg_height FROM nba_roster GROUP BY POS ORDER BY avg_height DESC LIMIT 1;"}
-{"question": "What are the top 3 tallest players in the NBA", "sql": "SELECT NAME, HT, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster GROUP BY NAME, HT ORDER BY height DESC LIMIT 3;"}
-{"question": "Who is the highest-paid player on the Toronto Raptors with a jersey number greater than 10", "sql": "SELECT name, salary FROM nba_roster WHERE team='Toronto Raptors' AND CAST(Jersey AS INTEGER) > 10 AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which teams have the most players in their roster", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster GROUP BY team ORDER BY num_players DESC;"}
-{"question": "What is the average salary of all NBA players, excluding those who are not paid or have an unknown position", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' AND POS!= 'NA';"}
-{"question": "Which team has invested the most in young talent, with an average salary for players 5 years or less younger than the average age of all players", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE <= (SELECT AVG(AGE) FROM nba_roster) * 5 GROUP BY team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "Which 5 teams have the most players who have publicly disclosed their college information", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY team ORDER BY num_players DESC LIMIT 5;"}
-{"question": "What is the average age of players by position in the NBA", "sql": "SELECT POS, AVG(AGE) as average_age FROM nba_roster GROUP BY POS ORDER BY average_age ASC;"}
-{"question": "What is the average height of the tallest positions in the NBA", "sql": "SELECT POS, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as avg_height FROM nba_roster GROUP BY POS ORDER BY avg_height DESC;"}
-{"question": "What is the number of players on the Chicago Bulls who are 25 years old or younger", "sql": "SELECT COUNT(*) FROM nba_roster WHERE team='Chicago Bulls' AND AGE <= 25;"}
-{"question": "What are the average heights for each position in the NBA, and which position has the tallest players on average", "sql": "SELECT pos, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as avg_height, COUNT(*) as count FROM nba_roster WHERE HT!= 'NA' GROUP BY pos ORDER BY avg_height DESC;"}
-{"question": "Which team has the oldest average age among its guards and forwards", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster WHERE POS IN ('PG', 'SG', 'SF', 'PF', 'C') GROUP BY team ORDER BY average_age DESC LIMIT 1;"}
-{"question": "Who is the pointguard for the Golden State Warriors?", "sql": "select name from nba_roster where team='Golden State Warriors' and POS='PG';"}
-{"question": "What is the number of players on the Chicago Bulls who are 25 years old or younger", "sql": "SELECT COUNT(*) FROM nba_roster WHERE team='Chicago Bulls' AND AGE <= 25;"}
-{"question": "Who is the highest-paid player on the Los Angeles Lakers", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE team='Los Angeles Lakers' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest paid player in the NBA?", "sql": "SELECT NAME, salary FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What team is LaMelo Ball on?", "sql": "select team from nba_roster where name='LaMelo Ball';"}
-{"question": "How much does Lonzo Ball weigh?", "sql": "select wt from nba_roster where name='Lonzo Ball';"}
-{"question": "What college sent the most players to the current NBA?", "sql": "select college from nba_roster where college != '--'  group by college order by count(*) desc limit 1;"}
-{"question": "How old is Lebron James?", "sql": "select age from nba_roster where name='LeBron James';"}
-{"question": "What is the most popular jersey number in the current NBA?", "sql": "select Jersey from nba_roster where Jersey != 'NA' group by Jersey order by count(*) desc limit 1;"}
-{"question": "Can you give me a list of all the players without college data?", "sql": "SELECT name FROM nba_roster WHERE COLLEGE IS NULL OR COLLEGE = '--';"}
-{"question": "What team has the smallest roster?", "sql": "select team from nba_roster group by team order by count(*) asc limit 1;"}
-{"question": "What team has the largest roster?", "sql": "select team, count(*) from nba_roster group by team order by count(*) desc limit 1;"}
-{"question": "What team is paying its players the most in total?", "sql": "select team, sum(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) from nba_roster group by team order by sum(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) desc limit 1;"}
-{"question": "Which team is paying its players the least?", "sql": "select team from nba_roster group by team order by sum(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) asc limit 1;"}
-{"question": "Which team is on average the tallest?", "sql": "select team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster group by team order by height desc limit 1;"}
-{"question": "Which team is on average the shortest?", "sql": "select team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster group by team order by height asc limit 1;"}
-{"question": "Who are the tallest 5 centers in the league?", "sql": "SELECT name, HT FROM nba_roster WHERE POS = 'C' ORDER BY HT DESC LIMIT 5;"}
-{"question": "Who are the top 5 highest paid power forwards in the league?", "sql": "SELECT NAME, salary FROM nba_roster WHERE POS = 'PF' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 5;"}
-{"question": "What is the median salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*50/100-1;"}
-{"question": "What is the average salary in the NBA?", "sql": "SELECT avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the 99th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
-{"question": "What is the 75th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
-{"question": "What is the 25th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
-{"question": "What is the median weight in the NBA?", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)*50/100-1;"}
-{"question": "What is the average weight in the NBA?", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
-{"question": "What is the median height in the NBA?", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)*50/100-1;"}
-{"question": "What is the average height in the NBA?", "sql": "select AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster;"}
-{"question": "Can you tell me how many players are in the NBA?", "sql": "select count(*) from nba_roster;"}
-{"question": "Would you please let me know what the highest paid players are for each position?", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "Is Jalen Johnson 23 years old?", "sql" : "Select name, age from nba_roster where name='Jalen Johnson';"}
-{"question": "Who is the oldest player on the Brooklyn Nets?", "sql" : "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
-{"question": "Who has the higest salary on the Memphis Grizzlies?", "sql" : "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "sql" : "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest paid center on the Dallas Mavericks?", "sql" : "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How much is Marcus Smart getting paid?", "sql" : "select salary from nba_roster where name='Marcus Smart';"}
-{"question": "What's the average age of the Trail Blazers?", "sql" : "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
-{"question": "What's the median age of the NBA?", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)*50/100-1;"}
-{"question": "What's the median age of the Miami Heat?", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')*50/100-1;"}
-{"question": "What are the 5 teams with the oldest average age in the NBA", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
-{"question": "What is the average salary of Power Forward players in the NBA", "sql": "select avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary from nba_roster where POS = 'PF';"}

+ 0 - 159
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries.jsonl

@@ -1,159 +0,0 @@
-{"question": "What is the average height of NBA players", "sql": "SELECT AVG(CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) + CAST(SUBSTRING(HT, INSTR(HT,'')+1) AS INTEGER)/12) as average_height FROM nba_roster WHERE HT!= 'NA';"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster"}
-{"question": "What are the oldest players on each team with a roster size of 6 or more", "sql": "SELECT NAME FROM nba_roster WHERE AGE IN (SELECT MAX(AGE) FROM nba_roster WHERE TEAM IN (SELECT TEAM FROM nba_roster GROUP BY TEAM HAVING COUNT(*) > 5))"}
-{"question": "What is the average height of the players on the Toronto Raptors", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "What is the highest-paid Toronto Raptors player who attended college", "sql": "SELECT name, salary FROM nba_roster WHERE team='Toronto Raptors' AND COLLEGE!='--' AND SALARY!='--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1"}
-{"question": "What is the most common height among NBA players", "sql": "SELECT HT, COUNT(*) as count FROM nba_roster WHERE HT IS NOT NULL GROUP BY HT ORDER BY count DESC LIMIT 1"}
-{"question": "What is the most represented college in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE IS NOT NULL GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster"}
-{"question": "What is the average height of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) AS average_height FROM nba_roster"}
-{"question": "What is the average age of the players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE AGE IS NOT NULL"}
-{"question": "What is the position with the most players in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE SALARY!= '--' GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average height of players on each NBA team, excluding players with unknown heights", "sql": "SELECT TEAM, AVG(CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER)) as avg_height FROM nba_roster WHERE HT!= 'NA' GROUP BY TEAM ORDER BY avg_height DESC"}
-{"question": "What are the 5 most common heights among NBA players", "sql": "SELECT HT, COUNT(*) AS count FROM nba_roster GROUP BY HT ORDER BY count DESC LIMIT 5"}
-{"question": "What are the top 5 colleges with the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 5"}
-{"question": "What is the average age of the players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE AGE IS NOT NULL"}
-{"question": "Which players in the NBA have attended the most colleges", "sql": "SELECT NAME, COLLEGE, COUNT(*) as num_colleges FROM nba_roster WHERE COLLEGE!= '--' GROUP BY NAME, COLLEGE ORDER BY num_colleges DESC;"}
-{"question": "What is the average age of the players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster"}
-{"question": "Who are the top 5 highest-paid players in the NBA", "sql": "SELECT * FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 5"}
-{"question": "What is the average height of players on each NBA team", "sql": "SELECT team, AVG(CAST(SUBSTRING(HT, 1, INSTR(HT,'')-1) AS INTEGER) + CAST(SUBSTRING(HT, INSTR(HT,'')+1) AS INTEGER) / 12.0) as avg_height FROM nba_roster WHERE HT!= 'NA' GROUP BY team"}
-{"question": "Who are the top 3 highest-paid players in the NBA", "sql": "SELECT name, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY name ORDER BY total_salary DESC LIMIT 3"}
-{"question": "Which team has the most players in the NBA", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster GROUP BY team ORDER BY num_players DESC LIMIT 1"}
-{"question": "What is the total salary of all players in the NBA who are 6'8", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 68;"}
-{"question": "What is the average age of players on each team in the NBA", "sql": "SELECT team, AVG(AGE) as avg_age FROM nba_roster WHERE SALARY!= '--' GROUP BY team"}
-{"question": "How many players in the NBA have a non-null salary and college information, and play one of the five main positions", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE POS IN ('PG', 'SG', 'SF', 'PF', 'C') AND SALARY!= '--' AND COLLEGE!= '--'"}
-{"question": "What is the most common position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average height of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster;"}
-{"question": "What is the average salary of NBA players who are at least 5 years old", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE > 5"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster"}
-{"question": "What is the most common age range among NBA players", "sql": "SELECT AGE, COUNT(*) AS count FROM nba_roster GROUP BY AGE ORDER BY count DESC LIMIT 1"}
-{"question": "Which team has the most players in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team ORDER BY num_players DESC LIMIT 1"}
-{"question": "What is the average salary of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "How many players in the NBA are 68 inches tall", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 68;"}
-{"question": "What is the average salary of Power Forwards in the NBA who are at least 25 years old", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) AS average_salary FROM nba_roster WHERE AGE >= 25 AND POS = 'PF';"}
-{"question": "What is the average age of 6-foot Power Forwards in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) = 6 AND POS='PF';"}
-{"question": "What is the heaviest Power Forward in the NBA", "sql": "SELECT NAME, AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) AS INTEGER)) AS avg_weight FROM nba_roster WHERE POS='PF' GROUP BY NAME ORDER BY avg_weight DESC LIMIT 1"}
-{"question": "What is the number of players on each team in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team"}
-{"question": "What is the average height of NBA players who are 25 years old or older", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster WHERE age >= 25"}
-{"question": "What are the top 3 teams with the highest average salaries in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC LIMIT 3"}
-{"question": "What is the most common position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "What are the names of the players in the NBA who are exactly 6 feet 8 inches tall", "sql": "SELECT NAME, HT FROM nba_roster WHERE CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) = 68 ORDER BY HT ASC;"}
-{"question": "What is the college with the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster"}
-{"question": "What is the most represented college in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS frequency FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1"}
-{"question": "What is the average age of the players in the NBA", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE IS NOT NULL"}
-{"question": "What is the average height of NBA players who have a recorded height", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster WHERE HT IS NOT NULL"}
-{"question": "What is the average salary of NBA players who are 25 years or older", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$') - 1) as INTEGER)) FROM nba_roster WHERE CAST(AGE as INTEGER) >= 25"}
-{"question": "What is the most represented college in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "What is the number of players on each team in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team"}
-{"question": "What is the average salary for each position in the NBA, excluding players with unknown salaries", "sql": "SELECT POS, AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$') - 1) as INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS"}
-{"question": "What is the most common position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average age of players on each team in the NBA", "sql": "SELECT team, AVG(AGE) as avg_age FROM nba_roster WHERE SALARY!= '--' GROUP BY team"}
-{"question": "What are the top 3 positions with the highest total salary expenditure in the NBA", "sql": "SELECT pos, name, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY pos ORDER BY total_salary DESC LIMIT 3"}
-{"question": "Which colleges have the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC;"}
-{"question": "What is the average salary for each team in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team"}
-{"question": "What is the age range of players on each team in the NBA", "sql": "SELECT team, MIN(AGE) as youngest_player, MAX(AGE) as oldest_player FROM nba_roster WHERE AGE IS NOT NULL GROUP BY team"}
-{"question": "Which team has the most players who are 6'8", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 68 GROUP BY team ORDER BY num_players DESC LIMIT 1"}
-{"question": "How many players in the NBA are over the age of 25", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE > 25"}
-{"question": "What is the average height of NBA players under the age of 25", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster WHERE AGE <= 25"}
-{"question": "What is the total salary of all players in the NBA who are more than 5 years older than the average age of all players", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE (AGE - (SELECT AVG(AGE) FROM nba_roster)) > 5"}
-{"question": "What is the median weight in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "What are the top 5 teams with the oldest average age of players", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5"}
-{"question": "What is the average height of NBA players", "sql": "SELECT AVG(CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER)) AS average_height FROM nba_roster WHERE HT!= 'NA';"}
-{"question": "What is the average salary of the Los Angeles Lakers players", "sql": "SELECT AVG(CAST(SALARY AS INTEGER) ) AS average_salary FROM nba_roster WHERE team='Los Angeles Lakers';"}
-{"question": "What is the college that has produced the most players currently playing for the Boston Celtics", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster WHERE team='Boston Celtics' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "What college has the most players in the NBA who are 30 years old or older", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster WHERE AGE >= 30 GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "How many players in the NBA are at least 5 years older than the youngest player in the league", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT MIN(AGE) FROM nba_roster) > 5"}
-{"question": "What are the 5 colleges that have produced the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC LIMIT 5"}
-{"question": "What are the most common positions in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE POS!= '--' GROUP BY POS ORDER BY count DESC"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE IS NOT NULL"}
-{"question": "What are the teams with the highest average salaries in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC"}
-{"question": "What is the average height of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster"}
-{"question": "What is the average salary of all NBA players", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster"}
-{"question": "What is the average age of the players on the Toronto Raptors", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "Which three teams have the most players from a single college", "sql": "SELECT team, COLLEGE, COUNT(*) AS num_players FROM nba_roster GROUP BY team, COLLEGE ORDER BY num_players DESC LIMIT 3"}
-{"question": "What is the average salary of NBA players 25 years or older", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) FROM nba_roster WHERE AGE >= 25"}
-{"question": "What is the total salary of all NBA players", "sql": "SELECT SUM(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)*1000000) FROM nba_roster"}
-{"question": "What are the most common positions in the NBA", "sql": "SELECT POS, COUNT(*) AS num_players FROM nba_roster GROUP BY POS;"}
-{"question": "What is the average salary for each age group in the NBA", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary, AGE as age_group FROM nba_roster WHERE SALARY!= '--' GROUP BY AGE ORDER BY age_group"}
-{"question": "What are the top 5 colleges that have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 5"}
-{"question": "What is the most common position for players under the age of 25 in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE AGE <= 25 GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "How many players in the NBA are 5 years or younger than the oldest player in the league", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE + 5 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "What are the top 5 colleges that have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 5"}
-{"question": "What are the most common positions in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster"}
-{"question": "What are the most common heights in the NBA", "sql": "SELECT HT, COUNT(*) AS frequency FROM nba_roster GROUP BY HT ORDER BY frequency DESC LIMIT 5"}
-{"question": "What are the most common positions in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC"}
-{"question": "What is the average salary for each team in the NBA, excluding teams with unknown salaries", "sql": "SELECT TEAM, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY TEAM ORDER BY average_salary DESC"}
-{"question": "What is the college that has produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "Who is the highest paid player in the NBA", "sql": "SELECT name, salary FROM nba_roster WHERE salary!= '--' ORDER BY CAST(REPLACE(REPLACE(salary, '$', ''), ',', '') AS INTEGER) DESC LIMIT 1"}
-{"question": "What is the average age of players who are 6'8", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) = 68"}
-{"question": "What is the average age of the players in the NBA who are more than 5 years older than the average age of all players", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE AGE + (SELECT AVG(AGE) FROM nba_roster) > 5*12"}
-{"question": "What is the average age of the players in the NBA who are older than 5 years old", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE AGE > 5*12"}
-{"question": "What are the top colleges that produce the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC;"}
-{"question": "How many players in the NBA are 6'8", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 68;"}
-{"question": "What is the average salary for each team in the NBA", "sql": "SELECT Team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster GROUP BY Team"}
-{"question": "What are the top colleges represented in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC;"}
-{"question": "What is the most represented college in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "What are the 5 teams with the highest average salary in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC"}
-{"question": "What is the average age of players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster"}
-{"question": "What is the most common height in the NBA", "sql": "SELECT SUBSTR(HT, 1, INSTR(HT,'')-1) as height, COUNT(*) as count FROM nba_roster GROUP BY SUBSTR(HT, 1, INSTR(HT,'')-1) ORDER BY count DESC LIMIT 1"}
-{"question": "What is the position with the most players in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "What is the 75th percentile salary in the NBA", "sql": "SELECT HT, AVG(WT) as avg_weight FROM nba_roster WHERE HT IS NOT NULL AND WT IS NOT NULL GROUP BY HT ORDER BY avg_weight DESC LIMIT 1"}
-{"question": "Which college has produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average salary of NBA players who are older than 25 years old", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE > 25"}
-{"question": "What is the average age of the players on the Toronto Raptors", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE TEAM = 'Toronto Raptors';"}
-{"question": "What is the average height of the players on the Los Angeles Lakers", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,'')+1) AS FLOAT)/12) AS height FROM nba_roster WHERE TEAM = 'Los Angeles Lakers';"}
-{"question": "What is the position with the most players in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average age of all players in the NBA who are older than 5 years old", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE > 5"}
-{"question": "How many players on each team have a height of 6'8", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE CAST(SUBSTRING(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 68 GROUP BY team"}
-{"question": "What is the 99th percentile salary in the NBA?", "answer": "46741590", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
-{"question": "What is the 75th percentile salary in the NBA?", "answer": "13932008", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
-{"question": "What is the 25th percentile salary in the NBA?", "answer": "2413304", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
-{"question": "What is the median weight in the NBA?", "answer": "215", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average weight in the NBA?", "answer": "214.98", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
-{"question": "What is the median height in the NBA?", "answer": "6.58333333333333", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average height in the NBA?", "answer": "6.54986111111111", "sql": "select AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster;"}
-{"question": "Can you tell me how many players are in the NBA?", "answer": "600", "sql": "select count(*) from nba_roster;"}
-{"question": "Would you please let me know what the highest paid players are for each position?", "answer": "The highest paid players are Nikola Jokic (C), Paul George (F), Norman Powell (G), Kevin Durant (PF), Stephen Curry (PG), LeBron James (SF), Bradley Beal (SG).", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "Is Jalen Johnson 23 years old?", "answer": "No, Jalen Johnson is 21 years old", "sql": "Select name, age from nba_roster where name='Jalen Johnson';"}
-{"question": "Who is the oldest player on the Brooklyn Nets?", "answer": "Spencer Dinwiddie, Dorian Finney-Smith, Royce O'Neale", "sql": "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
-{"question": "Who has the higest salary on the Memphis Grizzlies?", "answer": "Ja Morant", "sql": "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "answer": "Darius Garland", "sql": "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest paid center on the Dallas Mavericks?", "answer": "Dereck Lively II", "sql": "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How much is Marcus Smart getting paid?", "answer": "$18,833,712", "sql": "select salary from nba_roster where name='Marcus Smart';"}
-{"question": "What's the average age of the Trail Blazers?", "answer": "24", "sql": "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
-{"question": "What's the median age of the NBA?", "answer": "25", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What's the median age of the Miami Heat?", "answer": "26", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')/2;"}
-{"question": "What are the 5 teams with the oldest average age in the NBA", "answer": "Golden State Warriors, Milwaukee Bucks, Miami Heat, LA Clippers, Phoenix Suns", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
-{"question": "What is the average salary of Power Forward players in the NBA", "answer": "$10948045", "sql": "select avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary from nba_roster where POS = 'PF';"}
-{"question": "What is the most common position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1"}
-{"question": "What is the average height of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster;"}
-{"question": "What is the average salary of NBA players who are at least 5 years old", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE > 5"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster"}
-{"question": "What is the most common age range among NBA players", "sql": "SELECT AGE, COUNT(*) AS count FROM nba_roster GROUP BY AGE ORDER BY count DESC LIMIT 1"}
-{"question": "What is the median weight in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1"}
-{"question": "How many players in the NBA are at least 5 years older than the youngest player in the league", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT MIN(AGE) FROM nba_roster) > 5"}
-{"question": "What are the 5 colleges that have produced the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC LIMIT 5"}
-{"question": "What are the most common positions in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE POS!= '--' GROUP BY POS ORDER BY count DESC"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE IS NOT NULL"}
-{"question": "What is the 99th percentile salary in the NBA?", "answer": "46741590", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
-{"question": "What is the 75th percentile salary in the NBA?", "answer": "13932008", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
-{"question": "What is the 25th percentile salary in the NBA?", "answer": "2413304", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
-{"question": "What is the median weight in the NBA?", "answer": "215", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average weight in the NBA?", "answer": "214.98", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
-{"question": "What is the median height in the NBA?", "answer": "6.58333333333333", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average height in the NBA?", "answer": "6.54986111111111", "sql": "select AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height from nba_roster;"}
-{"question": "Can you tell me how many players are in the NBA?", "answer": "600", "sql": "select count(*) from nba_roster;"}
-{"question": "Would you please let me know what the highest paid players are for each position?", "answer": "The highest paid players are Nikola Jokic (C), Paul George (F), Norman Powell (G), Kevin Durant (PF), Stephen Curry (PG), LeBron James (SF), Bradley Beal (SG).", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "Is Jalen Johnson 23 years old?", "answer": "No, Jalen Johnson is 21 years old", "sql": "Select name, age from nba_roster where name='Jalen Johnson';"}
-{"question": "Who is the oldest player on the Brooklyn Nets?", "answer": "Spencer Dinwiddie, Dorian Finney-Smith, Royce O'Neale", "sql": "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
-{"question": "Who has the higest salary on the Memphis Grizzlies?", "answer": "Ja Morant", "sql": "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "answer": "Darius Garland", "sql": "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest paid center on the Dallas Mavericks?", "answer": "Dereck Lively II", "sql": "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How much is Marcus Smart getting paid?", "answer": "$18,833,712", "sql": "select salary from nba_roster where name='Marcus Smart';"}
-{"question": "What's the average age of the Trail Blazers?", "answer": "24", "sql": "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
-{"question": "What's the median age of the NBA?", "answer": "25", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What's the median age of the Miami Heat?", "answer": "26", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')/2;"}
-{"question": "What are the 5 teams with the oldest average age in the NBA", "answer": "Golden State Warriors, Milwaukee Bucks, Miami Heat, LA Clippers, Phoenix Suns", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
-{"question": "What is the average salary of Power Forward players in the NBA", "answer": "$10948045", "sql": "select avg(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary from nba_roster where POS = 'PF';"}

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 0 - 1149
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries_large.jsonl


+ 0 - 330
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries_large_filtered.jsonl

@@ -1,330 +0,0 @@
-{"question": "What college has the most players in the NBA who are 30 years old or older", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster WHERE AGE >= 30 GROUP BY COLLEGE ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the total salary of all NBA players", "sql": "SELECT SUM(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)*1000000) FROM nba_roster;"}
-{"question": "What are the most common positions in the NBA", "sql": "SELECT POS, COUNT(*) AS num_players FROM nba_roster GROUP BY POS;"}
-{"question": "What is the average salary for each age group in the NBA", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary, AGE as age_group FROM nba_roster WHERE SALARY!= '--' GROUP BY AGE ORDER BY age_group;"}
-{"question": "What are the top 5 colleges that have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 5;"}
-{"question": "How many players in the NBA attended college", "sql": "SELECT COUNT(*) AS num_college_players FROM nba_roster WHERE COLLEGE!= '--';"}
-{"question": "What are the top 3 colleges with the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 3;"}
-{"question": "What is the average age of all players in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster;"}
-{"question": "What is the most represented college in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1;"}
-{"question": "Which college has produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster GROUP BY COLLEGE ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average height of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) AS average_height FROM nba_roster;"}
-{"question": "What is the average age of players on each team in the NBA", "sql": "SELECT team, AVG(AGE) as avg_age FROM nba_roster WHERE SALARY!= '--' GROUP BY team;"}
-{"question": "What are the top 3 positions with the highest total salary expenditure in the NBA", "sql": "SELECT pos, name, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY pos ORDER BY total_salary DESC LIMIT 3;"}
-{"question": "Which colleges have the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC;"}
-{"question": "What is the average salary for each team in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team;"}
-{"question": "What are the teams with the highest average salaries in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC;"}
-{"question": "What are the 5 colleges that have produced the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC LIMIT 5;"}
-{"question": "What is the most common position in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average salary of Power Forwards in the NBA who are at least 25 years old", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) AS average_salary FROM nba_roster WHERE AGE >= 25 AND POS = 'PF';"}
-{"question": "What is the average age of 6-foot Power Forwards in the NBA", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) = 6 AND POS='PF';"}
-{"question": "What is the name of the player with the highest average weight among Power Forwards in the NBA", "sql": "SELECT NAME, AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) AS INTEGER)) AS avg_weight FROM nba_roster WHERE POS='PF' GROUP BY NAME ORDER BY avg_weight DESC LIMIT 1;"}
-{"question": "Which team has the most players in the NBA", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the average salary of all NBA players", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster;"}
-{"question": "What is the average age of the players on the Toronto Raptors", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "Which three teams have the most players from a single college", "sql": "SELECT team, COLLEGE, COUNT(*) AS num_players FROM nba_roster GROUP BY team, COLLEGE ORDER BY num_players DESC LIMIT 3;"}
-{"question": "How many players in the NBA are at least 5 years older than the youngest player in the league", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT MIN(AGE) FROM nba_roster) > 5;"}
-{"question": "What is the average salary of NBA players who are 25 years or older", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$') - 1) as INTEGER)) FROM nba_roster WHERE CAST(AGE as INTEGER) >= 25;"}
-{"question": "What is the number of players on each team in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team;"}
-{"question": "What is the average salary for each position in the NBA, excluding players with unknown salaries", "sql": "SELECT POS, AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$') - 1) as INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "What are the oldest players on each team with a roster size of 6 or more", "sql": "SELECT NAME FROM nba_roster WHERE AGE IN (SELECT MAX(AGE) FROM nba_roster WHERE TEAM IN (SELECT TEAM FROM nba_roster GROUP BY TEAM HAVING COUNT(*) > 5));"}
-{"question": "What is the average height of the players on the Toronto Raptors", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "What is the highest-paid Toronto Raptors player who attended college", "sql": "SELECT name, salary FROM nba_roster WHERE team='Toronto Raptors' AND COLLEGE!='--' AND SALARY!='--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the median weight in the NBA", "sql": "SELECT NAME, COLLEGE, COUNT(*) as num_colleges FROM nba_roster WHERE COLLEGE!= '--' GROUP BY NAME, COLLEGE ORDER BY num_colleges DESC;"}
-{"question": "Who are the top 5 highest-paid players in the NBA", "sql": "SELECT * FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 5;"}
-{"question": "What is the average height of players on each NBA team", "sql": "SELECT team, AVG(CAST(SUBSTRING(HT, 1, INSTR(HT,'')-1) AS INTEGER) + CAST(SUBSTRING(HT, INSTR(HT,'')+1) AS INTEGER) / 12.0) as avg_height FROM nba_roster WHERE HT!= 'NA' GROUP BY team;"}
-{"question": "Who are the top 3 highest-paid players in the NBA", "sql": "SELECT name, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY name ORDER BY total_salary DESC LIMIT 3;"}
-{"question": "What is the average salary of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "How many players in the NBA are 68 inches tall", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 68;"}
-{"question": "What are the top 5 teams with the oldest average age of players", "sql": "SELECT team, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 5;"}
-{"question": "What is the average salary of the Los Angeles Lakers players", "sql": "SELECT AVG(CAST(SALARY AS INTEGER) ) AS average_salary FROM nba_roster WHERE team='Los Angeles Lakers';"}
-{"question": "What is the college that has produced the most players currently playing for the Boston Celtics", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster WHERE team='Boston Celtics' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the most common position for players under the age of 25 in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE AGE <= 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average height of players on each NBA team, excluding players with unknown heights", "sql": "SELECT TEAM, AVG(CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER)) as avg_height FROM nba_roster WHERE HT!= 'NA' GROUP BY TEAM ORDER BY avg_height DESC;"}
-{"question": "What are the 5 most common heights among NBA players", "sql": "SELECT HT, COUNT(*) AS count FROM nba_roster GROUP BY HT ORDER BY count DESC LIMIT 5;"}
-{"question": "What are the top 5 colleges with the most players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 5;"}
-{"question": "What is the average height of NBA players who are 25 years or older", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster WHERE age >= 25;"}
-{"question": "What are the top 3 teams with the highest average salaries in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC LIMIT 3;"}
-{"question": "What is the position with the most players in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE SALARY!= '--' GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average salary of NBA players who are at least 5 years old", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE > 5;"}
-{"question": "What is the most common age range among NBA players", "sql": "SELECT AGE, COUNT(*) AS count FROM nba_roster GROUP BY AGE ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average salary of NBA players who are 25 years old or older", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE AGE > 25;"}
-{"question": "What is the average age of the players in the NBA who are more than 5 years older than the average age of all players", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE AGE + (SELECT AVG(AGE) FROM nba_roster) > 5*12;"}
-{"question": "What is the average age of the players in the NBA who are older than 5 years old", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE AGE > 5*12;"}
-{"question": "What colleges have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC;"}
-{"question": "Who is the highest paid player in the NBA", "sql": "SELECT name, salary FROM nba_roster WHERE salary!= '--' ORDER BY CAST(REPLACE(REPLACE(salary, '$', ''), ',', '') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How many players in the NBA are 5 years or younger than the oldest player in the league", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE + 5 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "What are the 5 teams with the highest average salary in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC;"}
-{"question": "What is the average salary for each team in the NBA, excluding teams with unknown salaries", "sql": "SELECT TEAM, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY TEAM ORDER BY average_salary DESC;"}
-{"question": "How many players in the NBA are 10 years old or older", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE age + (JULIANDAY('now') - JULIANDAY(DATE('now', '-10 year'))) / 365.25 >= 10;"}
-{"question": "How many players on the Toronto Raptors are 6 feet 8 inches tall", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE team='Toronto Raptors' AND CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) = '6' || '8';"}
-{"question": "How many players in the NBA are over the age of 25", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE > 25;"}
-{"question": "What is the average height of NBA players under the age of 25", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What is the total salary of all players in the NBA who are more than 5 years older than the average age of all players", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE (AGE - (SELECT AVG(AGE) FROM nba_roster)) > 5;"}
-{"question": "What is the most common height in the NBA", "sql": "SELECT SUBSTR(HT, 1, INSTR(HT,'')-1) as height, COUNT(*) as count FROM nba_roster GROUP BY SUBSTR(HT, 1, INSTR(HT,'')-1) ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average salary of NBA players 25 years or older", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) FROM nba_roster WHERE AGE >= 25;"}
-{"question": "What are the 5 most common heights in the NBA", "sql": "SELECT HT, COUNT(*) AS frequency FROM nba_roster GROUP BY HT ORDER BY frequency DESC LIMIT 5;"}
-{"question": "What is the average height of the players on the Los Angeles Lakers", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,'')+1) AS FLOAT)/12) AS height FROM nba_roster WHERE TEAM = 'Los Angeles Lakers';"}
-{"question": "What is the average age of all players in the NBA who are older than 5 years old", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE > 5;"}
-{"question": "What is the most popular college attended by NBA players", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average height for each position in the NBA", "sql": "SELECT POS, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) as INTEGER)) AS average_height FROM nba_roster GROUP BY POS ORDER BY average_height;"}
-{"question": "What are the jersey numbers of the first 5 players in the NBA roster", "sql": "SELECT NAME, JERSEY FROM nba_roster ORDER BY JERSEY LIMIT 5;"}
-{"question": "What is the age range of the players in the NBA", "sql": "SELECT MIN(AGE) as youngest_player, MAX(AGE) as oldest_player FROM nba_roster;"}
-{"question": "What is the total salary for each team in the NBA", "sql": "SELECT team, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team;"}
-{"question": "What are the top 5 teams in the NBA with the highest average salary", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC LIMIT 5;"}
-{"question": "What are the top 5 highest-paid players in the NBA", "sql": "SELECT NAME, CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) as total_salary FROM nba_roster WHERE SALARY!= '--' ORDER BY total_salary DESC LIMIT 5;"}
-{"question": "What is the 99th percentile salary in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC LIMIT 1;"}
-{"question": "How many players are on the Toronto Raptors", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE TEAM = 'Toronto Raptors';"}
-{"question": "What are the 5 highest-paid players in the NBA", "sql": "SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY SALARY DESC) AS row_num FROM nba_roster) AS temp_table WHERE row_num <= 5;"}
-{"question": "Which players have had the most varied careers in the NBA, having played for the most different teams", "sql": "SELECT name, COUNT(DISTINCT team) as num_teams FROM nba_roster WHERE team!= 'NA' GROUP BY name ORDER BY num_teams DESC LIMIT 10;"}
-{"question": "Which three teams have the most players under the age of 25", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster WHERE AGE < 25 GROUP BY Team ORDER BY num_players DESC LIMIT 3;"}
-{"question": "What are the colleges with the highest average salaries in the NBA", "sql": "SELECT college, COUNT(*) as count, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY college ORDER BY avg_salary DESC;"}
-{"question": "What is the name and jersey number of the player with the highest jersey number in the NBA", "sql": "SELECT NAME, JERSEY FROM nba_roster WHERE JERSEY!= 'NA' ORDER BY CAST(JERSEY AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average age of NBA players", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster;"}
-{"question": "What are the top 3 teams with the oldest average age in the NBA", "sql": "SELECT TEAM, AVG(AGE) as average_age FROM nba_roster WHERE SALARY!= '--' GROUP BY TEAM ORDER BY average_age DESC LIMIT 3;"}
-{"question": "Which colleges have multiple players in the NBA", "sql": "SELECT COUNT(*) AS college_players, COLLEGE FROM nba_roster GROUP BY COLLEGE HAVING COUNT(*) > 1;"}
-{"question": "What is the average age of players on each NBA team", "sql": "SELECT team, AVG(CAST(AGE as INTEGER)) as avg_age FROM nba_roster GROUP BY team;"}
-{"question": "What is the average salary of Power Forward players in the NBA", "sql": "SELECT age, COUNT(*) as count FROM nba_roster GROUP BY age ORDER BY count DESC;"}
-{"question": "What is the team with the highest average salary for players over 25 years old", "sql": "SELECT team, AVG(CAST(SUBSTRING(SALARY, 2, LENGTH(SALARY)-2) AS INTEGER)) AS average_salary FROM nba_roster WHERE AGE > 25 AND SALARY!= '--' GROUP BY team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "What is the age range of players in the NBA", "sql": "SELECT MIN(AGE) as youngest, MAX(AGE) as oldest FROM nba_roster;"}
-{"question": "What is the most successful college in terms of producing NBA players", "sql": "SELECT COLLEGE, COUNT(*) as frequency FROM nba_roster GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1;"}
-{"question": "What is the average salary of the Boston Celtics players", "sql": "SELECT AVG(CAST(SALARY AS INTEGER) ) AS average_salary FROM nba_roster WHERE team='Boston Celtics';"}
-{"question": "Which colleges have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC;"}
-{"question": "Who is the highest-paid player in the NBA", "sql": "SELECT NAME FROM nba_roster WHERE SALARY = (SELECT MAX(SALARY) FROM nba_roster);"}
-{"question": "Which 5 players have the highest jersey numbers in the NBA", "sql": "SELECT name, jersey FROM nba_roster WHERE jersey!= 'NA' ORDER BY CAST(REPLACE(REPLACE(jersey, '0', ''), 'NA', '') AS INTEGER) DESC LIMIT 5;"}
-{"question": "What are the names of the players who are older than 30 years old in the NBA", "sql": "SELECT name, age FROM nba_roster WHERE age > 30 ORDER BY age;"}
-{"question": "How many players in the NBA are younger than the oldest player in the league by 25 years", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE + 25 > (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "Which 10 players have played for the most teams in their NBA careers", "sql": "SELECT name, COUNT(DISTINCT team) AS num_teams FROM nba_roster GROUP BY name ORDER BY num_teams DESC LIMIT 10;"}
-{"question": "What is the average height for each height range in the NBA", "sql": "SELECT HT, COUNT(*) as count, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER)) as avg_height FROM nba_roster WHERE HT!= 'NA' GROUP BY HT;"}
-{"question": "How many players in the NBA are 6 feet 8 inches tall", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) = 68;"}
-{"question": "What percentage of players in the NBA are 10 years or less away from the oldest player in the league", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE + 10 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "What is the college that has produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) AS count FROM nba_roster GROUP BY COLLEGE ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the average salary of the youngest players on each NBA team", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE AGE <= 22 GROUP BY team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "What is the average age of players in the NBA who have a publicly disclosed salary", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the average salary for each position in the NBA", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary, POS FROM nba_roster WHERE SALARY!= '--' GROUP BY POS ORDER BY average_salary DESC;"}
-{"question": "What is the average age of players in the NBA who are at least 60 years old", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE > 5*12;"}
-{"question": "Who are the 10 tallest players in the NBA", "sql": "SELECT HT, NAME FROM nba_roster ORDER BY CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) DESC LIMIT 10;"}
-{"question": "Which NBA team has the most players under the age of 25", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE <= 25 GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the average age of players from each college, excluding those who did not attend college, listed in order from oldest to youngest", "sql": "SELECT COLLEGE, AVG(AGE) as average_age FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY average_age DESC;"}
-{"question": "What is the average salary for each position in the NBA, with the highest-paid positions listed first", "sql": "SELECT POS, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS ORDER BY average_salary DESC;"}
-{"question": "What is the average height of NBA players 25 years old or older", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster WHERE AGE >= 25;"}
-{"question": "What are the top 10 colleges with the most players in the NBA", "sql": "SELECT college, COUNT(*) as num_players FROM nba_roster WHERE college!= '--' GROUP BY college ORDER BY num_players DESC LIMIT 10;"}
-{"question": "What is the average height of all players in the NBA", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as average_height FROM nba_roster;"}
-{"question": "What are the top 5 colleges that produce the highest-paid NBA players", "sql": "SELECT COLLEGE, AVG(CAST(SUBSTR(SALARY, 2, LENGTH(SALARY)-2) AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY COLLEGE ORDER BY average_salary DESC LIMIT 5;"}
-{"question": "Which teams have the most players under 6'8", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE HT!= 'NA' AND CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) < 68 GROUP BY team;"}
-{"question": "What is the number of players in the NBA who are 25 years old or younger", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What is the team with the highest average salary in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "What are the average heights for each position in the NBA, from tallest to shortest", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as average_height, POS FROM nba_roster GROUP BY POS ORDER BY average_height DESC;"}
-{"question": "How many players in the NBA are over the age of 30", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE > 30;"}
-{"question": "Who is the tallest player in the NBA", "sql": "SELECT NAME, HT FROM nba_roster ORDER BY LENGTH(HT) DESC LIMIT 1;"}
-{"question": "What are the top 3 teams in the NBA with the highest average salary", "sql": "SELECT team, AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$') - 1) AS INTEGER)) AS avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC LIMIT 3;"}
-{"question": "Which team has the highest average salary in the NBA", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY average_salary DESC LIMIT 1;"}
-{"question": "What is the total number of players in the NBA who have attended a college other than '--'?", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE COLLEGE!= '--';"}
-{"question": "Who is the player who has played for the most teams in their NBA career", "sql": "SELECT NAME, COUNT(DISTINCT TEAM) AS num_teams FROM nba_roster WHERE SALARY!= '--' GROUP BY NAME ORDER BY num_teams DESC LIMIT 1;"}
-{"question": "What are the top 10 highest-paid college-educated players in the NBA", "sql": "SELECT name, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS total_salary FROM nba_roster WHERE COLLEGE!= '--' GROUP BY name ORDER BY total_salary DESC LIMIT 10;"}
-{"question": "Which NBA players have attended multiple colleges", "sql": "SELECT NAME, COLLEGE FROM nba_roster WHERE COLLEGE!= '--' GROUP BY NAME, COLLEGE HAVING COUNT(COLLEGE) > 1;"}
-{"question": "What are the 5 teams with the tallest average height in the NBA", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER)) AS average_height FROM nba_roster GROUP BY team ORDER BY average_height DESC;"}
-{"question": "What is the average height of players in the NBA who are older than 25 years old", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER)) AS average_height FROM nba_roster WHERE AGE > 25;"}
-{"question": "How many players are on the Toronto Raptors' roster", "sql": "SELECT COUNT(*) FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "What is the weight of the heaviest 75% of NBA players", "sql": "SELECT WT FROM nba_roster ORDER BY CAST(REPLACE(REPLACE(WT,'lbs', ''),'', '') AS INTEGER) DESC LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster WHERE WT!= '--') * 75 / 100 - 1;"}
-{"question": "Who is the highest-paid player in the league, excluding those with unknown positions, salaries, or colleges", "sql": "SELECT name, salary FROM nba_roster WHERE POS!= 'NA' AND SALARY!= '--' AND COLLEGE!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How many players in the NBA attended Duke, Kentucky, or North Carolina for college", "sql": "SELECT COUNT(*) AS count FROM nba_roster WHERE COLLEGE!= '--' AND COLLEGE IN ('Duke', 'Kentucky', 'North Carolina');"}
-{"question": "What is the most common college represented in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as frequency FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1;"}
-{"question": "What is the number of players in the NBA who attended a college other than '--'?", "sql": "SELECT COUNT(*) FROM nba_roster WHERE COLLEGE!= '--';"}
-{"question": "How many players on the Toronto Raptors are 25 years old or older", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE team='Toronto Raptors' AND AGE >= 25;"}
-{"question": "How many players on the Toronto Raptors are 6'8", "sql": "SELECT COUNT(*) FROM nba_roster WHERE TEAM = 'Toronto Raptors' AND CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) = '6' || '8';"}
-{"question": "What is the team with the most players over 30 years old in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster WHERE AGE > 30 GROUP BY Team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What is the highest-paid Power Forward in the NBA", "sql": "SELECT POS, NAME, CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) as Salary FROM nba_roster WHERE SALARY!= '--' ORDER BY Salary DESC LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster WHERE SALARY!= '--' AND POS = 'PF')-1;"}
-{"question": "How many players in the NBA are older than the average age of all players", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE > (SELECT AVG(AGE) FROM nba_roster);"}
-{"question": "What positions in the NBA tend to have the oldest average age", "sql": "SELECT POS, COUNT(*) AS count, AVG(AGE) AS average_age FROM nba_roster GROUP BY POS ORDER BY average_age DESC;"}
-{"question": "Which players have more than 5 teammates with the same name", "sql": "SELECT NAME FROM nba_roster WHERE (SELECT COUNT(*) FROM nba_roster WHERE NAME = nba_roster.NAME AND TEAM = nba_roster.TEAM) > 5;"}
-{"question": "Which teams have the most players in the NBA", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team ORDER BY num_players DESC;"}
-{"question": "What is the total salary of the most expensive team in the NBA", "sql": "SELECT Team, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as Total_Salary FROM nba_roster WHERE SALARY!= '--' GROUP BY Team ORDER BY Total_Salary DESC;"}
-{"question": "How many players on the Boston Celtics are 6 feet 8 inches tall or taller", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE team='Boston Celtics' AND CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) = '6' || '8';"}
-{"question": "What are the most common colleges represented in the NBA", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster GROUP BY COLLEGE ORDER BY count DESC;"}
-{"question": "What are the 5 teams with the oldest average age in the NBA", "sql": "SELECT team, AVG(AGE) AS average_age, COUNT(*) AS num_players FROM nba_roster GROUP BY team HAVING COUNT(*) > 5 ORDER BY average_age DESC;"}
-{"question": "How many players in the NBA are 6 feet tall", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 6;"}
-{"question": "Who are the tallest players in the NBA", "sql": "SELECT NAME FROM nba_roster WHERE HT > (SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) FROM nba_roster);"}
-{"question": "What are the ages of the youngest and oldest players in the NBA", "sql": "SELECT MIN(AGE) AS youngest_player, MAX(AGE) AS oldest_player FROM nba_roster;"}
-{"question": "What are the 5 teams with the lightest average weight for players with known heights", "sql": "SELECT HT, WT, AVG(CAST(SUBSTR(WT, 1, LENGTH(WT)-3) AS INTEGER)) AS avg_weight FROM nba_roster WHERE HT!= 'NA' GROUP BY HT ORDER BY avg_weight DESC LIMIT 5;"}
-{"question": "What are the top 5 positions with the tallest average height in the NBA", "sql": "SELECT POS, COUNT(*) AS count, AVG(CAST(SUBSTR(HT, 1, LENGTH(HT)-2) AS INTEGER)) AS avg_height FROM nba_roster WHERE HT!= 'NA' GROUP BY POS ORDER BY count DESC LIMIT 5;"}
-{"question": "Which 5 players have played for the most teams in their NBA careers", "sql": "SELECT NAME, COUNT(DISTINCT team) AS num_teams FROM nba_roster GROUP BY NAME ORDER BY num_teams DESC LIMIT 5;"}
-{"question": "What are the most common heights in the NBA", "sql": "SELECT HT, COUNT(*) as count FROM nba_roster GROUP BY HT ORDER BY count DESC LIMIT 10;"}
-{"question": "How many players on the Los Angeles Lakers are 6 feet 8 inches tall", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE team='Los Angeles Lakers' AND CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = '6' || '8';"}
-{"question": "What are the most common positions for players under the age of 25 in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE AGE < 25 GROUP BY POS ORDER BY count DESC;"}
-{"question": "What are the top colleges that produce the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) as count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY count DESC;"}
-{"question": "What are the colleges that have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster GROUP BY COLLEGE ORDER BY num_players DESC;"}
-{"question": "How many players in the NBA are 25 years or younger", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE + 25 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "What is the average age of players from the college that has produced the youngest players in the NBA", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY average_age LIMIT 1;"}
-{"question": "How many players in the NBA have attended Duke, Kentucky, North Carolina, or did not attend college", "sql": "SELECT COUNT(*) FROM nba_roster WHERE COLLEGE IN ('--', 'Duke', 'Kentucky', 'North Carolina');"}
-{"question": "What are the teams with the most players from a particular college", "sql": "SELECT team, COLLEGE, COUNT(*) AS num_players FROM nba_roster GROUP BY team, COLLEGE ORDER BY num_players DESC;"}
-{"question": "What is the number of players in the NBA who are older than 10 years old", "sql": "SELECT COUNT(*) FROM nba_roster WHERE (CAST(CAST(AGE AS INTEGER) AS REAL) > 10);"}
-{"question": "What are the top 3 highest paid players from each college", "sql": "SELECT name, college, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY college ORDER BY max_salary DESC LIMIT 3;"}
-{"question": "How many players in the NBA are at least 6 feet 8 inches tall", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) >= 68;"}
-{"question": "Which NBA teams have the most players from a particular college", "sql": "SELECT Team, COLLEGE, COUNT(*) as Count FROM nba_roster WHERE COLLEGE!= '--' GROUP BY Team, COLLEGE ORDER BY Count DESC;"}
-{"question": "What is the most common college attended by NBA players", "sql": "SELECT COLLEGE, COUNT(*) AS frequency FROM nba_roster GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1;"}
-{"question": "What is the total salary of all NBA players, excluding those with unknown salaries", "sql": "SELECT SUM(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)*1000000) AS total_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What are the teams with the tallest average height in the NBA", "sql": "SELECT team, AVG(LENGTH(HT)) AS average_height FROM nba_roster GROUP BY team ORDER BY average_height DESC;"}
-{"question": "Which 10 players have played for the most teams in their NBA career", "sql": "SELECT name, COUNT(DISTINCT team) as num_teams FROM nba_roster WHERE SALARY!= '--' GROUP BY name ORDER BY num_teams DESC LIMIT 10;"}
-{"question": "What is the average height of NBA players 25 years old or younger", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')) AS INTEGER)) AS average_height FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What is the average weight of NBA players", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')-1) AS INTEGER) + CAST(SUBSTR(WT, INSTR(WT,' ')+1) AS FLOAT)/16) as average_weight FROM nba_roster WHERE WT!= '--';"}
-{"question": "Which teams in the NBA have a significantly larger roster size compared to the number of point guards in the league", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster GROUP BY Team HAVING COUNT(*) > (SELECT COUNT(*) FROM nba_roster WHERE POS = 'PG')*0.3;"}
-{"question": "What are the top 5 colleges that produce the oldest average age of NBA players", "sql": "SELECT COLLEGE, AVG(AGE) as avg_age FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY avg_age DESC LIMIT 5;"}
-{"question": "What is the average salary of all players in the positions of PG, SG, SF, PF, and C in the NBA", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE POS = 'PG' OR POS = 'SG' OR POS = 'SF' OR POS = 'PF' OR POS = 'C';"}
-{"question": "Who is the player with the highest salary in the NBA", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE SALARY = (SELECT MAX(SALARY) FROM nba_roster);"}
-{"question": "What are the top 10 teams with the most players in the NBA, considering only teams with at least 10 players with height information", "sql": "SELECT name, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as avg_height, COUNT(*) as count FROM nba_roster WHERE HT!= 'NA' GROUP BY name ORDER BY count DESC LIMIT 10;"}
-{"question": "Which players have played for the most teams in their NBA careers", "sql": "SELECT name, COUNT(DISTINCT team) as team_count FROM nba_roster WHERE team!= 'NA' GROUP BY name ORDER BY team_count DESC LIMIT 10;"}
-{"question": "What is the 75th percentile jersey number in the NBA", "sql": "SELECT CAST(Jersey AS INTEGER) as percentile FROM nba_roster ORDER BY CAST(Jersey AS INTEGER) LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster) * 0.75;"}
-{"question": "How many players in the NBA are younger than the oldest player in the league by 15 years", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE + 15 > (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "Which jersey numbers are the most popular among NBA players", "sql": "SELECT NAME, JERSEY FROM nba_roster GROUP BY JERSEY ORDER BY COUNT(*) DESC LIMIT 3;"}
-{"question": "Which team has the highest average salary", "sql": "SELECT team, AVG(CAST(SUBSTRING(SALARY, 2, LENGTH(SALARY)-2) AS INTEGER)) AS avg_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_salary DESC LIMIT 1;"}
-{"question": "How many players in the NBA are older than 25 years old", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE > 25;"}
-{"question": "Which colleges have produced the most multiple NBA players", "sql": "SELECT COLLEGE, COUNT(*) FROM nba_roster GROUP BY COLLEGE HAVING COUNT(*) > 1;"}
-{"question": "Who has the highest salary on the Los Angeles Lakers", "sql": "SELECT name, salary FROM nba_roster WHERE team='Los Angeles Lakers' AND salary!= '--' ORDER BY CAST(REPLACE(REPLACE(salary, '$', ''), ',', '') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What are the minimum and maximum salaries for each team in the NBA", "sql": "SELECT MIN(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as min_salary, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary, team FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY min_salary DESC, max_salary DESC;"}
-{"question": "What is the average age of the team with the oldest roster in the NBA", "sql": "SELECT AVG(AGE) as avg_age FROM nba_roster GROUP BY team ORDER BY avg_age DESC LIMIT 1;"}
-{"question": "What are the teams with more than 5 players in the age range of 25 to 30 in the NBA", "sql": "SELECT team, COUNT(*) AS num_players FROM nba_roster WHERE AGE BETWEEN 25 AND 30 GROUP BY team HAVING COUNT(*) > 5;"}
-{"question": "Who is the highest-paid player who did not attend college", "sql": "SELECT name, salary FROM nba_roster WHERE SALARY!= '--' AND COLLEGE = '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the total number of players in the NBA", "sql": "SELECT COUNT(*) FROM nba_roster;"}
-{"question": "What is the most common position among players under the age of 25 in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE AGE <= 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "Who is the oldest player in the NBA", "sql": "SELECT name, age FROM nba_roster ORDER BY age DESC LIMIT 1;"}
-{"question": "What are the minimum and maximum salaries in the NBA", "sql": "SELECT MIN(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as min_salary, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the average salary of Power Forward players in the NBA who are under the age of 25", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE POS = 'PF' AND AGE < 25;"}
-{"question": "What is the total number of players in the NBA who are 25 years or younger", "sql": "SELECT COUNT(*) as total_players FROM nba_roster WHERE AGE + 25 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "Who is the highest-paid player on the Toronto Raptors", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE TEAM = 'Toronto Raptors' AND SALARY = (SELECT MAX(SALARY) FROM nba_roster WHERE TEAM = 'Toronto Raptors');"}
-{"question": "Who is the highest-paid player on the Los Angeles Lakers", "sql": "SELECT name FROM nba_roster WHERE team='Los Angeles Lakers' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What are the top 3 teams with the most players over the age of 5 in the NBA", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE age > 5 GROUP BY team ORDER BY num_players DESC LIMIT 3;"}
-{"question": "Which teams have the tallest players, excluding those with unknown salaries", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as avg_height FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY avg_height DESC;"}
-{"question": "What is the number of players in the NBA who are 25 years or younger", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE + 25 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "What is the age group with the most players in the NBA", "sql": "SELECT AGE, COUNT(*) as count FROM nba_roster GROUP BY AGE ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the most common position for players aged 25 or older in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE AGE >= 25 GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the total salary of all players in the NBA", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster;"}
-{"question": "Which three teams have the most players from the same college", "sql": "SELECT team, COUNT(*) AS num_players, COLLEGE FROM nba_roster GROUP BY team, COLLEGE ORDER BY num_players DESC LIMIT 3;"}
-{"question": "What is the average age of players in the NBA who are more than 5 years older than the average age of all players", "sql": "SELECT AVG(AGE) as average_age FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5;"}
-{"question": "What is the heaviest player in the NBA", "sql": "SELECT NAME, WT FROM nba_roster WHERE WT!= 'NA' ORDER BY CAST(SUBSTRING(WT, 0, INSTR(WT,'') - 1) AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average height of all players in the NBA roster", "sql": "SELECT AVG(LENGTH(HT)) AS average_height FROM nba_roster;"}
-{"question": "What are the average height and age of players on each team in the NBA", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER)) AS average_height, AVG(AGE) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC;"}
-{"question": "How many players in the NBA are 6' or 8' tall", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE CAST(SUBSTRING(HT, 1, INSTR(HT,'')-1) AS INTEGER) = 6 | 8;"}
-{"question": "What is the shortest weight listed in the 'nba_roster' table", "sql": "SELECT NAME, WT FROM nba_roster ORDER BY LENGTH(WT) LIMIT 1;"}
-{"question": "What is the highest-paid player in the NBA", "sql": "SELECT TEAM, NAME, SALARY FROM nba_roster WHERE SALARY = (SELECT MAX(SALARY) FROM nba_roster) ORDER BY TEAM;"}
-{"question": "What college has produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) AS frequency FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY frequency DESC LIMIT 1;"}
-{"question": "What is the total salary of all players in the NBA who are 25 years old or younger", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What is the oldest player for each position in the NBA", "sql": "SELECT pos, NAME, MAX(AGE) as max_age FROM nba_roster GROUP BY pos;"}
-{"question": "Who is the highest-paid player in the NBA who did not attend college", "sql": "SELECT name, salary FROM nba_roster WHERE COLLEGE = '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the salary of the 25th percentile of players in the NBA who are 25 years old or younger", "sql": "SELECT CAST(SALARY as INTEGER) as percentile FROM nba_roster WHERE AGE <= 25 ORDER BY percentile LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster WHERE AGE <= 25) / 4;"}
-{"question": "What are the most common positions in the NBA, and which position has the highest average weight", "sql": "SELECT POS, COUNT(*) AS count, AVG(CAST(SUBSTR(WT, 1, INSTR(WT,'')) AS INTEGER)) AS average_weight FROM nba_roster WHERE POS!= 'NA' GROUP BY POS ORDER BY count DESC;"}
-{"question": "What is the 75th percentile age of the NBA players", "sql": "SELECT CAST(AGE AS INTEGER) AS percentile FROM nba_roster ORDER BY percentile LIMIT 1 OFFSET (SELECT COUNT(*) FROM nba_roster) * 0.75;"}
-{"question": "What is the average salary of paid NBA players", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY,' ')-1) AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What age group has the most players in the NBA", "sql": "SELECT AGE, COUNT(*) as count FROM nba_roster GROUP BY AGE ORDER BY count DESC;"}
-{"question": "What is the height of the tallest player on the Los Angeles Lakers", "sql": "SELECT HT, NAME FROM nba_roster WHERE team='Los Angeles Lakers' AND HT!= 'NA' ORDER BY CAST(SUBSTRING(HT, 0, INSTR(HT,'')) AS INTEGER) DESC LIMIT 1;"}
-{"question": "What is the average salary of the Toronto Raptors players", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster WHERE team='Toronto Raptors';"}
-{"question": "What is the average salary of an NBA player", "sql": "SELECT AVG(CAST(SALARY AS INTEGER) / 1000000) AS average_salary FROM nba_roster;"}
-{"question": "What is the team with the highest average age in the NBA", "sql": "SELECT team, AVG(age) AS average_age FROM nba_roster GROUP BY team ORDER BY average_age DESC LIMIT 1;"}
-{"question": "Which team has the most players over the age of 25 in the NBA", "sql": "SELECT Team, COUNT(*) FROM nba_roster WHERE AGE > 25 GROUP BY Team ORDER BY COUNT(*) DESC LIMIT 1;"}
-{"question": "What is the total salary of the team with the highest total salary in the NBA", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS total_salary, team FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY total_salary DESC;"}
-{"question": "How many players in the NBA are exactly 6 feet tall", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) = 6 AND HT!= 'NA';"}
-{"question": "What is the age with the most unique players in the NBA", "sql": "SELECT COUNT(DISTINCT AGE) AS age_count, AGE FROM nba_roster GROUP BY AGE ORDER BY age_count DESC LIMIT 1;"}
-{"question": "What is the highest-paid player who did not attend college", "sql": "SELECT name, salary FROM nba_roster WHERE COLLEGE = '--' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which age group has the most players in the NBA", "sql": "SELECT COUNT(*), AGE FROM nba_roster GROUP BY AGE ORDER BY COUNT(*) DESC;"}
-{"question": "What is the average height in the NBA?", "sql": "SELECT COUNT(*) as num_college_players FROM nba_roster WHERE COLLEGE!= '--';"}
-{"question": "Which position has the most players in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What are the top 10 colleges that have produced the most NBA players", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC LIMIT 10;"}
-{"question": "What is the average age of players from colleges that have multiple players in the NBA", "sql": "SELECT AVG(AGE) AS average_age, COLLEGE FROM nba_roster GROUP BY COLLEGE HAVING COUNT(*) > 1;"}
-{"question": "Which colleges have the most representation in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE ORDER BY num_players DESC;"}
-{"question": "Who are the oldest players in the NBA, excluding those who are above the average age of all players", "sql": "SELECT NAME FROM nba_roster WHERE AGE > (SELECT AVG(AGE) FROM nba_roster) ORDER BY AGE DESC;"}
-{"question": "What are the top 3 highest-paid players on the Toronto Raptors", "sql": "SELECT name, SALARY FROM nba_roster WHERE team='Toronto Raptors' ORDER BY CAST(SUBSTRING(SALARY, 2) AS INTEGER) DESC LIMIT 3;"}
-{"question": "Which colleges have produced multiple players in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster GROUP BY COLLEGE HAVING COUNT(*) > 1;"}
-{"question": "What is the average salary of NBA players 25 years old or younger", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) AS average_salary FROM nba_roster WHERE CAST(AGE AS INTEGER) <= 25;"}
-{"question": "What is the highest-paid player who has played for more than one team", "sql": "SELECT NAME, TEAM, SALARY FROM nba_roster WHERE SALARY = (SELECT MAX(SALARY) FROM nba_roster) AND (SELECT COUNT(DISTINCT TEAM) FROM nba_roster WHERE NAME = nba_roster.NAME) > 1;"}
-{"question": "Who is the tallest player in the NBA, based on average height", "sql": "SELECT NAME, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')) AS INTEGER)) AS AVG_HEIGHT, COUNT(DISTINCT TEAM) AS TEAM_COUNT FROM nba_roster GROUP BY NAME ORDER BY AVG_HEIGHT DESC LIMIT 1;"}
-{"question": "What is the total weight of all players in the NBA", "sql": "SELECT SUM(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
-{"question": "What are the top 10 highest-paid teams in the NBA, based on the average salary of their players", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) as avg_salary, AVG(AGE) as avg_age FROM nba_roster WHERE SALARY!= '--' GROUP BY SALARY ORDER BY avg_salary DESC LIMIT 10;"}
-{"question": "What is the highest salary for each team in the NBA", "sql": "SELECT team, MAX(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) as highest_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team;"}
-{"question": "What is the average age of all players in the NBA who are at least 60 years old", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster WHERE AGE > 5*12;"}
-{"question": "What is the average age of the youngest players in the NBA", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What are the top 3 teams with the highest average salary", "sql": "SELECT team, AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) AS average_salary FROM nba_roster GROUP BY team ORDER BY average_salary DESC LIMIT 3;"}
-{"question": "What is the most popular jersey number in the NBA", "sql": "SELECT Jersey, COUNT(*) as frequency FROM nba_roster WHERE Jersey!= 'NA' GROUP BY Jersey ORDER BY frequency DESC LIMIT 1;"}
-{"question": "What is the total salary of all players in the NBA, excluding those with unknown salaries", "sql": "SELECT SUM(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the number of players in the NBA roster who are 10 years or less away from the oldest player in the league", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE AGE + 10 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "Which three teams have the tallest average height in the NBA", "sql": "SELECT team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) as height FROM nba_roster GROUP BY team ORDER BY height DESC LIMIT 3;"}
-{"question": "How many players in the NBA are older than 5 years old", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE AGE > 5;"}
-{"question": "What are the 5 teams with the most players from the University of Michigan", "sql": "SELECT team, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE = 'Michigan' GROUP BY team ORDER BY num_players DESC LIMIT 5;"}
-{"question": "What is the number of players in the NBA who are 15 years or younger than the oldest player in the league", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE + 15 <= (SELECT MAX(AGE) FROM nba_roster);"}
-{"question": "What are the minimum and maximum salaries of NBA players", "sql": "SELECT MIN(SALARY) AS min_salary, MAX(SALARY) AS max_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the total salary of all players on the Toronto Raptors who are at least 6 feet 7 inches tall", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE team='Toronto Raptors' AND CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 >= 6.67;"}
-{"question": "What is the height with the most players in the NBA", "sql": "SELECT HT, COUNT(*) as count, AVG(WT) as avg_weight FROM nba_roster GROUP BY HT ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the most common height of NBA players", "sql": "SELECT CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) AS height, COUNT(*) AS count FROM nba_roster GROUP BY CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the total salary of all NBA players with known salaries", "sql": "SELECT SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the oldest player in the NBA", "sql": "SELECT AVG(AGE) as average_age, NAME from nba_roster GROUP BY NAME ORDER BY average_age DESC LIMIT 1;"}
-{"question": "What is the average height of NBA players aged 25 or older", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')) as INTEGER)) AS avg_height FROM nba_roster WHERE AGE >= 25;"}
-{"question": "How many players in the NBA are 6'6", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) = '6' || '6';"}
-{"question": "Who are the oldest players on each team in the NBA, excluding the average age of their team", "sql": "SELECT nba_roster.NAME FROM nba_roster WHERE AGE > (SELECT AVG(AGE) FROM nba_roster WHERE TEAM = nba_roster.TEAM) ORDER BY AGE DESC;"}
-{"question": "What is the most common position played by Jalen Johnson", "sql": "SELECT POS, COUNT(*) AS count, POS FROM nba_roster GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the number of players on each team who are 25 years old or older", "sql": "SELECT team, COUNT(*) AS num_players FROM nba_roster WHERE AGE >= 25 GROUP BY team;"}
-{"question": "What are the top 5 players in the NBA in terms of average height", "sql": "SELECT name, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)) as avg_height FROM nba_roster GROUP BY name ORDER BY avg_height DESC LIMIT 5;"}
-{"question": "What players in the NBA are taller than the average height of all players", "sql": "SELECT NAME FROM nba_roster WHERE HT > (SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12) FROM nba_roster);"}
-{"question": "Which team has the most players 25 years old or older", "sql": "SELECT team, COUNT(*) as num_players FROM nba_roster WHERE AGE >= 25 GROUP BY team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What are the 5 most common jersey numbers in the NBA", "sql": "SELECT COUNT(DISTINCT Jersey), Jersey FROM nba_roster GROUP BY Jersey ORDER BY COUNT(DISTINCT Jersey) DESC LIMIT 5;"}
-{"question": "What colleges are most represented in the NBA", "sql": "SELECT COLLEGE, COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE!= '--' GROUP BY COLLEGE;"}
-{"question": "What is the average salary of NBA players under the age of 25", "sql": "SELECT AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) AS average_salary FROM nba_roster WHERE AGE <= 25;"}
-{"question": "What are the top 10 teams in the NBA by average salary", "sql": "SELECT Team, AVG(CAST(SUBSTRING(SALARY, 2, LENGTH(SALARY)-2) AS INTEGER)) AS average_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY Team ORDER BY average_salary DESC LIMIT 10;"}
-{"question": "What is the player with the highest salary in the NBA", "sql": "SELECT name, CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) AS salary FROM nba_roster WHERE SALARY!= '--' ORDER BY salary DESC LIMIT 1;"}
-{"question": "Who is the oldest player in the NBA who is not a rookie", "sql": "SELECT name, age FROM nba_roster WHERE SALARY!= '--' ORDER BY age DESC LIMIT 1;"}
-{"question": "How many players in the NBA are 6 feet 8 inches or taller", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE CAST(SUBSTRING(HT, 0, INSTR(HT,'')-1) AS INTEGER) >= 68;"}
-{"question": "How many players in the NBA are 25 years old", "sql": "SELECT COUNT(*) FROM nba_roster WHERE age = 25;"}
-{"question": "What is the team with the oldest average age in the NBA", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster GROUP BY TEAM ORDER BY average_age DESC LIMIT 1;"}
-{"question": "Who is the highest-paid player in the NBA, excluding those with unknown salaries", "sql": "SELECT MAX(SALARY) AS highest_salary, NAME FROM nba_roster WHERE SALARY!= '--' GROUP BY NAME ORDER BY highest_salary DESC LIMIT 1;"}
-{"question": "Which team has the most players under the age of 25", "sql": "SELECT Team, COUNT(*) as num_players FROM nba_roster WHERE AGE < 25 GROUP BY Team ORDER BY num_players DESC LIMIT 1;"}
-{"question": "What are the top 3 jersey numbers with the most players in the NBA", "sql": "SELECT jersey, COUNT(*) as count FROM nba_roster WHERE jersey!= 'NA' GROUP BY jersey ORDER BY count DESC LIMIT 3;"}
-{"question": "What percentage of NBA players are at least 6 feet 8 inches tall", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER) >= 68 AND CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 >= 6.5;"}
-{"question": "What is the average age and height of NBA players, excluding those with unknown heights", "sql": "SELECT AVG(AGE) as average_age, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')) as INTEGER)) as average_height FROM nba_roster WHERE HT!= 'NA';"}
-{"question": "What is the player who has played for the most teams in the NBA", "sql": "SELECT name, COUNT(*) as num_teams FROM nba_roster GROUP BY name ORDER BY num_teams DESC LIMIT 1;"}
-{"question": "What is the average height of players in the NBA who are 25 years or older", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')) as INTEGER)) AS avg_height FROM nba_roster WHERE CAST(AGE as INTEGER) >= 25;"}
-{"question": "Which team has the most unique players in the NBA", "sql": "SELECT COUNT(DISTINCT TEAM), TEAM FROM nba_roster GROUP BY TEAM ORDER BY COUNT(DISTINCT TEAM) DESC LIMIT 1;"}
-{"question": "What are the 5 oldest players in the NBA", "sql": "SELECT NAME, AGE FROM nba_roster ORDER BY AGE DESC LIMIT 5;"}
-{"question": "What is the shortest height of a player in the NBA", "sql": "SELECT name, HT FROM nba_roster ORDER BY LENGTH(HT) LIMIT 1, 1;"}
-{"question": "What is the average height of Power Forwards and Centers in the NBA", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER) + CAST(SUBSTR(HT, INSTR(HT,'')+1) AS FLOAT)/12) AS average_height FROM nba_roster WHERE POS IN ('PF', 'C');"}
-{"question": "What is the total salary of the team with the highest payroll in the NBA", "sql": "SELECT team, SUM(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as total_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY total_salary DESC;"}
-{"question": "What are the top-paid players for each team in the NBA", "sql": "SELECT team, name, CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) as salary FROM nba_roster WHERE SALARY!= '--' GROUP BY team ORDER BY salary DESC;"}
-{"question": "What is the average salary of all NBA players, excluding those with unknown salaries", "sql": "SELECT AVG(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as average_salary FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the name and jersey number of the player with the highest jersey number in the NBA roster", "sql": "SELECT NAME, JERSEY FROM nba_roster ORDER BY CAST(JERSEY AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which five jersey numbers are the most commonly worn by players in the NBA", "sql": "SELECT name, jersey, COUNT(*) as count FROM nba_roster GROUP BY jersey ORDER BY count DESC LIMIT 5;"}
-{"question": "What is the most popular position in the NBA", "sql": "SELECT POS, COUNT(*) AS count FROM nba_roster WHERE POS IN ('PG', 'SG', 'SF', 'PF', 'C') GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "What is the number of players in the NBA who are 68 inches tall", "sql": "SELECT COUNT(*) FROM nba_roster WHERE CAST(SUBSTRING(HT, 0, INSTR(HT,'')) AS INTEGER) = 68;"}
-{"question": "Who is the highest paid player on the team with the most players", "sql": "SELECT NAME FROM nba_roster WHERE SALARY = (SELECT MAX(SALARY) FROM nba_roster) AND TEAM = (SELECT TEAM FROM nba_roster GROUP BY TEAM ORDER BY COUNT(*) DESC LIMIT 1);"}
-{"question": "What is the average height of players on the Toronto Raptors", "sql": "SELECT Team, AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER)) as Average_Height FROM nba_roster GROUP BY Team;"}
-{"question": "What are the top 5 teams in the NBA by average salary", "sql": "SELECT Team, AVG(CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER)) as Average_Salary FROM nba_roster WHERE SALARY!= '--' GROUP BY Team ORDER BY Average_Salary DESC;"}
-{"question": "What are the top 3 highest-paid players in the NBA", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE SALARY!= '--' ORDER BY CAST(SUBSTR(SALARY, 1, INSTR(SALARY, '$')-1) AS INTEGER) DESC LIMIT 3;"}
-{"question": "How many players in the NBA are 25 years old or younger", "sql": "SELECT COUNT(*) FROM nba_roster WHERE AGE <= 25;"}
-{"question": "How many players in the NBA attended Michigan State University", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE COLLEGE = 'Michigan State';"}
-{"question": "What is the most represented position among University of Michigan alumni in the NBA", "sql": "SELECT POS, COUNT(*) as count FROM nba_roster WHERE COLLEGE='Michigan' GROUP BY POS ORDER BY count DESC LIMIT 1;"}
-{"question": "How many players are on each team in the NBA", "sql": "SELECT TEAM, COUNT(*) AS num_players FROM nba_roster GROUP BY TEAM;"}
-{"question": "What is the number of players in the NBA roster who are more than 5 years older than the average age of all players in the roster", "sql": "SELECT COUNT(*) as num_players FROM nba_roster WHERE AGE - (SELECT AVG(AGE) FROM nba_roster) > 5;"}
-{"question": "What teams have multiple players from the same college", "sql": "SELECT team, COUNT(*) AS num_players, COLLEGE FROM nba_roster GROUP BY team, COLLEGE HAVING COUNT(*) > 1;"}
-{"question": "Which college has the most players on the Brooklyn Nets", "sql": "SELECT team, COUNT(*) AS num_players, COLLEGE FROM nba_roster WHERE COLLEGE!= '--' GROUP BY team, COLLEGE ORDER BY num_players DESC;"}
-{"question": "What is the average age of NBA players who are older than 5 years old", "sql": "SELECT AVG(AGE) AS average_age FROM nba_roster WHERE AGE > 5;"}
-{"question": "What are the 10 players with the tallest and shortest heights in the NBA", "sql": "SELECT name, HT, MAX(CAST(SUBSTRING(HT, 1, INSTR(HT,'')-1) AS INTEGER)) AS max_height, MIN(CAST(SUBSTRING(HT, INSTR(HT,'')+1) AS INTEGER)) AS min_height FROM nba_roster WHERE HT!= 'NA' GROUP BY name ORDER BY max_height DESC, min_height ASC LIMIT 10;"}
-{"question": "What is the age of the oldest player on the Toronto Raptors", "sql": "SELECT name, age FROM nba_roster WHERE team='Toronto Raptors' ORDER BY age DESC LIMIT 1;"}
-{"question": "What are the top 5 highest-paid college-educated players in the NBA", "sql": "SELECT NAME, SALARY FROM nba_roster WHERE COLLEGE!= '--' AND SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 5;"}
-{"question": "What is the number of players in the NBA roster who do not have a college listed", "sql": "SELECT COUNT(*) AS num_players FROM nba_roster WHERE COLLEGE = '--';"}
-{"question": "What is the number of players on the Toronto Raptors who earn more than $10,000,000", "sql": "SELECT COUNT(*) FROM nba_roster WHERE team='Toronto Raptors' AND CAST(SUBSTRING(SALARY, 2) AS INTEGER) > 10000000;"}
-{"question": "What is the average height and age of NBA players, and how do these values vary by height", "sql": "SELECT AVG(CAST(SUBSTR(HT, 1, INSTR(HT,'')-1) AS INTEGER)) AS average_height, AVG(AGE) AS average_age FROM nba_roster GROUP BY CAST(SUBSTR(HT, INSTR(HT,'')+1) AS INTEGER);"}
-{"question": "What is the most frequently worn jersey number in the NBA", "sql": "SELECT COUNT(DISTINCT Jersey) AS total_jerseys, Jersey FROM nba_roster GROUP BY Jersey ORDER BY total_jerseys DESC LIMIT 1;"}
-{"question": "What is the average age of all players in the NBA who have a known salary", "sql": "SELECT AVG(AGE) FROM nba_roster WHERE SALARY!= '--';"}
-{"question": "What is the 99th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*99/100-1;"}
-{"question": "What is the 75th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*75/100-1;"}
-{"question": "What is the 25th percentile salary in the NBA?", "sql": "SELECT (CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as percentile FROM nba_roster WHERE SALARY!= '--' order by percentile limit 1 offset (select count(*) from nba_roster where SALARY != '--')*25/100-1;"}
-{"question": "What is the median weight in the NBA?", "sql": "select CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What is the average weight in the NBA?", "sql": "SELECT AVG(CAST(SUBSTR(WT, 1, INSTR(WT,' ')) as INTEGER)) FROM nba_roster;"}
-{"question": "What is the median height in the NBA?", "sql": "select CAST(SUBSTR(HT, 1, INSTR(HT,' ')-1) AS INTEGER)+ CAST(SUBSTR(HT, INSTR(HT,' ')+1) AS FLOAT)/12 as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "Can you tell me how many players are in the NBA?", "sql": "select count(*) from nba_roster;"}
-{"question": "Would you please let me know what the highest paid players are for each position?", "sql": "SELECT name, pos, MAX(CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER)) as max_salary FROM nba_roster WHERE SALARY!= '--' GROUP BY POS;"}
-{"question": "Is Jalen Johnson 23 years old?", "sql": "Select name, age from nba_roster where name='Jalen Johnson';"}
-{"question": "Who is the oldest player on the Brooklyn Nets?", "sql": "SELECT NAME FROM nba_roster WHERE TEAM = 'Brooklyn Nets' AND AGE = (SELECT MAX(AGE) FROM nba_roster WHERE TEAM = 'Brooklyn Nets');"}
-{"question": "Who has the higest salary on the Memphis Grizzlies?", "sql": "select salary, name from nba_roster where team='Memphis Grizzlies' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Which player has the higest salary on the Cleveland Cavaliers?", "sql": "select salary, name from nba_roster where team='Cleveland Cavaliers' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "Who is the highest paid center on the Dallas Mavericks?", "sql": "select salary, name from nba_roster where team='Dallas Mavericks' and POS='C' and SALARY!= '--' ORDER BY CAST(REPLACE(REPLACE(SALARY, '$', ''), ',','') AS INTEGER) DESC LIMIT 1;"}
-{"question": "How much is Marcus Smart getting paid?", "sql": "select salary from nba_roster where name='Marcus Smart';"}
-{"question": "What's the average age of the Trail Blazers?", "sql": "select avg(age) from nba_roster where team='Portland Trail Blazers';"}
-{"question": "What's the median age of the NBA?", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster order by percentile limit 1 offset (select count(*) from nba_roster)/2;"}
-{"question": "What's the median age of the Miami Heat?", "sql": "select CAST(AGE as INTEGER) as percentile from nba_roster where team='Miami Heat' order by percentile limit 1 offset (select count(*) from nba_roster where team='Miami Heat')/2;"}

+ 0 - 0
recipes/3p_integrations/lamini/text2sql_memory_tuning/data/training_data/generated_queries_v2.jsonl


Daži faili netika attēloti, jo izmaiņu fails ir pārāk liels