{
"cells": [
{
"cell_type": "markdown",
"id": "3df36883-3c1c-4008-98ca-0d641820d9ef",
"metadata": {},
"source": [
"## Dataset exploration\n",
"\n",
"This is probably going to be a minimal EDA notebook looking at tool calling datasets that exists in market right now"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "63d7e44e-d808-4628-ab7a-540aa6d0fd72",
"metadata": {},
"outputs": [],
"source": [
"#!pip install datasets"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "52075b0f-61be-4898-be35-bd72bd022efc",
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"import pandas as pd\n",
"import numpy as np\n",
"from collections import Counter, defaultdict\n",
"import json\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import re\n",
"from tqdm import tqdm\n",
"import networkx as nx"
]
},
{
"cell_type": "markdown",
"id": "c3c59bbd-1060-4b71-9ceb-1e70d733b8a4",
"metadata": {},
"source": [
"### Hermes-Function-Calling v1\n",
"\n",
"- Apache 2.0\n",
"- Single-Turn: 2k\n",
"- Func_calling: 2k\n",
"- Glaive: 5k\n",
"- Json mode agent: 1.3k\n",
"- Json mode single: 1.24k\n",
"\n",
"https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "22a4366c-4a9a-45f8-b12d-603b0ab2df5f",
"metadata": {},
"outputs": [],
"source": [
"d = load_dataset(\"NousResearch/hermes-function-calling-v1\")"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "bb2798bd-0d22-47c6-9346-4fde13b64c54",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Dataset({\n",
" features: ['id', 'conversations', 'category', 'subcategory', 'task'],\n",
" num_rows: 1893\n",
"})"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d['train']"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "bc731c69-3ef5-4e34-98b6-e07646241d09",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['id', 'conversations', 'category', 'subcategory', 'task'])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d['train'][123].keys()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "0b54ea4f-3f2a-48dc-8efc-9cff09dd46db",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': '936b90da-dbeb-4864-a6e6-28899965265d',\n",
" 'conversations': [{'from': 'system',\n",
" 'value': \"You are a function calling AI model. You are provided with function signatures within
\n", " | id | \n", "category | \n", "subcategory | \n", "task | \n", "split | \n", "num_available_functions | \n", "num_tool_calls | \n", "human_msg_length | \n", "available_functions | \n", "called_functions | \n", "conversation_turns | \n", "human_msg | \n", "gpt_msg | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "85f6c398-69c7-4df2-aed1-29d614a93a26 | \n", "IoT and Home Automation | \n", "Security Camera Management | \n", "View and Manage Security Camera Feeds | \n", "train | \n", "0 | \n", "3 | \n", "1670 | \n", "[] | \n", "[get_camera_live_feed, record_camera_feed, get... | \n", "3 | \n", "I've recently installed a new security system ... | \n", "<tool_call>\\n{'arguments': {'camera_id': 'fron... | \n", "
1 | \n", "89ef3c87-66bd-46ee-9297-15398fd9a235 | \n", "IoT and Home Automation | \n", "Smart Home Setup | \n", "Set Up a Smart Home System | \n", "train | \n", "0 | \n", "2 | \n", "916 | \n", "[] | \n", "[initialize_smart_home_system, create_device_g... | \n", "3 | \n", "I've recently equipped my home with various sm... | \n", "<tool_call>\\n{'arguments': {'device_list': ['P... | \n", "
2 | \n", "14657d01-d6d1-46df-8eb1-7267ba820683 | \n", "IoT and Home Automation | \n", "Thermostat Control | \n", "Adjust Smart Thermostat Settings | \n", "train | \n", "0 | \n", "1 | \n", "757 | \n", "[] | \n", "[set_thermostat_schedule] | \n", "3 | \n", "I recently installed a smart thermostat model ... | \n", "<tool_call>\\n{'arguments': {'thermostat_id': '... | \n", "
3 | \n", "c483f963-8a29-4ff0-a684-89be0d0f2843 | \n", "IoT and Home Automation | \n", "Voice Commands for Home Tasks | \n", "Perform Home Tasks Using Voice Commands | \n", "train | \n", "0 | \n", "3 | \n", "1058 | \n", "[] | \n", "[activate_voice_command, set_thermostat, activ... | \n", "3 | \n", "I just arrived home and I'm carrying several b... | \n", "<tool_call>\\n{'arguments': {'command': 'Activa... | \n", "
4 | \n", "81ad724a-bb74-420f-8221-91557b7e5930 | \n", "IoT and Home Automation | \n", "Lighting Control | \n", "Control Smart Lights in a Home | \n", "train | \n", "0 | \n", "2 | \n", "556 | \n", "[] | \n", "[set_smart_light_color, sync_lights_with_autom... | \n", "3 | \n", "I am preparing my living room for a cozy movie... | \n", "<tool_call>\\n{'arguments': {'room': 'living ro... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1888 | \n", "90b96a9f-85c3-459f-95bd-7c58557a4b02 | \n", "Information Extraction | \n", "Json Schema | \n", "Structured json schema extaction with function... | \n", "train | \n", "0 | \n", "0 | \n", "9349 | \n", "[] | \n", "[] | \n", "3 | \n", "Can you help me extract queries from the follo... | \n", "<tool_call>\\n{\"arguments\": {\"queries\": ['hat i... | \n", "
1889 | \n", "f698e236-d733-4a24-af41-64831a7139ac | \n", "Information Extraction | \n", "Json Schema | \n", "Structured json schema extaction with function... | \n", "train | \n", "0 | \n", "0 | \n", "13462 | \n", "[] | \n", "[] | \n", "3 | \n", "Can you help me extract queries from the follo... | \n", "<tool_call>\\n{\"arguments\": {\"queries\": ['How d... | \n", "
1890 | \n", "0eed5d10-8d6e-4aaf-9b49-ad299ee02c5d | \n", "Information Extraction | \n", "Json Schema | \n", "Structured json schema extaction with function... | \n", "train | \n", "0 | \n", "0 | \n", "11526 | \n", "[] | \n", "[] | \n", "3 | \n", "Can you help me extract queries from the follo... | \n", "<tool_call>\\n{\"arguments\": {\"queries\": ['How d... | \n", "
1891 | \n", "cf305996-bae4-46f2-b725-d540b1d3ea5c | \n", "Information Extraction | \n", "Json Schema | \n", "Structured json schema extaction with function... | \n", "train | \n", "0 | \n", "0 | \n", "10049 | \n", "[] | \n", "[] | \n", "3 | \n", "Can you help me extract queries from the follo... | \n", "<tool_call>\\n{\"arguments\": {\"queries\": ['Can y... | \n", "
1892 | \n", "98c8fda0-ca02-4d3c-ac96-c5bd6bf6904a | \n", "Information Extraction | \n", "Json Schema | \n", "Structured json schema extaction with function... | \n", "train | \n", "0 | \n", "0 | \n", "10025 | \n", "[] | \n", "[] | \n", "3 | \n", "Can you help me extract queries from the follo... | \n", "<tool_call>\\n{\"arguments\": {\"queries\": ['How i... | \n", "
1893 rows × 13 columns
\n", "