{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports & Env Setup" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "import sys\n", "import os\n", "from dotenv import load_dotenv\n", "load_dotenv()\n", "from datasets import load_dataset\n", "\n", "\n", "import dspy\n", "sys.path.append(os.path.abspath('../'))\n", "from benchmarks import llama_mmlu_pro, leaderboard_mmlu_pro, llama_mmlu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configuration" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "NUM_THREADS = 48\n", "\n", "FEW_SHOTS = 5\n", "\n", "# See https://docs.litellm.ai/docs/providers/vllm for details\n", "TASK_MODEL = dspy.LM(\n", " \"hosted_vllm/meta-llama/Llama-3.3-70B-Instruct\",\n", " api_base = 'http://localhost:8000/v1' , # or api_base ?\n", " api_key = \"dummy\",\n", " # api_version: Optional[str] = None,\n", " # api_key: Optional[str] = None,\n", " # seed: Optional[int] = None,\n", " # max_tokens: Optional[int] = None,\n", " # timeout: Optional[Union[float, int]] = None,\n", ")\n", "PROMPT_MODEL = dspy.LM(\n", " \"hosted_vllm/meta-llama/Llama-3.3-70B-Instruct\",\n", " api_base = 'http://localhost:8000/v1', # or api_base ?\n", " api_key = \"dummy\",\n", "\n", " # api_version: Optional[str] = None,\n", " # api_key: Optional[str] = None,\n", " # seed: Optional[int] = None,\n", " # max_tokens: Optional[int] = None,\n", " # timeout: Optional[Union[float, int]] = None,\n", ")\n", "\n", "dspy.configure(lm=TASK_MODEL)\n", "\n", "# replace this with llama_mmlu_pro or whatever\n", "benchmark = llama_mmlu\n", "\n", "# Without chain of thought:\n", "# program = dspy.Predict(\n", "# benchmark.signature(\"\")\n", "# )\n", "\n", "# With chain of thought:\n", "program = dspy.ChainOfThought(\n", " benchmark.signature(\"You are an assistant designed to provide guidance on multiple-choice questions. Your role involves analyzing questions, assessing answer choices, and offering well-reasoned explanations to enhance understanding. By breaking down complex concepts, you help users develop critical thinking skills and improve their decision-making process. You strive to present information in a clear, structured manner while adapting to the user's level of expertise. Ultimately, your goal is to foster deeper comprehension and confidence in tackling multiple-choice assessments.\") # put your initial system prompt here, or leave blank\n", ")\n", "\n", "evaluate = dspy.Evaluate(\n", " devset=[],\n", " max_errors = 500,\n", " metric=benchmark.metric,\n", " num_threads=NUM_THREADS,\n", " display_progress=True,\n", " display_table=True,\n", " return_all_scores=True,\n", " return_outputs=True,\n", " provide_traceback=True\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load dataset" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1403, 1263, 11369)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainset, valset, testset = benchmark.datasets(\n", " train_size=0.1,\n", " validation_size=0.1,\n", ")\n", "\n", "len(trainset), len(valset), len(testset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Baseline Benchmark" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BASE PROMPT:\n", " You are an assistant designed to provide guidance on multiple-choice questions. Your role involves analyzing questions, assessing answer choices, and offering well-reasoned explanations to enhance understanding. By breaking down complex concepts, you help users develop critical thinking skills and improve their decision-making process. You strive to present information in a clear, structured manner while adapting to the user's level of expertise. Ultimately, your goal is to foster deeper comprehension and confidence in tackling multiple-choice assessments.\n", "CPU times: user 339 μs, sys: 0 ns, total: 339 μs\n", "Wall time: 289 μs\n" ] } ], "source": [ "%%time\n", "print(\"BASE PROMPT:\\n\", program.signature.instructions)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BEST EXAMPLES:\n", " []\n", "CPU times: user 267 μs, sys: 8 μs, total: 275 μs\n", "Wall time: 228 μs\n" ] } ], "source": [ "%%time\n", "print(\"BEST EXAMPLES:\\n\", program.demos)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting execution...\n", "Average Metric: 37.00 / 40 (92.5%): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:13<00:00, 2.88it/s]" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2025/01/30 13:36:38 INFO dspy.evaluate.evaluate: Average Metric: 37 / 40 (92.5%)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", " | question | \n", "options | \n", "example_answer | \n", "reasoning | \n", "pred_answer | \n", "metric | \n", "
---|---|---|---|---|---|---|
0 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "A | \n", "To determine which scenario involves the main character doing some... | \n", "A | \n", "✔️ [True] | \n", "
1 | \n", "Which of the following is not a TV cartoon duo? | \n", "{'A': 'Chip and Dale', 'B': 'Beavis and Butt-head', 'C': 'Simon an... | \n", "C | \n", "To determine which of the following is not a TV cartoon duo, we ne... | \n", "C | \n", "✔️ [True] | \n", "
2 | \n", "Market testing studies use __________ to carry out controlled expe... | \n", "{'A': 'Test markets.', 'B': 'Data mining.', 'C': 'Analytical infor... | \n", "A | \n", "To answer this question, we need to understand the concept of mark... | \n", "A | \n", "✔️ [True] | \n", "
3 | \n", "Which of the following is the essential feature of Conduct Disorder | \n", "{'A': 'Lack of appropriate guilt or remorse', 'B': 'Major rule or ... | \n", "B | \n", "To answer this question, we need to consider the characteristics o... | \n", "B | \n", "✔️ [True] | \n", "
4 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "C | \n", "To determine which scenario involves the main character doing some... | \n", "C | \n", "✔️ [True] | \n", "
5 | \n", "An off-duty police officer was standing on a street corner waiting... | \n", "{'A': 'assault with a deadly weapon.', 'B': 'involuntary manslaugh... | \n", "B | \n", "The police officer's actions, although intended to frighten the ma... | \n", "B | \n", "✔️ [True] | \n", "
6 | \n", "There are 2,000 insurance agents licensed under the general licens... | \n", "{'A': 'constitutional, because a state license is a privilege and ... | \n", "C | \n", "The key issue here is whether the state statute revoking the insur... | \n", "C | \n", "✔️ [True] | \n", "
7 | \n", "Millions of immigrant children who entered the United States learn... | \n", "{'A': 'Acculturation', 'B': 'Collective behavior', 'C': 'Social st... | \n", "A | \n", "The concept described in the question involves immigrant children ... | \n", "A | \n", "✔️ [True] | \n", "
8 | \n", "This question refers to the following information. \"When we were k... | \n", "{'A': 'The labor union movement.', 'B': 'The civil rights movement... | \n", "D | \n", "The Port Huron Statement, as excerpted, discusses themes of social... | \n", "D | \n", "✔️ [True] | \n", "
9 | \n", "Who is the eighth-century CE female poet worshipped throughout ma... | \n", "{'A': 'Andal', 'B': 'Devi', 'C': 'Ganga', 'D': 'Kali'} | \n", "A | \n", "The question asks for an eighth-century CE female poet who is wors... | \n", "A | \n", "✔️ [True] | \n", "
10 | \n", "Clifford and Lucia Pauling, in Senior View, told us that physical ... | \n", "{'A': 'Are rapid and frightening', 'B': 'Can be offset by meditati... | \n", "C | \n", "To answer this question, we need to consider the context of physic... | \n", "C | \n", "✔️ [True] | \n", "
11 | \n", "Which statement best describes one of Dworkin's central arguments ... | \n", "{'A': 'Morality plays no role in the concept of law.', 'B': 'Moral... | \n", "D | \n", "To answer this question, we need to consider the central arguments... | \n", "D | \n", "✔️ [True] | \n", "
12 | \n", "Light that is not transmitted by opaque materials is | \n", "{'A': 'reflected or converted to internal energy in the material.'... | \n", "A | \n", "When light hits an opaque material, it does not pass through becau... | \n", "A | \n", "✔️ [True] | \n", "
13 | \n", "Which of the following was not defined by Giddens (1998) as part o... | \n", "{'A': 'the democratization of the family', 'B': 'putting an end to... | \n", "B | \n", "To answer this question, we need to consider the key components of... | \n", "B | \n", "✔️ [True] | \n", "
14 | \n", "The U.S. economy currently suffers a recessionary gap and a budget... | \n", "{'A': 'Tax increase \\xa0\\xa0\\xa0 Demand rises \\xa0\\xa0\\xa0 Falling... | \n", "C | \n", "To address a recessionary gap and a budget deficit through fiscal ... | \n", "C | \n", "✔️ [True] | \n", "
15 | \n", "A company president is concerned about the low motivation and sati... | \n", "{'A': 'ERG theory', 'B': 'expectancy theory', 'C': 'equity theory'... | \n", "D | \n", "The scenario describes a situation where a company president imple... | \n", "D | \n", "✔️ [True] | \n", "
16 | \n", "What characteristic is not representative of a type IIb muscle fib... | \n", "{'A': 'Low oxidative capacity', 'B': 'High fatigue resistance', 'C... | \n", "B | \n", "To answer this question, we need to understand the characteristics... | \n", "B | \n", "✔️ [True] | \n", "
17 | \n", "The energy for all forms of muscle contraction is provided by: | \n", "{'A': 'ATP.', 'B': 'ADP.', 'C': 'phosphocreatine.', 'D': 'oxidativ... | \n", "A | \n", "To answer this question, we need to understand the role of differe... | \n", "A | \n", "✔️ [True] | \n", "
18 | \n", "The main factor preventing subsistence economies from advancing ec... | \n", "{'A': 'a currency.', 'B': 'a well-connected transportation infrast... | \n", "D | \n", "To address this question, let's consider what subsistence economie... | \n", "B | \n", "\n", " |
19 | \n", "The primary research method used by developmental psychologists is | \n", "{'A': 'case study', 'B': 'cross-sectional research', 'C': 'natural... | \n", "B | \n", "Developmental psychologists often aim to understand how individual... | \n", "B | \n", "✔️ [True] | \n", "
20 | \n", "Kevin wants shoes and grows turnips. Lisa wants turnips and makes ... | \n", "{'A': 'Store of value', 'B': 'Unit of account', 'C': 'Medium of ex... | \n", "C | \n", "In this scenario, Kevin, Lisa, and Bob have different needs and pr... | \n", "C | \n", "✔️ [True] | \n", "
21 | \n", "The Federal Reserve implements an expansionary policy by doing whi... | \n", "{'A': 'Buying Treasury bonds in the open market', 'B': 'Raising th... | \n", "A | \n", "To answer this question, we need to understand the tools the Feder... | \n", "A | \n", "✔️ [True] | \n", "
22 | \n", "Why do political scientists identify the presidential elections of... | \n", "{'A': 'The issues at stake in those elections were more important ... | \n", "B | \n", "To answer this question, we need to understand what is meant by \"c... | \n", "B | \n", "✔️ [True] | \n", "
23 | \n", "An entity engaged an accountant to review its financial statements... | \n", "{'A': 'Withdrawn from the engagement because the entity has not be... | \n", "C | \n", "When an accountant is engaged to review financial statements in ac... | \n", "C | \n", "✔️ [True] | \n", "
24 | \n", "Which of these statements defines the Copenhagen School's view of ... | \n", "{'A': 'Security is a socially constructed concept, relative to the... | \n", "A | \n", "The Copenhagen School's view of security is rooted in the concept ... | \n", "A | \n", "✔️ [True] | \n", "
25 | \n", "An object of mass m1 experiences a linear, elastic collision with ... | \n", "{'A': 'The final speed of object 1', 'B': 'The initial speed of ob... | \n", "D | \n", "To determine the mass of the second object in a linear, elastic co... | \n", "D | \n", "✔️ [True] | \n", "
26 | \n", "This question refers to the following information. \"Thereupon it w... | \n", "{'A': 'Portuguese explorers were convinced that the route to the N... | \n", "D | \n", "The Treaty of Tordesillas, signed in 1494, was an agreement betwee... | \n", "D | \n", "✔️ [True] | \n", "
27 | \n", "When both short-run aggregate supply and aggregate demand increase... | \n", "{'A': 'The price level rises but real GDP falls.', 'B': 'Both the ... | \n", "D | \n", "When both short-run aggregate supply and aggregate demand increase... | \n", "D | \n", "✔️ [True] | \n", "
28 | \n", "This question refers to the following information. \"The challenge ... | \n", "{'A': 'the Civil Rights Act of 1964', 'B': 'Medicare and Medicaid'... | \n", "D | \n", "To answer this question, we need to analyze the given text and opt... | \n", "D | \n", "✔️ [True] | \n", "
29 | \n", "Ada Lovelace is credited with being the first person to have made ... | \n", "{'A': 'a computer program', 'B': 'a souffle', 'C': 'a brassiere', ... | \n", "A | \n", "To answer this question, we need to consider the historical contri... | \n", "A | \n", "✔️ [True] | \n", "
30 | \n", "How are new polyomaviruses detailed | \n", "{'A': 'Shot gun sequencing', 'B': 'Cultivation in human neural cel... | \n", "C | \n", "To detail new polyomaviruses, we must consider the methods used fo... | \n", "C | \n", "✔️ [True] | \n", "
31 | \n", "What is the square root of 81 squared? | \n", "{'A': '9', 'B': '27', 'C': '81', 'D': '729'} | \n", "C | \n", "To find the square root of 81 squared, we first calculate 81 squar... | \n", "C | \n", "✔️ [True] | \n", "
32 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "C | \n", "To determine which scenario involves the main character doing some... | \n", "C | \n", "✔️ [True] | \n", "
33 | \n", "This question refers to the following information. \"The conscience... | \n", "{'A': 'Passage of the Pure Food and Water Act', 'B': 'Creation of ... | \n", "D | \n", "The Progressive Party Platform of 1912 emphasizes the importance o... | \n", "D | \n", "✔️ [True] | \n", "
34 | \n", "A 50-year-old nurse who had been fired from his job at ahospital t... | \n", "{'A': 'No, because sanctions are not proper against the attorney o... | \n", "B | \n", "The court is likely to grant the hospital's motion for sanctions b... | \n", "C | \n", "\n", " |
35 | \n", "According to John Money, __________ refers to a person's private i... | \n", "{'A': 'Gender identity', 'B': 'Sexual nomenclature', 'C': 'Gender ... | \n", "A | \n", "To answer this question, we need to understand the concept of gend... | \n", "A | \n", "✔️ [True] | \n", "
36 | \n", "Rounded to the nearest 10,000, the population of Louisiana was 4,5... | \n", "{'A': '4,500,321', 'B': '4,524,491', 'C': '4,533,372', 'D': '4,535... | \n", "C | \n", "To find the actual population of Louisiana in 2010, we need to con... | \n", "C | \n", "✔️ [True] | \n", "
37 | \n", "Compared with the sound you hear from the siren of a stationary fi... | \n", "{'A': 'speed', 'B': 'frequency', 'C': 'wavelength', 'D': 'All of t... | \n", "B | \n", "The question is about the Doppler effect, which describes how the ... | \n", "B | \n", "✔️ [True] | \n", "
38 | \n", "______ regulation goes further than the ______ that CSOs have with... | \n", "{'A': 'Legal, Interactions, Mechanics', 'B': 'Civil, Relations, Ou... | \n", "B | \n", "To solve this, let's break down the components of the sentence and... | \n", "C | \n", "\n", " |
39 | \n", "Aristotle saw his theory of the virtues as: | \n", "{'A': 'final and complete.', 'B': 'open-ended and revisable.', 'C'... | \n", "B | \n", "Aristotle's theory of the virtues, as outlined in his Nicomachean ... | \n", "B | \n", "✔️ [True] | \n", "
\n", " | question | \n", "options | \n", "example_answer | \n", "reasoning | \n", "pred_answer | \n", "metric | \n", "
---|---|---|---|---|---|---|
0 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "A | \n", "To determine which scenario involves the main character doing some... | \n", "A | \n", "✔️ [True] | \n", "
1 | \n", "Which of the following is not a TV cartoon duo? | \n", "{'A': 'Chip and Dale', 'B': 'Beavis and Butt-head', 'C': 'Simon an... | \n", "C | \n", "To determine which of the following is not a TV cartoon duo, we ne... | \n", "C | \n", "✔️ [True] | \n", "
2 | \n", "Market testing studies use __________ to carry out controlled expe... | \n", "{'A': 'Test markets.', 'B': 'Data mining.', 'C': 'Analytical infor... | \n", "A | \n", "To answer this question, we need to understand the concept of mark... | \n", "A | \n", "✔️ [True] | \n", "
3 | \n", "Which of the following is the essential feature of Conduct Disorder | \n", "{'A': 'Lack of appropriate guilt or remorse', 'B': 'Major rule or ... | \n", "B | \n", "To answer this question, we need to consider the characteristics o... | \n", "B | \n", "✔️ [True] | \n", "
4 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "C | \n", "To determine which scenario involves the main character doing some... | \n", "C | \n", "✔️ [True] | \n", "
5 | \n", "An off-duty police officer was standing on a street corner waiting... | \n", "{'A': 'assault with a deadly weapon.', 'B': 'involuntary manslaugh... | \n", "B | \n", "The police officer's actions, although intended to frighten the ma... | \n", "B | \n", "✔️ [True] | \n", "
6 | \n", "There are 2,000 insurance agents licensed under the general licens... | \n", "{'A': 'constitutional, because a state license is a privilege and ... | \n", "C | \n", "The key issue here is whether the state statute revoking the insur... | \n", "C | \n", "✔️ [True] | \n", "
7 | \n", "Millions of immigrant children who entered the United States learn... | \n", "{'A': 'Acculturation', 'B': 'Collective behavior', 'C': 'Social st... | \n", "A | \n", "The concept described in the question involves immigrant children ... | \n", "A | \n", "✔️ [True] | \n", "
8 | \n", "This question refers to the following information. \"When we were k... | \n", "{'A': 'The labor union movement.', 'B': 'The civil rights movement... | \n", "D | \n", "The Port Huron Statement, as excerpted, discusses themes of social... | \n", "D | \n", "✔️ [True] | \n", "
9 | \n", "Who is the eighth-century CE female poet worshipped throughout ma... | \n", "{'A': 'Andal', 'B': 'Devi', 'C': 'Ganga', 'D': 'Kali'} | \n", "A | \n", "The question asks for an eighth-century CE female poet who is wors... | \n", "A | \n", "✔️ [True] | \n", "
10 | \n", "Clifford and Lucia Pauling, in Senior View, told us that physical ... | \n", "{'A': 'Are rapid and frightening', 'B': 'Can be offset by meditati... | \n", "C | \n", "To answer this question, we need to consider the context of physic... | \n", "C | \n", "✔️ [True] | \n", "
11 | \n", "Which statement best describes one of Dworkin's central arguments ... | \n", "{'A': 'Morality plays no role in the concept of law.', 'B': 'Moral... | \n", "D | \n", "To answer this question, we need to consider the central arguments... | \n", "D | \n", "✔️ [True] | \n", "
12 | \n", "Light that is not transmitted by opaque materials is | \n", "{'A': 'reflected or converted to internal energy in the material.'... | \n", "A | \n", "When light hits an opaque material, it does not pass through becau... | \n", "A | \n", "✔️ [True] | \n", "
13 | \n", "Which of the following was not defined by Giddens (1998) as part o... | \n", "{'A': 'the democratization of the family', 'B': 'putting an end to... | \n", "B | \n", "To answer this question, we need to consider the key components of... | \n", "B | \n", "✔️ [True] | \n", "
14 | \n", "The U.S. economy currently suffers a recessionary gap and a budget... | \n", "{'A': 'Tax increase \\xa0\\xa0\\xa0 Demand rises \\xa0\\xa0\\xa0 Falling... | \n", "C | \n", "To address a recessionary gap and a budget deficit through fiscal ... | \n", "C | \n", "✔️ [True] | \n", "
15 | \n", "A company president is concerned about the low motivation and sati... | \n", "{'A': 'ERG theory', 'B': 'expectancy theory', 'C': 'equity theory'... | \n", "D | \n", "The scenario describes a situation where a company president imple... | \n", "D | \n", "✔️ [True] | \n", "
16 | \n", "What characteristic is not representative of a type IIb muscle fib... | \n", "{'A': 'Low oxidative capacity', 'B': 'High fatigue resistance', 'C... | \n", "B | \n", "To answer this question, we need to understand the characteristics... | \n", "B | \n", "✔️ [True] | \n", "
17 | \n", "The energy for all forms of muscle contraction is provided by: | \n", "{'A': 'ATP.', 'B': 'ADP.', 'C': 'phosphocreatine.', 'D': 'oxidativ... | \n", "A | \n", "To answer this question, we need to understand the role of differe... | \n", "A | \n", "✔️ [True] | \n", "
18 | \n", "The main factor preventing subsistence economies from advancing ec... | \n", "{'A': 'a currency.', 'B': 'a well-connected transportation infrast... | \n", "D | \n", "To address this question, let's consider what subsistence economie... | \n", "B | \n", "\n", " |
19 | \n", "The primary research method used by developmental psychologists is | \n", "{'A': 'case study', 'B': 'cross-sectional research', 'C': 'natural... | \n", "B | \n", "Developmental psychologists often aim to understand how individual... | \n", "B | \n", "✔️ [True] | \n", "
20 | \n", "Kevin wants shoes and grows turnips. Lisa wants turnips and makes ... | \n", "{'A': 'Store of value', 'B': 'Unit of account', 'C': 'Medium of ex... | \n", "C | \n", "In this scenario, Kevin, Lisa, and Bob have different needs and pr... | \n", "C | \n", "✔️ [True] | \n", "
21 | \n", "The Federal Reserve implements an expansionary policy by doing whi... | \n", "{'A': 'Buying Treasury bonds in the open market', 'B': 'Raising th... | \n", "A | \n", "To answer this question, we need to understand the tools the Feder... | \n", "A | \n", "✔️ [True] | \n", "
22 | \n", "Why do political scientists identify the presidential elections of... | \n", "{'A': 'The issues at stake in those elections were more important ... | \n", "B | \n", "To answer this question, we need to understand what is meant by \"c... | \n", "B | \n", "✔️ [True] | \n", "
23 | \n", "An entity engaged an accountant to review its financial statements... | \n", "{'A': 'Withdrawn from the engagement because the entity has not be... | \n", "C | \n", "When an accountant is engaged to review financial statements in ac... | \n", "C | \n", "✔️ [True] | \n", "
24 | \n", "Which of these statements defines the Copenhagen School's view of ... | \n", "{'A': 'Security is a socially constructed concept, relative to the... | \n", "A | \n", "The Copenhagen School's view of security is rooted in the concept ... | \n", "A | \n", "✔️ [True] | \n", "
25 | \n", "An object of mass m1 experiences a linear, elastic collision with ... | \n", "{'A': 'The final speed of object 1', 'B': 'The initial speed of ob... | \n", "D | \n", "To determine the mass of the second object in a linear, elastic co... | \n", "D | \n", "✔️ [True] | \n", "
26 | \n", "This question refers to the following information. \"Thereupon it w... | \n", "{'A': 'Portuguese explorers were convinced that the route to the N... | \n", "D | \n", "The Treaty of Tordesillas, signed in 1494, was an agreement betwee... | \n", "D | \n", "✔️ [True] | \n", "
27 | \n", "When both short-run aggregate supply and aggregate demand increase... | \n", "{'A': 'The price level rises but real GDP falls.', 'B': 'Both the ... | \n", "D | \n", "When both short-run aggregate supply and aggregate demand increase... | \n", "D | \n", "✔️ [True] | \n", "
28 | \n", "This question refers to the following information. \"The challenge ... | \n", "{'A': 'the Civil Rights Act of 1964', 'B': 'Medicare and Medicaid'... | \n", "D | \n", "To answer this question, we need to analyze the given text and opt... | \n", "D | \n", "✔️ [True] | \n", "
29 | \n", "Ada Lovelace is credited with being the first person to have made ... | \n", "{'A': 'a computer program', 'B': 'a souffle', 'C': 'a brassiere', ... | \n", "A | \n", "To answer this question, we need to consider the historical contri... | \n", "A | \n", "✔️ [True] | \n", "
30 | \n", "How are new polyomaviruses detailed | \n", "{'A': 'Shot gun sequencing', 'B': 'Cultivation in human neural cel... | \n", "C | \n", "To detail new polyomaviruses, we must consider the methods used fo... | \n", "C | \n", "✔️ [True] | \n", "
31 | \n", "What is the square root of 81 squared? | \n", "{'A': '9', 'B': '27', 'C': '81', 'D': '729'} | \n", "C | \n", "To find the square root of 81 squared, we first calculate 81 squar... | \n", "C | \n", "✔️ [True] | \n", "
32 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "C | \n", "To determine which scenario involves the main character doing some... | \n", "C | \n", "✔️ [True] | \n", "
33 | \n", "This question refers to the following information. \"The conscience... | \n", "{'A': 'Passage of the Pure Food and Water Act', 'B': 'Creation of ... | \n", "D | \n", "The Progressive Party Platform of 1912 emphasizes the importance o... | \n", "D | \n", "✔️ [True] | \n", "
34 | \n", "A 50-year-old nurse who had been fired from his job at ahospital t... | \n", "{'A': 'No, because sanctions are not proper against the attorney o... | \n", "B | \n", "The court is likely to grant the hospital's motion for sanctions b... | \n", "C | \n", "\n", " |
35 | \n", "According to John Money, __________ refers to a person's private i... | \n", "{'A': 'Gender identity', 'B': 'Sexual nomenclature', 'C': 'Gender ... | \n", "A | \n", "To answer this question, we need to understand the concept of gend... | \n", "A | \n", "✔️ [True] | \n", "
36 | \n", "Rounded to the nearest 10,000, the population of Louisiana was 4,5... | \n", "{'A': '4,500,321', 'B': '4,524,491', 'C': '4,533,372', 'D': '4,535... | \n", "C | \n", "To find the actual population of Louisiana in 2010, we need to con... | \n", "C | \n", "✔️ [True] | \n", "
37 | \n", "Compared with the sound you hear from the siren of a stationary fi... | \n", "{'A': 'speed', 'B': 'frequency', 'C': 'wavelength', 'D': 'All of t... | \n", "B | \n", "The question is about the Doppler effect, which describes how the ... | \n", "B | \n", "✔️ [True] | \n", "
38 | \n", "______ regulation goes further than the ______ that CSOs have with... | \n", "{'A': 'Legal, Interactions, Mechanics', 'B': 'Civil, Relations, Ou... | \n", "B | \n", "To solve this, let's break down the components of the sentence and... | \n", "C | \n", "\n", " |
39 | \n", "Aristotle saw his theory of the virtues as: | \n", "{'A': 'final and complete.', 'B': 'open-ended and revisable.', 'C'... | \n", "B | \n", "Aristotle's theory of the virtues, as outlined in his Nicomachean ... | \n", "B | \n", "✔️ [True] | \n", "
\n", " | question | \n", "options | \n", "example_answer | \n", "example_reasoning | \n", "pred_reasoning | \n", "pred_answer | \n", "metric | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "A | \n", "## Step 1: Evaluate the first scenario In the first scenario, the ... | \n", "To determine which scenario involves the main character doing some... | \n", "A | \n", "✔️ [True] | \n", "
1 | \n", "Two lists, list1 and list2, contain the names of books found in tw... | \n", "{'A': 'newList ← Combine (listl, list2)\\n newList ← Sort (newList)... | \n", "A | \n", "To create newList, which contains the names of all books found in ... | \n", "To create newList, which contains the names of all books found in ... | \n", "A | \n", "✔️ [True] | \n", "
2 | \n", "How might the recent global economic crisis be viewed as a challen... | \n", "{'A': 'Governmental cyberspace restrictions, in the form of censor... | \n", "C | \n", "## Step 1: Understanding the liberalist perspective The liberalist... | \n", "The liberalist perspective emphasizes the importance of free marke... | \n", "C | \n", "✔️ [True] | \n", "
3 | \n", "What term is used to represent unavoidable past costs that cannot ... | \n", "{'A': 'Prime costs', 'B': 'Sunk costs', 'C': 'Opportunity costs', ... | \n", "B | \n", "## Step 1: Understanding the concept of sunk costs Sunk costs are ... | \n", "To answer this question, we need to understand the concept of each... | \n", "B | \n", "✔️ [True] | \n", "
4 | \n", "Markson Co. traded a concrete-mixing truck with a book value of $1... | \n", "{'A': 'Does the book value of the asset given up exceed the fair v... | \n", "C | \n", "## Step 1: Understand the concept of commercial substance in asset... | \n", "To determine whether an exchange of assets has commercial substanc... | \n", "C | \n", "✔️ [True] | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
495 | \n", "A man is charged with murder. During the trial, defense counsel of... | \n", "{'A': 'not hearsay.', 'B': 'hearsay, but admissible as an admissio... | \n", "D | \n", "To answer this question, let's break it down step by step: ## Step... | \n", "The testimony in question involves a statement made by a man on de... | \n", "D | \n", "✔️ [True] | \n", "
496 | \n", "Two men held-up a liquor store in a city. During the robbery, one ... | \n", "{'A': 'granted, because the prosecutor is constitutionally require... | \n", "B | \n", "To answer this question, we need to consider the legal implication... | \n", "The defendant's motion to dismiss the indictment due to the delay ... | \n", "C | \n", "\n", " |
497 | \n", "Which vitamins are important in lowering circulating homocysteine ... | \n", "{'A': 'Vitamin D', 'B': 'Vitamin C', 'C': 'Vitamin A', 'D': 'Folat... | \n", "D | \n", "## Step 1: Understanding the role of vitamins in homocysteine leve... | \n", "To answer this question, we need to consider the role of vitamins ... | \n", "D | \n", "✔️ [True] | \n", "
498 | \n", "This question refers to the following information. \"The greatest c... | \n", "{'A': 'African nations will not achieve independence without unity... | \n", "D | \n", "## Step 1: Understand the context of Nkrumah's statement Nkrumah e... | \n", "To answer this question, we need to understand the context and the... | \n", "D | \n", "✔️ [True] | \n", "
499 | \n", "Millions of immigrant children who entered the United States learn... | \n", "{'A': 'Acculturation', 'B': 'Collective behavior', 'C': 'Social st... | \n", "A | \n", "## Step 1: Understanding the concept of acculturation Acculturatio... | \n", "The concept described in the question involves immigrant children ... | \n", "A | \n", "✔️ [True] | \n", "
500 rows × 7 columns
\n", "