{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports & Env Setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "import sys\n", "import os\n", "from dotenv import load_dotenv\n", "load_dotenv()\n", "from datasets import load_dataset\n", "\n", "\n", "import dspy\n", "sys.path.append(os.path.abspath('../'))\n", "from benchmarks import llama_mmlu_pro, leaderboard_mmlu_pro, llama_mmlu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configuration" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "NUM_THREADS = 48\n", "\n", "FEW_SHOTS = 5\n", "\n", "# See https://docs.litellm.ai/docs/providers/vllm for details\n", "TASK_MODEL = dspy.LM(\n", " \"hosted_vllm/meta-llama/Llama-3.3-70B-Instruct\",\n", " api_base = 'http://localhost:8000/v1' , # or api_base ?\n", " api_key = \"dummy\",\n", " # api_version: Optional[str] = None,\n", " # api_key: Optional[str] = None,\n", " # seed: Optional[int] = None,\n", " # max_tokens: Optional[int] = None,\n", " # timeout: Optional[Union[float, int]] = None,\n", ")\n", "PROMPT_MODEL = dspy.LM(\n", " \"hosted_vllm/meta-llama/Llama-3.3-70B-Instruct\",\n", " api_base = 'http://localhost:8000/v1', # or api_base ?\n", " api_key = \"dummy\",\n", "\n", " # api_version: Optional[str] = None,\n", " # api_key: Optional[str] = None,\n", " # seed: Optional[int] = None,\n", " # max_tokens: Optional[int] = None,\n", " # timeout: Optional[Union[float, int]] = None,\n", ")\n", "\n", "dspy.configure(lm=TASK_MODEL)\n", "\n", "# replace this with llama_mmlu_pro or whatever\n", "benchmark = llama_mmlu\n", "\n", "# Without chain of thought:\n", "# program = dspy.Predict(\n", "# benchmark.signature(\"\")\n", "# )\n", "\n", "# With chain of thought:\n", "program = dspy.ChainOfThought(\n", " benchmark.signature(\"You are a helpful assistant.\") # put your initial system prompt here, or leave blank\n", ")\n", "\n", "evaluate = dspy.Evaluate(\n", " devset=[],\n", " max_errors = 500,\n", " metric=benchmark.metric,\n", " num_threads=NUM_THREADS,\n", " display_progress=True,\n", " display_table=True,\n", " return_all_scores=True,\n", " return_outputs=True,\n", " provide_traceback=True\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load dataset" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1403, 1263, 11369)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainset, valset, testset = benchmark.datasets(\n", " train_size=0.1,\n", " validation_size=0.1,\n", ")\n", "\n", "len(trainset), len(valset), len(testset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Baseline Benchmark" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BASE PROMPT:\n", " You are a helpful assistant.\n", "CPU times: user 270 μs, sys: 7 μs, total: 277 μs\n", "Wall time: 231 μs\n" ] } ], "source": [ "%%time\n", "print(\"BASE PROMPT:\\n\", program.signature.instructions)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BEST EXAMPLES:\n", " []\n", "CPU times: user 107 μs, sys: 0 ns, total: 107 μs\n", "Wall time: 110 μs\n" ] } ], "source": [ "%%time\n", "print(\"BEST EXAMPLES:\\n\", program.demos)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting execution...\n", "Average Metric: 35.00 / 40 (87.5%): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:21<00:00, 1.84it/s]" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2025/02/03 10:43:12 INFO dspy.evaluate.evaluate: Average Metric: 35 / 40 (87.5%)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", " | question | \n", "options | \n", "example_answer | \n", "reasoning | \n", "pred_answer | \n", "metric | \n", "
---|---|---|---|---|---|---|
0 | \n", "The first hominids could be described as: | \n", "{'A': 'quadrupedal knappers.', 'B': 'quadrupedal pongids.', 'C': '... | \n", "C | \n", "The first hominids are known for their transition from a quadruped... | \n", "C | \n", "✔️ [True] | \n", "
1 | \n", "Investment demand most likely increases when | \n", "{'A': 'real GDP decreases.', 'B': 'the cost of acquiring and maint... | \n", "C | \n", "Investment demand is influenced by several factors, including expe... | \n", "C | \n", "✔️ [True] | \n", "
2 | \n", "Which of the following is a legitimate threat to societal security? | \n", "{'A': 'Political policies designed to alter demographic characteri... | \n", "B | \n", "To determine a legitimate threat to societal security, we must con... | \n", "B | \n", "✔️ [True] | \n", "
3 | \n", "Long-run aggregate supply is most likely to increase as the result of | \n", "{'A': 'an increase in the real interest rate', 'B': 'increased inv... | \n", "B | \n", "The long-run aggregate supply (LRAS) curve represents the total am... | \n", "B | \n", "✔️ [True] | \n", "
4 | \n", "As of 2013, share of people in the India who think political parti... | \n", "{'A': '26%', 'B': '46%', 'C': '66%', 'D': '86%'} | \n", "D | \n", "To answer this question, we need to consider the available data an... | \n", "D | \n", "✔️ [True] | \n", "
5 | \n", "This question refers to the following information. \"I travelled th... | \n", "{'A': 'The ability of commerce to foster cultural diffusion', 'B':... | \n", "A | \n", "The journal entry by Ibn Battuta describes the city of 'Aden as a ... | \n", "A | \n", "✔️ [True] | \n", "
6 | \n", "A woman is engaged in the retail sale of widgets throughout the Un... | \n", "{'A': \"No, because the woman's order was too indefinite to constit... | \n", "D | \n", "The woman's purchase order constituted an offer to the manufacture... | \n", "D | \n", "✔️ [True] | \n", "
7 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "D | \n", "In Scenario 1, the main character kills spiders because their frie... | \n", "D | \n", "✔️ [True] | \n", "
8 | \n", "A 67-year-old woman has had fatigue, dry skin, brittle hair, swell... | \n", "{'A': 'Chronic lymphocytic thyroiditis (Hashimoto disease)', 'B': ... | \n", "A | \n", "The patient's symptoms, such as fatigue, dry skin, brittle hair, s... | \n", "A | \n", "✔️ [True] | \n", "
9 | \n", "Weber said that the 'spirit of capitalism' could be traced back to: | \n", "{'A': 'the movement towards religious pluralism', 'B': 'inspiratio... | \n", "D | \n", "To answer this question, we need to consider the ideas of Max Webe... | \n", "D | \n", "✔️ [True] | \n", "
10 | \n", "This question refers to the following information. The history of ... | \n", "{'A': 'The Oxford Movement', 'B': 'Materialism and economic determ... | \n", "B | \n", "The Communist Manifesto, written by Karl Marx and Friedrich Engels... | \n", "B | \n", "✔️ [True] | \n", "
11 | \n", "A nongovernmental, not-for-profit organization held the following ... | \n", "{'A': '$12,700', 'B': '$13,000', 'C': '$13,800', 'D': '$14,900'} | \n", "D | \n", "To determine the amount of stock investments that should be report... | \n", "D | \n", "✔️ [True] | \n", "
12 | \n", "If you were hired by a large company to develop a new training pro... | \n", "{'A': 'needs analysis.', 'B': 'job evaluation.', 'C': 'summative e... | \n", "A | \n", "To develop an effective training program, it's crucial to understa... | \n", "A | \n", "✔️ [True] | \n", "
13 | \n", "How were the first metals worked in South America? | \n", "{'A': 'casting', 'B': 'hammering', 'C': 'smelting', 'D': 'all of t... | \n", "D | \n", "The first metals worked in South America involved various techniqu... | \n", "D | \n", "✔️ [True] | \n", "
14 | \n", "Under the Articles of Confederation, the national government had t... | \n", "{'A': 'negotiate treaties', 'B': 'collect taxes', 'C': 'establish ... | \n", "A | \n", "The Articles of Confederation, which served as the first constitut... | \n", "A | \n", "✔️ [True] | \n", "
15 | \n", "A large company has offices in two locations, one in New Jersey an... | \n", "{'A': '$22,500 ', 'B': '$23,700 ', 'C': '$25,500 ', 'D': '$27,300 '} | \n", "D | \n", "To find the mean salary paid to the office assistants in the compa... | \n", "D | \n", "✔️ [True] | \n", "
16 | \n", "Which character on the TV show 'Friends' is a chef? | \n", "{'A': 'Joey', 'B': 'Monica', 'C': 'Ross', 'D': 'Rachel'} | \n", "B | \n", "To answer this question, we need to consider the main characters o... | \n", "B | \n", "✔️ [True] | \n", "
17 | \n", "Which of the following events INITIATES puberty? | \n", "{'A': 'pituitary gland releases FSH', 'B': 'pituitary gland releas... | \n", "C | \n", "The initiation of puberty is a complex process involving the hypot... | \n", "C | \n", "✔️ [True] | \n", "
18 | \n", "Which of the boys on the TV show 'My Three Sons' is adopted? | \n", "{'A': 'Mike', 'B': 'Ernie', 'C': 'Chip', 'D': 'Robbie'} | \n", "B | \n", "The TV show 'My Three Sons' features a family with three boys. The... | \n", "B | \n", "✔️ [True] | \n", "
19 | \n", "What is the minimum value of the expression x + 4z as a function d... | \n", "{'A': '0', 'B': '-2', 'C': '-sqrt(34)', 'D': '-sqrt(35)'} | \n", "C | \n", "To find the minimum value of the expression x + 4z subject to the ... | \n", "C | \n", "✔️ [True] | \n", "
20 | \n", "What will happen to the equilibrium price and the equilibrium quan... | \n", "{'A': 'The equilibrium price will rise and the equilibrium quantit... | \n", "A | \n", "When producers of good A expect the price to be higher in the near... | \n", "C | \n", "\n", " |
21 | \n", "Construct a complete truth table for the following argument. Then,... | \n", "{'A': 'Valid', 'B': 'Invalid. Counterexample when M and O are true... | \n", "A | \n", "To determine the validity of the argument, we first need to constr... | \n", "B | \n", "\n", " |
22 | \n", "This question refers to the following information. \"If any person ... | \n", "{'A': 'rural and urban interests.', 'B': 'federal law and state la... | \n", "B | \n", "The passage describes a Pennsylvania law from 1826 that criminaliz... | \n", "B | \n", "✔️ [True] | \n", "
23 | \n", "Which of the following is not an element of the marketing mix? | \n", "{'A': 'Promotion.', 'B': 'Product.', 'C': 'Target market.', 'D': '... | \n", "C | \n", "The marketing mix, also known as the 4 Ps, consists of Product, Pr... | \n", "C | \n", "✔️ [True] | \n", "
24 | \n", "Which of the following must be done when universal screening data ... | \n", "{'A': 'Changes must be made in the delivery of the core program.',... | \n", "A | \n", "When universal screening data indicate that very few students are ... | \n", "A | \n", "✔️ [True] | \n", "
25 | \n", "A large man with red hair robbed a liquor store. Thereafter, a def... | \n", "{'A': 'admissible as a prior identification.', 'B': \"admissible, f... | \n", "B | \n", "The corrections officer's testimony is being offered to prove that... | \n", "B | \n", "✔️ [True] | \n", "
26 | \n", "Good X is exchanged in a competitive market. Which of the followin... | \n", "{'A': 'If the demand curve is perfectly elastic, the price rises b... | \n", "D | \n", "When an excise tax is imposed on the production of a good in a com... | \n", "D | \n", "✔️ [True] | \n", "
27 | \n", "Of the following compounds, which is LEAST likely to behave as a L... | \n", "{'A': 'BeCl2', 'B': 'MgCl2', 'C': 'ZnCl2', 'D': 'SCl2'} | \n", "D | \n", "To determine which of the given compounds is least likely to behav... | \n", "B | \n", "\n", " |
28 | \n", "Mr. Cleary’s class and Ms. Ntuala’s class go to use the computer l... | \n", "{'A': '2', 'B': '6', 'C': '10', 'D': '14'} | \n", "C | \n", "To find the maximum number of students who can have a computer to ... | \n", "{C} | \n", "\n", " |
29 | \n", "As of December 1, year 2, a company obtained a $1,000,000 line of ... | \n", "{'A': 'Current liabilities of $1,000,000; long-term liabilities of... | \n", "C | \n", "To determine the presentation of the company's debt in its classif... | \n", "C | \n", "✔️ [True] | \n", "
30 | \n", "Use indirect truth tables to determine whether the following argum... | \n", "{'A': 'Valid', 'B': 'Invalid. Counterexample when P, Q, R, and S a... | \n", "A | \n", "To determine the validity of the argument using indirect truth tab... | \n", "C | \n", "\n", " |
31 | \n", "Which expression represents the phrase below? 3 fewer than a numbe... | \n", "{'A': '3-p', 'B': 'p+3', 'C': '3/p', 'D': 'p-3'} | \n", "D | \n", "To represent the phrase \"3 fewer than a number, p\", we need to und... | \n", "D | \n", "✔️ [True] | \n", "
32 | \n", "The influenza virus is mainly controlled in special \"risk\" sectors... | \n", "{'A': 'Hygiene', 'B': 'Vaccination', 'C': 'Antiviral drugs', 'D': ... | \n", "B | \n", "The influenza virus can be controlled through various methods, but... | \n", "B | \n", "✔️ [True] | \n", "
33 | \n", "What size of cannula would you use in a patient who needed a rapid... | \n", "{'A': '18 gauge.', 'B': '20 gauge.', 'C': '22 gauge.', 'D': '24 ga... | \n", "A | \n", "To determine the correct size of cannula for a rapid blood transfu... | \n", "A | \n", "✔️ [True] | \n", "
34 | \n", "Which one of the following is not a characteristic of a team? | \n", "{'A': 'Minimal and formal knowledge sharing', 'B': 'Collective out... | \n", "A | \n", "To determine which one of the following is not a characteristic of... | \n", "A | \n", "✔️ [True] | \n", "
35 | \n", "When developing a plan of care relating to the management of a per... | \n", "{'A': 'physical and pharmacological needs only.', 'B': 'physical a... | \n", "C | \n", "To develop an effective plan of care for managing a person's pain,... | \n", "C | \n", "✔️ [True] | \n", "
36 | \n", "Which of the following teenagers have the highest pregnancy rates? | \n", "{'A': 'U.S.', 'B': 'Canadian', 'C': 'Swedish', 'D': 'French'} | \n", "A | \n", "To answer this question, we need to consider the available data on... | \n", "A | \n", "✔️ [True] | \n", "
37 | \n", "Based on the analysis of oxygen isotope ratios in the shells of ga... | \n", "{'A': 'It became increasingly complex due to increased rainfall an... | \n", "B | \n", "The analysis of oxygen isotope ratios in the shells of gastropods ... | \n", "B | \n", "✔️ [True] | \n", "
38 | \n", "Which is a 'rock' under the UN Convention of the Law of the Sea (U... | \n", "{'A': 'Rocks are the islands that are less than 10 square miles', ... | \n", "C | \n", "To answer this question, we need to understand the definition of a... | \n", "C | \n", "✔️ [True] | \n", "
39 | \n", "The Gravitron is a carnival ride that looks like a large cylinder.... | \n", "{'A': 'μv^2/(rg)', 'B': 'r^2v^2/(μg)', 'C': 'rg/(μv^2)', 'D': 'Non... | \n", "A | \n", "To solve this problem, we need to consider the forces acting on a ... | \n", "A | \n", "✔️ [True] | \n", "
\n", " | question | \n", "options | \n", "example_answer | \n", "reasoning | \n", "pred_answer | \n", "metric | \n", "
---|---|---|---|---|---|---|
0 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "A | \n", "To determine which scenario involves the main character doing some... | \n", "A | \n", "✔️ [True] | \n", "
1 | \n", "Which of the following is not a TV cartoon duo? | \n", "{'A': 'Chip and Dale', 'B': 'Beavis and Butt-head', 'C': 'Simon an... | \n", "C | \n", "To determine which of the following is not a TV cartoon duo, we ne... | \n", "C | \n", "✔️ [True] | \n", "
2 | \n", "Market testing studies use __________ to carry out controlled expe... | \n", "{'A': 'Test markets.', 'B': 'Data mining.', 'C': 'Analytical infor... | \n", "A | \n", "To answer this question, we need to understand the concept of mark... | \n", "A | \n", "✔️ [True] | \n", "
3 | \n", "Which of the following is the essential feature of Conduct Disorder | \n", "{'A': 'Lack of appropriate guilt or remorse', 'B': 'Major rule or ... | \n", "B | \n", "To answer this question, we need to consider the characteristics o... | \n", "B | \n", "✔️ [True] | \n", "
4 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "C | \n", "To determine which scenario involves the main character doing some... | \n", "C | \n", "✔️ [True] | \n", "
5 | \n", "An off-duty police officer was standing on a street corner waiting... | \n", "{'A': 'assault with a deadly weapon.', 'B': 'involuntary manslaugh... | \n", "B | \n", "The police officer's actions, although intended to frighten the ma... | \n", "B | \n", "✔️ [True] | \n", "
6 | \n", "There are 2,000 insurance agents licensed under the general licens... | \n", "{'A': 'constitutional, because a state license is a privilege and ... | \n", "C | \n", "The key issue here is whether the state statute revoking the insur... | \n", "C | \n", "✔️ [True] | \n", "
7 | \n", "Millions of immigrant children who entered the United States learn... | \n", "{'A': 'Acculturation', 'B': 'Collective behavior', 'C': 'Social st... | \n", "A | \n", "The concept described in the question involves immigrant children ... | \n", "A | \n", "✔️ [True] | \n", "
8 | \n", "This question refers to the following information. \"When we were k... | \n", "{'A': 'The labor union movement.', 'B': 'The civil rights movement... | \n", "D | \n", "The Port Huron Statement, as excerpted, discusses themes of social... | \n", "D | \n", "✔️ [True] | \n", "
9 | \n", "Who is the eighth-century CE female poet worshipped throughout ma... | \n", "{'A': 'Andal', 'B': 'Devi', 'C': 'Ganga', 'D': 'Kali'} | \n", "A | \n", "The question asks for an eighth-century CE female poet who is wors... | \n", "A | \n", "✔️ [True] | \n", "
10 | \n", "Clifford and Lucia Pauling, in Senior View, told us that physical ... | \n", "{'A': 'Are rapid and frightening', 'B': 'Can be offset by meditati... | \n", "C | \n", "To answer this question, we need to consider the context of physic... | \n", "C | \n", "✔️ [True] | \n", "
11 | \n", "Which statement best describes one of Dworkin's central arguments ... | \n", "{'A': 'Morality plays no role in the concept of law.', 'B': 'Moral... | \n", "D | \n", "To answer this question, we need to consider the central arguments... | \n", "D | \n", "✔️ [True] | \n", "
12 | \n", "Light that is not transmitted by opaque materials is | \n", "{'A': 'reflected or converted to internal energy in the material.'... | \n", "A | \n", "When light hits an opaque material, it does not pass through becau... | \n", "A | \n", "✔️ [True] | \n", "
13 | \n", "Which of the following was not defined by Giddens (1998) as part o... | \n", "{'A': 'the democratization of the family', 'B': 'putting an end to... | \n", "B | \n", "To answer this question, we need to consider the key components of... | \n", "B | \n", "✔️ [True] | \n", "
14 | \n", "The U.S. economy currently suffers a recessionary gap and a budget... | \n", "{'A': 'Tax increase \\xa0\\xa0\\xa0 Demand rises \\xa0\\xa0\\xa0 Falling... | \n", "C | \n", "To address a recessionary gap and a budget deficit through fiscal ... | \n", "C | \n", "✔️ [True] | \n", "
15 | \n", "A company president is concerned about the low motivation and sati... | \n", "{'A': 'ERG theory', 'B': 'expectancy theory', 'C': 'equity theory'... | \n", "D | \n", "The scenario describes a situation where a company president imple... | \n", "D | \n", "✔️ [True] | \n", "
16 | \n", "What characteristic is not representative of a type IIb muscle fib... | \n", "{'A': 'Low oxidative capacity', 'B': 'High fatigue resistance', 'C... | \n", "B | \n", "To answer this question, we need to understand the characteristics... | \n", "B | \n", "✔️ [True] | \n", "
17 | \n", "The energy for all forms of muscle contraction is provided by: | \n", "{'A': 'ATP.', 'B': 'ADP.', 'C': 'phosphocreatine.', 'D': 'oxidativ... | \n", "A | \n", "To answer this question, we need to understand the role of differe... | \n", "A | \n", "✔️ [True] | \n", "
18 | \n", "The main factor preventing subsistence economies from advancing ec... | \n", "{'A': 'a currency.', 'B': 'a well-connected transportation infrast... | \n", "D | \n", "To address this question, let's consider what subsistence economie... | \n", "B | \n", "\n", " |
19 | \n", "The primary research method used by developmental psychologists is | \n", "{'A': 'case study', 'B': 'cross-sectional research', 'C': 'natural... | \n", "B | \n", "Developmental psychologists often aim to understand how individual... | \n", "B | \n", "✔️ [True] | \n", "
20 | \n", "Kevin wants shoes and grows turnips. Lisa wants turnips and makes ... | \n", "{'A': 'Store of value', 'B': 'Unit of account', 'C': 'Medium of ex... | \n", "C | \n", "In this scenario, Kevin, Lisa, and Bob have different needs and pr... | \n", "C | \n", "✔️ [True] | \n", "
21 | \n", "The Federal Reserve implements an expansionary policy by doing whi... | \n", "{'A': 'Buying Treasury bonds in the open market', 'B': 'Raising th... | \n", "A | \n", "To answer this question, we need to understand the tools the Feder... | \n", "A | \n", "✔️ [True] | \n", "
22 | \n", "Why do political scientists identify the presidential elections of... | \n", "{'A': 'The issues at stake in those elections were more important ... | \n", "B | \n", "To answer this question, we need to understand what is meant by \"c... | \n", "B | \n", "✔️ [True] | \n", "
23 | \n", "An entity engaged an accountant to review its financial statements... | \n", "{'A': 'Withdrawn from the engagement because the entity has not be... | \n", "C | \n", "When an accountant is engaged to review financial statements in ac... | \n", "C | \n", "✔️ [True] | \n", "
24 | \n", "Which of these statements defines the Copenhagen School's view of ... | \n", "{'A': 'Security is a socially constructed concept, relative to the... | \n", "A | \n", "The Copenhagen School's view of security is rooted in the concept ... | \n", "A | \n", "✔️ [True] | \n", "
25 | \n", "An object of mass m1 experiences a linear, elastic collision with ... | \n", "{'A': 'The final speed of object 1', 'B': 'The initial speed of ob... | \n", "D | \n", "To determine the mass of the second object in a linear, elastic co... | \n", "D | \n", "✔️ [True] | \n", "
26 | \n", "This question refers to the following information. \"Thereupon it w... | \n", "{'A': 'Portuguese explorers were convinced that the route to the N... | \n", "D | \n", "The Treaty of Tordesillas, signed in 1494, was an agreement betwee... | \n", "D | \n", "✔️ [True] | \n", "
27 | \n", "When both short-run aggregate supply and aggregate demand increase... | \n", "{'A': 'The price level rises but real GDP falls.', 'B': 'Both the ... | \n", "D | \n", "When both short-run aggregate supply and aggregate demand increase... | \n", "D | \n", "✔️ [True] | \n", "
28 | \n", "This question refers to the following information. \"The challenge ... | \n", "{'A': 'the Civil Rights Act of 1964', 'B': 'Medicare and Medicaid'... | \n", "D | \n", "To answer this question, we need to analyze the given text and opt... | \n", "D | \n", "✔️ [True] | \n", "
29 | \n", "Ada Lovelace is credited with being the first person to have made ... | \n", "{'A': 'a computer program', 'B': 'a souffle', 'C': 'a brassiere', ... | \n", "A | \n", "To answer this question, we need to consider the historical contri... | \n", "A | \n", "✔️ [True] | \n", "
30 | \n", "How are new polyomaviruses detailed | \n", "{'A': 'Shot gun sequencing', 'B': 'Cultivation in human neural cel... | \n", "C | \n", "To detail new polyomaviruses, we must consider the methods used fo... | \n", "C | \n", "✔️ [True] | \n", "
31 | \n", "What is the square root of 81 squared? | \n", "{'A': '9', 'B': '27', 'C': '81', 'D': '729'} | \n", "C | \n", "To find the square root of 81 squared, we first calculate 81 squar... | \n", "C | \n", "✔️ [True] | \n", "
32 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "C | \n", "To determine which scenario involves the main character doing some... | \n", "C | \n", "✔️ [True] | \n", "
33 | \n", "This question refers to the following information. \"The conscience... | \n", "{'A': 'Passage of the Pure Food and Water Act', 'B': 'Creation of ... | \n", "D | \n", "The Progressive Party Platform of 1912 emphasizes the importance o... | \n", "D | \n", "✔️ [True] | \n", "
34 | \n", "A 50-year-old nurse who had been fired from his job at ahospital t... | \n", "{'A': 'No, because sanctions are not proper against the attorney o... | \n", "B | \n", "The court is likely to grant the hospital's motion for sanctions b... | \n", "C | \n", "\n", " |
35 | \n", "According to John Money, __________ refers to a person's private i... | \n", "{'A': 'Gender identity', 'B': 'Sexual nomenclature', 'C': 'Gender ... | \n", "A | \n", "To answer this question, we need to understand the concept of gend... | \n", "A | \n", "✔️ [True] | \n", "
36 | \n", "Rounded to the nearest 10,000, the population of Louisiana was 4,5... | \n", "{'A': '4,500,321', 'B': '4,524,491', 'C': '4,533,372', 'D': '4,535... | \n", "C | \n", "To find the actual population of Louisiana in 2010, we need to con... | \n", "C | \n", "✔️ [True] | \n", "
37 | \n", "Compared with the sound you hear from the siren of a stationary fi... | \n", "{'A': 'speed', 'B': 'frequency', 'C': 'wavelength', 'D': 'All of t... | \n", "B | \n", "The question is about the Doppler effect, which describes how the ... | \n", "B | \n", "✔️ [True] | \n", "
38 | \n", "______ regulation goes further than the ______ that CSOs have with... | \n", "{'A': 'Legal, Interactions, Mechanics', 'B': 'Civil, Relations, Ou... | \n", "B | \n", "To solve this, let's break down the components of the sentence and... | \n", "C | \n", "\n", " |
39 | \n", "Aristotle saw his theory of the virtues as: | \n", "{'A': 'final and complete.', 'B': 'open-ended and revisable.', 'C'... | \n", "B | \n", "Aristotle's theory of the virtues, as outlined in his Nicomachean ... | \n", "B | \n", "✔️ [True] | \n", "
\n", " | question | \n", "options | \n", "example_answer | \n", "example_reasoning | \n", "pred_reasoning | \n", "pred_answer | \n", "metric | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "For which of these two scenarios does the main character (who uses... | \n", "{'A': 'Wrong, Wrong', 'B': 'Wrong, Not wrong', 'C': 'Not wrong, Wr... | \n", "A | \n", "## Step 1: Evaluate the first scenario In the first scenario, the ... | \n", "To determine which scenario involves the main character doing some... | \n", "A | \n", "✔️ [True] | \n", "
1 | \n", "Two lists, list1 and list2, contain the names of books found in tw... | \n", "{'A': 'newList ← Combine (listl, list2)\\n newList ← Sort (newList)... | \n", "A | \n", "To create newList, which contains the names of all books found in ... | \n", "To create newList, which contains the names of all books found in ... | \n", "A | \n", "✔️ [True] | \n", "
2 | \n", "How might the recent global economic crisis be viewed as a challen... | \n", "{'A': 'Governmental cyberspace restrictions, in the form of censor... | \n", "C | \n", "## Step 1: Understanding the liberalist perspective The liberalist... | \n", "The liberalist perspective emphasizes the importance of free marke... | \n", "C | \n", "✔️ [True] | \n", "
3 | \n", "What term is used to represent unavoidable past costs that cannot ... | \n", "{'A': 'Prime costs', 'B': 'Sunk costs', 'C': 'Opportunity costs', ... | \n", "B | \n", "## Step 1: Understanding the concept of sunk costs Sunk costs are ... | \n", "To answer this question, we need to understand the concept of each... | \n", "B | \n", "✔️ [True] | \n", "
4 | \n", "Markson Co. traded a concrete-mixing truck with a book value of $1... | \n", "{'A': 'Does the book value of the asset given up exceed the fair v... | \n", "C | \n", "## Step 1: Understand the concept of commercial substance in asset... | \n", "To determine whether an exchange of assets has commercial substanc... | \n", "C | \n", "✔️ [True] | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
495 | \n", "A man is charged with murder. During the trial, defense counsel of... | \n", "{'A': 'not hearsay.', 'B': 'hearsay, but admissible as an admissio... | \n", "D | \n", "To answer this question, let's break it down step by step: ## Step... | \n", "The testimony in question involves a statement made by a man on de... | \n", "D | \n", "✔️ [True] | \n", "
496 | \n", "Two men held-up a liquor store in a city. During the robbery, one ... | \n", "{'A': 'granted, because the prosecutor is constitutionally require... | \n", "B | \n", "To answer this question, we need to consider the legal implication... | \n", "The defendant's motion to dismiss the indictment due to the delay ... | \n", "C | \n", "\n", " |
497 | \n", "Which vitamins are important in lowering circulating homocysteine ... | \n", "{'A': 'Vitamin D', 'B': 'Vitamin C', 'C': 'Vitamin A', 'D': 'Folat... | \n", "D | \n", "## Step 1: Understanding the role of vitamins in homocysteine leve... | \n", "To answer this question, we need to consider the role of vitamins ... | \n", "D | \n", "✔️ [True] | \n", "
498 | \n", "This question refers to the following information. \"The greatest c... | \n", "{'A': 'African nations will not achieve independence without unity... | \n", "D | \n", "## Step 1: Understand the context of Nkrumah's statement Nkrumah e... | \n", "To answer this question, we need to understand the context and the... | \n", "D | \n", "✔️ [True] | \n", "
499 | \n", "Millions of immigrant children who entered the United States learn... | \n", "{'A': 'Acculturation', 'B': 'Collective behavior', 'C': 'Social st... | \n", "A | \n", "## Step 1: Understanding the concept of acculturation Acculturatio... | \n", "The concept described in the question involves immigrant children ... | \n", "A | \n", "✔️ [True] | \n", "
500 rows × 7 columns
\n", "