|
6 months ago | |
---|---|---|
llmeval | 6 months ago | |
.gitignore | 6 months ago | |
LICENSE.md | 6 months ago | |
README.md | 6 months ago |
This repository hosts a Django application called LLM Eval, designed for evaluating and benchmarking large language models against specific datasets. The app integrates clients from leading platforms, including OpenAI, Anthropic, Google, Ollama, and Anyscale.
Follow these steps to set up the application on your local machine.
Clone the repository to your local machine using the following command:
git clone <repository-url>
Create a virtual environment and activate it:
python -m venv /path/to/environment
source /path/to/environment/bin/activate # On Windows, use /path/to/environment/Scripts/activate
Install the required Python dependencies:
pip install -r requirements.txt
The default database is SQLite, but you can configure other types of connections in the configuration file located at llmeval/llmeval/settings.py. For more information on database connections, see the Django Documentation
Run the following command to apply database migrations:
python manage.py migrate
Create a superuser account to manage the application:
python manage.py createsupersuer
Start the Server App
python manage.py runserver 8000
The application will be accesible at http://localhost:8000/admin. Login using the superuser credentials. Once logged in, you will notice this menu items:
api_key
in the parameters to access its web services.top_p
, top_k
, and temperature
.parameters
attribute overrides the LLM parameters.At the moment there is an implementation for loading MedQA, PubMedQA and MMLU datasets.
python manage.py import_medqa --file=datasets/medqa/test.jsonl --target=test --dataset=medqa #will load the medqa test QA to target test
python manage.py import_mmlu --dataset=mmlu --target=test --subject=anatomy #will load the dataset MMLU covering anatomy subject into a dataset called mmlu and target test. The dataset is loaded from hugging face.
python manage.py eval_qa --session-id=16 --continue
The data loaders are locate in commons/management/commands
. At this moment there are 3 loaders: the import_medqa
importer, import_mmlu
and import_pubmedqa
. MedQA and PubMedQA are imported from local files. MMLU importer pulls the dataset from Huggingface.
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
©2024 Radu Boncea, ICI Bucharest