Original Reddit post

TL;DR: I was too lazy to manually compile Excel files to compare LLM evaluations, and tools like MLFlow were too bulky. I built LightML: a zero-config, lightweight (4 dependencies) experiment tracker that works with just a few lines of code. LightML Hi! I’m an AI researcher for a private company with a solid background in ML and stats. A little while ago, I was working on optimizing a model on several different tasks. The first problem I encountered was that in order to compare different runs and models, I had to compile an Excel file by hand. That was a tedious task that I did not want to do at all. Some time passed and I started searching for tools that helped me with this, but nothing was in sight. I tried some model registries like W&B or MLFlow, but they were bulky and they are built more as model and dataset versioning tools than as a tool to compare models. So I decided to take matters into my own hands. The philosophy behind the project is that I’m VERY lazy. The requirements were 3: I wanted a tool that I could use in my evaluation scripts (that use lm_eval mostly), take the results, the model name, and model path, and it would display it in a dashboard regardless of the metric. I wanted a lightweight tool that I did not need to deploy or do complex stuff to use. Last but not least, I wanted it to work with as few dependencies as possible (in fact, the project depends on only 4 libraries). So I spoke with a friend who works as a software engineer and we came up with a simple yet effective structure to do this. And LightML was born. Using it is pretty simple and can be added to your evaluation pipeline with just a couple of lines of code: from lightml.handle import LightMLHandle handle = LightMLHandle(db=“./registry.db”, run_name=“my-eval”) handle.register_model(model_name=“my_model”, path=“path/to/model”) handle.log_model_metric(model_name=“my_model”, family=“task”, metric_name=“acc”, value=0.85) I’m using it and I also suggested it to some of my colleagues and friends that are using it as well! As of now, I released a major version on PyPI and it is available to use. There are a couple of dev versions you can try with some cool tools, like one to run statistical tests on the metrics you added to the db in order to find out if the model has really improved on the benchmark you were trying to improve! All other info is in the readme! LightML Hope you enjoy it! Thank you! submitted by /u/Logical_Delivery8331

Originally posted by u/Logical_Delivery8331 on r/ArtificialInteligence