In the artificial intelligence (AI) domain, benchmarking a model is the process of evaluating the performance (accuracy, precision, recall, F1 score, etc.) of that model by comparing it to a standard or baseline. This can be done by using established datasets and metrics to measure the accuracy, speed, and other characteristics of the model. The results of benchmarking can be used to compare different models and to identify the strengths and weaknesses of each one. It can also be used to identify areas where a model may need further optimization or improvement.
There are many dimensions that can be used to evaluate a deep learning (DL) model. Some common ones include:
- Accuracy: This is the most common metric used to evaluate a DL model. It measures the percentage of predictions made by the model that are correct.
- Precision: This measures the proportion of positive predictions that are actually correct.
- Recall: This measures the proportion of actual positive cases that are correctly predicted by the model.
- F1 score: This is a combination of precision and recall, and is calculated as the harmonic mean of the two.
Annotation Lab 4.4.0 supports model benchmarking in two ways:
- First of all, it is possible to quickly view benchmarking data obtained for the pre-trained models by John Snow Labs team of data scientists and for the models trained within the Annotation Lab. This information is available on the Models page of the Hub, by navigating to the model of interest and by clicking on the benchmarking icon. Read more here.
- Second, it is possible to test the pre-trained models on your custom data by creating a project, adding some tasks, annotating them, and tagging them as Test. Then navigate to the new project, and from the Train page, click on the Test Configuration button. Read more here.
Training logs with confusion matrices for NER models were already available in Annotation Lab since release 4.2.0, which made it easier to understand the performance of the model and judge whether the model is underfitting or overfitting. Release 4.4.0 adds the same features for classification models. Those can be accessed on the training logs as well as by clicking on the benchmarking icon available on the model’s tiles in the Models tab of the Hub page.