Auto Fine-Tuning

Just tell us which domain you want your embeddings to excel in, and we automatically deliver a ready-to-use, fine-tuned embedding model for that domain.

API

List fine-tuned models

What is Auto Fine-Tuning?

Fine-tuning allows you to take a pre-trained model and adapt it to a specific task or domain by training it on a new dataset. In practice, finding effective training data is not straightforward for many users. Effective training requires more than just throwing raw PDFs, HTMLs into the model; and it is hard to get it right. Auto fine-tuning solves this problem by automatically generating effective training data using an advanced LLM agent pipeline; and fine-tuning the model within a ML workflow. You can think it as a combination of synthetic data generation and AutoML, so all you need to do is describe your target domain in natural language and let our system do the rest.

But does it work though?

Auto fine-tuning holds an auto-magical promise to deliver fine-tuned embeddings for any domain you want. But does it really work? This is a fairly reasonable doubt. We've tested it on a variety of domains and base models to find out. Check out the cherry-picked and lemon-picked results below.

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-en

Avg. improvement

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.505 0.532 5%

MAP

0.352 0.389 10%

MRR

0.352 0.389 10%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from tollefj/norwegian-nli-triplets

NDCG

0.852 0.867 2%

MAP

0.800 0.820 2%

MRR

0.800 0.820 2%

Synthetic data generated

Total

4648

Training

4480

Validation

168

Download synthetic data Download fine-tuned model

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-en

Avg. improvement

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.672 0.755 12%

MAP

0.567 0.675 19%

MRR

0.567 0.675 19%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from mteb/askubuntudupquestions-reranking

NDCG

0.698 0.722 3%

MAP

0.515 0.549 6%

MRR

0.666 0.712 7%

Synthetic data generated

Total

616

Training

448

Validation

168

Download synthetic data Download fine-tuned model

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-en

Avg. improvement

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.727 0.861 18%

MAP

0.640 0.814 27%

MRR

0.640 0.814 27%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from mteb/scidocs-reranking

NDCG

0.773 0.822 6%

MAP

0.575 0.651 13%

MRR

0.823 0.884 7%

Synthetic data generated

Total

616

Training

448

Validation

168

Download synthetic data Download fine-tuned model

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-zh

Avg. improvement

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.718 0.785 9%

MAP

0.629 0.717 14%

MRR

0.629 0.717 14%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from C-MTEB/CMedQAv2-reranking

NDCG

0.938 0.948 1%

MAP

0.912 0.926 2%

MRR

0.920 0.933 1%

Synthetic data generated

Total

616

Training

448

Validation

168

Download synthetic data Download fine-tuned model

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-en

Avg. improvement

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.543 0.579 7%

MAP

0.402 0.452 12%

MRR

0.402 0.452 12%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from nc33/triplet_sbert_law2 (machine-translated to dutch)

NDCG

0.904 0.948 5%

MAP

0.870 0.930 7%

MRR

0.870 0.930 7%

Synthetic data generated

Total

9128

Training

8960

Validation

168

Download synthetic data Download fine-tuned model

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-code

Avg. improvement

-4%

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.671 0.640 -5%

MAP

0.569 0.525 -8%

MRR

0.569 0.525 -8%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from mteb/stackoverflowdupquestions-reranking

NDCG

0.640 0.621 -3%

MAP

0.530 0.505 -5%

MRR

0.555 0.532 -4%

Synthetic data generated

Total

616

Training

448

Validation

168

Download synthetic data Download fine-tuned model

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-code

Avg. improvement

-4%

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.632 0.711 13%

MAP

0.517 0.622 20%

MRR

0.517 0.622 20%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from mteb/stackoverflowdupquestions-reranking

NDCG

0.640 0.619 -3%

MAP

0.530 0.504 -5%

MRR

0.555 0.525 -5%

Synthetic data generated

Total

616

Training

448

Validation

168

Download synthetic data Download fine-tuned model

Base model for fine-tuning

jinaai/jina-embeddings-v2-base-en

Avg. improvement

Domain instruction

Performance on synthetic validation set before and after fine-tuning

NDCG

0.646 0.729 13%

MAP

0.535 0.644 20%

MRR

0.535 0.644 20%

Performance on held-out test set before and after fine-tuning

Tested on 50 random samples from mteb/askubuntudupquestions-reranking

NDCG

0.645 0.650 1%

MAP

0.452 0.462 2%

MRR

0.606 0.605 -0%

Synthetic data generated

Total

616

Training

448

Validation

168

Download synthetic data Download fine-tuned model

Auto Fine-Tuning API

Get fine-tuned embeddings for any domain you want.

Describe the domain you wish to fine-tune for.

Provide a detailed description of how the fine-tuned embeddings will be used. This is essential for generating high-quality synthetic data that will improve the performance of your embeddings.

Fine-tuning domain

Use URL instead. Toggle it on means we will base on the page content of that URL to generate synthetic data for fine-tuning.

Choose a base embedding model for fine-tuning.

Please enter the email where you want to receive the download link upon completion.

Agree to the terms and begin fine-tuning by clicking the button below.

API key

Available tokens

Each new key has some free tokens for you to try out. You can top up your key at any time. Make sure to store your API key at a safe place!

FAQ

At any time, press

to open search bar