Compare images based on scoring

#70
by schirrmacher - opened

I tried comparing output tokens with Cosine Similarity, however the results were random.

    for path, img in images:
        emb = encoder.encode_image(img, base_size=base_size, device=device)
        embeddings.append(emb)
        print(f"  {path}: shape {emb.shape}")
    print()

    # Compute pairwise similarities
    n = len(images)
    similarity_matrix = np.zeros((n, n))

    for i in range(n):
        for j in range(n):
            if i == j:
                similarity_matrix[i, j] = 1.0 if similarity_method == 'cosine' else 0.0
            elif i < j:
                sim = compute_similarity(
                    embeddings[i],
                    embeddings[j],
                    method=similarity_method
                )
                similarity_matrix[i, j] = sim
                similarity_matrix[j, i] = sim

Is is possible to use the output of the encoder and use it for categorization tasks based on similarity scores?

image

How do you create your encoder? From code and architecture, we should be using stuff in deepencoder. It is made up of build_sam_vit_b, build_clip_l, MlpProjector. Where MlpProjector is the one that project the features that we want to use for similarity comparison.

in file: modeling_deepseekocr.py
from .deepencoder import build_sam_vit_b, build_clip_l, MlpProjector
...
global_local_features = torch.cat([local_features, global_features, self.view_seperator[None, :]], dim=0)

You should be using global_local_features for images similarity comparision.
NOTE: I didn't test this myself do let me know if i am wrong.

Sign up or log in to comment