File size: 7,001 Bytes

eee6498

---
language:
- "en"
pretty_name: "ModernTrajectoryNet: Transaction Embedding Classifier"
tags:
- embedding
- pytorch
- finance
- transaction-classifier
- contrastive-learning
license: "apache-2.0"
datasets:
- "HighkeyPrxneeth/BusinessTransactions"
library_name: "pytorch"
---

# ModernTrajectoryNet: Transaction Embedding Classifier

A state-of-the-art PyTorch embedding classifier trained with modern deep learning techniques for transaction categorization. The model learns to project transaction embeddings toward their target category embeddings through trajectory-based contrastive learning.

## Model Architecture

**ModernTrajectoryNet** combines several modern architectural innovations:

### Core Components

1. **RMSNorm (Root Mean Square Layer Normalization)**
   - More stable and computationally efficient than LayerNorm
   - Used in LLaMA, PaLM, and Gopher
   - Provides consistent gradient flow through deep networks

2. **SwiGLU (Swish-Gated Linear Unit)**
   - SOTA activation function for feed-forward networks
   - Outperforms GELU and ReLU in expressivity
   - Gate mechanism: `(x * sigmoid(x)) * linear(x)`

3. **SEBlock (Squeeze-and-Excitation)**
   - Channel attention mechanism
   - Allows dynamic weighting of embedding dimensions
   - Context-aware feature recalibration

4. **ModernBlock (Pre-Norm Architecture)**
   - RMSNorm → SwiGLU → SEBlock → Residual Connection
   - Incorporates layer scaling and stochastic depth (DropPath)
   - Enables training of very deep networks

### Configuration

- **Input dimension**: 768 (embedding size)
- **Hidden layers**: 12 transformer-style blocks
- **Expansion ratio**: 4x hidden dimension in SwiGLU
- **Dropout**: 0.1
- **Stochastic depth**: Linear decay across layers (0.0 → 0.1)

## Training Objective: Hybrid Trajectory Learning

The model is trained with **HybridTrajectoryLoss**, combining two objectives:

### 1. Adaptive InfoNCE (Contrastive Component)
- Learnable temperature parameter for dynamic scaling
- Contrastive loss with label smoothing (0.1)
- Ensures the model maps input embeddings close to their true target embedding
- Equation: `L_contrastive = CrossEntropy(logits / T, labels)`

### 2. Monotonic Ranking (Trajectory Component)
- Enforces **monotonically increasing similarity** through the transaction sequence
- Each step in the trajectory should have higher similarity than the previous step
- Final embedding must achieve high similarity (ideally 1.0) with target
- Margin constraint: `sim[i+1] > sim[i] + 0.01`
- Ensures the model learns the **path** to the target, not just the endpoint

### Loss Formulation

```
Total Loss = InfoNCE Loss + Monotonicity Loss
```

**Why Trajectory Learning?**
- Transactions often evolve gradually toward their correct category
- Intermediate embeddings should show progression toward the target
- This inductive bias improves generalization and interpretability

## Training Details

- **Optimizer**: AdamW with weight decay (1e-4)
- **Learning rate**: Cosine annealing from 3e-4 to 1e-6
- **Batch size**: 128
- **Gradient clipping**: 1.0
- **Epochs**: 50 with early stopping (patience=5)
- **EMA (Exponential Moving Average)**: Decay=0.99 for evaluation stability
- **Augmentation**: Input masking (p=0.15) and Gaussian noise (std=0.01) during training
- **Mixed Precision**: AMP enabled for faster training on CUDA

## Performance Metrics

The model optimizes for:
1. **Last Similarity**: Similarity of final embedding with target (Target: ≈1.0)
2. **Monotonicity Accuracy**: % of transitions with strictly increasing similarity (Target: 100%)
3. **Contrastive Accuracy**: Ability to distinguish true target from other targets in batch

## How to Load

```python
from safetensors.torch import load_file
import torch
from config import Config
from model import ModernTrajectoryNet

# Load weights
weights = load_file("model.safetensors")

# Instantiate model
config = Config()
model = ModernTrajectoryNet(config)
model.load_state_dict(weights)
model.eval()

# Use model
with torch.no_grad():
    input_embedding = torch.randn(1, 768)  # Your transaction embedding
    output_embedding = model(input_embedding)
    print(output_embedding.shape)  # [1, 768]
```

## Usage Example

```python
import torch
from torch.nn.functional import normalize

# Assuming you have transaction embeddings and category embeddings
transaction_emb = model(input_embedding)  # [B, 768]

# Compute similarity with category embeddings
category_embs = normalize(category_embeddings, p=2, dim=1)  # [N_cats, 768]
transaction_emb_norm = normalize(transaction_emb, p=2, dim=1)  # [B, 768]

similarities = torch.matmul(transaction_emb_norm, category_embs.t())  # [B, N_cats]
predicted_category = torch.argmax(similarities, dim=1)  # [B]
```

## Intended Uses

- **Transaction categorization**: Classify business transactions into merchant categories
- **Embedding refinement**: Project raw transaction embeddings to discriminative space
- **Contrastive learning**: Extract improved embeddings for downstream tasks
- **Research**: Study trajectory-based learning for sequential decision problems

## Limitations & Biases

- **Synthetic data**: Trained on synthetic transaction strings generated from Foursquare Open-Source (FSQ OS) business names and categories using `qwen2.5-4b-instruct` LLM
- **FSQ OS biases**: Inherits biases from the FSQ OS dataset (e.g., geographic coverage, business type distribution)
- **Generation artifacts**: LLM-based synthetic data may not reflect real-world transaction diversity
- **Category coverage**: Limited to categories present in FSQ OS (typically 200-500 merchant types)
- **Language**: Trained on English transaction strings; may not generalize to other languages

**Recommendation**: Validate performance on your specific transaction domain before production deployment.

## Dataset

- **Source**: Foursquare Open-Source (FSQ OS) business names and categories
- **Processing**: LLM-based synthetic transaction generation
- **Size**: ~1M synthetic transaction embeddings
- **Train/Val split**: 90% / 10%

See the [dataset](https://huggingface.co/datasets/HighkeyPrxneeth/BusinessTransactions) for more details.

## Files in This Repository

- `model.safetensors`: Model weights in HuggingFace SafeTensors format (160MB)
- `README.md`: This file
- `LICENSE`: Apache 2.0 license

## License

Apache License 2.0. See LICENSE file for details.

## Citation

If you use this model, please cite:

```bibtex
@software{transactionclassifier2024,
  title={TransactionClassifier: Embedding-based Transaction Categorization},
  author={HighkeyPrxneeth},
  year={2024},
  url={https://huggingface.co/HighkeyPrxneeth/ModernTrajectoryNet}
}
```

## Contact & Support

- **Repository**: [GitHub - TransactionClassifier](https://github.com/HighkeyPrxneeth/TransactionClassifier)
- **Issues**: Open an issue in the main project repository
- **Author**: HighkeyPrxneeth

For questions about the model architecture, training, or usage, feel free to reach out!