ModernTrajectoryNet / README.md
HighkeyPrxneeth's picture
Upload folder using huggingface_hub
eee6498 verified
---
language:
- "en"
pretty_name: "ModernTrajectoryNet: Transaction Embedding Classifier"
tags:
- embedding
- pytorch
- finance
- transaction-classifier
- contrastive-learning
license: "apache-2.0"
datasets:
- "HighkeyPrxneeth/BusinessTransactions"
library_name: "pytorch"
---
# ModernTrajectoryNet: Transaction Embedding Classifier
A state-of-the-art PyTorch embedding classifier trained with modern deep learning techniques for transaction categorization. The model learns to project transaction embeddings toward their target category embeddings through trajectory-based contrastive learning.
## Model Architecture
**ModernTrajectoryNet** combines several modern architectural innovations:
### Core Components
1. **RMSNorm (Root Mean Square Layer Normalization)**
- More stable and computationally efficient than LayerNorm
- Used in LLaMA, PaLM, and Gopher
- Provides consistent gradient flow through deep networks
2. **SwiGLU (Swish-Gated Linear Unit)**
- SOTA activation function for feed-forward networks
- Outperforms GELU and ReLU in expressivity
- Gate mechanism: `(x * sigmoid(x)) * linear(x)`
3. **SEBlock (Squeeze-and-Excitation)**
- Channel attention mechanism
- Allows dynamic weighting of embedding dimensions
- Context-aware feature recalibration
4. **ModernBlock (Pre-Norm Architecture)**
- RMSNorm → SwiGLU → SEBlock → Residual Connection
- Incorporates layer scaling and stochastic depth (DropPath)
- Enables training of very deep networks
### Configuration
- **Input dimension**: 768 (embedding size)
- **Hidden layers**: 12 transformer-style blocks
- **Expansion ratio**: 4x hidden dimension in SwiGLU
- **Dropout**: 0.1
- **Stochastic depth**: Linear decay across layers (0.0 → 0.1)
## Training Objective: Hybrid Trajectory Learning
The model is trained with **HybridTrajectoryLoss**, combining two objectives:
### 1. Adaptive InfoNCE (Contrastive Component)
- Learnable temperature parameter for dynamic scaling
- Contrastive loss with label smoothing (0.1)
- Ensures the model maps input embeddings close to their true target embedding
- Equation: `L_contrastive = CrossEntropy(logits / T, labels)`
### 2. Monotonic Ranking (Trajectory Component)
- Enforces **monotonically increasing similarity** through the transaction sequence
- Each step in the trajectory should have higher similarity than the previous step
- Final embedding must achieve high similarity (ideally 1.0) with target
- Margin constraint: `sim[i+1] > sim[i] + 0.01`
- Ensures the model learns the **path** to the target, not just the endpoint
### Loss Formulation
```
Total Loss = InfoNCE Loss + Monotonicity Loss
```
**Why Trajectory Learning?**
- Transactions often evolve gradually toward their correct category
- Intermediate embeddings should show progression toward the target
- This inductive bias improves generalization and interpretability
## Training Details
- **Optimizer**: AdamW with weight decay (1e-4)
- **Learning rate**: Cosine annealing from 3e-4 to 1e-6
- **Batch size**: 128
- **Gradient clipping**: 1.0
- **Epochs**: 50 with early stopping (patience=5)
- **EMA (Exponential Moving Average)**: Decay=0.99 for evaluation stability
- **Augmentation**: Input masking (p=0.15) and Gaussian noise (std=0.01) during training
- **Mixed Precision**: AMP enabled for faster training on CUDA
## Performance Metrics
The model optimizes for:
1. **Last Similarity**: Similarity of final embedding with target (Target: ≈1.0)
2. **Monotonicity Accuracy**: % of transitions with strictly increasing similarity (Target: 100%)
3. **Contrastive Accuracy**: Ability to distinguish true target from other targets in batch
## How to Load
```python
from safetensors.torch import load_file
import torch
from config import Config
from model import ModernTrajectoryNet
# Load weights
weights = load_file("model.safetensors")
# Instantiate model
config = Config()
model = ModernTrajectoryNet(config)
model.load_state_dict(weights)
model.eval()
# Use model
with torch.no_grad():
input_embedding = torch.randn(1, 768) # Your transaction embedding
output_embedding = model(input_embedding)
print(output_embedding.shape) # [1, 768]
```
## Usage Example
```python
import torch
from torch.nn.functional import normalize
# Assuming you have transaction embeddings and category embeddings
transaction_emb = model(input_embedding) # [B, 768]
# Compute similarity with category embeddings
category_embs = normalize(category_embeddings, p=2, dim=1) # [N_cats, 768]
transaction_emb_norm = normalize(transaction_emb, p=2, dim=1) # [B, 768]
similarities = torch.matmul(transaction_emb_norm, category_embs.t()) # [B, N_cats]
predicted_category = torch.argmax(similarities, dim=1) # [B]
```
## Intended Uses
- **Transaction categorization**: Classify business transactions into merchant categories
- **Embedding refinement**: Project raw transaction embeddings to discriminative space
- **Contrastive learning**: Extract improved embeddings for downstream tasks
- **Research**: Study trajectory-based learning for sequential decision problems
## Limitations & Biases
- **Synthetic data**: Trained on synthetic transaction strings generated from Foursquare Open-Source (FSQ OS) business names and categories using `qwen2.5-4b-instruct` LLM
- **FSQ OS biases**: Inherits biases from the FSQ OS dataset (e.g., geographic coverage, business type distribution)
- **Generation artifacts**: LLM-based synthetic data may not reflect real-world transaction diversity
- **Category coverage**: Limited to categories present in FSQ OS (typically 200-500 merchant types)
- **Language**: Trained on English transaction strings; may not generalize to other languages
**Recommendation**: Validate performance on your specific transaction domain before production deployment.
## Dataset
- **Source**: Foursquare Open-Source (FSQ OS) business names and categories
- **Processing**: LLM-based synthetic transaction generation
- **Size**: ~1M synthetic transaction embeddings
- **Train/Val split**: 90% / 10%
See the [dataset](https://huggingface.co/datasets/HighkeyPrxneeth/BusinessTransactions) for more details.
## Files in This Repository
- `model.safetensors`: Model weights in HuggingFace SafeTensors format (160MB)
- `README.md`: This file
- `LICENSE`: Apache 2.0 license
## License
Apache License 2.0. See LICENSE file for details.
## Citation
If you use this model, please cite:
```bibtex
@software{transactionclassifier2024,
title={TransactionClassifier: Embedding-based Transaction Categorization},
author={HighkeyPrxneeth},
year={2024},
url={https://huggingface.co/HighkeyPrxneeth/ModernTrajectoryNet}
}
```
## Contact & Support
- **Repository**: [GitHub - TransactionClassifier](https://github.com/HighkeyPrxneeth/TransactionClassifier)
- **Issues**: Open an issue in the main project repository
- **Author**: HighkeyPrxneeth
For questions about the model architecture, training, or usage, feel free to reach out!