IncomeNet-13.1k-Embeddings-log

This model is a Multi-Layer Perceptron (MLP) utilizing Categorical Embeddings to classify income levels (>50K or <=50K) from census data. It represents a more complex and robust approach compared to standard one-hot encoded models.

Model Description

Architecture: MLP with specialized Embedding layers for categorical features (13.1k parameters).
Hidden Layers: [128, 64] with ReLU activation.
Key Feature: Instead of simple one-hot encoding, this model learns dense representations (embeddings) for categorical variables.
Preprocessing: Includes Log-transformation for numerical stability and Standard Scaling.

Performance

While slightly more conservative in its raw scores compared to the base models, the Embedding-log variant offers excellent generalization:

Accuracy: 0.806
F1-Score: 0.768

Experimental Comparison

In the scatter plot below, this model is represented by the grey circle/square:

Superior Generalization & Training Stability

The main strength of the Embedding architecture is its stability. As seen in the evaluation loss curves, the grey line for the 13.1k-Embeddings model remains much flatter and more controlled than the base models, which tend to overfit more aggressively:

How to Use

This model requires the model_architecture.py file (specifically the IncomeNetEmbeddings class) and the corresponding preprocessor.pkl.

import torch
from safetensors.torch import load_model
from model_architecture import IncomeNetEmbeddings

# Initialize the architecture with 13.1k parameter settings
model = IncomeNetEmbeddings(input_dim=12, hidden_dims=[128, 64]) 
load_model(model, "model.safetensors")
model.eval()

Downloads last month: 15

fabiszn
/

incomeNet-13.1k-Embeddings-log