IncomeNet-13.1k-Embeddings-log

This model is a Multi-Layer Perceptron (MLP) utilizing Categorical Embeddings to classify income levels (>50K or <=50K) from census data. It represents a more complex and robust approach compared to standard one-hot encoded models.

Model Description

  • Architecture: MLP with specialized Embedding layers for categorical features (13.1k parameters).
  • Hidden Layers: [128, 64] with ReLU activation.
  • Key Feature: Instead of simple one-hot encoding, this model learns dense representations (embeddings) for categorical variables.
  • Preprocessing: Includes Log-transformation for numerical stability and Standard Scaling.

Performance

While slightly more conservative in its raw scores compared to the base models, the Embedding-log variant offers excellent generalization:

  • Accuracy: 0.806
  • F1-Score: 0.768

Experimental Comparison

In the scatter plot below, this model is represented by the grey circle/square:

image

Superior Generalization & Training Stability

The main strength of the Embedding architecture is its stability. As seen in the evaluation loss curves, the grey line for the 13.1k-Embeddings model remains much flatter and more controlled than the base models, which tend to overfit more aggressively:

image

How to Use

This model requires the model_architecture.py file (specifically the IncomeNetEmbeddings class) and the corresponding preprocessor.pkl.

import torch
from safetensors.torch import load_model
from model_architecture import IncomeNetEmbeddings

# Initialize the architecture with 13.1k parameter settings
model = IncomeNetEmbeddings(input_dim=12, hidden_dims=[128, 64]) 
load_model(model, "model.safetensors")
model.eval()
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train fabiszn/incomeNet-13.1k-Embeddings-log