max-vit-goliath / README.md

Update README.md

0b428c0 verified 4 months ago

5.82 kB

	---
	license: mit
	base_model:
	- timm/maxvit_tiny_tf_224.in1k
	pipeline_tag: zero-shot-classification
	datasets:
	- AbstractPhil/geometric-vocab
	---

	# The models uploaded are no longer based on max-vit so this repo is to be archived.

	The massive achievement here is the 300 kb pentachora vit that can accurately top 1 cifar 100 with 25% accuracy and top 5 at 80% accuracy is tremendous. This is a legitimate showcase and proof of concept that not only proves without a doubt that the geometry and the structural integrity will withstand large amounts of information, but that the features and CLS structure is not just semantic - but it's deterministic and repeatable.

	The internal structure no longer reflects maxvit even slightly. It's far divergent and no longer houses any of the original conceptualizations that the max-vit-goliath would curtail.

	If you were keeping up on the journey, know that I will not slow down. The next repo will contain the full manifest of the "penta-vit" and the vision of how the patches will function in an entirely new systemic capacity.

	Thank you for your time. bows head

	# Spark V2 - Non random pentas.

	The early prototype below was from purely random pentas; meaning it wasn't using the vocabulary based on checking the saved vocabulary outputs.

	The vocabulary should be uniformly matching through all of the variants.


	# Updated again - Spark has variants.

	It works boys n grills. We have a micro-sized geometric ViT model that works.

	Now lets provide that lightning that makes the Nikola architecture truly unique - baked clean into our geometric structure with our geometric attention relay.

	The current model.py contains the weights I'm training. Which makes this direct proofs for geometric structural integrity solidifying smaller structures into a much more potent shape.

	Nikola's resonant formulas will assist with this one; as it took to the geometric attention built specifically for the coil architecture. Lets see how she behaves in the coming days.

	Currently I'm going to run about 50 of these to see how she behaves with cifar100 and various settings.


	```TEXT
	Model Configuration:
	Internal dim: 100
	Vocab dim: 100
	Num classes: 100
	Crystal shape: torch.Size([100, 5, 100])
	Evaluating: 100%\|██████████\| 100/100 [00:02<00:00, 37.96it/s]

	================================================================================
	EVALUATION RESULTS
	================================================================================

	Overall Accuracy: 53.50%
	Auxiliary Head Accuracy: 52.97%

	Top 10 Classes:
	Class Acc% Conf GeoAlign CrystalNorm
	----------------------------------------------------------------------
	wardrobe 87.0 0.703 0.829 0.308
	orange 84.0 0.708 0.839 0.298
	road 84.0 0.772 0.626 0.327
	sunflower 84.0 0.749 0.756 0.260
	plain 80.0 0.692 0.763 0.306
	skyscraper 80.0 0.669 0.631 0.255
	apple 78.0 0.681 0.821 0.275
	cloud 77.0 0.725 0.758 0.267
	aquarium_fish 75.0 0.606 0.473 0.266
	chair 73.0 0.709 0.696 0.279

	Bottom 10 Classes:
	Class Acc% Conf GeoAlign CrystalNorm
	----------------------------------------------------------------------
	kangaroo 33.0 0.434 0.601 0.316
	man 33.0 0.461 0.554 0.321
	squirrel 33.0 0.479 0.538 0.274
	woman 33.0 0.399 0.576 0.289
	boy 31.0 0.465 0.573 0.299
	bus 31.0 0.526 0.694 0.298
	possum 31.0 0.486 0.619 0.284
	lizard 28.0 0.432 0.452 0.274
	crocodile 25.0 0.408 0.481 0.310
	seal 25.0 0.441 0.475 0.325

	Correlations with Accuracy:
	Geometric Alignment: 0.493
	Crystal Norm: -0.210
	Vertex Variance: -0.194
	```
	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F630cf55b15433862cfc9556f%2FG3cnwQ93wGEjgKdtrnPWe.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->



	# Updated - Spark works.

	max-vit-goliath-spark is essentially a 300k param vit that can handle nearly identical accuracy as the larger model with a shockingly robust utility of the features.

	```PYTHON
	'pentachora_spark': PentachoraConfig(
	dim=64, depth=5, heads=4, mlp_ratio=4.0,
	preserve_structure_until_layer=2,
	dropout_rate=0.0, drop_path_rate=0.0
	),
	```

	64 dim vocabulary effectively trying to carry the entire vit.
	It's using a particularly effective geometric attention.

	The output produces effective image feature representations in geomeric format.

	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F630cf55b15433862cfc9556f%2FDvJBf3cP6p2zj6P_wc7HH.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->


	```
	Final Results:
	Best Validation Accuracy: 54.15%
	Final Train Loss: 2.1262
	Final Val Loss: 3.6396
	```

	# Original post
	Currently it's only a pickled early version at about ~50% accuracy.

	This one is a 12 layer 8 head variation of max-vit-goliath that trained on geometric vocab with cifar100 using a specialized 5d format. It's WORKING - somewhat, but it's definitely nothing to phone home about yet.

	Dropout was used and I really don't like what it did to the internals. The math doesn't line up correctly and the shapes are all over the board. The next will be cleaner.

	I've included the weights in a file for posterity as this version may be abandoned, but I want to preserve the A100 80 gig time that google sliced off for me yesterday. If that was intentional thank you, if it was random then the universe wanted thsi to exist. Either way we're here now.