Add model
Browse files- README.md +251 -0
- config.json +36 -0
- pytorch_model.bin +3 -0
    	
        README.md
    ADDED
    
    | @@ -0,0 +1,251 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            tags:
         | 
| 3 | 
            +
            - image-classification
         | 
| 4 | 
            +
            - timm
         | 
| 5 | 
            +
            library_tag: timm
         | 
| 6 | 
            +
            license: apache-2.0
         | 
| 7 | 
            +
            datasets:
         | 
| 8 | 
            +
            - imagenet-1k
         | 
| 9 | 
            +
            - imagenet-12k
         | 
| 10 | 
            +
            ---
         | 
| 11 | 
            +
            # Model card for maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            A timm specific MaxViT (w/ a MLP Log-CPB (continuous log-coordinate relative position bias motivated by Swin-V2) image classification model. Pretrained in `timm` on ImageNet-12k (a 11821 class subset of full ImageNet-22k) and fine-tuned on ImageNet-1k by Ross Wightman.
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            ImageNet-12k pretraining and ImageNet-1k fine-tuning performed on 8x GPU [Lambda Labs](https://lambdalabs.com/) cloud instances..
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            ### Model Variants in [maxxvit.py](https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/maxxvit.py)
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            MaxxViT covers a number of related model architectures that share a common structure including:
         | 
| 20 | 
            +
            - CoAtNet - Combining MBConv (depthwise-separable) convolutional blocks in early stages with self-attention transformer blocks in later stages.
         | 
| 21 | 
            +
            - MaxViT - Uniform blocks across all stages, each containing a MBConv (depthwise-separable) convolution block followed by two self-attention blocks with different partitioning schemes (window followed by grid).
         | 
| 22 | 
            +
            - CoAtNeXt - A timm specific arch that uses ConvNeXt blocks in place of MBConv blocks in CoAtNet. All normalization layers are LayerNorm (no BatchNorm).
         | 
| 23 | 
            +
            - MaxxViT - A timm specific arch that uses ConvNeXt blocks in place of MBConv blocks in MaxViT. All normalization layers are LayerNorm (no BatchNorm).
         | 
| 24 | 
            +
            - MaxxViT-V2 - A MaxxViT variation that removes the window block attention leaving only ConvNeXt blocks and grid attention w/ more width to compensate.
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            Aside from the major variants listed above, there are more subtle changes from model to model. Any model name with the string `rw` are `timm` specific configs w/ modelling adjustments made to favour PyTorch eager use. These were created while training initial reproductions of the models so there are variations.
         | 
| 27 | 
            +
            All models with the string `tf` are models exactly matching Tensorflow based models by the original paper authors with weights ported to PyTorch. This covers a number of MaxViT models. The official CoAtNet models were never released.
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            ## Model Details
         | 
| 30 | 
            +
            - **Model Type:** Image classification / feature backbone
         | 
| 31 | 
            +
            - **Model Stats:**
         | 
| 32 | 
            +
              - Params (M): 116.1
         | 
| 33 | 
            +
              - GMACs: 71.0
         | 
| 34 | 
            +
              - Activations (M): 318.9
         | 
| 35 | 
            +
              - Image size: 384 x 384
         | 
| 36 | 
            +
            - **Papers:**
         | 
| 37 | 
            +
              - MaxViT: Multi-Axis Vision Transformer: https://arxiv.org/abs/2204.01697
         | 
| 38 | 
            +
              - Swin Transformer V2: Scaling Up Capacity and Resolution: https://arxiv.org/abs/2111.09883
         | 
| 39 | 
            +
            - **Dataset:** ImageNet-1k
         | 
| 40 | 
            +
            - **Pretrain Dataset:** ImageNet-12k
         | 
| 41 | 
            +
             | 
| 42 | 
            +
            ## Model Usage
         | 
| 43 | 
            +
            ### Image Classification
         | 
| 44 | 
            +
            ```python
         | 
| 45 | 
            +
            from urllib.request import urlopen
         | 
| 46 | 
            +
            from PIL import Image
         | 
| 47 | 
            +
            import timm
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            img = Image.open(
         | 
| 50 | 
            +
                urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
         | 
| 51 | 
            +
             | 
| 52 | 
            +
            model = timm.create_model('maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k', pretrained=True)
         | 
| 53 | 
            +
            model = model.eval()
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            # get model specific transforms (normalization, resize)
         | 
| 56 | 
            +
            data_config = timm.data.resolve_model_data_config(model)
         | 
| 57 | 
            +
            transforms = timm.data.create_transform(**data_config, is_training=False)
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
         | 
| 60 | 
            +
             | 
| 61 | 
            +
            top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
         | 
| 62 | 
            +
            ```
         | 
| 63 | 
            +
             | 
| 64 | 
            +
            ### Feature Map Extraction
         | 
| 65 | 
            +
            ```python
         | 
| 66 | 
            +
            from urllib.request import urlopen
         | 
| 67 | 
            +
            from PIL import Image
         | 
| 68 | 
            +
            import timm
         | 
| 69 | 
            +
             | 
| 70 | 
            +
            img = Image.open(
         | 
| 71 | 
            +
                urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
         | 
| 72 | 
            +
             | 
| 73 | 
            +
            model = timm.create_model(
         | 
| 74 | 
            +
                'maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k',
         | 
| 75 | 
            +
                pretrained=True,
         | 
| 76 | 
            +
                features_only=True,
         | 
| 77 | 
            +
            )
         | 
| 78 | 
            +
            model = model.eval()
         | 
| 79 | 
            +
             | 
| 80 | 
            +
            # get model specific transforms (normalization, resize)
         | 
| 81 | 
            +
            data_config = timm.data.resolve_model_data_config(model)
         | 
| 82 | 
            +
            transforms = timm.data.create_transform(**data_config, is_training=False)
         | 
| 83 | 
            +
             | 
| 84 | 
            +
            output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
         | 
| 85 | 
            +
             | 
| 86 | 
            +
            for o in output:
         | 
| 87 | 
            +
                # print shape of each feature map in output
         | 
| 88 | 
            +
                # e.g.: 
         | 
| 89 | 
            +
                #  torch.Size([1, 128, 192, 192])
         | 
| 90 | 
            +
                #  torch.Size([1, 128, 96, 96])
         | 
| 91 | 
            +
                #  torch.Size([1, 256, 48, 48])
         | 
| 92 | 
            +
                #  torch.Size([1, 512, 24, 24])
         | 
| 93 | 
            +
                #  torch.Size([1, 1024, 12, 12])
         | 
| 94 | 
            +
                print(o.shape)
         | 
| 95 | 
            +
            ```
         | 
| 96 | 
            +
             | 
| 97 | 
            +
            ### Image Embeddings
         | 
| 98 | 
            +
            ```python
         | 
| 99 | 
            +
            from urllib.request import urlopen
         | 
| 100 | 
            +
            from PIL import Image
         | 
| 101 | 
            +
            import timm
         | 
| 102 | 
            +
             | 
| 103 | 
            +
            img = Image.open(
         | 
| 104 | 
            +
                urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
         | 
| 105 | 
            +
             | 
| 106 | 
            +
            model = timm.create_model(
         | 
| 107 | 
            +
                'maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k',
         | 
| 108 | 
            +
                pretrained=True,
         | 
| 109 | 
            +
                num_classes=0,  # remove classifier nn.Linear
         | 
| 110 | 
            +
            )
         | 
| 111 | 
            +
            model = model.eval()
         | 
| 112 | 
            +
             | 
| 113 | 
            +
            # get model specific transforms (normalization, resize)
         | 
| 114 | 
            +
            data_config = timm.data.resolve_model_data_config(model)
         | 
| 115 | 
            +
            transforms = timm.data.create_transform(**data_config, is_training=False)
         | 
| 116 | 
            +
             | 
| 117 | 
            +
            output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor
         | 
| 118 | 
            +
             | 
| 119 | 
            +
            # or equivalently (without needing to set num_classes=0)
         | 
| 120 | 
            +
             | 
| 121 | 
            +
            output = model.forward_features(transforms(img).unsqueeze(0))
         | 
| 122 | 
            +
            # output is unpooled (ie.e a (batch_size, num_features, H, W) tensor
         | 
| 123 | 
            +
             | 
| 124 | 
            +
            output = model.forward_head(output, pre_logits=True)
         | 
| 125 | 
            +
            # output is (batch_size, num_features) tensor
         | 
| 126 | 
            +
            ```
         | 
| 127 | 
            +
             | 
| 128 | 
            +
            ## Model Comparison
         | 
| 129 | 
            +
            ### By Top-1
         | 
| 130 | 
            +
            |model                                                                                                                   |top1 |top5 |samples / sec  |Params (M)     |GMAC  |Act (M)|
         | 
| 131 | 
            +
            |------------------------------------------------------------------------------------------------------------------------|----:|----:|--------------:|--------------:|-----:|------:|
         | 
| 132 | 
            +
            |[maxvit_xlarge_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k)                    |88.53|98.64|          21.76|         475.77|534.14|1413.22|
         | 
| 133 | 
            +
            |[maxvit_xlarge_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k)                    |88.32|98.54|          42.53|         475.32|292.78| 668.76|
         | 
| 134 | 
            +
            |[maxvit_base_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k)                        |88.20|98.53|          50.87|         119.88|138.02| 703.99|
         | 
| 135 | 
            +
            |[maxvit_large_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k)                      |88.04|98.40|          36.42|         212.33|244.75| 942.15|
         | 
| 136 | 
            +
            |[maxvit_large_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k)                      |87.98|98.56|          71.75|         212.03|132.55| 445.84|
         | 
| 137 | 
            +
            |[maxvit_base_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k)                        |87.92|98.54|         104.71|         119.65| 73.80| 332.90|
         | 
| 138 | 
            +
            |[maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k)        |87.81|98.37|         106.55|         116.14| 70.97| 318.95|
         | 
| 139 | 
            +
            |[maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k)  |87.47|98.37|         149.49|         116.09| 72.98| 213.74|
         | 
| 140 | 
            +
            |[coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k)            |87.39|98.31|         160.80|          73.88| 47.69| 209.43|
         | 
| 141 | 
            +
            |[maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k)        |86.89|98.02|         375.86|         116.14| 23.15|  92.64|
         | 
| 142 | 
            +
            |[maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k)  |86.64|98.02|         501.03|         116.09| 24.20|  62.77|
         | 
| 143 | 
            +
            |[maxvit_base_tf_512.in1k](https://huggingface.co/timm/maxvit_base_tf_512.in1k)                                          |86.60|97.92|          50.75|         119.88|138.02| 703.99|
         | 
| 144 | 
            +
            |[coatnet_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_2_rw_224.sw_in12k_ft_in1k)                      |86.57|97.89|         631.88|          73.87| 15.09|  49.22|
         | 
| 145 | 
            +
            |[maxvit_large_tf_512.in1k](https://huggingface.co/timm/maxvit_large_tf_512.in1k)                                        |86.52|97.88|          36.04|         212.33|244.75| 942.15|
         | 
| 146 | 
            +
            |[coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k)            |86.49|97.90|         620.58|          73.88| 15.18|  54.78|
         | 
| 147 | 
            +
            |[maxvit_base_tf_384.in1k](https://huggingface.co/timm/maxvit_base_tf_384.in1k)                                          |86.29|97.80|         101.09|         119.65| 73.80| 332.90|
         | 
| 148 | 
            +
            |[maxvit_large_tf_384.in1k](https://huggingface.co/timm/maxvit_large_tf_384.in1k)                                        |86.23|97.69|          70.56|         212.03|132.55| 445.84|
         | 
| 149 | 
            +
            |[maxvit_small_tf_512.in1k](https://huggingface.co/timm/maxvit_small_tf_512.in1k)                                        |86.10|97.76|          88.63|          69.13| 67.26| 383.77|
         | 
| 150 | 
            +
            |[maxvit_tiny_tf_512.in1k](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k)                                          |85.67|97.58|         144.25|          31.05| 33.49| 257.59|
         | 
| 151 | 
            +
            |[maxvit_small_tf_384.in1k](https://huggingface.co/timm/maxvit_small_tf_384.in1k)                                        |85.54|97.46|         188.35|          69.02| 35.87| 183.65|
         | 
| 152 | 
            +
            |[maxvit_tiny_tf_384.in1k](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k)                                          |85.11|97.38|         293.46|          30.98| 17.53| 123.42|
         | 
| 153 | 
            +
            |[maxvit_large_tf_224.in1k](https://huggingface.co/timm/maxvit_large_tf_224.in1k)                                        |84.93|96.97|         247.71|         211.79| 43.68| 127.35|
         | 
| 154 | 
            +
            |[coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k)          |84.90|96.96|        1025.45|          41.72|  8.11|  40.13|
         | 
| 155 | 
            +
            |[maxvit_base_tf_224.in1k](https://huggingface.co/timm/maxvit_base_tf_224.in1k)                                          |84.85|96.99|         358.25|         119.47| 24.04|  95.01|
         | 
| 156 | 
            +
            |[maxxvit_rmlp_small_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_small_rw_256.sw_in1k)                      |84.63|97.06|         575.53|          66.01| 14.67|  58.38|
         | 
| 157 | 
            +
            |[coatnet_rmlp_2_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in1k)                              |84.61|96.74|         625.81|          73.88| 15.18|  54.78|
         | 
| 158 | 
            +
            |[maxvit_rmlp_small_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_small_rw_224.sw_in1k)                        |84.49|96.76|         693.82|          64.90| 10.75|  49.30|
         | 
| 159 | 
            +
            |[maxvit_small_tf_224.in1k](https://huggingface.co/timm/maxvit_small_tf_224.in1k)                                        |84.43|96.83|         647.96|          68.93| 11.66|  53.17|
         | 
| 160 | 
            +
            |[maxvit_rmlp_tiny_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_tiny_rw_256.sw_in1k)                          |84.23|96.78|         807.21|          29.15|  6.77|  46.92|
         | 
| 161 | 
            +
            |[coatnet_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_1_rw_224.sw_in1k)                                        |83.62|96.38|         989.59|          41.72|  8.04|  34.60|
         | 
| 162 | 
            +
            |[maxvit_tiny_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_tiny_rw_224.sw_in1k)                                    |83.50|96.50|        1100.53|          29.06|  5.11|  33.11|
         | 
| 163 | 
            +
            |[maxvit_tiny_tf_224.in1k](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k)                                          |83.41|96.59|        1004.94|          30.92|  5.60|  35.78|
         | 
| 164 | 
            +
            |[coatnet_rmlp_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw_224.sw_in1k)                              |83.36|96.45|        1093.03|          41.69|  7.85|  35.47|
         | 
| 165 | 
            +
            |[maxxvitv2_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvitv2_nano_rw_256.sw_in1k)                              |83.11|96.33|        1276.88|          23.70|  6.26|  23.05|
         | 
| 166 | 
            +
            |[maxxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_nano_rw_256.sw_in1k)                        |83.03|96.34|        1341.24|          16.78|  4.37|  26.05|
         | 
| 167 | 
            +
            |[maxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_nano_rw_256.sw_in1k)                          |82.96|96.26|        1283.24|          15.50|  4.47|  31.92|
         | 
| 168 | 
            +
            |[maxvit_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_nano_rw_256.sw_in1k)                                    |82.93|96.23|        1218.17|          15.45|  4.46|  30.28|
         | 
| 169 | 
            +
            |[coatnet_bn_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_bn_0_rw_224.sw_in1k)                                  |82.39|96.19|        1600.14|          27.44|  4.67|  22.04|
         | 
| 170 | 
            +
            |[coatnet_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_0_rw_224.sw_in1k)                                        |82.39|95.84|        1831.21|          27.44|  4.43|  18.73|
         | 
| 171 | 
            +
            |[coatnet_rmlp_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_nano_rw_224.sw_in1k)                        |82.05|95.87|        2109.09|          15.15|  2.62|  20.34|
         | 
| 172 | 
            +
            |[coatnext_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnext_nano_rw_224.sw_in1k)                                |81.95|95.92|        2525.52|          14.70|  2.47|  12.80|
         | 
| 173 | 
            +
            |[coatnet_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_nano_rw_224.sw_in1k)                                  |81.70|95.64|        2344.52|          15.14|  2.41|  15.41|
         | 
| 174 | 
            +
            |[maxvit_rmlp_pico_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_pico_rw_256.sw_in1k)                          |80.53|95.21|        1594.71|           7.52|  1.85|  24.86|
         | 
| 175 | 
            +
             | 
| 176 | 
            +
             | 
| 177 | 
            +
            ### By Throughput (samples / sec)
         | 
| 178 | 
            +
            |model                                                                                                                   |top1 |top5 |samples / sec  |Params (M)     |GMAC  |Act (M)|
         | 
| 179 | 
            +
            |------------------------------------------------------------------------------------------------------------------------|----:|----:|--------------:|--------------:|-----:|------:|
         | 
| 180 | 
            +
            |[coatnext_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnext_nano_rw_224.sw_in1k)                                |81.95|95.92|        2525.52|          14.70|  2.47|  12.80|
         | 
| 181 | 
            +
            |[coatnet_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_nano_rw_224.sw_in1k)                                  |81.70|95.64|        2344.52|          15.14|  2.41|  15.41|
         | 
| 182 | 
            +
            |[coatnet_rmlp_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_nano_rw_224.sw_in1k)                        |82.05|95.87|        2109.09|          15.15|  2.62|  20.34|
         | 
| 183 | 
            +
            |[coatnet_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_0_rw_224.sw_in1k)                                        |82.39|95.84|        1831.21|          27.44|  4.43|  18.73|
         | 
| 184 | 
            +
            |[coatnet_bn_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_bn_0_rw_224.sw_in1k)                                  |82.39|96.19|        1600.14|          27.44|  4.67|  22.04|
         | 
| 185 | 
            +
            |[maxvit_rmlp_pico_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_pico_rw_256.sw_in1k)                          |80.53|95.21|        1594.71|           7.52|  1.85|  24.86|
         | 
| 186 | 
            +
            |[maxxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_nano_rw_256.sw_in1k)                        |83.03|96.34|        1341.24|          16.78|  4.37|  26.05|
         | 
| 187 | 
            +
            |[maxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_nano_rw_256.sw_in1k)                          |82.96|96.26|        1283.24|          15.50|  4.47|  31.92|
         | 
| 188 | 
            +
            |[maxxvitv2_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvitv2_nano_rw_256.sw_in1k)                              |83.11|96.33|        1276.88|          23.70|  6.26|  23.05|
         | 
| 189 | 
            +
            |[maxvit_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_nano_rw_256.sw_in1k)                                    |82.93|96.23|        1218.17|          15.45|  4.46|  30.28|
         | 
| 190 | 
            +
            |[maxvit_tiny_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_tiny_rw_224.sw_in1k)                                    |83.50|96.50|        1100.53|          29.06|  5.11|  33.11|
         | 
| 191 | 
            +
            |[coatnet_rmlp_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw_224.sw_in1k)                              |83.36|96.45|        1093.03|          41.69|  7.85|  35.47|
         | 
| 192 | 
            +
            |[coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k)          |84.90|96.96|        1025.45|          41.72|  8.11|  40.13|
         | 
| 193 | 
            +
            |[maxvit_tiny_tf_224.in1k](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k)                                          |83.41|96.59|        1004.94|          30.92|  5.60|  35.78|
         | 
| 194 | 
            +
            |[coatnet_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_1_rw_224.sw_in1k)                                        |83.62|96.38|         989.59|          41.72|  8.04|  34.60|
         | 
| 195 | 
            +
            |[maxvit_rmlp_tiny_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_tiny_rw_256.sw_in1k)                          |84.23|96.78|         807.21|          29.15|  6.77|  46.92|
         | 
| 196 | 
            +
            |[maxvit_rmlp_small_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_small_rw_224.sw_in1k)                        |84.49|96.76|         693.82|          64.90| 10.75|  49.30|
         | 
| 197 | 
            +
            |[maxvit_small_tf_224.in1k](https://huggingface.co/timm/maxvit_small_tf_224.in1k)                                        |84.43|96.83|         647.96|          68.93| 11.66|  53.17|
         | 
| 198 | 
            +
            |[coatnet_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_2_rw_224.sw_in12k_ft_in1k)                      |86.57|97.89|         631.88|          73.87| 15.09|  49.22|
         | 
| 199 | 
            +
            |[coatnet_rmlp_2_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in1k)                              |84.61|96.74|         625.81|          73.88| 15.18|  54.78|
         | 
| 200 | 
            +
            |[coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k)            |86.49|97.90|         620.58|          73.88| 15.18|  54.78|
         | 
| 201 | 
            +
            |[maxxvit_rmlp_small_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_small_rw_256.sw_in1k)                      |84.63|97.06|         575.53|          66.01| 14.67|  58.38|
         | 
| 202 | 
            +
            |[maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k)  |86.64|98.02|         501.03|         116.09| 24.20|  62.77|
         | 
| 203 | 
            +
            |[maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k)        |86.89|98.02|         375.86|         116.14| 23.15|  92.64|
         | 
| 204 | 
            +
            |[maxvit_base_tf_224.in1k](https://huggingface.co/timm/maxvit_base_tf_224.in1k)                                          |84.85|96.99|         358.25|         119.47| 24.04|  95.01|
         | 
| 205 | 
            +
            |[maxvit_tiny_tf_384.in1k](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k)                                          |85.11|97.38|         293.46|          30.98| 17.53| 123.42|
         | 
| 206 | 
            +
            |[maxvit_large_tf_224.in1k](https://huggingface.co/timm/maxvit_large_tf_224.in1k)                                        |84.93|96.97|         247.71|         211.79| 43.68| 127.35|
         | 
| 207 | 
            +
            |[maxvit_small_tf_384.in1k](https://huggingface.co/timm/maxvit_small_tf_384.in1k)                                        |85.54|97.46|         188.35|          69.02| 35.87| 183.65|
         | 
| 208 | 
            +
            |[coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k)            |87.39|98.31|         160.80|          73.88| 47.69| 209.43|
         | 
| 209 | 
            +
            |[maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k)  |87.47|98.37|         149.49|         116.09| 72.98| 213.74|
         | 
| 210 | 
            +
            |[maxvit_tiny_tf_512.in1k](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k)                                          |85.67|97.58|         144.25|          31.05| 33.49| 257.59|
         | 
| 211 | 
            +
            |[maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k)        |87.81|98.37|         106.55|         116.14| 70.97| 318.95|
         | 
| 212 | 
            +
            |[maxvit_base_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k)                        |87.92|98.54|         104.71|         119.65| 73.80| 332.90|
         | 
| 213 | 
            +
            |[maxvit_base_tf_384.in1k](https://huggingface.co/timm/maxvit_base_tf_384.in1k)                                          |86.29|97.80|         101.09|         119.65| 73.80| 332.90|
         | 
| 214 | 
            +
            |[maxvit_small_tf_512.in1k](https://huggingface.co/timm/maxvit_small_tf_512.in1k)                                        |86.10|97.76|          88.63|          69.13| 67.26| 383.77|
         | 
| 215 | 
            +
            |[maxvit_large_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k)                      |87.98|98.56|          71.75|         212.03|132.55| 445.84|
         | 
| 216 | 
            +
            |[maxvit_large_tf_384.in1k](https://huggingface.co/timm/maxvit_large_tf_384.in1k)                                        |86.23|97.69|          70.56|         212.03|132.55| 445.84|
         | 
| 217 | 
            +
            |[maxvit_base_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k)                        |88.20|98.53|          50.87|         119.88|138.02| 703.99|
         | 
| 218 | 
            +
            |[maxvit_base_tf_512.in1k](https://huggingface.co/timm/maxvit_base_tf_512.in1k)                                          |86.60|97.92|          50.75|         119.88|138.02| 703.99|
         | 
| 219 | 
            +
            |[maxvit_xlarge_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k)                    |88.32|98.54|          42.53|         475.32|292.78| 668.76|
         | 
| 220 | 
            +
            |[maxvit_large_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k)                      |88.04|98.40|          36.42|         212.33|244.75| 942.15|
         | 
| 221 | 
            +
            |[maxvit_large_tf_512.in1k](https://huggingface.co/timm/maxvit_large_tf_512.in1k)                                        |86.52|97.88|          36.04|         212.33|244.75| 942.15|
         | 
| 222 | 
            +
            |[maxvit_xlarge_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k)                    |88.53|98.64|          21.76|         475.77|534.14|1413.22|
         | 
| 223 | 
            +
             | 
| 224 | 
            +
            ## Citation
         | 
| 225 | 
            +
            ```bibtex
         | 
| 226 | 
            +
            @misc{rw2019timm,
         | 
| 227 | 
            +
              author = {Ross Wightman},
         | 
| 228 | 
            +
              title = {PyTorch Image Models},
         | 
| 229 | 
            +
              year = {2019},
         | 
| 230 | 
            +
              publisher = {GitHub},
         | 
| 231 | 
            +
              journal = {GitHub repository},
         | 
| 232 | 
            +
              doi = {10.5281/zenodo.4414861},
         | 
| 233 | 
            +
              howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
         | 
| 234 | 
            +
            }
         | 
| 235 | 
            +
            ```
         | 
| 236 | 
            +
            ```bibtex
         | 
| 237 | 
            +
            @article{tu2022maxvit,
         | 
| 238 | 
            +
              title={MaxViT: Multi-Axis Vision Transformer},
         | 
| 239 | 
            +
              author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
         | 
| 240 | 
            +
              journal={ECCV},
         | 
| 241 | 
            +
              year={2022},
         | 
| 242 | 
            +
            }        
         | 
| 243 | 
            +
            ```
         | 
| 244 | 
            +
            ```bibtex
         | 
| 245 | 
            +
            @article{dai2021coatnet,
         | 
| 246 | 
            +
              title={CoAtNet: Marrying Convolution and Attention for All Data Sizes},
         | 
| 247 | 
            +
              author={Dai, Zihang and Liu, Hanxiao and Le, Quoc V and Tan, Mingxing},
         | 
| 248 | 
            +
              journal={arXiv preprint arXiv:2106.04803},
         | 
| 249 | 
            +
              year={2021}
         | 
| 250 | 
            +
            }
         | 
| 251 | 
            +
            ```
         | 
    	
        config.json
    ADDED
    
    | @@ -0,0 +1,36 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "architecture": "maxvit_rmlp_base_rw_384",
         | 
| 3 | 
            +
              "num_classes": 1000,
         | 
| 4 | 
            +
              "num_features": 768,
         | 
| 5 | 
            +
              "global_pool": "avg",
         | 
| 6 | 
            +
              "pretrained_cfg": {
         | 
| 7 | 
            +
                "tag": "sw_in12k_ft_in1k",
         | 
| 8 | 
            +
                "custom_load": false,
         | 
| 9 | 
            +
                "input_size": [
         | 
| 10 | 
            +
                  3,
         | 
| 11 | 
            +
                  384,
         | 
| 12 | 
            +
                  384
         | 
| 13 | 
            +
                ],
         | 
| 14 | 
            +
                "fixed_input_size": true,
         | 
| 15 | 
            +
                "interpolation": "bicubic",
         | 
| 16 | 
            +
                "crop_pct": 1.0,
         | 
| 17 | 
            +
                "crop_mode": "squash",
         | 
| 18 | 
            +
                "mean": [
         | 
| 19 | 
            +
                  0.5,
         | 
| 20 | 
            +
                  0.5,
         | 
| 21 | 
            +
                  0.5
         | 
| 22 | 
            +
                ],
         | 
| 23 | 
            +
                "std": [
         | 
| 24 | 
            +
                  0.5,
         | 
| 25 | 
            +
                  0.5,
         | 
| 26 | 
            +
                  0.5
         | 
| 27 | 
            +
                ],
         | 
| 28 | 
            +
                "num_classes": 1000,
         | 
| 29 | 
            +
                "pool_size": [
         | 
| 30 | 
            +
                  12,
         | 
| 31 | 
            +
                  12
         | 
| 32 | 
            +
                ],
         | 
| 33 | 
            +
                "first_conv": "stem.conv1",
         | 
| 34 | 
            +
                "classifier": "head.fc"
         | 
| 35 | 
            +
              }
         | 
| 36 | 
            +
            }
         | 
    	
        pytorch_model.bin
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:9100ec62813ec8d7d2f7d89008c1a3262b181dcef990b7a6e9bff2fec013ffce
         | 
| 3 | 
            +
            size 465612856
         | 

