Spaces:

Saivamsim26
/

mechvis

Sleeping

App Files Files Community

SaiMupparaju commited on Mar 11

Commit

03653db

0 Parent(s):

Initial commit for MechVis Hugging Face Space

Browse files

Files changed (10) hide show

.gitignore +26 -0
1_4_1_Indirect_Object_Identification_exercises.ipynb +0 -0
Dockerfile +13 -0
Procfile +1 -0
README.md +80 -0
README_HF.md +27 -0
app.py +116 -0
requirements.txt +5 -0
space.yaml +7 -0
templates/index.html +346 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,26 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# Distribution / packaging
+dist/
+build/
+*.egg-info/
+# Virtual environments
+venv/
+.env/
+# Environment variables
+.env
+# IDE files
+.vscode/
+.idea/
+# Jupyter Notebook
+.ipynb_checkpoints/
+# Miscellaneous
+.DS_Store

1_4_1_Indirect_Object_Identification_exercises.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

Dockerfile ADDED Viewed

	@@ -0,0 +1,13 @@

+FROM python:3.9
+WORKDIR /code
+COPY ./requirements.txt /code/requirements.txt
+RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
+COPY . /code
+EXPOSE 7860
+CMD ["python", "app.py"]

Procfile ADDED Viewed

	@@ -0,0 +1 @@


1	+ web: python app.py

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# MechVis: GPT-2 Attention Head Contribution Visualization
+[![Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/saivamsim26/mechvis)
+MechVis is a tool for visualizing how attention heads in GPT-2 small contribute to next token predictions. It provides a simple web interface where you can enter text, see what token the model predicts next, and visualize which attention heads contribute most to that prediction.
+This project is inspired by mechanistic interpretability research on language models, particularly studies of "indirect object identification" in GPT-2 small.
+## Features
+- Input any text prompt and see GPT-2's next token prediction
+- View a heatmap visualization of each attention head's contribution to the predicted token
+- Interactive tooltips showing exact contribution values for each head
+- Simple, clean web interface
+## Deployment on Hugging Face Spaces
+1. Create a new Space on Hugging Face:
+   - Go to https://huggingface.co/spaces
+   - Click "Create new Space"
+   - Choose "Docker" as the SDK
+   - Set the environment variables if needed
+2. Upload the following files to your Space:
+   - `app.py`
+   - `requirements.txt`
+   - `Dockerfile`
+   - Contents of `templates/` directory
+The application will automatically deploy and will be available at your Space's URL.
+## Local Development
+To run the application locally:
+```bash
+pip install -r requirements.txt
+python app.py
+```
+The application will be available at http://localhost:7860
+## How to Use
+1. Enter a text prompt in the input field
+2. Click "Predict Next Word"
+3. View the predicted token, its logit value, and probability
+4. Explore the heatmap visualization showing each attention head's contribution:
+   - Red cells indicate positive contributions to the predicted token
+   - Blue cells indicate negative contributions
+   - Hover over cells to see exact contribution values
+## Understanding the Visualization
+The visualization shows a 12×12 grid representing all attention heads in GPT-2 small, with:
+- Rows representing layers (0-11)
+- Columns representing heads within each layer (0-11)
+- Color intensity showing the magnitude of contribution
+This kind of visualization can help identify which attention heads are most important for specific prediction tasks. For example, research has shown that certain heads specialize in tasks like:
+- Name mover heads (e.g., 9.9, 10.0, 9.6)
+- Induction heads (e.g., 5.5, 6.9)
+- S-inhibition heads (e.g., 7.3, 7.9, 8.6, 8.10)
+## Example Use Cases
+1. **Indirect Object Identification**: Try entering "When John and Mary went to the store, John gave a drink to" and see which heads contribute to predicting "Mary"
+2. **Induction Pattern Detection**: Enter repetitive sequences like "The capital of France is Paris. The capital of Germany is" to see induction heads activate
+3. **Exploration**: Try various prompts to see how different heads specialize in different linguistic patterns
+## References
+- [Transformer Lens](https://github.com/neelnanda-io/TransformerLens) - Library for transformer interpretability
+- [Indirect Object Identification](https://arxiv.org/abs/2211.00593) - Research on circuits in GPT-2 small
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

README_HF.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# MechVis: GPT-2 Attention Head Visualization
+This interactive web app allows you to visualize how different attention heads in GPT-2 small contribute to next token predictions.
+## How to Use
+1. Enter text in the input field (e.g., "When John and Mary went to the store, John gave a drink to")
+2. Click "Predict Next Word"
+3. See what token GPT-2 predicts next and explore how each attention head contributes to that prediction
+## Features
+- Next token prediction with GPT-2 small
+- Interactive heatmap showing attention head contributions
+- Layer contribution analysis
+- Hover over cells to see exact contribution values
+## Examples to Try
+- **Indirect Object Identification**: "When John and Mary went to the store, John gave a drink to" (likely predicts "Mary")
+- **Induction Pattern**: "The capital of France is Paris. The capital of Germany is" (likely predicts "Berlin")
+## About
+This project uses [TransformerLens](https://github.com/neelnanda-io/TransformerLens) to access internal model activations and calculate how each attention head contributes to the final logit score of the predicted token.
+[GitHub Repository](https://github.com/saivamsim26/mechvis)

app.py ADDED Viewed

	@@ -0,0 +1,116 @@

+import torch
+import numpy as np
+from flask import Flask, render_template, request, jsonify
+from transformer_lens import HookedTransformer
+import json
+app = Flask(__name__)
+# Load GPT-2 small model
+model = HookedTransformer.from_pretrained(
+    "gpt2-small",
+    center_unembed=True,
+    center_writing_weights=True,
+    fold_ln=True,
+    refactor_factored_attn_matrices=True,
+)
+@app.route('/', methods=['GET', 'POST'])
+def index():
+    prediction = None
+    text = ""
+    head_contributions = None
+    if request.method == 'POST':
+        text = request.form.get('text', '')
+        if text:
+            # Tokenize the input text
+            tokens = model.to_tokens(text, prepend_bos=True)
+            # Run the model with cache to get intermediate activations
+            logits, cache = model.run_with_cache(tokens)
+            # Get logits for the last token
+            last_token_logits = logits[0, -1]
+            # Get the index of the token with the highest logit
+            top_token_idx = torch.argmax(last_token_logits).item()
+            # Get the logit value
+            top_token_logit = last_token_logits[top_token_idx].item()
+            # Get the probability
+            probs = torch.nn.functional.softmax(last_token_logits, dim=-1)
+            top_token_prob = probs[top_token_idx].item() * 100  # Convert to percentage
+            # Get the token as a string
+            top_token_str = model.to_string([top_token_idx])
+            # Get attention head contributions for the top token
+            head_contributions = calculate_head_contributions(cache, top_token_idx, model)
+            prediction = {
+                'token': top_token_str,
+                'logit': top_token_logit,
+                'prob': top_token_prob
+            }
+    return render_template('index.html', prediction=prediction, text=text, head_contributions=json.dumps(head_contributions) if head_contributions else None)
+def calculate_head_contributions(cache, token_idx, model):
+    """Calculate the contribution of each attention head to the top token's logit."""
+    # Get all head outputs for the last token
+    head_outputs_by_layer = []
+    contributions = []
+    layer_total_contributions = []
+    # Get the direction in the residual stream that corresponds to the token
+    token_direction = model.W_U[:, token_idx].detach()
+    # Calculate contributions for each head
+    for layer in range(model.cfg.n_layers):
+        # Get the output of each head at the last position
+        z = cache["z", layer][0, -1]  # [head, d_head]
+        # Apply the OV matrix for each head
+        head_outputs = torch.einsum("hd,hdm->hm", z, model.W_O[layer])  # [head, d_model]
+        # Project onto the token direction to get contribution to the logit
+        head_contribs = torch.einsum("hm,m->h", head_outputs, token_direction)
+        # Calculate total contribution for this layer
+        layer_total = head_contribs.sum().item()
+        layer_total_contributions.append(layer_total)
+        # Convert to list for JSON serialization
+        layer_contributions = head_contribs.detach().cpu().numpy().tolist()
+        contributions.append(layer_contributions)
+    # Calculate total contribution across all heads
+    total_contribution = sum([sum(layer_contrib) for layer_contrib in contributions])
+    # Convert contributions to percentage of total
+    percentage_contributions = []
+    for layer_contributions in contributions:
+        percentage_layer = [(contrib / total_contribution) * 100 for contrib in layer_contributions]
+        percentage_contributions.append(percentage_layer)
+    # Calculate per-layer contribution percentages
+    layer_percentages = [(layer_total / total_contribution) * 100 for layer_total in layer_total_contributions]
+    # Get the max and min values for normalization in visualization
+    all_contribs_pct = np.array(percentage_contributions).flatten()
+    max_contrib = float(np.max(all_contribs_pct))
+    min_contrib = float(np.min(all_contribs_pct))
+    return {
+        "contributions": percentage_contributions,
+        "max_value": max_contrib,
+        "min_value": min_contrib,
+        "layer_contributions": layer_percentages
+    }
+if __name__ == '__main__':
+    app.run(host="0.0.0.0", port=7860, debug=False)

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+flask==2.0.1
+torch==2.0.1
+numpy>=1.21.0
+transformer-lens==1.2.2
+gunicorn==20.1.0

space.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+title: MechVis
+emoji: 📊
+colorFrom: indigo
+colorTo: purple
+sdk: docker
+app_port: 7860
+pinned: false

templates/index.html ADDED Viewed

	@@ -0,0 +1,346 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>GPT-2 Next Word Prediction</title>
+    <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
+    <script src="https://d3js.org/d3.v7.min.js"></script>
+    <style>
+        body {
+            padding: 40px;
+            font-family: system-ui, -apple-system, sans-serif;
+        }
+        .prediction {
+            margin-top: 30px;
+            padding: 20px;
+            background-color: #f8f9fa;
+            border-radius: 5px;
+        }
+        .token {
+            font-size: 1.2rem;
+            font-weight: bold;
+            background-color: #e9ecef;
+            padding: 5px 10px;
+            border-radius: 4px;
+            display: inline-block;
+            margin-bottom: 10px;
+        }
+        #visualization {
+            margin-top: 30px;
+            width: 100%;
+            overflow-x: auto;
+        }
+        .head-cell {
+            stroke: #ddd;
+            stroke-width: 1px;
+        }
+        .layer-label, .head-label {
+            font-size: 12px;
+            font-weight: bold;
+            text-anchor: middle;
+        }
+        .tooltip {
+            position: absolute;
+            background-color: rgba(255, 255, 255, 0.9);
+            border: 1px solid #ddd;
+            padding: 8px;
+            border-radius: 4px;
+            pointer-events: none;
+            font-size: 12px;
+        }
+        .visualization-container {
+            margin-top: 30px;
+            background-color: white;
+            border-radius: 5px;
+            padding: 20px;
+            box-shadow: 0 0 10px rgba(0,0,0,0.1);
+        }
+        .legend {
+            margin-top: 15px;
+            margin-bottom: 20px;
+        }
+        .legend-item {
+            display: inline-block;
+            margin-right: 20px;
+        }
+        .legend-color {
+            display: inline-block;
+            width: 20px;
+            height: 20px;
+            margin-right: 5px;
+            vertical-align: middle;
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <h1 class="mb-4">GPT-2 Next Word Prediction</h1>
+        <div class="row">
+            <div class="col-md-12">
+                <form method="POST">
+                    <div class="mb-3">
+                        <label for="text" class="form-label">Input Text:</label>
+                        <textarea class="form-control" id="text" name="text" rows="3" placeholder="Enter text (e.g. 'When John and Mary went to the store, John gave a drink to')" required>{{ text }}</textarea>
+                    </div>
+                    <button type="submit" class="btn btn-primary">Predict Next Word</button>
+                </form>
+            </div>
+        </div>
+        {% if prediction %}
+        <div class="row">
+            <div class="col-md-12">
+                <div class="prediction">
+                    <h3>Prediction Results</h3>
+                    <p>Input text: <strong>{{ text }}</strong></p>
+                    <p>Next word: <span class="token">{{ prediction.token }}</span></p>
+                    <p>Logit value: <strong>{{ "%.4f"|format(prediction.logit) }}</strong></p>
+                    <p>Probability: <strong>{{ "%.2f"|format(prediction.prob) }}%</strong></p>
+                </div>
+            </div>
+        </div>
+        {% if head_contributions %}
+        <div class="row">
+            <div class="col-md-12">
+                <div class="visualization-container">
+                    <h3>Layer Contributions to Log Probability</h3>
+                    <p>This chart shows how each layer in GPT-2 contributes to the log probability of the token "{{ prediction.token }}" (as % of total contribution).</p>
+                    <div id="layer-chart"></div>
+                    <h3>Attention Head Contributions</h3>
+                    <p>This visualization shows how each attention head in GPT-2 contributes to the prediction of the token "{{ prediction.token }}" (as % of total contribution).</p>
+                    <div class="legend">
+                        <div class="legend-item">
+                            <div class="legend-color" style="background-color: #4575b4;"></div>
+                            <span>Negative contribution %</span>
+                        </div>
+                        <div class="legend-item">
+                            <div class="legend-color" style="background-color: #ffffbf;"></div>
+                            <span>Neutral (0%)</span>
+                        </div>
+                        <div class="legend-item">
+                            <div class="legend-color" style="background-color: #d73027;"></div>
+                            <span>Positive contribution %</span>
+                        </div>
+                    </div>
+                    <div id="visualization"></div>
+                </div>
+            </div>
+        </div>
+        {% endif %}
+        {% endif %}
+    </div>
+    <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script>
+    {% if head_contributions %}
+    <script>
+        document.addEventListener('DOMContentLoaded', function() {
+            const headContributions = {{ head_contributions|safe }};
+            // Create layer contributions bar chart
+            const createLayerChart = () => {
+                const layerContribs = headContributions.layer_contributions;
+                const margin = { top: 40, right: 30, bottom: 50, left: 60 };
+                const width = Math.min(800, window.innerWidth - 100);
+                const height = 300;
+                const svg = d3.select("#layer-chart")
+                    .append("svg")
+                    .attr("width", width)
+                    .attr("height", height);
+                const g = svg.append("g")
+                    .attr("transform", `translate(${margin.left},${margin.top})`);
+                // Create scales
+                const x = d3.scaleBand()
+                    .domain(d3.range(layerContribs.length))
+                    .range([0, width - margin.left - margin.right])
+                    .padding(0.1);
+                const y = d3.scaleLinear()
+                    .domain([
+                        Math.min(0, d3.min(layerContribs)),
+                        Math.max(0, d3.max(layerContribs))
+                    ])
+                    .nice()
+                    .range([height - margin.top - margin.bottom, 0]);
+                // Create color scale - positive is green, negative is purple
+                const colorScale = d3.scaleLinear()
+                    .domain([Math.min(0, d3.min(layerContribs)), 0, Math.max(0, d3.max(layerContribs))])
+                    .range(["#9467bd", "#f7f7f7", "#2ca02c"]);
+                // Create tooltip
+                const tooltip = d3.select("body")
+                    .append("div")
+                    .attr("class", "tooltip")
+                    .style("opacity", 0);
+                // Create bars
+                g.selectAll(".bar")
+                    .data(layerContribs)
+                    .join("rect")
+                    .attr("class", "bar")
+                    .attr("x", (d, i) => x(i))
+                    .attr("y", d => d >= 0 ? y(d) : y(0))
+                    .attr("width", x.bandwidth())
+                    .attr("height", d => Math.abs(y(0) - y(d)))
+                    .attr("fill", d => colorScale(d))
+                    .attr("stroke", "#555")
+                    .attr("stroke-width", 1)
+                    .on("mouseover", function(event, d) {
+                        d3.select(this).attr("stroke", "#000").attr("stroke-width", 2);
+                        tooltip.transition().duration(200).style("opacity", 1);
+                        tooltip.html(`Layer ${layerContribs.indexOf(d)}<br>Contribution: ${d.toFixed(2)}%`)
+                            .style("left", (event.pageX + 10) + "px")
+                            .style("top", (event.pageY - 28) + "px");
+                    })
+                    .on("mouseout", function() {
+                        d3.select(this).attr("stroke", "#555").attr("stroke-width", 1);
+                        tooltip.transition().duration(500).style("opacity", 0);
+                    });
+                // Add x-axis
+                g.append("g")
+                    .attr("transform", `translate(0,${y(0)})`)
+                    .call(d3.axisBottom(x).tickFormat(i => `L${i}`))
+                    .selectAll("text")
+                    .style("font-size", "12px");
+                // Add y-axis
+                g.append("g")
+                    .call(d3.axisLeft(y).tickFormat(d => `${d.toFixed(1)}%`))
+                    .selectAll("text")
+                    .style("font-size", "12px");
+                // Add title
+                svg.append("text")
+                    .attr("x", width / 2)
+                    .attr("y", 20)
+                    .attr("text-anchor", "middle")
+                    .style("font-size", "16px")
+                    .style("font-weight", "bold")
+                    .text("Layer Contributions to Log Probability (%)");
+                // Add x-axis label
+                svg.append("text")
+                    .attr("x", width / 2)
+                    .attr("y", height - 10)
+                    .attr("text-anchor", "middle")
+                    .style("font-size", "14px")
+                    .text("Layer");
+                // Add y-axis label
+                svg.append("text")
+                    .attr("transform", "rotate(-90)")
+                    .attr("x", -(height / 2))
+                    .attr("y", 15)
+                    .attr("text-anchor", "middle")
+                    .style("font-size", "14px")
+                    .text("Contribution %");
+            };
+            // Create head contributions heatmap
+            const createHeadHeatmap = () => {
+                // Define visualization parameters
+                const cellSize = 40;
+                const numLayers = headContributions.contributions.length;
+                const numHeads = headContributions.contributions[0].length;
+                const margin = { top: 60, right: 20, bottom: 20, left: 60 };
+                const width = cellSize * numHeads + margin.left + margin.right;
+                const height = cellSize * numLayers + margin.top + margin.bottom;
+                // Create SVG
+                const svg = d3.select("#visualization")
+                    .append("svg")
+                    .attr("width", width)
+                    .attr("height", height);
+                // Create a group for the heatmap
+                const g = svg.append("g")
+                    .attr("transform", `translate(${margin.left},${margin.top})`);
+                // Create color scale
+                const colorScale = d3.scaleSequential(d3.interpolateRdBu)
+                    .domain([headContributions.max_value, headContributions.min_value]);
+                // Create tooltip
+                const tooltip = d3.select("body")
+                    .append("div")
+                    .attr("class", "tooltip")
+                    .style("opacity", 0);
+                // Create cells
+                for (let layer = 0; layer < numLayers; layer++) {
+                    for (let head = 0; head < numHeads; head++) {
+                        const contribution = headContributions.contributions[layer][head];
+                        g.append("rect")
+                            .attr("class", "head-cell")
+                            .attr("x", head * cellSize)
+                            .attr("y", layer * cellSize)
+                            .attr("width", cellSize)
+                            .attr("height", cellSize)
+                            .attr("fill", colorScale(contribution))
+                            .on("mouseover", function(event) {
+                                d3.select(this).attr("stroke", "#000").attr("stroke-width", 2);
+                                tooltip.transition().duration(200).style("opacity", 1);
+                                tooltip.html(`Layer ${layer}, Head ${head}<br>Contribution: ${contribution.toFixed(2)}%`)
+                                    .style("left", (event.pageX + 10) + "px")
+                                    .style("top", (event.pageY - 28) + "px");
+                            })
+                            .on("mouseout", function() {
+                                d3.select(this).attr("stroke", "#ddd").attr("stroke-width", 1);
+                                tooltip.transition().duration(500).style("opacity", 0);
+                            });
+                    }
+                }
+                // Add layer labels
+                for (let layer = 0; layer < numLayers; layer++) {
+                    g.append("text")
+                        .attr("class", "layer-label")
+                        .attr("x", -10)
+                        .attr("y", layer * cellSize + cellSize / 2)
+                        .attr("text-anchor", "end")
+                        .attr("dominant-baseline", "middle")
+                        .text(`L${layer}`);
+                }
+                // Add head labels
+                for (let head = 0; head < numHeads; head++) {
+                    g.append("text")
+                        .attr("class", "head-label")
+                        .attr("x", head * cellSize + cellSize / 2)
+                        .attr("y", -10)
+                        .attr("text-anchor", "middle")
+                        .attr("dominant-baseline", "central")
+                        .text(`H${head}`);
+                }
+                // Add title
+                svg.append("text")
+                    .attr("x", width / 2)
+                    .attr("y", 20)
+                    .attr("text-anchor", "middle")
+                    .style("font-size", "16px")
+                    .style("font-weight", "bold")
+                    .text("Head Contributions to Log Probability (%)");
+            };
+            // Create both visualizations
+            createLayerChart();
+            createHeadHeatmap();
+        });
+    </script>
+    {% endif %}
+</body>
+</html>