How GTO Poker Solvers Work (And Why Ours Is Different)

If you've ever wondered how tools like PioSolver, GTO+, or our own analyzer can tell you the "optimal" play in any poker situation, you're about to get answers. This article breaks down the math and algorithms that power modern poker solvers—no PhD required.

What Does "Game Theory Optimal" Actually Mean?

Game Theory Optimal (GTO) poker is a strategy that cannot be exploited by opponents. If you play perfectly GTO, the best your opponent can do is break even against you (in a heads-up scenario), no matter what adjustments they make.

This doesn't mean GTO is always the most profitable strategy—exploitative play against weak opponents makes more money. But GTO provides a strong baseline strategy and protects you from being exploited by skilled players.

The Core Algorithm: Counterfactual Regret Minimization (CFR)

Most modern poker solvers use a technique called Counterfactual Regret Minimization (CFR), developed by researchers at the University of Alberta. Here's how it works conceptually:

1. The Regret Concept

Imagine you're at a decision point in poker—say, you have AK on the button after a raise. You have three options: fold, call, or 3-bet.

The solver plays out millions of possible scenarios for each action and tracks the regret of not taking each option. Regret is essentially: "How much better would I have done if I'd chosen action X instead?"

2. Iterative Self-Play

The algorithm plays against itself repeatedly:

Generate a strategy (initially random)
Play through hands using that strategy
Calculate regret for each decision point
Update the strategy to favor actions with positive regret
Repeat millions of times

After enough iterations, the strategy converges toward Nash equilibrium—the GTO solution.

3. Simplified Example

Let's look at a toy game to understand regret:

# Simplified poker decision: Button vs BB postflop
# Button has top pair, BB has flush draw
# Pot: $100, Effective stack: $100

class SimplifiedSolver:
    def __init__(self):
        self.regret_sum = {'bet': 0, 'check': 0}
        self.strategy_sum = {'bet': 0, 'check': 0}

    def get_strategy(self):
        # Calculate strategy based on positive regrets
        normalizing_sum = sum(max(r, 0) for r in self.regret_sum.values())

        if normalizing_sum > 0:
            strategy = {
                action: max(self.regret_sum[action], 0) / normalizing_sum
                for action in self.regret_sum
            }
        else:
            # Default to uniform random
            strategy = {'bet': 0.5, 'check': 0.5}

        # Add to strategy sum for averaging
        for action in strategy:
            self.strategy_sum[action] += strategy[action]

        return strategy

    def get_average_strategy(self):
        normalizing_sum = sum(self.strategy_sum.values())
        return {
            action: self.strategy_sum[action] / normalizing_sum
            for action in self.strategy_sum
        }

    def train(self, iterations=10000):
        for i in range(iterations):
            strategy = self.get_strategy()

            # Simulate outcomes for each action
            # (simplified - real solver would traverse full game tree)
            bet_ev = self.calculate_bet_ev(strategy)
            check_ev = self.calculate_check_ev(strategy)

            # Calculate regret for each action
            # Regret = (action_value - strategy_value)
            strategy_value = (strategy['bet'] * bet_ev +
                            strategy['check'] * check_ev)

            self.regret_sum['bet'] += (bet_ev - strategy_value)
            self.regret_sum['check'] += (check_ev - strategy_value)

    def calculate_bet_ev(self, strategy):
        # Simplified EV calculation
        # Opponent folds some %, calls with better/worse hands
        fold_equity = 0.30
        called_ev = 0.45 * 200 - 0.55 * 100  # Win 45%, lose 55% when called
        return fold_equity * 100 + (1 - fold_equity) * called_ev

    def calculate_check_ev(self, strategy):
        # EV when checking
        # Sometimes opponent bets, sometimes we get to showdown
        return 0.60 * 100 + 0.40 * 50  # Simplified calculation

This toy example shows the core loop: calculate regrets, update strategy, repeat. Real solvers like PioSolver use this same principle but with:

Complete game tree traversal
Multiple betting sizes
All possible card runouts
Billions of decision points

Why Traditional Solvers Are Slow

The problem with CFR is scale. Even a simple postflop scenario has:

~1,000 possible flop textures (after card removal)
~50 unique turn cards
~50 unique river cards
Multiple bet sizes at each decision point
Entire hand ranges for each player (169 starting hands)

This creates billions of decision nodes. Traditional solvers like PioSolver handle this by:

Simplifying the game (limited bet sizes, card bucketing)
Running for hours on powerful CPUs
Requiring manual setup for each spot

How Exploit Coach Is Different

We've built our solver with a fundamentally different approach focused on speed and accessibility:

1. Pre-Computed Solutions Database

Instead of solving every spot from scratch, we've pre-computed solutions for the most common scenarios:

Standard cash game bet sizes (33%, 50%, 75%, 100%, 150%)
All common stack depths (20bb - 200bb)
Frequently played board textures

This means many queries return results instantly from our database rather than requiring minutes of computation.

2. Neural Network Approximation

For novel spots not in our database, we use a neural network trained on millions of solved poker scenarios. This network can:

Approximate GTO strategy in milliseconds
Generalize to unseen situations based on board texture patterns
Provide "good enough" solutions when perfect accuracy isn't critical

# Simplified conceptual model of our neural network approach
class GTOApproximator:
    def __init__(self):
        self.model = self.load_trained_model()

    def encode_game_state(self, hand, board, position, action_history, stack_depth):
        """Convert poker situation into neural network input features"""
        features = []

        # Hand strength features
        features.extend(self.hand_strength_features(hand, board))

        # Board texture features
        features.extend(self.board_texture_features(board))

        # Strategic features
        features.extend([
            self.position_encoding(position),
            stack_depth / 100,  # Normalize stack depth
            len(action_history) / 10,  # Action count
        ])

        # Action history encoding
        features.extend(self.encode_action_history(action_history))

        return features

    def predict_strategy(self, game_state):
        """Predict GTO strategy for this game state"""
        features = self.encode_game_state(*game_state)

        # Neural network outputs probabilities for each action
        action_probs = self.model.predict(features)

        return {
            'fold': action_probs[0],
            'call': action_probs[1],
            'raise_small': action_probs[2],
            'raise_medium': action_probs[3],
            'raise_large': action_probs[4],
        }

Our model is trained on over 50 million solved poker situations, learning the patterns that make strategies GTO without needing to traverse the full game tree every time.

3. Exploitative Overlays

Pure GTO is just the starting point. We layer on exploitative adjustments based on:

Opponent tendencies (fold to 3-bet %, aggression frequency, etc.)
Population statistics (how typical players at this stake play)
Hand history analysis (patterns in your uploaded hands)

This hybrid approach gives you:

GTO baseline to avoid being exploited
Exploitative deviations to maximize profit against weak players

The Training Process

Building our solver required:

Phase 1: Generate Training Data

Ran PioSolver and Monker Solver on 10 million+ scenarios
Covered cash games from 50NL to 1000NL
Focused on most common real-world situations

Phase 2: Neural Network Training

Trained a deep neural network to approximate solver solutions
Used gradient descent to minimize difference between network output and true GTO
Validated against held-out test scenarios

Phase 3: Hybrid System

Combined database lookups (perfect accuracy) with neural network (speed)
Added exploitative adjustment layer
Built real-time API to serve results in <500ms

Accuracy vs Speed Tradeoffs

We're transparent about our tradeoffs:

| Solver | Accuracy | Speed | Ease of Use | |--------|----------|-------|-------------| | PioSolver | 99%+ | Hours | Expert | | GTO+ | 98%+ | Minutes | Intermediate | | Exploit Coach (DB) | 99%+ | Instant | Beginner | | Exploit Coach (Neural) | 95%+ | <500ms | Beginner |

For studying standard spots deeply, traditional solvers win on accuracy. For getting quick feedback during review sessions or in-game decisions, our approach is significantly faster while maintaining high accuracy.

When To Use Each Mode

Use Database Mode (99%+ accuracy) when:

Studying standard spots (BTN vs BB, 100bb cash game)
Building foundational ranges
You need exact GTO strategy

Use Neural Network Mode (95%+ accuracy) when:

Analyzing unusual stack depths
Reviewing hands quickly
You need fast feedback on dozens of spots

Use Exploitative Mode when:

You have specific opponent reads
Playing against weak player pools
Maximizing EV is more important than remaining unexploitable

The Future of Poker Solvers

We believe the future of poker study tools is:

Real-time analysis - Get answers in seconds, not hours
Accessible to everyone - No PhD in mathematics required
Exploitative by default - Pure GTO is a baseline, not the goal
Integrated study systems - Analysis, hand history review, and training in one place

Traditional solvers will always have their place for deep research, but everyday players need something faster and more practical.

Try It Yourself

Want to see the difference? Sign up for our beta and analyze your first hand for free. Compare our results to your current solver and judge for yourself.

Questions about our approach or the math behind GTO? Join our Discord community where we discuss poker theory, share solver insights, and help each other improve.

This article represents our current technical implementation as of October 2024. We're constantly training new models and adding pre-computed solutions to improve both speed and accuracy.