Building an AI Trading Agent That Learns From Its Own Mistakes

2026-02-13 · 37 min read · 134 views

I've been building a personal AI agent that manages my life — calendar, finances, journaling, the works. Last week I wrote about the architecture behind it. This week I pointed it at something considerably more dangerous: the stock market.

Not "here's a moving average crossover, buy when line goes up" algo trading. I'm building something different — an agent that reasons about whether to trade at all, adapts its parameters based on what's working, and learns from every mistake it makes.

The starting capital: ₹20,000. The goal: beat the 15% CAGR you'd get from a Nifty index fund. The real goal: build a system that gets smarter over time instead of slowly becoming obsolete.

Full transparency: This system doesn't exist yet. What follows is the architecture, the plan, and the early implementation. I'm sharing the design before I have performance numbers because the approach matters more than the results. This is experimental software applied to markets — it will probably fail in interesting ways.


Why Agentic Trading, Not Algo Trading

Traditional algorithmic trading is basically an if-else tree with a brokerage API attached. RSI drops below 30? Buy. MACD crosses above signal? Buy. Price hits stop loss? Sell. The rules are fixed at deploy time and they never change until a human intervenes.

This works. Plenty of quantitative funds run on static rule systems and do fine. But it has a fundamental problem: markets change, and static rules don't.

In 2023-2024, mean reversion strategies crushed it on Indian mid-caps. The market was mostly range-bound, stocks would overshoot and snap back, textbook mean reversion. Then Nifty started trending hard in one direction and mean reversion got destroyed — buying "oversold" stocks that just kept going down.

A traditional algo doesn't know the market regime changed. It keeps applying the same rules and hemorrhaging money until a human notices and pulls the plug.

An agentic system is different. The AI doesn't just execute rules — it reasons about context:

The agent doesn't just execute. It thinks about whether to execute. That's the difference between algo trading and agentic trading.

Here's the mental model I use:

Context Reason Signal Evaluate Execute Learn Adapt
Algo Trading:
  Signal → Execute → Log

Agentic Trading:
  Context → Reason → Signal → Evaluate → (Maybe) Execute → Log → Learn → Adapt

The "Maybe" in the middle is doing a lot of work. Some of my best trades are the ones the agent decided not to take.


The Strategy Stack

I'm not running a single strategy. I'm running a layered stack where each layer can override or modify the layers below it. Think of it like middleware in a web framework — each layer transforms the signal before it reaches execution.

500 Raw Signals Layer 1: Signal Generation RSI Signals 500 stocks scanned ~50 filter Layer 2: Regime Detection Trending / Ranging / Volatile VIX, ADX, Breadth ~15 adjust Layer 3: Sentiment Overlay News, Events Global Cues ~8 size Layer 4: Position Sizing Kelly Criterion Confidence-adjusted 3-5 Actionable Trades
Click to expand

Layer 1: Signal Generation — Mean Reversion RSI

The base strategy is straightforward mean reversion using RSI (Relative Strength Index). When a stock's RSI drops below a threshold, it's "oversold" and likely to bounce. When it rises above a threshold, it's "overbought" and likely to pull back.

I backtested this across 200 Nifty stocks over 3 years of daily data using vectorbt. The results were promising but need heavy skepticism:

Metric Value
Sharpe Ratio 2.29
Win Rate 78%
Max Drawdown -5.3%
Avg Holding Period 3.2 days
Profit Factor 2.8
Backtest caveat: These numbers are suspiciously good, and I know it. Backtests assume perfect fills at close prices with zero slippage and no market impact. I'm expecting live performance to degrade by 30-50%. A Sharpe of 2.29 in backtest might be 1.2-1.5 live. A 78% win rate might be 60-65%. That's still tradeable — but it's a very different risk profile than the table above suggests.

The core signal generation looks like this:

import pandas as pd
from ta.momentum import RSIIndicator

def generate_signals(df, rsi_period=14, oversold=30, overbought=70):
    """
    Generate mean reversion signals from OHLCV data.
    Returns a DataFrame with signal column: 1 (buy), -1 (sell), 0 (hold)
    """
    rsi = RSIIndicator(close=df['close'], window=rsi_period)
    df['rsi'] = rsi.rsi()

    df['signal'] = 0
    df.loc[df['rsi'] < oversold, 'signal'] = 1      # Oversold → Buy
    df.loc[df['rsi'] > overbought, 'signal'] = -1   # Overbought → Sell

    # Confirmation: require RSI to have been declining for 3+ days
    df['rsi_declining'] = df['rsi'].diff().rolling(3).apply(
        lambda x: all(v < 0 for v in x)
    )
    df.loc[(df['signal'] == 1) & (df['rsi_declining'] != 1), 'signal'] = 0

    return df

A 78% backtest win rate sounds great, right? It is — in a spreadsheet. But backtests are liars. They don't account for slippage, they assume perfect fills at the close price, and they can't model the impact of your own orders on illiquid stocks. More on this later.

The key insight from the backtest wasn't the win rate — it was when the strategy failed. Losses clustered heavily in trending markets. During the October 2023 correction, almost every mean reversion buy signal failed because "oversold" stocks kept getting more oversold. The strategy's average loss during trending periods was 2.8x its average loss during ranging periods.

This is exactly what Layer 2 is designed to catch.

Layer 2: Market Regime Detection

Before the agent even looks at individual stock signals, it classifies the current market regime. Is the market trending, ranging, or volatile? Different regimes call for different strategies — or no strategy at all.

The regime classifier uses three indicators:

ADX (Average Directional Index): Measures trend strength. ADX > 25 = trending market. ADX < 20 = ranging market. Between 20-25 = ambiguous.

India VIX: The fear gauge. VIX > 20 = elevated volatility. VIX > 25 = high stress, probably not a great time for mean reversion. VIX < 15 = complacency, mean reversion thrives.

Market Breadth: What percentage of Nifty 500 stocks are above their 50-day moving average? Breadth > 60% with rising trend = strong bullish trend. Breadth < 40% and falling = bearish trend. Breadth between 40-60% = mixed, probably ranging.

def classify_regime(vix, adx, breadth_pct):
    """
    Classify market regime. Returns one of:
    'trending_up', 'trending_down', 'ranging', 'volatile', 'unknown'
    """
    if vix > 25:
        return 'volatile'

    if adx > 25:
        if breadth_pct > 60:
            return 'trending_up'
        elif breadth_pct < 40:
            return 'trending_down'

    if adx < 20 and 15 < vix < 22:
        return 'ranging'

    return 'unknown'


# Strategy mapping per regime
REGIME_STRATEGY = {
    'ranging':      {'strategy': 'mean_reversion', 'confidence_boost': 0.1},
    'trending_up':  {'strategy': 'momentum',       'confidence_boost': 0.0},
    'trending_down':{'strategy': 'none',           'confidence_boost': -0.3},
    'volatile':     {'strategy': 'none',           'confidence_boost': -0.5},
    'unknown':      {'strategy': 'mean_reversion', 'confidence_boost': -0.1},
}

In a ranging market, mean reversion gets a confidence boost. In a trending or volatile market, it gets penalized or disabled entirely. The agent doesn't stubbornly apply the same strategy regardless of context — it adapts.

Layer 3: Sentiment Overlay

Numbers don't exist in a vacuum. A stock might look oversold on RSI, but if the company just reported terrible earnings, it's oversold for a reason.

The sentiment layer pulls from multiple sources:

import feedparser
from datetime import datetime, timedelta

# RSS feeds for market news
FEEDS = [
    'https://economictimes.indiatimes.com/markets/rssfeeds/1977021501.cms',
    'https://www.moneycontrol.com/rss/marketreports.xml',
    'https://www.livemint.com/rss/markets',
]

# Event calendar — hard no-trade days
NO_TRADE_EVENTS = [
    'union_budget',
    'rbi_policy',
    'monthly_expiry',  # Last Thursday of every month
    'quarterly_results',  # For stocks in portfolio
]

def check_event_risk(date, holdings):
    """
    Returns list of active risk events.
    Agent should reduce position sizes or skip trading.
    """
    events = []

    # Check for RBI policy (bi-monthly, pre-scheduled)
    if is_rbi_policy_day(date):
        events.append('rbi_policy')

    # Check for monthly F&O expiry
    if is_last_thursday(date):
        events.append('monthly_expiry')

    # Check earnings calendar for held stocks
    for stock in holdings:
        if has_earnings_upcoming(stock, within_days=2):
            events.append(f'earnings_{stock}')

    return events

When the sentiment layer detects a high-risk event, it doesn't just reduce confidence — it can veto trades entirely. Budget day? We're closed. RBI policy day? Sitting on hands. Stock reporting earnings tomorrow? Not opening a new position.

This is conservative, and it means missing some good trades. I'm okay with that. The goal isn't to catch every opportunity — it's to avoid catastrophic losses while the system is learning.

Layer 4: Position Sizing — Kelly Criterion

Once a signal passes through regime detection and sentiment filtering, the agent needs to decide how much to bet. This is where the Kelly criterion comes in.

The classic Kelly formula gives you the optimal bet size to maximize long-term growth:

f* = (bp - q) / b

where:
  f* = fraction of capital to bet
  b  = odds (avg win / avg loss)
  p  = probability of winning
  q  = probability of losing (1 - p)

With our backtest numbers (78% win rate, 2.8 profit factor), full Kelly suggests betting about 64% of capital per trade. That's still way too aggressive for real-world trading — it assumes your edge estimate is perfectly accurate, which it never is.

I use quarter-Kelly as the baseline and then adjust based on the agent's confidence score:

def position_size(capital, win_rate, profit_factor, confidence):
    """
    Kelly-based position sizing, adjusted for confidence.
    Returns position size in rupees.
    """
    b = profit_factor  # avg_win / avg_loss
    p = win_rate
    q = 1 - p

    kelly_fraction = (b * p - q) / b
    quarter_kelly = kelly_fraction * 0.25

    # Scale by confidence (0.0 to 1.0)
    adjusted = quarter_kelly * confidence

    # Hard cap at 20% of capital
    position = min(capital * adjusted, capital * 0.20)

    # Round to nearest lot-friendly amount
    return max(round(position, -2), 0)  # Minimum ₹0, round to ₹100


# Example:
# capital = 20000
# win_rate = 0.64, profit_factor = 2.1, confidence = 0.8
# → kelly = 0.36, quarter_kelly = 0.09, adjusted = 0.07
# → position = ₹1,400

High conviction setup in a favorable regime? Maybe ₹3,000-4,000 (15-20% of capital). Low conviction setup in an ambiguous regime? ₹1,000-1,500 (5-7%). The position sizing is doing risk management work before the trade is even placed.


The Daily Rhythm

The agent doesn't run continuously. Markets have structure, and the agent's schedule mirrors it. Everything runs on OpenClaw's cron system:

Market Open 9:15 Market Close 3:30 8:30 AM Pre-Market 9:00 AM Signals 9:15 AM Execute 11:00 AM Mid-Check 2:30 PM Pre-Close 3:30 PM Post-Market 8:00 PM Evening Analysis Analysis Execution Review
Click to expand
# OpenClaw cron configuration (server runs UTC, IST = UTC+5:30)
schedules:
  pre_market_scan:
    cron: "0 3 * * 1-5"      # 8:30 AM IST (3:00 UTC)
    task: "Run pre-market scan: overnight gaps, global cues, news"

  strategy_activation:
    cron: "30 3 * * 1-5"     # 9:00 AM IST (3:30 UTC)
    task: "Generate signals, rank by conviction, propose trades"

  market_open_monitor:
    cron: "45 3 * * 1-5"     # 9:15 AM IST (3:45 UTC)
    task: "Execute approved trades, monitor 15-min opening candle"

  mid_morning_check:
    cron: "30 5 * * 1-5"     # 11:00 AM IST (5:30 UTC)
    task: "Check stops, scan for new setups"

  pre_close_scan:
    cron: "0 9 * * 1-5"      # 2:30 PM IST (9:00 UTC)
    task: "Exit decisions, overnight hold analysis"

  post_market:
    cron: "0 10 * * 1-5"     # 3:30 PM IST (10:00 UTC)
    task: "Log trades, calculate P&L, update performance DB"

  evening_analysis:
    cron: "30 14 * * 1-5"    # 8:00 PM IST (14:30 UTC)
    task: "Daily review, parameter tuning proposals"

Let me walk through a typical day.

8:30 AM — Pre-Market Scan

The agent wakes up and checks the overnight picture. US markets closed at what level? SGX Nifty futures are indicating what open? Any major news overnight?

It pulls data from yfinance for S&P 500 and Dow futures, checks RSS feeds for breaking news, and looks at the Gift Nifty (SGX Nifty replacement) for the expected opening level.

This isn't about generating trades yet. It's about context. If US markets crashed 3% overnight, the agent knows Indian markets will gap down and adjusts its expectations accordingly. Mean reversion signals at 9:15 AM on a gap-down day are different from mean reversion signals on a flat open — the "reversion" target is a moving target.

The output is a pre-market briefing that gets saved and used as context for the rest of the day:

{
  "date": "2026-02-13",
  "pre_market": {
    "gift_nifty": 23145,
    "gift_nifty_change_pct": -0.3,
    "sp500_close_pct": -0.1,
    "vix_india": 14.2,
    "regime": "ranging",
    "events": [],
    "news_sentiment": "neutral",
    "gap_direction": "slightly_negative",
    "recommendation": "Normal trading day. Mean reversion active."
  }
}

9:00 AM — Strategy Activation

Now the agent runs signal generation across its watchlist. For a ₹20K account, I'm keeping the universe small — Nifty 50 stocks only. Liquid, tight spreads, no impact cost issues. As the account grows, the universe expands.

The agent screens all 50 stocks, applies the RSI signal generator, filters through regime detection and sentiment overlay, sizes positions using Kelly, and ranks signals by confidence score.

Most days, this produces 0-3 actionable signals. Many days it produces zero, and that's fine. The agent isn't trying to trade every day — it's trying to trade when conditions are favorable.

Signals that pass all filters get sent to Telegram for approval:

📊 TRADE SIGNAL — BUY

Stock:      HDFCBANK
Direction:  LONG
Entry:      ₹1,642.30
Stop Loss:  ₹1,608.50 (ATR-based, -2.1%)
Target:     ₹1,698.00 (+3.4%)
Risk:Reward: 1:1.6

Position:   ₹3,200 (16% of capital)
Confidence: 0.82

Regime:     Ranging (ADX: 17.3, VIX: 14.2)
Signal:     RSI at 27.4 after 4 days of decline
Breadth:    52% above 50-DMA (neutral)

Reasoning:  HDFCBANK RSI hit 27.4 — lowest in 3 months.
Stock has mean-reverted from similar levels 4 out of last
5 times in ranging markets. No earnings within 2 weeks.
Sector (banking) breadth is neutral. Quarter-Kelly sizing
at 82% confidence gives ₹3,200 position.

[✅ Approve]  [❌ Reject]

This is the critical part: the agent doesn't execute on its own. It proposes and explains. I read the reasoning, check if it makes sense, and tap approve or reject. If I don't respond within 5 minutes during market hours, the signal expires. No auto-execution.

9:15 AM — Market Open

If I approved a trade before market open, the agent waits for the opening 15-minute candle before executing. Why? Because the first 15 minutes are chaos. Prices gap around, spreads widen, volume is noisy. The opening candle tells you whether the market is confirming or rejecting the overnight direction.

If the approved trade still looks good after the first candle (stock didn't gap up through our entry, stop loss isn't already breached), the agent places the order via Kite Connect:

from kiteconnect import KiteConnect

def execute_trade(kite, signal):
    """
    Place an order via Kite Connect with a bracket of stop loss.
    Uses CNC (delivery) — no intraday leverage.
    """
    # Calculate quantity from position size and price
    qty = int(signal['position_size'] / signal['entry_price'])

    if qty == 0:
        return {'status': 'skipped', 'reason': 'position_too_small'}

    # Place primary buy order
    order_id = kite.place_order(
        variety=kite.VARIETY_REGULAR,
        exchange=kite.EXCHANGE_NSE,
        tradingsymbol=signal['symbol'],
        transaction_type=kite.TRANSACTION_TYPE_BUY,
        quantity=qty,
        product=kite.PRODUCT_CNC,     # Delivery only, no leverage
        order_type=kite.ORDER_TYPE_LIMIT,
        price=signal['entry_price'],
        validity=kite.VALIDITY_DAY,
    )

    # Set GTT (Good Till Triggered) for stop loss
    kite.place_gtt(
        trigger_type=kite.GTT_TYPE_SINGLE,
        tradingsymbol=signal['symbol'],
        exchange=kite.EXCHANGE_NSE,
        trigger_values=[signal['stop_loss']],
        last_price=signal['entry_price'],
        orders=[{
            'transaction_type': kite.TRANSACTION_TYPE_SELL,
            'quantity': qty,
            'product': kite.PRODUCT_CNC,
            'order_type': kite.ORDER_TYPE_LIMIT,
            'price': signal['stop_loss'],
        }]
    )

    return {'status': 'placed', 'order_id': order_id, 'quantity': qty}

I use CNC (Cash and Carry / delivery) exclusively. No intraday leverage, no margin trading. If a trade goes against me, I lose the notional loss on the position, not 5x the notional loss because I was leveraged. With a ₹20K account, this is non-negotiable.

11:00 AM — Mid-Morning Check

The agent checks: did any stop losses trigger? Are there new setups forming that weren't visible at 9 AM? Has the regime shifted intraday (VIX spiking, etc.)?

This is usually a quiet check. Most days there's nothing to do. The agent confirms positions are healthy and goes back to sleep.

2:30 PM — Pre-Close Scan

An hour before market close, the agent makes exit decisions. Should it hold positions overnight or close them?

The decision framework is simple:

Remember: CNC only, so holding overnight doesn't cost anything extra. The risk is just the overnight gap risk, which the agent evaluates based on global market conditions and the next day's event calendar.

3:30 PM — Post-Market

After market close, the agent logs everything. Every position, every fill, every P&L. This is the raw material for the learning loop.

import json
from datetime import datetime

def log_trade(trade, filepath='trades.jsonl'):
    """Append trade record to JSONL log."""
    record = {
        'timestamp': datetime.now().isoformat(),
        'date': trade['date'],
        'symbol': trade['symbol'],
        'direction': trade['direction'],
        'entry_price': trade['entry_price'],
        'exit_price': trade['exit_price'],
        'quantity': trade['quantity'],
        'pnl': trade['pnl'],
        'pnl_pct': trade['pnl_pct'],
        'holding_days': trade['holding_days'],

        # Strategy metadata
        'strategy': trade['strategy'],
        'signal_confidence': trade['confidence'],
        'regime_at_entry': trade['regime'],
        'rsi_at_entry': trade['rsi'],
        'vix_at_entry': trade['vix'],
        'adx_at_entry': trade['adx'],

        # Outcome analysis
        'hit_target': trade['hit_target'],
        'hit_stop': trade['hit_stop'],
        'max_adverse_excursion': trade['mae'],  # Worst drawdown during trade
        'max_favorable_excursion': trade['mfe'],  # Best unrealized profit

        # Human metadata
        'approved_by': 'human',
        'approval_time_seconds': trade['approval_latency'],
        'entry_reasoning': trade['reasoning'],
        'exit_reasoning': trade['exit_reason'],
    }

    with open(filepath, 'a') as f:
        f.write(json.dumps(record) + '\n')

Every field matters. The regime_at_entry lets us analyze which regimes produce winners. The max_adverse_excursion tells us if our stop losses are too tight (getting stopped out before the trade works) or too loose (taking unnecessary pain). The approval_latency even tracks how long I took to approve — did I hesitate? Hesitation might correlate with worse outcomes.

8:00 PM — Evening Analysis

This is my favorite part. The agent reviews the day and writes a brief analysis:

The evening analysis gets saved alongside the daily memory file. Over time, this builds up a corpus of market observations that the agent can reference.


The Learning Loop

This is the section that matters most. Everything else — signal generation, regime detection, position sizing — that's table stakes. Plenty of algo trading systems do those things. What makes an agentic system different is that it learns.

Strategy Fitness Score 0-100 Execute Trade Log Everything Weekly Review Identify Patterns Tune Parameters Backtest Changes Paper Trade A/B Test
Click to expand

Trade Logging and Pattern Recognition

Every trade gets logged in JSONL with extensive metadata (as shown above). After accumulating enough trades, the agent periodically reviews its own performance:

import pandas as pd

def weekly_review(trades_path='trades.jsonl'):
    """
    Analyze recent trades and identify performance patterns.
    Returns actionable insights for parameter tuning.
    """
    df = pd.read_json(trades_path, lines=True)
    recent = df[df['date'] >= (pd.Timestamp.now() - pd.Timedelta(weeks=4))]

    insights = []

    # Win rate by regime
    regime_stats = recent.groupby('regime_at_entry').agg(
        trades=('pnl', 'count'),
        win_rate=('pnl', lambda x: (x > 0).mean()),
        avg_pnl=('pnl_pct', 'mean'),
    ).round(3)

    for regime, row in regime_stats.iterrows():
        if row['win_rate'] < 0.55 and row['trades'] >= 5:
            insights.append({
                'type': 'regime_underperformance',
                'regime': regime,
                'win_rate': row['win_rate'],
                'suggestion': f"Mean reversion underperforming in {regime} "
                             f"regime ({row['win_rate']:.0%} win rate). "
                             f"Consider reducing confidence weight.",
            })

    # RSI threshold analysis
    winners = recent[recent['pnl'] > 0]
    losers = recent[recent['pnl'] <= 0]

    if len(winners) > 0 and len(losers) > 0:
        avg_rsi_winners = winners['rsi_at_entry'].mean()
        avg_rsi_losers = losers['rsi_at_entry'].mean()

        if avg_rsi_losers > avg_rsi_winners + 3:
            insights.append({
                'type': 'rsi_threshold',
                'detail': f"Winners entered at avg RSI {avg_rsi_winners:.1f}, "
                         f"losers at {avg_rsi_losers:.1f}. Consider "
                         f"lowering oversold threshold.",
            })

    # Stop loss analysis
    stopped_out = recent[recent['hit_stop'] == True]
    if len(stopped_out) > 3:
        recovered = stopped_out[stopped_out['mfe'] > abs(stopped_out['pnl_pct'])]
        if len(recovered) / len(stopped_out) > 0.4:
            insights.append({
                'type': 'stop_too_tight',
                'detail': f"{len(recovered)}/{len(stopped_out)} stopped-out "
                         f"trades would have been profitable. Stop losses "
                         f"may be too tight.",
            })

    return insights

The agent looks for specific patterns:

Regime-dependent performance: "Mean reversion has a 82% win rate in ranging markets but only 48% in trending markets." The response isn't to abandon mean reversion — it's to weight regime detection more heavily. If the market is trending, reduce the confidence multiplier for mean reversion signals.

RSI threshold drift: Maybe the optimal oversold level isn't 30 — maybe it's 25 for the current market environment. The agent compares the RSI at entry for winners vs losers and proposes threshold adjustments.

Stop loss calibration: If 40% of stopped-out trades would have eventually hit the target, the stops are too tight. If losing trades regularly blow through stops for much larger losses, the stops are too loose. The agent proposes ATR multiplier adjustments.

Parameter Evolution

The agent identifies problems but parameter tuning is constrained to prevent overfitting:

def propose_parameter_change(current_params, insights):
    """
    Proposes parameter changes within strict bounds to prevent overfitting.
    Changes are NEVER applied directly — they go to paper trading first.
    """
    proposals = []

    # Allowed parameter adjustments (prevents wild swings)
    ADJUSTMENT_LIMITS = {
        'rsi_oversold': {'min': 20, 'max': 35, 'step': 2},
        'rsi_overbought': {'min': 65, 'max': 80, 'step': 2},
        'atr_stop_multiplier': {'min': 1.5, 'max': 3.0, 'step': 0.2},
    }

    for insight in insights:
        if insight['type'] == 'rsi_threshold':
            current = current_params['rsi_oversold']
            limits = ADJUSTMENT_LIMITS['rsi_oversold']
            proposed = max(limits['min'], current - limits['step'])

            if proposed != current:
                proposals.append({
                    'parameter': 'rsi_oversold',
                    'current': current,
                    'proposed': proposed,
                    'reasoning': insight['detail'],
                    'test_method': 'backtest_recent_60d + paper_trade_2w',
                    'human_review_required': True,
                })

    return proposals

The process is:

  1. Identify — weekly review surfaces an issue
  2. Propose — agent suggests a specific parameter change
  3. Backtest — test the change against the last 60 days of data
  4. Paper trade — if backtest looks good, run the new params on paper alongside the live params for 2 weeks
  5. Compare — did the new params actually outperform?
  6. Promote — if yes, swap the live params (with human approval)

This is deliberately slow. A parameter change takes a minimum of 2-3 weeks to go from proposal to production. I don't want the agent chasing noise — a few bad trades shouldn't trigger parameter changes. Only persistent patterns should drive adaptation.

A/B Testing Strategies

The paper trading alongside live trading is key. It's easy to backtest a parameter change and see that it "would have" performed better. It's much harder to validate that in real-time.

def evaluate_ab_test(live_trades, paper_trades, min_trades=20):
    """
    Compare live params vs paper (proposed) params.
    Requires minimum trade count for statistical relevance.
    """
    if len(paper_trades) < min_trades:
        return {'status': 'insufficient_data', 'paper_count': len(paper_trades)}

    live_sharpe = calculate_sharpe(live_trades)
    paper_sharpe = calculate_sharpe(paper_trades)

    live_win_rate = (live_trades['pnl'] > 0).mean()
    paper_win_rate = (paper_trades['pnl'] > 0).mean()

    return {
        'status': 'complete',
        'live_sharpe': live_sharpe,
        'paper_sharpe': paper_sharpe,
        'sharpe_improvement': paper_sharpe - live_sharpe,
        'live_win_rate': live_win_rate,
        'paper_win_rate': paper_win_rate,
        'recommendation': 'promote' if paper_sharpe > live_sharpe * 1.1 else 'keep_current',
    }

I require the paper params to beat live params by at least 10% on Sharpe ratio before promoting. Small improvements might just be noise.

Strategy Fitness Score

Every strategy gets a rolling "fitness" score based on recent performance:

def strategy_fitness(trades, lookback_days=30):
    """
    Calculate strategy fitness score (0-100).
    Below 40 → triggers review.
    Below 20 → pauses strategy.
    """
    recent = trades[trades['date'] >= lookback_cutoff(lookback_days)]

    if len(recent) < 5:
        return 50  # Not enough data, assume neutral

    win_rate = (recent['pnl'] > 0).mean()
    avg_return = recent['pnl_pct'].mean()
    max_dd = calculate_max_drawdown(recent)
    consistency = recent['pnl_pct'].std()

    # Weighted scoring
    score = (
        win_rate * 40 +                           # 40 points for win rate
        min(avg_return * 1000, 30) +               # 30 points for returns
        max(0, 20 - abs(max_dd) * 200) +           # 20 points for drawdown
        max(0, 10 - consistency * 50)              # 10 points for consistency
    )

    return max(0, min(100, score))

If the fitness score drops below 40, the agent flags it in the evening analysis: "Mean reversion fitness has degraded to 35. Win rate is 52% over the last 30 days, down from 78% backtest baseline. Recommend reviewing regime filter weights."

Below 20, the strategy pauses automatically. No more signals until a human reviews and either adjusts parameters or re-enables it.

This is the closest thing to "learning from mistakes." The agent doesn't just blindly keep trading a failing strategy. It notices when something isn't working, reduces exposure, and proposes changes. It's not AGI-level adaptation, but it's a hell of a lot better than a static algo that trades until someone pulls the plug.


Risk Management: Hard Rules, Not Soft

The #1 Rule

Risk management isn't a suggestion — it's a set of hard-coded rules that the agent cannot override regardless of how confident it is. These aren't parameters that get tuned. They're circuit breakers.

Position Size Limits

RISK_RULES = {
    'max_position_pct': 0.20,       # Max 20% of capital per trade
    'max_daily_loss_pct': 0.03,     # Max 3% daily loss → stop trading
    'max_drawdown_pct': 0.10,       # Max 10% drawdown → pause everything
    'max_open_positions': 3,         # Max 3 simultaneous positions
    'min_position_size': 500,        # Don't bother with positions under ₹500
    'max_sector_exposure': 0.30,     # Max 30% of capital in one sector
    'max_single_stock': 0.20,        # Max 20% in any single stock
    'correlation_limit': 0.7,        # No two positions with >0.7 correlation
}

Max 20% per trade: With ₹20K capital, that's ₹4,000 max per position. Even a total loss on one position won't wipe more than a fifth of the account.

Max 3% daily loss: If the account loses ₹600 in a day, trading stops until tomorrow. This catches correlated losses — if multiple positions are failing simultaneously, something systemic is wrong and the agent shouldn't keep trading.

Max 10% drawdown: If the account drops from ₹20,000 to ₹18,000, everything pauses. All positions are closed, all pending orders are cancelled, and I get an alert on Telegram. Trading only resumes after I manually review and restart. This is the nuclear option and it should never trigger under normal conditions.

No-Trade Days

NO_TRADE_DAYS = [
    'union_budget',           # February
    'rbi_monetary_policy',    # Bi-monthly
    'monthly_fo_expiry',      # Last Thursday
    'election_results',       # As applicable
]

def is_tradeable_day(date, holdings):
    """Check if today is safe to trade."""
    for event in get_events(date):
        if event in NO_TRADE_DAYS:
            return False, f"No-trade day: {event}"

    # Check earnings for held stocks
    for stock in holdings:
        if earnings_within_days(stock, date, days=1):
            return False, f"Earnings day for held stock: {stock}"

    return True, "Clear to trade"

These days have outsized moves that mean reversion can't handle. Budget day can move the Nifty 3-5% in either direction. RBI policy can gap banking stocks 2-3%. Monthly expiry has weird option-driven price action. The expected value of trading these days is negative for a mean reversion strategy, so we don't.

ATR-Based Stop Losses

Every position gets a stop loss based on ATR (Average True Range), not a fixed percentage:

def calculate_stop_loss(entry_price, atr, multiplier=2.0):
    """
    ATR-based stop loss adapts to the stock's actual volatility.
    A volatile stock gets a wider stop. A stable stock gets a tighter one.
    """
    stop_distance = atr * multiplier
    stop_loss = entry_price - stop_distance

    return round(stop_loss, 2)

# Example:
# HDFCBANK: entry ₹1,642, ATR ₹24, stop = 1,642 - 48 = ₹1,594 (-2.9%)
# TATAMOTORS: entry ₹680, ATR ₹18, stop = 680 - 36 = ₹644 (-5.3%)

A fixed 2% stop loss makes no sense when some stocks regularly move 1.5% in a day and others barely move 0.5%. ATR adapts to the stock's personality.

Correlation and Sector Risk

The agent checks correlation between potential new positions and existing holdings. If I'm already long HDFCBANK, I don't want to also open ICICIBANK - both banking stocks move together during sector stress.

def check_correlation_risk(new_symbol, existing_positions, max_corr=0.7):
    """Reject trades that would create excessive correlation."""
    for position in existing_positions:
        correlation = calculate_correlation(new_symbol, position['symbol'], days=60)
        if correlation > max_corr:
            return False, f"Correlation {correlation:.2f} with {position['symbol']} exceeds {max_corr}"
    return True, "Correlation risk acceptable"

Sector limits prevent concentration. Max 30% of capital in banking stocks, even if three banking signals trigger simultaneously.

No Overnight Leverage

This is absolute. CNC (delivery) only. No MIS (intraday margin), no BO (bracket orders with leverage), no CO (cover orders). If a position goes against me, I lose 1x. Never 5x. Never 10x.

With ₹20K, this means the worst possible day (every position hits stop loss) loses about ₹1,200. Painful but survivable. With leverage, that same day could wipe the account.


The Approval Flow

I don't trust the agent to trade autonomously. Not yet, maybe not ever. Here's how the human-in-the-loop works.

AI Agent Signal Generated 📱 Telegram Message NHPC BUY ₹76.50 SL ₹72.10 | Target ₹83.20 Confidence: 82% [✅ Approve] [❌ Reject] Human Decision Approve Execute Order Log Trade Reject / Timeout 5min Signal Expired Log Skip Graduation Criteria After 50+ trades, 60% win rate → Semi-autonomous
Click to expand

Signal to Telegram

When the agent generates a trade signal that passes all filters, it sends a structured message to my Telegram bot. The message includes everything I need to make a decision: the trade details, the reasoning, the risk parameters, and the confidence score.

I showed the format earlier, but the key design decision is: the agent must explain its reasoning. It's not enough to say "BUY HDFCBANK." I need to know why. What's the RSI? What regime are we in? Why this stock and not that stock? What's the risk/reward?

If the agent can't articulate good reasoning, the signal probably isn't good.

Approval Rules

Signal Generated
    Send to Telegram with full reasoning
    Wait for response (max 5 minutes during market hours)
    ↓
├──  Approved  Execute at next valid price
├──  Rejected  Log rejection reason, learn from it
└──  Expired  Log as expired, analyze if it would have worked

The 5-minute expiry is critical. Markets move fast. A signal that was valid at 9:00 AM might not be valid at 9:30 AM. If I'm busy and can't review in time, the signal dies. Better to miss a trade than enter one I haven't reviewed.

Edge cases I'm watching for:

The Road to Semi-Autonomy

The approval flow is training wheels. I don't plan to keep manually approving every trade forever. The graduation criteria:

  1. 50 approved trades completed — enough data for the agent to have a track record
  2. Win rate above 60% — not the backtest win rate, the live win rate
  3. No risk rule violations — zero incidents of the agent trying to exceed position limits, trade on no-trade days, etc.
  4. Consistent reasoning quality — the explanations make sense and accurately describe what happened

After meeting all four criteria, I might consider semi-autonomous mode for very small trades (₹500-1,000) while larger positions still require approval. But honestly, the approval step takes 30 seconds and catches edge cases the agent can't see.

Full autonomy isn't the goal. Having a human in the loop catches things like "didn't that company's CEO just resign?" or "wait, that stock is locked in upper circuit, why is the agent trying to buy it?" The agent provides systematic analysis; I provide context and common sense.


Performance Tracking

If you can't measure it, you can't improve it. The agent tracks everything and surfaces it through a dashboard on pkarnal.com.

Real-Time P&L

During market hours, the dashboard shows:

Trade Journal

Every trade gets a journal entry, written by the agent:

## 2026-02-13: HDFCBANK Long

**Entry:** ₹1,642.30 at 09:22 | **Exit:** ₹1,689.50 at 14:45 (+2.9%)
**Holding:** 1 day | **P&L:** +₹94

**Entry Reasoning:** RSI 27.4 in ranging regime. Bank Nifty showing
strength. No events. High confidence (0.82).

**What Happened:** Stock opened flat, dipped to ₹1,635 (MAE -0.4%),
then rallied through the afternoon as Bank Nifty broke out. Hit target
zone at 2:30 PM, closed before market close.

**Lessons:** Stock reached ₹1,701 after close the next day. Could have
held longer? But overnight risk was real. Good discipline to take the
planned exit.

Strategy Breakdown

Over time, the dashboard would show which signal types are profitable:

Strategy Performance (Last 30 Days) — HYPOTHETICAL
─────────────────────────────────────────────────
Signal Type     Trades  Win%   Avg P&L
─────────────────────────────────────────────────
RSI Oversold       12   67%   +1.2%
RSI Overbought      4   50%   +0.3%
RSI + Volume         6   71%   +1.8%
─────────────────────────────────────────────────

Regime Breakdown — HYPOTHETICAL
─────────────────────────────────────────────────
Regime          Trades  Win%   Avg P&L
─────────────────────────────────────────────────
Ranging            14   71%   +1.4%
Trending Up         5   60%   +0.8%
Trending Down       2   0%    -1.9%
Volatile            1   0%    -2.3%
─────────────────────────────────────────────────

If this pattern holds, mean reversion works well in ranging markets, acceptably in uptrends, and poorly in downtrends. The regime filter should aggressively filter out non-ranging signals.

Benchmark Comparison

The dashboard tracks performance against three benchmarks:

If the agent can't beat a fixed deposit, it shouldn't be trading. If it can't beat Nifty 50, I should just buy a Nifty index fund. The bar is clear.

The dashboard also shows rolling Sharpe ratio over 30/60/90 day windows. Sharpe above 1.5 is good. Above 2.0 is excellent. Below 1.0 means the returns aren't justifying the risk and something needs to change.


Why This Can Beat 15% CAGR

Let's talk about the elephant in the room. Can an AI agent actually beat the market?

The honest answer: probably not consistently in the long run, and definitely not by a huge margin. Markets are efficient enough that sustainable alpha is extremely hard to find. But there are structural reasons why this approach has an edge:

Speed of analysis. I can screen 500 stocks in under 10 seconds. A human analyst might cover 20-30 stocks deeply. The agent won't find the best trade among 500 stocks, but it'll find all the stocks that match its criteria, every single day, without getting bored or distracted.

No emotional bias. The agent doesn't panic sell during a drawdown. It doesn't FOMO into a momentum stock that's already up 20%. It doesn't hold a losing position because "it'll come back." It follows the rules. Always.

Consistent execution. The agent runs its cron jobs at the same time every day. It doesn't sleep in, doesn't take holidays (except the no-trade days), doesn't get sick. Consistency is an edge when most retail traders are inconsistent.

Compounding small edges. Even 2% monthly returns compound to 26.8% CAGR. The backtest suggests the strategy can do better than 2% monthly, but even a conservative estimate beats the Nifty's historical ~15%.

Adaptation. This is the real edge. A static algo trading system slowly decays as market conditions change. This agent detects when its strategy is underperforming and adapts. It won't always adapt correctly, but the baseline of "notice and adjust" is better than "keep doing the same thing until a human notices."

But let me be honest about the limitations:

Backtests lie. A 2.29 Sharpe ratio in backtesting will almost certainly degrade in live trading. Slippage, impact cost, and the difference between limit order fills and backtest fills all eat into returns. I'd be happy with a live Sharpe above 1.5.

₹20K is toy money. At this scale, the transaction costs (₹20 per executed order on Zerodha) eat into returns disproportionately. A ₹2,000 position that makes 2% (₹40) gives up half of that to brokerage. The strategy needs to scale to ₹1L+ to overcome transaction costs efficiently.

Past performance means nothing. Three years of backtest data is not enough to capture all market regimes. The agent might encounter a market condition it's never seen — a regime change, a black swan, a structural shift in how Indian markets work. History is a guide, not a guarantee.

Mean reversion has a mathematical edge, but edges erode. As more participants adopt similar strategies, the edge shrinks. This is true of every quantitative strategy ever. The learning loop helps, but it can't create edge from nothing.

My realistic expectation: 15-25% CAGR with controlled drawdowns under 10%. If it achieves that consistently over a year, I'll scale up. If it can't beat a Nifty index fund over 6 months, I'll shut it down and buy the index fund instead.

The Hidden Costs

Let's talk about what eats into returns:

Brokerage: ₹20 per executed order on Zerodha. A ₹2,000 trade costs ₹40 roundtrip (₹20 buy + ₹20 sell) = 2% of the position. This hurts small accounts disproportionately.

STT (Securities Transaction Tax): 0.1% on sell side for delivery trades. On a ₹2,000 position, that's ₹2. Small but adds up.

GST: 18% on brokerage fees. The ₹40 brokerage becomes ₹47.20.

SEBI charges: Negligible but present.

Total impact: ~2.4% per roundtrip trade on small positions. This means every trade needs to make 2.4% just to break even. It's why the strategy targets 3-5% per trade minimum and why scaling above ₹1L is essential for profitability.

The ₹20K trap: At this account size, transaction costs dominate. A 1% market move on a ₹2,000 position is just ₹20 — less than the brokerage cost. The agent must be extremely selective and target larger percentage moves to overcome this handicap. This is why most retail algo trading fails: the costs are invisible until you calculate them.

Tax Implications

All gains from delivery trades held for less than one year are Short Term Capital Gains (STCG), taxed at 20% (as of 2026).

Unlike mutual funds where short-term gains are added to income and taxed at your slab rate, equity STCG has a flat 20% rate. On a ₹10,000 annual gain, that's ₹2,000 to taxes.

This matters for position holding periods. The agent targets 3-7 day holds, all of which qualify as STCG. Stretching some profitable trades to 366+ days for Long Term Capital Gains (LTCG at 12.5%) could improve after-tax returns, but it conflicts with the mean reversion logic.

SEBI Algorithmic Trading Regulations

Technically, this system operates in a gray area. SEBI's algo trading regulations apply to institutional algorithms that directly interface with exchange systems. Since this agent sends trade proposals to a human who manually approves them via Telegram, it's arguably "systematic trading with human oversight" rather than pure algorithmic trading.

However, SEBI could tighten these definitions. The approval flow and manual execution aren't just for risk management — they're also regulatory protection. If SEBI requires algo traders to register and provide source code access, this human-in-the-loop design keeps me in the clear.


What Could Go Wrong

Overfitting to backtest data: Three years isn't enough history. The parameters might be tuned to that specific period. Mitigation: the learning loop will detect underperformance and adjust.

Unknown regime changes: COVID-style crashes create conditions the agent has never seen. Mitigation: it stops trading when volatility exceeds known patterns.

API failures: Kite Connect downtime could leave positions unprotected. Mitigation: defensive coding sends alerts for any failed stop loss placement.

Learning from noise: The agent might see patterns that don't exist ("Tuesdays are lucky"). Mitigation: 20-trade minimums for analysis and 2-week paper trading validation.

Slippage at scale: Larger positions in mid-caps could move prices. Mitigation: stick to Nifty 50 now, add liquidity filters later.

Black swan events: Pandemics and crashes break all models. Mitigation: the 10% drawdown circuit breaker and no leverage limits the damage.


The Tech Stack

Here's everything that makes this work:

OpenClaw — The AI orchestration platform that runs the agent. Handles cron scheduling, tool access, Telegram integration, and agent memory. The agent is essentially a Claude instance with access to financial tools, running on scheduled cron jobs.

Kite Connect API — Zerodha's trading API. Full NSE/BSE access, order placement, portfolio management, historical data. The agent authenticates once per day and uses the session for all operations.

yfinance — Free historical data for backtesting and analysis. Not reliable enough for real-time decisions (uses Yahoo Finance which can lag), but perfect for EOD analysis and backtesting.

mftool — Mutual fund data for benchmark comparisons. Tracks Nifty 50 and Nifty 500 TRI performance.

vectorbt — Backtesting engine. Fast, vectorized backtesting that can test thousands of parameter combinations across hundreds of stocks in minutes. Used for parameter validation before any changes go live.

ta (Technical Analysis) — Python library for technical indicators. RSI, ADX, ATR, Bollinger Bands, MACD — all the indicators the strategy uses, calculated consistently and correctly.

feedparser — RSS feed parser for news sentiment. Pulls from Economic Times, Moneycontrol, and Livemint for market news and event detection.

SQLite / JSONL — Trade logging and performance tracking. JSONL for append-only trade logs (easy to process, easy to backup). SQLite for aggregated performance metrics and dashboard queries.

Telegram Bot API — The approval interface. Inline buttons for approve/reject, structured messages for trade signals, alerts for risk events.

pkarnal.com — Personal website hosting the performance dashboard. Static site with JavaScript pulling from an API endpoint that reads the SQLite database.

The whole system runs on a single Hetzner CAX21 server (4 ARM vCPUs, 8GB RAM). Total infrastructure cost: about €7/month. The agent itself is the most expensive part — Claude API calls for reasoning about trades. But since it's running through OpenClaw on scheduled cron jobs rather than continuously, the token usage is manageable.


Where We Are Today

This is week 1 of a ₹20,000 experiment. Most of the system is built but untested in live conditions. The first trade hasn't been placed yet.

What works: Signal generation, regime classification, Telegram approval flow, basic order execution.

What's untested: Everything under live market stress. Learning loop. Parameter adaptation. Risk circuit breakers.

Weeks 1-2: Observation Signals sent but not executed
Weeks 3-4: Small Trades ₹1,000-2,000 positions
Month 2: Full Operation ₹20K deployed, accumulate performance data
Month 3: First Learning Cycle Evaluate vs. index fund — scale up, modify, or shut down

Success criteria: After 3 months, if live Sharpe ratio exceeds 1.5 and the system beats a Nifty index fund, I'll add capital. If not, I'll buy the index fund instead. No sunk cost fallacy.


What I've Learned So Far

Even before placing the first trade, building this system has taught me things:

Risk management is the product. The signal generation is maybe 20% of the code. Risk management, position sizing, circuit breakers, and approval flows are the other 80%. Getting the edge right doesn't matter if a single bad day wipes out six months of gains.

Explain-ability is not optional. If the agent can't explain why it wants to make a trade, the trade probably isn't good. The Telegram approval messages force the agent to articulate its reasoning, which acts as a quality filter. I've already rejected signals during testing because the "reasoning" boiled down to "RSI is low" with no regime context.

Patience is a strategy. The agent generates zero signals on most days. My instinct is to tweak parameters until it's more active. But "more trades" is not the goal — "more good trades" is. The agent sitting on its hands in an ambiguous market is exactly the right behavior.

The learning loop will be slow. I initially imagined the agent rapidly evolving its strategy week over week. In reality, with 0-3 trades per day and a 20-trade minimum sample size for analysis, it'll take weeks to accumulate enough data for meaningful insights. That's fine. Slow and correct beats fast and wrong in trading.

What I've Learned Already

Building this taught me that risk management isn't just a feature — it's the product. Most of the code handles edge cases, position sizing, and circuit breakers. The actual signal generation is maybe 20% of the system.

The approval flow forces explainability. If the agent can't articulate why it wants to make a trade, the trade probably isn't good. "Because RSI is low" isn't sufficient reasoning.

The reality check: This will probably fail in subtle ways I haven't anticipated. That's fine. The goal isn't to get rich quick — it's to build a system that can adapt when it fails, rather than just break.

I'll write a follow-up in 3 months with actual performance numbers, real learning loop adjustments, and honest lessons from live trading. Until then, it's all theoretical.


This blog post — and the platform serving it — was built and deployed by another AI agent system I run. That one handles personal life infrastructure instead of code: finances, memory, email processing, web publishing. I wrote about it here.

aiagentstradingfinanceautomationkite-connect

Comments