OPEN-SOURCE SCRIPT

Machine Learning: seMLP Q-Wavelet RL Engine [Jamallo]

1 264
Author Note: I always get asked: "How can I build a Machine Learning or Artificial Intelligence trading system?" I created the study "Machine Learning: seMLP Q-Wavelet RL Engine" to showcase exactly how it can be done in a beginner-friendly manner. We will break down exactly how this AI thinks in plain English, and then show you exactly how the Pine Script code executes it step-by-step.

Introduction: The Institutional Approach to Algorithmic Trading

Most retail and algorithmic traders spend years searching for the "holy grail" by combining static indicators and hard-coded `IF/THEN` rule sets. They are often unaware that institutional quant desks abandoned those basic, curve-fitted patterns decades ago. Standard algorithmic analysis fails because financial markets are inherently chaotic—a hardcoded strategy that works perfectly in a backtest will systematically break down during a live regime shift.

To acquire a true institutional edge, algorithmic strategies cannot rely on rigid, backwards-looking formulas; they require a system that adapts dynamically in real-time. This script brings that quantitative firepower directly to your chart by constructing a live Self-Teaching AI.

  • Dynamic Filtering: It uses advanced frequency mathematics (Wavelets) to separate random market noise from true institutional momentum footprints with near-zero lag.
  • Artificial Brain: It feeds that data into a neural network—a living matrix of artificial "neurons" that continuously analyze and execute decisions.
  • Self-Correction: Most importantly, it executes Reinforcement Learning. If a trade fails, the AI actively calculates the error and mathematically rewires its own brain, ensuring it constantly evolves to survive changing market conditions.


Ultimately, this serves as a foundational study showing you exactly how to break away from basic scripting and get started in true Quantitative Algorithmic Trading.

1. The Core Architecture Loop
Here is the high-level flow of how the AI thinks on every single candle:

cuplikan

The Invisible "Burn-In" Phase
Because the AI starts with a completely randomized, "empty" brain, it will make terrible decisions on the very first few candles. To prevent it from acting prematurely on live data, the script executes an aggressive Burn-In Phase (e.g., the first 300 bars of the chart). During this period, the indicator is completely invisible. It aggressively executes hundreds of "mock trades" in the background, tracking virtual PnL, taking massive risks, and rapidly rewiring its brain without showing a single signal on your screen. Once the 300 bars are up, the burn-in phase ends. The AI stops acting recklessly and officially enters "Live Trading" mode with a fully trained, highly-intelligent brain.

SECTIONS 2 & 3: Setting Up the Brain

cuplikan

Conceptual Overview
Imagine the brain as a massive team of financial analysts.

  • We have 16 junior analysts looking at chart data.
  • They report their findings up to 12 senior analysts.
  • The seniors report to 6 directors.
  • The 6 directors send their final opinions to 3 executives representing the 3 possible actions: `BUY, SELL, HOLD`. This is called a 16 → 12 → 6 → 3 network structure.


Before we hand the price data to the junior analysts, we Normalize it (Z-Score). This just means "leveling the playing field" so a massive $500 candle wick doesn't break the analysts' math compared to a tiny $1 movement.

The Code Breakdown

Pine Script®
// Section 2: Brain Size Constants int NI = 16 // 16 Inputs (Junior analysts) int NH1 = 12 // 12 Hidden layer 1 nodes int NH2 = 6 // 6 Hidden layer 2 nodes int NO = 3 // 3 Outputs [Buy, Sell, Hold] // Section 3: Normalization Helper norm(series float x, simple int win) => float mu = ta.sma(x, win) float sg = ta.stdev(x, win) float sf = nz(sg) < 1e-10 ? 1.0 : sg float res = (x - nz(mu, x)) / sf // Levels out the price data na(res) ? 0.0 : res


SECTIONS 4 & 5: Giving the AI "Memory"


Conceptual Overview
By default, TradingView indicators suffer from permanent amnesia! Every time a new candle paints, TradingView completely deletes its short-term memory and forgets what happened on the last candle. If we are building an AI for trading that needs to "learn", it must be able to remember its past mathematical mistakes.

To force TradingView to remember, we use special variables called `var` to create "Persistent Memory Matrices" where the AI for trading stores its brain's wiring throughout the entire chart history.

The Code Breakdown

Pine Script®
// Using 'var' locks the memory so it never resets when a new candle paints var matrix<float> W1 = matrix.new<float>(NI, NH1, 0.0) // The connections between neurons var matrix<float> W2 = matrix.new<float>(NH1, NH2, 0.0) ... var int pos = 0 // The AI remembers its current position: Long (1), Short (-1), or Flat (0)


SECTION 6: Seeing the Market (Wavelets)


Conceptual Overview
If you use a Moving Average, it always "lags" behind the real price. By the time the Moving Average crosses to tell you to buy, the massive breakout has already happened.

To fix this, we teach the AI for trading to see using Haar Wavelets. A Wavelet is a piece of advanced math that splits the price candle with minimal lag into two things:

  • The Detail (D): The immediate, rapid volatility chop.
  • The Smooth (V): The true underlying smooth momentum. By looking at the detail and momentum completely separately, the AI for trading can react to shifts with minimal lag.


The Code Breakdown

Pine Script®
// We take standard features like Open, Close, and Volume: float f0 = open float f1 = close... // We break them into Wavelets using simple math combinations: float v1_0 = (f0 + nz(f0[1], f0)) / 2.0 // Smooth momentum float d1_0 = (f0 - nz(f0[1], f0)) / 2.0 // Instant volatility detail ... // We pack all 16 traits into the 'feat' array to feed the AI for trading's Brain feat.set(0, norm(d1_0, i_normWin)) feat.set(14, float(pos)) // Tells the brain its current trade position feat.set(15, norm(portRet, i_normWin)) // Tells the brain its current open trade return


SECTION 7: How the Brain Thinks (seMLP)

Conceptual Overview

An "MLP" is just a standard Neural Network (a massive web of variables that pass data to each other). The problem is that if you give TradingView an insanely massive web of math equations, it will crash and throw a compiler timeout error.

So, we use a Self-evolving MLP (seMLP). The AI pushes the Wavelet data through its network dynamically. To prevent "dead zones" where a neuron just stops firing in a flat market, it uses a formula called LeakyReLU. It basically acts as a gatekeeper that tells the neuron: "If this signal is incredibly weak, shrink it down to 1%, but don't explicitly delete it."

The Code Breakdown

Pine Script®
// The data enters Hidden Layer 1 (h1) array<float> h1 = array.new<float>(NH1, 0.0) for j = 0 to NH1 - 1 float s = B1.get(j) // The inner brain loops through all 16 incoming inputs for i = 0 to NI - 1 s += feat.get(i) * W1.get(i, j) // LeakyReLU Formula: f(x) = x if x > 0 else 0.01 * x // If the signal 's' is positive, keep it. If 's' is negative, shrink to 1% h1.set(j, s > 0 ? s : 0.01 * s)


SECTION 8: Taking Action (Exploration vs Exploitation)

Conceptual Overview
How does the AI actually press the BUY or SELL button? It calculates a "Confidence Score" (called a Q-Value) for all three options— Buy, Sell, and Hold. The highest score wins and executes the trade.
However, during its invisible "Burn-In Period", the AI uses a variable called Epsilon. Think of Epsilon as a dice roll. Sometimes, instead of making the smartest, highest-scoring choice, the AI will randomly pick a completely stupid trade just to "experiment" and see if a hidden market pattern exists! This is conceptually how AI for trading discovers new, out-of-the-box strategies. As training goes on, Epsilon gets smaller, and the AI stops experimenting.

The Code Breakdown

Pine Script®
// Calculate Epsilon: Start at a high 50% and slowly decay to 5% over time float epsilon = bar_index <= i_burnIn ? math.max(0.05, i_epsStart_val * ...) // Roll the dice. If the random number is less than epsilon, we experiment randomly! bool explore = math.random(0.0, 1.0) < epsilon // Find the AI for trading's highest confidence choice: Q(0) = Buy, Q(1) = Sell, Q(2) = Hold if Q.get(1) > bestQ // If Sell confidence is higher than current best (Buy)... bestQ := Q.get(1) bestAct := 1 if Q.get(2) > bestQ // If Hold is even higher... bestQ := Q.get(2) bestAct := 2 // Execute the final action int act = explore ? math.min(int(math.floor(math.random(0.0, 2.999))), 2) : qArg


SECTION 9: Training with Rewards (Reinforcement Learning)

Conceptual Overview

cuplikan

This is the heart of Machine Learning. It functions exactly like training a pet. If the AI makes a winning trade that generates cash, we give it a mathematical "treat" (a positive reward). If the AI loses money, we hit it with a brutal negative reward. Over time, the AI autonomously refines its neural weights exclusively to collect the maximum amount of "treats".

The Code Breakdown

Pine Script®
// Calculate how much money the candle moved float cRet = nz((close - close[1]) / close[1], 0.0) // The Reward (R) is a combination of three factors: // 1. PnL (rPn) - Did we make raw cash profit? // 2. Trail (rTn) - Did we efficiently track the trend? // 3. Lee (rLee) - A shaping bonus for correct directional positioning. float R = i_alphaT * rTn + i_alphaP * rPn + 0.1 * rLee


SECTION 10: Learning from Mistakes (Backpropagation)

Conceptual Overview

cuplikan

If the AI's trade failed, how does it adjust its internal logic? It uses a process called Backpropagation. It looks at the Reward it just received, realizes it was horribly wrong, and calculates the "Error Margin" (How far off my prediction was I?). It then mathematically rewrites all of the internal connections `(W1, W2, W3)` in reverse, editing them to be slightly smarter for the next candle!
Because updating a massive brain on every single micro-tick causes chaotic glitches, we "Accumulate" the errors in a batch over several candles, and then update the brain smoothly with the batch average.

The Code Breakdown

Pine Script®
// Compare the Target Reward vs what the Brain actually Predicted (Temporal Difference Error) float tgt = R + i_gamma * max_qt float td = tgt - pOut.get(prevAct) // Accumulate the backwards gradients over multiple bars so we don't glitch for j = 0 to NO - 1 gB3_acc.set(j, gB3_acc.get(j) + g3.get(j)) accumCount += 1 // Once 'i_accumSteps' bars have passed, we apply the compiled batch update to 'Rewire' the Brain weights! if accumCount >= i_accumSteps for i = 0 to NH2 - 1 for j = 0 to NO - 1 float dw = gW3_acc.get(i, j) * sc W3.set(i, j, W3.get(i, j) + clr * dw - clr * i_l2 * W3.get(i, j))


SECTION 11: Link Pruning (Making the Brain Faster)

Conceptual Overview

Stage 1: The Initial Brain (Complex & Slow)
cuplikan
Stage 2: The Pruning Decision
cuplikan
Stage 3: The Optimized AI for trading (Sleek & Fast)
cuplikan

As the brain learns, some of the mathematical connections become totally useless. Having a giant Tradingview indicator calculate hundreds of useless math connections will trigger a calculation timeout. At a specific point in training length (defaulting to the end of the 300-bar burn-in period), the script literally pauses and deletes (zeroes out) the weakest neural links. TradingView skips over calculations containing plain zeroes, making your indicator insanely fast and completely lag-proof.

The Code Breakdown

Pine Script®
if bar_index == i_pruneBar and not pruned // Evaluate every single connection weight... // Find the bottom weakest percentage (i_prunePct) float thr = absW.get(pidx) // Explicitly set the weakest weights to Zero! for i = 0 to NI - 1 for j = 0 to NH1 - 1 if math.abs(W1.get(i, j)) <= thr W1.set(i, j, 0.0) // Permanent pruning: weak link removed



Important Disclaimer

This indicator is published strictly for educational and research purposes. It is a conceptual showcase proving that advanced Deep Reinforcement Learning architectures generally reserved for Python/TensorFlow can be natively executed within the TradingView Pine Script environment. Due to Pine Script's structural time-series limitations—specifically the lack of a random-access historical buffer required for true experience replay—this is NOT intended for practical live trading. For production-grade deployment, it is highly recommended to port this mathematical framework to Python.

References
This indicator's mathematical engine was directly modeled and bridged from the following quantitative research papers:

  • [LEE] Lee et al. (2021) — "Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning" (Used for the MODWT Wavelet integration & State architecture).
  • [TSA] Tsantekidis et al. (2021) — "Price Trailing for Financial Trading using Deep Reinforcement Learning" (Used for the dynamic margin-trailing reward system).
  • [SEM] Seow et al. (2021) — "seMLP: Self-evolving Multi-layer Perceptron" (Used for the 16 → 12 → 6 → 3 sparse Neural Network structure and the automatic Link Pruning logic).

Pernyataan Penyangkalan

Informasi dan publikasi ini tidak dimaksudkan, dan bukan merupakan, saran atau rekomendasi keuangan, investasi, trading, atau jenis lainnya yang diberikan atau didukung oleh TradingView. Baca selengkapnya di Ketentuan Penggunaan.