TradeWise

Council of Quants

RL Multi-Agent Swarm · Episodic Memory · Auto-Execute

Online RL Training Loop

Debate episodes are stored, rewarded after market movement, then used to weight future decisions.

Episodes

Rewarded

Pending

0.01

State

market vector

Debate

3 agents

Memory

0 episodes

Policy

meta-controller

Reward

delayed

State-action tuple

Each debate stores market state, agent arguments, meta-action, confidence, and entry price.

Delayed reward

After the reward window, price movement marks whether the action was correct.

Policy update

Future debates retrieve similar episodes and weight agents by observed accuracy.

Learning trace

No training episodes yet. Run Swarm to create the first memory.

RL-Optimised Multi-Agent Swarm

Three adversarial agents · Episodic memory · Meta-controller · Auto-execute

Bull (Alpha)

Bear (Beta)

Fundamentalist (Gamma)

Episodic Memory

RL Policy

Meta-Controller