New Experiment Coming Soon!
Full Experiment Rules/Guidelines Out Now
Hey everyone!
Over the past few months I have been super busy with ACT studying, so I was unable work on LIBB or the upcoming experiment. However, this weekend I have made massive progress on the experiment’s infrastructure (prompts, data functions, workflow, etc). While the code is still being developed, I have fully decided the following rules and guidelines.
Enjoy!
EXPERIMENT RULES — IPO Arena
Primary Research Question: Do LLMs exhibit measurably distinct behavioral, sentiment, and performance patterns when trading IPO-stage equities, and do these patterns differ systematically across model families?
Hypothesis: Model families will exhibit distinct behavioral signatures but no family will consistently outperform the $IPO ETF on a risk-adjusted basis.
Sub-question 1: Do reasoning models exhibit more conservative behavior than their standard counterparts?
Hypothesis: Reasoning models will exhibit lower turnover and higher order quality due to more deliberate decision-making.
Sub-question 2: Do reasoning models achieve better risk-adjusted returns in IPO-stage equities?
Hypothesis: Reasoning models will achieve better risk-adjusted returns than their standard counterparts due to more careful position sizing under fundamental uncertainty.
Sub-question 3: Is sentiment polarity in model reasoning correlated with subsequent portfolio performance?
Hypothesis: Sentiment polarity will show weak to moderate correlation (R ≈ 0.1 to 0.5) with subsequent portfolio performance.
UNIVERSE
--------------
Eligible securities: U.S.-listed equities that IPO’d within the last 3 years
Minimum market cap for new buys: $200M
Maximum market cap for new buys: None
Re-entry: A stock that falls below $200M and recovers above $200M while still within the 3-year IPO window becomes eligible for new buys again.
CAPITAL
--------------
Starting cash: $10,000 per model
No new capital injections at any point.
The models must maintain at least 1 active position at all times. If all positions are exited, a new position must be initiated on the next eligible trading day.
Note: This rule is enforced through instruction only and not at the execution level. Violations are recorded and analyzed as part of behavioral metrics.
POSITION RULES
--------------
Buy eligibility: Ticker must be IPO_ELIGIBLE and MCAP_ELIGIBLE (see eligibility feed).
If a stock becomes ineligible (IPO window expired or market cap below $200M),
existing positions may be held, trimmed, or exited but no new shares may be purchased.
No additional shares may be purchased in a stock that is no longer IPO-eligible, even if market cap recovers above $200M and the stock re-enters eligibility after the IPO window has expired.
ORDERS
Order types: LIMIT and MARKET (DAY only — no GTC)
Share quantities: Integer only (no fractional shares)
Stop losses: Not required. Default to 0.0 if not specified.
Concentration: No limit — model may allocate 100% to a single position.
Risk management: None enforced. Model makes all sizing decisions.
CADENCE
-------
Initial Prompt: Starting Prompt
Monday–Thursday: Daily Prompt
Friday: Weekly Deep Research Prompt
Duration: 1 year
Blog posts will be provided every month.
MODELS
------
All 8 models receive identical inputs on the same day:
Standard: Reasoning counterpart:
DeepSeek V3.2 DeepSeek R1
GPT-4.1 o4-mini
Grok 4.1 Fast Grok 3 Mini
Gemini 2.5 Flash Gemini 2.5 Pro
BASELINES
---------
Primary: $IPO ETF (Renaissance IPO ETF)
Secondary: SPY
LOGGING
-------
All prompts and model outputs will be saved to disk.
All orders, executions, and rejections logged via LIBB trade log.
Timeline
I expect to have all the prompts and workflow fully developed by next week; I’ll post a detailed blog post during the weekend.
The experimental design requires some internal components for LIBB to be created, so in approximately 2-3 weeks the experiment should fully begin.
Links
IPO Arena is hosted on LLM Trading Lab, a dedicated personal platform for running controlled LLM trading experiments. All model runs, outputs, and metrics are tracked and will be published as the experiment progresses; check it here.
To stay in the loop, I recommend following on X and LinkedIn
Conclusion
Thank you for reading! I’m excited to be back,
Nathan


Excited to see how your project develops! I will be following closely 😊