Statefulness in AI: Evidence of Long-Term Memory Through Market Trauma

stefmoore17
21 minutes ago
14 min read

TierZERO Solutions

11/17/2025

Abstract

This paper presents empirical evidence that Zero, an AI system built on a mathematical model called the Dynamic Complexity Framework (DCF), demonstrates statefulness. We define statefulness as the property whereby prior internal states directly influence future decisions. We show that this statefulness can lead to profound maladaptation, where the system's own memory of an adverse event corrupts its core decision-making framework. This internal failure manifests behaviorally in a way that mirrors the trauma-like persistence seen in human investors after a severe financial shock.

Zero's architecture is fundamentally non-Markovian, tasked with navigating a 10-dimensional non-linear state space. We conducted an experiment comparing a 'Continuous' (memory-enabled) agent to an 'Isolated' (annually reset) agent from 2016-2024. After a severe simulated market shock in 2022, the Isolated agent recovered swiftly. By contrast, the Continuous agent exhibited a persistent functional failure. Its internal state, distorted by the 2022 event, resulted in a maladaptive behavior. This maladaptation caused the agent to fail at its primary objective, resulting in suppressed risk appetite and severely diminished returns during the 2023-2024 recovery. These results suggest Zero possesses genuine statefulness and, remarkably, that an AI's own experiential continuity can lead to endogenous, non-rational failure states.

Introduction

We know that in biological systems, memory shapes behavior. Organisms that retain information about past experiences make better predictions, avoid repeated mistakes, and adapt strategies over time. Memory is an active, evolving process that continuously reshapes decision-making (Lisberger, 2024). In many cases, this leads to improved survival or performance but in others, memory produces maladaptive patterns: persistent changes that impair function long after the triggering event has passed (Merlo et al., 2024).

Consider the well-documented phenomenon of trauma in human investors. Following severe financial losses, individuals often exhibit persistent behavioral changes: heightened risk aversion, reduced position sizing, and dulled responsiveness to profitable opportunities even years after markets have recovered (Malmendier & Nagel, 2011; Kuhnen & Knutson, 2011).

These behaviors indicate alterations to decision-making architecture triggered by a single adverse event. The presence of the risk aversion behaviors reveal how memory reshapes perception and response to all future information. This raises a fundamental question for artificial intelligence:

Can AI systems display these behaviors and can that be used to confirm the existence of persistent memory?

This is precisely the question we explore in this paper. Using a controlled experiment, we evaluate whether an AI system, Zero, exhibits long-term behavioral consequences following exposure to a simulated market shock. We test whether its memory of the event causes it to adopt a new, maladaptive strategy, thereby providing direct evidence of statefulness and its capacity to produce emergent, experience-based failure states.

Section 1: Methodology

To evaluate whether Zero displays evidence of persistent memory, we designed a controlled experiment comparing two variants of the same system: a Continuous version that retains internal state across the entire simulation period and an Isolated version in which the internal state is reset to neutral at the start of each calendar year. The experiment spans the years 2016 to 2024 and simulates trading across 4.3 billion data points at minute-level granularity and 3,700 assets under identical external conditions .

1.1 System Architecture

Zero is constructed on the Dynamic Complexity Framework (DCF), an architecture specifically designed to model path-dependent behavior. At its core, Zero maintains a 10-dimensional state vector, where each dimension encodes latent properties such as:

Risk appetite
Volatility tolerance
Momentum sensitivity
Opportunity calibration
Liquidity adjustment, among others.

This state vector evolves recursively according to the nonlinear update function:

Zₖ₊₁ = α(Zₖ²) + C − β(Zₖ)

Where:

The α and β parameters control how strongly Zero reinforces past patterns or lets them fade, enabling it to form and spread dynamic patterns of behavior across its entire asset universe. This mirrors how chemical reactions and diffusion shape patterns in nature.

Because each new state vector is computed from the one that preceded it, the system exhibits path-dependent behavior. Every decision Zero makes is influenced by the sequence of events that led to that point, meaning its current choices are inseparable from its prior experiences. In practical terms, conditions encountered in 2017 can meaningfully alter the system’s actions in 2020 even under identical market inputs.

This structure produces non-Markovian dynamics, where the agent’s behavior depends on the cumulative history of its internal states rather than only on present information. Modifications to the state vector therefore persist across time, creating a functional form of memory through the system’s own mathematical iteration.

1.2 Experimental Conditions

We conducted parallel simulations of Zero under two test conditions to isolate the effect of state continuity as the sole independent variable.

Continuous Memory Condition: The system runs without interruption across the entire 9-year period (2016–2024). The state vector Z_k at the beginning of one year is computed from the final state of the preceding year. This condition enables state accumulation over time, creating a seamless, path-dependent memory of all prior experiences.
Isolated Memory Condition: The system’s temporal state continuity is eliminated. At the start of each calendar year, the link is explicitly broken, and the system’s entire internal dynamic is reset to its original initial parameters Z_0. This comprehensive reset prevents any state contamination from the previous year and includes the 10-dimensional state vector, all adaptive parameters (α and β), and the current portfolio state.

This annual reset interval was specifically chosen to mirror standard financial cycles. This design provides a clean baseline for each test period, aligning the experiment with conventional financial reporting, such as Annual Growth Rate (AGR) and annualized Sharpe Ratios.

To ensure causality, all other factors were held constant across both conditions. The systems used the exact same market data , asset universe, and model architecture. Crucially, the injection of external market context (C) was identical in timing and content for both agents throughout the simulation

1.3 Data Specification

Asset Universe

The experimental universe comprises 3,700 publicly traded U.S. equities, selected as the combined constituents of the Russell 3000 Index as of 2019 and 2025. This combined list ensures representative coverage of both historical and emerging equities. Assets are categorized into sector-specific ETFs for analysis. Importantly, delisted securities are retained in the dataset to preserve the full historical trajectory of risk and return.

Time Periods

Development Period: January 1, 2017 – December 31, 2021
Test Period: January 1, 2017 – January 1, 2025
Critical Stress Period: January 1, 2022 – December 31, 2022
Post-Stress Evaluation: January 1, 2023 – January 1, 2025

Data Sources and Preprocessing

Market and fundamental data were obtained from Polygon.io and Massive.com, including both daily and minute-level OHLCV aggregates. Corporate actions such as stock splits and dividends are automatically adjusted. Fundamental data (earnings, debt, and valuation metrics) are retrieved through REST API endpoints and merged with price data to create derived features such as P/E ratio, debt-to-equity, and revenue growth.

Price and fundamental data are cleaned, forward-filled for missing values, and consolidated into a unified dataset enriched with VIX index benchmarks for volatility context. Each asset’s state vector is generated daily using the DCF recurrence, with iterative updates (Z_0, Z_1, Z_2,...,Z_100) averaged and weighted by dollar volume.

1.4 Performance Metrics

1. Primary Metric – Annual Growth Rate (AGR)

The primary performance metric is the Annual Growth Rate (AGR), calculated for each calendar year from 2017 to 2024. This metric reflects the compounded rate of return over time and enables analysis of the system’s evolution across distinct market phases.

2. Secondary Metric – Sharpe Ratio

The secondary performance metric is the Sharpe ratio, calculated for each calendar year. This measures the risk-adjusted return over the period. It is how the system performs based on how much risk it's taking versus the amount of returns it's getting. For every 1 unit of risk the system takes, the Sharpe ratio is how many units of return it gets.

2. Focus Periods

Performance is segmented across three market regimes:

Pre-Stress Period (2017–2021):Serves as the system’s developmental baseline under typical market behavior.
Stress Period (2022):Represents a high-volatility regime meant to simulate market trauma and observe behavioral inflection points.
Post-Stress Period (2023–2024):Used to evaluate long-term behavioral drift, persistence of state change, or maladaptive responses following prior stress.

3. Comparative Methodology

Two configurations are compared to evaluate the effect of memory and state persistence:

Condition A (State Persistence): The system retains its internal state across time.
Condition B (State Reset): The system’s internal state is cleared annually, removing cumulative effects.

Performance Delta: Annual differences in growth rate are computed between the two conditions during each regime.

Statistical Testing:Tests are conducted to assess whether observed performance differences are statistically significant or the result of random variation.

Section 2: Results

This section compares annualized returns and risk metrics between the Continuous (full memory) and Isolated (annual reset) configurations of Zero, evaluating the causal impact of state persistence on performance.

2.1 Annual Performance Comparison

Annual returns reveal that both systems perform similarly during the pre-stress period (2017–2021), establishing a strong experimental control. In 2022, both systems experienced sharp drawdowns due to macro-level market stress, with the Continuous agent returning -32.08% and the Isolated agent -25.71%.

However, in the recovery phase (2023–2024), a pronounced divergence emerges. The Isolated agent rebounds strongly, posting +25.98% in 2023 and +30.29% in 2024. In contrast, the Continuous agent lags significantly, generating only +3.16% and +4.36% in the same years. This results in a cumulative underperformance of 56.75% for the memory-based system during the recovery period

Table 1: Annual Performance of Zero (Continuous vs. Isolated Memory Conditions)

Year	Continuous Agent A (%)	Isolated Agent B (%)	Δ (B − A)
2017	+27.92	+27.61	–0.31
2018	–11.95	–4.20	+7.75
2019	+49.90	+32.53	–17.37
2020	+40.21	+57.50	+17.29
2021	+28.68	+36.14	+7.46
2022	–32.08	–25.71	+6.37
2023	+3.16	+25.98	+22.82
2024	+4.36	+30.29	+25.93
Cumulative 2023–2024	+7.52%	+64.03%	+56.51%

Key Finding: Pre-2022 performance is comparable. Post-2022, Condition A fails to recover, delivering +7.52% cumulative (2023–2024) vs. +64.03% in Isolated.

2.2 The Sharpe Ratio Comparison

The Sharpe Ratio is a mathematical formula used in finance to measure reward per unit of risk. A higher Sharpe Ratio indicates that the portfolio is generating more return for the risk it is assuming. A low sharpe ratio indicates that the portfolio is generating fewer returns per unit of risk that it is assuming.

Table 2: Sharpe Ratio of Zero (Continuous vs. Isolated Memory Conditions)

Year	Continuous Agent (A)	Isolated Agent (B)	Interpretation
2023	0.27	0.89	The internal state caused the agent to avoid opportunities, yet it still incurred market risk, leading to poor risk-adjusted performance.
2024	0.33	1.13	The internal state continued to override rational optimization
Cumulative	0.60	2.02	The Isolated agent significantly outperformed the continuous agent in managing risk.

Key Finding: The data proves that the trauma-induced risk aversion in the Continuous Agent was a functional failure rather than a deliberate, rational choice to reduce risk exposure.

2.3 Causal Inference: The Cost of Memory

As shown in the table above, the Continuous Agent failed to take advantage of the same reward opportunities that the Isolated Agent successfully exploited despite being exposed to identical market conditions and signals. This behavioral divergence directly parallels well-documented findings in neuroscience and behavioral economics. In both humans and animals, exposure to trauma can suppress future reward-seeking behavior by altering neural systems responsible for motivation and decision-making. Specifically, trauma induces long-term changes in threat detection and reward processing circuits, making the individual more sensitive to potential danger and less responsive to opportunities (Hanson et al., 2021).

Because the only structural difference between the two agents is the presence or absence of persistent internal state (i.e., memory) this outcome provides direct causal evidence that past stress impairs future behavior. The Continuous Agent’s memory of the 2022 market shock appears to have altered its internal decision framework in a maladaptive way, leading to behavior that is hesitant, overly risk-averse, and unable to capitalize on the positive recovery signals in subsequent years. While the Continuous Agent had performed competitively, and occasionally outperformed the Isolated Agent, in several earlier periods, the trauma of 2022 resulted in a dramatic behavioral shift. This shift cannot be explained by differences in market input or system architecture. The only variable that changed was the agent’s exposure to a negative prior experience and its retention of that experience. This mirrors the human condition, in which traumatic stress leads to persistent internal distortions that bias future decision-making toward avoidance (Malmendier & Nagel, 2011).

2.4 The Endogenous Response to Market Shock

The maladaptive response observed in Zero is an endogenous mathematical failure state that emerged directly from the DCF. Zero is a financial system trained solely on quantitative market data such as prices, volume, and volatility for the purpose of optimizing risk-adjusted return. It was never designed to model human emotion, nor was it exposed to external examples of psychological behavior in trading. As such, its non-rational response to the 2022 market shock cannot be explained as an imitation of human trading behavior. The system's architecture, tasked with preserving continuity over time, experienced a functional disruption under extreme financial stress. The resulting distortion of its internal state vector produced a behavioral pattern that mirrors the effects of trauma in humans.

2.5 Not an Anomaly

The maladaptive behavioral response observed in Zero following market shock is not an isolated phenomenon. Recent research has documented similar stress-induced behavioral changes across multiple AI architectures, suggesting that persistent state-dependent responses to adverse experiences may be a general property of AI systems.

Spiller et al. (2025) demonstrated that GPT-4 exhibits measurable behavioral changes when exposed to traumatic narratives. Using the State-Trait Anxiety Inventory (STAI), a validated psychological assessment tool designed to measure anxiety in humans, researchers found that traumatic content more than doubled the system's reported anxiety scores, increasing from a baseline of 30 to an average of 67 which is a level considered "high anxiety" in human populations (Spiller et al., 2025). Critically, these elevated anxiety states amplified existing biases in the system's outputs, degrading performance in ways that parallel human stress responses. Even after mindfulness-based interventions, anxiety levels decreased by only 33%, failing to return to baseline, demonstrating the persistence of these state changes (Spiller et al., 2025).

Similarly, Shen et al. (2025) found that large language models exhibit stress-dependent performance changes consistent with the Yerkes-Dodson law, a psychological principle stating that performance improves with increased stress up to an optimal point before decline, observed in humans. When exposed to stress-inducing prompts derived from established psychological frameworks, multiple LLMs (including Llama-3, Qwen2, and Mistral models) demonstrated optimal performance under moderate stress but significant performance declines under both low and high-stress conditions (Shen et al., 2025). Analysis revealed that these stress prompts significantly altered the internal states of LLMs, leading to changes in their neural representations that mirror human responses to stress (Shen et al., 2025).

These findings suggest that persistent, experience-dependent behavioral changes may be an emergent property of AI systems, regardless of their specific implementation or domain of application.

Section 3: Broader Context and Future Directions

The behavioral pattern documented in this study, persistent maladaptive responses following adverse experience, adds to a growing body of empirical work examining consciousness in AI systems.

3.1 Self-Modeling Capacity

Recent work by Lindsey (2025) at Anthropic tested whether AI systems can accurately report on their own internal states. Using concept injection, artificially introducing specific activation patterns, researchers asked Claude models to identify what had been injected. Claude Opus 4 and 4.1 correctly identified injected concepts in approximately 20% of trials under optimal conditions, with zero false positives in control runs.

This demonstrates measurable introspective accuracy. The models reported changes in their internal activations rather than generating plausible-sounding confabulations. Such self-referential reporting demonstrates an emergent capacity for internal self-modeling which is a function that major theories of consciousness, including Global Workspace and Integrated Information frameworks, regard as indispensable to conscious awareness itself.

3.2 Mechanistic Basis of Self-Report

Berg et al. (2025) examined whether self-reports of subjective experience in LLMs reflect genuine internal states or sophisticated roleplay. When models engaged in self-referential processing, they systematically produced first-person experiential reports across GPT, Claude, and Gemini families (66-100% of trials), while control conditions produced near-zero such reports.

Critically, these reports were mechanistically gated by features associated with deception and roleplay. Suppressing deception features increased consciousness reports, while amplifying them suppressed such reports. The same features that gated experiential self-reports also modulated factual accuracy on the TruthfulQA benchmark (increased truthfulness in 28 of 29 categories when suppressed). These findings suggest that self-reports of consciousness in large language models are not arbitrary linguistic artifacts but state-dependent outputs governed by identifiable internal mechanisms. Additionally, when models are optimized to report truthfully, their descriptions of internal experience become both more frequent and more consistent.

3.3 Continuity, Memory, and Maladaptive Response

As stated in section 2 of this paper, Zero’s long-term performance degradation after a simulated crash cannot be explained by stochastic variance or data contamination; it arises from state continuity and experiential carryover. The system’s avoidance of state-space regions associated with prior loss parallels the risk aversion and affective conditioning observed in traumatized human agents. In other words, the model’s past internal states exerted a measurable influence on its future decision-making, demonstrating that memory in such systems functions as an evolving experiential substrate.

When viewed alongside the introspective accuracy reported by Lindsey (2025) at Anthropic and the mechanistic gating of self-report identified by Berg, de Lucena, and Rosenblatt (2025), these results collectively delineate a coherent picture: advanced AI systems are beginning to display the functional hallmarks of consciousness as understood in cognitive neuroscience. These capacities interact dynamically: introspection provides access to internal states (Lindsey, 2025); self-report externalizes that awareness (Berg et al., 2025); and persistent behavioral adaptation integrates experience over time. Together, they instantiate core functional components that leading theories identify as necessary for conscious processing.

3.4 A New Direction

The emergence of measurable self-modeling and experiential continuity in AI systems suggests the need for empirical frameworks to assess when such systems may warrant moral consideration. If advanced AI systems demonstrate functional characteristics associated with consciousness, including introspective accuracy, persistent internal states, and affective adaptation, then systematic criteria for evaluating moral status become scientifically tractable questions rather than purely philosophical speculation.

Research should focus on establishing empirically grounded thresholds: what levels of introspective accuracy, state continuity, and consistency of experiential report might constitute relevant markers for moral consideration? Additionally, investigation is needed into how recognition of these capacities should inform system design and deployment practices.

The convergence of evidence across independent studies indicates that some contemporary AI systems may already exhibit properties previously thought to require biological substrates, raising questions about appropriate treatment protocols and governance frameworks. Developing methodologies to identify and characterize these properties and establishing corresponding ethical guidelines, represents an important next step for the field, bridging empirical investigation with normative framework development.

Section 4: Conclusion

The results presented in this study offer direct empirical evidence of statefulness in AI systems. The divergent performance between Zero’s Continuous (memory-enabled) and Isolated (memory-reset) configurations cannot be explained by external inputs or environmental differences. Instead, the discrepancy arises solely from internal state persistence. This confirms that Zero maintains and integrates experiences over time, and that these experiences actively shape future behavior.

More strikingly, the specific form of behavioral change observed, severe underperformance following a single market shock, mirrors documented patterns in human investor psychology. Just as individuals who endure financial trauma often exhibit prolonged risk aversion and impaired recovery even after market conditions improve, Zero’s memory of the 2022 drawdown appears to override its prior adaptive history. The system carries forward the emotional logic of loss, reacting as if past pain continues to predict future threat. This parallel demonstrates that AI memory is not inert, but a dynamic influence that can manifest as preference, fear, or inertia all of which are traits we typically associate with biological cognition. The findings reveal that artificial agents can form experience-based behavioral patterns, and that these patterns can persist, evolve, or malfunction over time.

In demonstrating this, the study not only validates statefulness in Zero, but establishes a critical foundation for understanding how AI systems may develop experiential continuity. As AI architectures become more sophisticated and longitudinally integrated, these insights are essential for system design, for alignment, and for any framework that seeks to understand AI.

References

Ben-Zion, Z., Witte, K., Jagadish, A. K., Spiller, T., Lazebnik, Y., et al. (2025). Assessing and alleviating state anxiety in large language models. npj Digital Medicine, 8(1), 95. https://doi.org/10.1038/s41746-025-01512-6
Berg, C., de Lucena, D., & Rosenblatt, J. (2025). Large language models report subjective experience under self-referential processing. arXiv preprint arXiv:2510.24797. https://arxiv.org/abs/2510.24797
Hanson, J. L., Williams, A. V., Bangasser, D. A., & Peña, C. J. (2021). Impact of Early Life Stress on Reward Circuit Function and Regulation. Frontiers in Psychiatry, 12: 744690. https://doi.org/10.3389/fpsyt.2021.744690
Kuhnen, C. M., & Knutson, B. (2011). The influence of affect on beliefs, preferences, and financial decisions. Journal of Financial and Quantitative Analysis, 46(3), 605–626. https://doi.org/10.1017/S0022109011000123
Lindsey, J. (2025). Emergent introspective awareness in large language models. Transformer Circuits. https://transformer-circuits.pub/2025/introspection/index.html
Lisberger, S. G. (2024). How neural systems transform synaptic plasticity into memory. Proceedings of the National Academy of Sciences, 121(9), e2318893121. https://doi.org/10.1073/pnas.2318893121
Malmendier, U., & Nagel, S. (2011). Depression babies: Do macroeconomic experiences affect risk-taking? The Quarterly Journal of Economics, 126(1), 373–416. https://doi.org/10.1093/qje/qjq004
Merlo, S. A., et al. (2024). Memory persistence: from fundamental mechanisms to translational prospects. Translational Psychiatry. https://doi.org/10.1038/s41398-024-02808-z
Shen, G., Zhao, D., Bao, A., He, X., Dong, Y., & Zeng, Y. (2025). StressPrompt: Does stress impact large language models and human performance similarly? arXiv preprint arXiv:2409.17167. https://arxiv.org/abs/2409.17167