Alpha Analytics - M-SEAM Earnings Prediction Model

Project Overview

The goal of this project was to build a model that cuts through market noise and helps users trade earnings events with greater confidence. We focused on three major tech companies: Apple, Nvidia, and Google, to explore repeatable price patterns. Our analysis centres on three core questions:

How do stock prices typically behave around earnings windows?
What financial metrics (e.g. revenues, margins) most impact those movements?
How do macroeconomic conditions (e.g. inflation, interest rate, and unemployment) interact with these patterns?

Toolkit

Python (Jupyter Notebook)

Data analysis and visualization

Pandas & Scikit-learn

Data cleaning, exploration, and machine learning models

Replit

Web application for user-friendly and interactive model usage

Methodology

1

Data Preparation

The analysis uses data from January 2015 to August 2025, balancing the quantity of data required for modelling. After extensive research and rigorous quality evaluation, potential data sources were ranked by availability, cleanliness, quality and price. FRED, yFinance, Alpha Vantage, Kaggle and Reddit were chosen.

2

Exploratory Data Analysis (EDA)

Initial exploration was conducted using histograms and box plots to identify outliers. Skew testing assessed the normality of distributions. Linear and non-linear relationships were explored through correlations, scatterplots, SLR, MLR, decision trees, and random forests. A sentiment analysis EDA was added for all three stocks using Reddit discussion data, enabling the team to assess shifts in retail investor sentiment and examine how sentiment dynamics corresponded with stock price behaviour around earnings announcements.

3

Key Trends & Preliminary Insights

Different shares were driven by unique factors, necessitating a tailored feature-weighting approach for accurate forecasting. Apple's share price showed a strong correlation with sentiment, while NVIDIA's was more seasonal, and Google's demonstrated a high degree of self-correlation over time.

4

Building the Model

The framework relies on the theory that earnings drive reproducible patterns within windows before and after announcements. We quantify those patterns with sub-scores (EPS, macro, volatility, sentiment, revenue) and blend them into a composite score. Different methods were used to prevent influence from outliers and allow different data types to be normalized: validation, scaling, filtering, temporal controls, missing data handling, cross-validation, synthetic data, and QA.

5

Technical Implementation

Adaptive thresholds are the default "engine" because they're transparent and robust with limited real events. Time-Series Cross-Validation (TS-CV) and cross-stock validation are activated for stronger stock-specific optimization or stress-test generalization. Synthetic data is used to augment training but not in validation, with rigid checks to avoid look-ahead bias.

Stocks and Scores - Subscore weightings for Google, Apple, and Nvidia showing EPS, Sentiment, Macro, Volatility, and Revenue factors

Composite subscore weightings by stock: each company's unique factor sensitivity

Model Insights

The model identifies predictable share price movement around earnings windows. Each stock shows predictable return tendencies in narrow windows around earnings. We generate long/short signals from composite subscores passed through adaptive, event-level thresholds, rather than a blanket "buy-before" rule.

~60%

AAPL Long Hit Rate

~58%

GOOGL Long Hit Rate

Apple (AAPL)

T-5→T+5 long bias. Long-term stability and revenue-driven results lend itself to a longer trade window, especially on positive beats.

NVIDIA (NVDA)

T-1→T+1 short bias. Exhibits "sell the news" effect where positive subscores pre-earnings often led to negative post-earnings returns.

Google (GOOGL)

High degree of self-correlation over time. Benefits from composite scoring approach with macro adjustments.

Strategy performance dashboard showing Alphabet, NVIDIA, and Apple trading signals with hit rates, returns, and best strategy recommendations

System decision analysis: best strategy recommendations by stock

Future Areas to Explore

Include additional stocks and automate threshold learning for cross-sector applications
Apply the engine optimization to companies in other sectors to increase data points and model breadth
Stress-test under extreme conditions and simulate threshold adjustment under macro shocks
Backtest using a dummy portfolio of at least 50 stocks to refine the model and increase accuracy
Expand the model to react to other events in real-time 24/7
Use enhanced platform for pipeline integration and real-time data refresh

Conclusion

The analysis found that stock prices have different factors influencing their behaviour, and once identified, they can be used to build a model that predicts returns for specific windows around earnings events with up to 60% accuracy for trade direction and profitability.

There are optimal windows around earnings events for each company. Apple's long-term stability and revenue-driven results lend itself to a longer trade window, especially on positive beats. In contrast, Nvidia acts in a volatile and sometimes contradictory way, with the stock going down on positive beats—meaning a short trading window can produce profit when shorting. When combined with share and macro information, successful trades can be reliably and repeatedly achieved.

Download Project Files

Access the complete analysis report and system architecture documentation.

Download Report (PDF) View in browser

System Architecture Overview (PDF) View in browser