Stock Screener Architecture¶
Technical documentation for the AI-powered Stock Screener implementation.
Overview¶
The Stock Screener is a Streamlit-based application that combines traditional filtering with AI-powered natural language query interpretation. It integrates with the Trading System API for market data and uses Ollama (local LLM) for intelligent query processing.
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ Streamlit UI (pages/3_Screener.py) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Natural Lang │ │ Traditional │ │ Results + │ │
│ │ Query Tab │ │ Filters Tab │ │ Chat Panel │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ LLM Service Layer │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ interpret_screening_query() │ │
│ │ analyze_screened_results() │ │
│ │ chat_about_results() │ │
│ └──────────────────┬─────────────────────────────────┘ │
└──────────────────────┼────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Ollama (Local LLM) │
│ http://localhost:11434 │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Technical Indicators Module │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ RSI │ │ MACD │ │ SMA │ │ Volatility│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Trading System API │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Market Data │ │ Company Info │ │ Symbols │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Components¶
1. Screener Page (streamlit_ui/pages/3_Screener.py)¶
Main Streamlit page component.
Key Functions:
def screener_page()
"""Main screener page entry point"""
# Initializes API client and LLM service (model selector in sidebar)
# Creates tabbed interface (Natural Language / Traditional Filters)
# Handles user interactions
# Displays results via display_results()
def get_indicators_for_symbol_from_db(symbol, ohlc_data)
"""Load pre-calculated technical indicators from database (database-first)"""
# Reads latest values via get_latest_technical_indicators()
# Falls back to current price + volume if no DB record
# Returns dict with sma_20, sma_50, rsi, price_change_*, volatility, macd_*, bb_position
def _sector_matches(sector_criteria, stock_sector)
"""Case-insensitive sector match with alias expansion (SECTOR_VARIATIONS)"""
def _compute_signal(stock)
"""Derive Oversold / Overbought / Bullish / Bearish / Neutral badge from RSI + MACD histogram"""
def _format_criteria_readable(criteria)
"""Convert parsed criteria dict to human-readable info string shown below query box"""
def screen_stocks(api_client, llm_service, criteria, symbols, query)
"""Screen stocks based on criteria dict"""
# Iterates through symbols with progress bar
# Fetches market data and DB indicators
# Applies all filters (sector, price, volume, RSI, MACD, BB, SMA crossover, etc.)
# Sorts and optionally truncates to top 10 when sort_by from LLM is set
# Returns matching stocks
def display_results(results, llm_service, query)
"""Render AI analysis expander, AgGrid results table, CSV download, and chat panel"""
def _build_results_df(results)
"""Build DataFrame with numeric columns for correct AgGrid sorting"""
def _configure_aggrid(df)
"""Build AgGrid options with RSI colour coding, signal badge styling, and value formatters"""
Session State:
- screener_results: List of screened stock results
- screener_query: Last natural language query
- screener_chat_history: Conversation history for follow-up chat panel (trimmed to last 6 turns)
2. LLM Service (streamlit_ui/services/llm_service.py)¶
Service for interacting with Ollama local LLM.
Class: LLMService
class LLMService:
def __init__(self, model: str = "phi3", base_url: Optional[str] = None)
"""Initialize LLM service with model and base URL"""
def interpret_screening_query(self, query: str) -> Dict[str, Any]
"""Parse natural language query into structured criteria"""
# Returns dict with: sector, industry, min_price, max_price,
# min_volume, min_market_cap, rsi_min, rsi_max,
# min_price_change_pct, max_price_change_pct, keywords,
# sort_by (rsi_desc|rsi_asc|price_change_desc|price_change_asc|market_cap_desc|price_desc)
# Post-processing: _fix_comparison_direction(), keyword hallucination filter
def _fix_comparison_direction(self, query: str, criteria: Dict) -> Dict
"""Regex correction for swapped rsi_min/rsi_max and min_price/max_price"""
# Detects 'rsi < 30', 'price above 50', etc. and enforces correct field
def _fallback_parse(self, content: str) -> Dict
"""Regex-based fallback for malformed JSON LLM responses"""
def analyze_screened_results(self, results: List[Dict], query: Optional[str]) -> str
"""Generate AI analysis of screening results (top 10, 3-4 sentences)"""
# Includes RSI, MACD direction, 30d change, Bollinger Band position
def chat_about_results(self, results: List[Dict], history: List[Dict], question: str) -> str
"""Multi-turn conversational follow-up about screened results"""
# Builds compact context string from results (up to 15 stocks)
# Prepends system message with context, appends trimmed history (last 6 turns)
def get_model_info(self) -> Dict[str, Any]
"""Get information about available Ollama models"""
The service is instantiated via @st.cache_resource-decorated get_llm_service(model) for session-scoped caching.
Query Interpretation Prompt:
The LLM receives a structured prompt requesting JSON output with format="json" and temperature=0.1. A post-parse _fix_comparison_direction() pass corrects any swapped comparison operators. Keywords not appearing verbatim in the original query are stripped to prevent hallucination.
Analysis Prompt: The LLM receives a compact data block (top 10 stocks with RSI, MACD, 30d change, BB position) and returns a focused 3–4 sentence analysis referencing specific tickers.
3. Technical Indicators (Database-Backed)¶
The screener uses database-backed indicators as the primary source (implemented). Pre-calculated values are read via get_latest_technical_indicators(symbol) from the data_ingestion.technical_indicators_latest table. A lightweight fallback (current price + volume only) is used if no DB record exists.
Indicator fields returned by get_indicators_for_symbol_from_db():
| Field | Source |
|---|---|
current_price |
Latest close price |
sma_20, sma_50 |
DB: sma_20, sma_50 |
rsi |
DB: rsi_14 or rsi |
price_change_1d, _5d, _30d |
DB: price_change_* |
volatility |
DB: volatility_20 |
macd, macd_signal, macd_histogram |
DB: macd_line, macd_signal, macd_histogram |
bb_position |
DB: bb_position |
avg_volume |
DB: avg_volume_20 |
4. API Client Integration¶
Uses existing TradingSystemAPI client:
from streamlit_ui.api_client import get_api_client
api_client = get_api_client()
# Get symbols
symbols = api_client.get_all_symbols()
symbols = api_client.get_symbols_by_filter(sector="Technology", industry="Software")
# Get market data
ohlc_data = api_client.get_market_data(symbol="AAPL", data_source="yahoo")
# Get company info
company_info = api_client.get_company_info(symbol="AAPL")
Data Flow¶
Natural Language Query Flow¶
1. User enters query: "Find tech stocks with RSI < 30"
│
▼
2. LLM Service interprets query
│
▼
3. Returns structured criteria:
{
"sector": "Technology",
"rsi_max": 30
}
│
▼
4. screen_stocks() function:
- Loads symbols from API
- For each symbol:
a. Fetches market data
b. Calculates indicators
c. Applies filters
d. Adds to results if matches
│
▼
5. Results displayed in AgGrid table (Signal badge, RSI colour coding)
│
▼
6. LLM generates analysis of results (top 10, references specific tickers)
│
▼
7. Analysis displayed in expandable section
│
▼
8. User may ask follow-up questions via chat panel (chat_about_results())
Traditional Filter Flow¶
1. User sets filters in UI
│
▼
2. Criteria dictionary built from inputs
│
▼
3. screen_stocks() function (same as above)
│
▼
4. Results displayed (no LLM analysis)
Filtering Logic¶
Criteria Dictionary Structure¶
{
# Core filters
"sector": str | None, # Sector name (alias-expanded via SECTOR_VARIATIONS)
"industry": str | None, # Industry name
"min_price": float | None, # Minimum price
"max_price": float | None, # Maximum price
"min_volume": int | None, # Minimum average volume
"min_market_cap": float | None, # Minimum market cap (billions)
# RSI
"rsi_min": float | None, # Minimum RSI (0-100)
"rsi_max": float | None, # Maximum RSI (0-100)
# Price change
"min_price_change_pct": float | None, # Min 30d price change %
"max_price_change_pct": float | None, # Max 30d price change %
"min_price_change_1d": float | None, # Min 1d price change %
"min_price_change_5d": float | None, # Min 5d price change %
# Advanced technical filters
"max_volatility": float | None, # Max annualised volatility %
"macd_signal": "bullish"|"bearish"|None, # MACD histogram direction
"bb_min": float | None, # Min Bollinger Band position (0-1)
"bb_max": float | None, # Max Bollinger Band position (0-1)
"sma_crossover": str | None, # above_sma20 | below_sma20 | golden_cross | death_cross
# Sorting
"sort_by": str | None, # LLM sort key (see LLM_SORT_MAP)
"manual_sort": str | None, # UI sort label (see SORT_KEY_MAP)
"keywords": List[str] | None, # Company name or ticker search
}
Filter Application¶
Filters are applied sequentially with AND logic: - All specified criteria must match - Missing criteria are ignored (no filter applied) - Numeric comparisons use standard operators (<, >, <=, >=) - String matching uses case-insensitive contains
Performance Considerations¶
Optimization Strategies¶
- Database-Backed Indicators (Recommended):
- Pre-calculated indicators stored in database
- 10-100x faster than on-the-fly calculation
- Direct SQL filtering on RSI, MACD, etc.
- Supports screening thousands of symbols
-
Implementation: See "Technical Indicators Storage" section below
-
Caching:
- Company info cached for 30 minutes
- Sectors/industries cached for 1 hour
- Indicator values cached in database
-
Reduces API calls and calculations
-
Batch Processing:
- Processes symbols sequentially
- Shows progress indicator
-
Limits to 50 symbols for demo (can increase with database)
-
Lazy Loading:
- Only loads data when needed
- Skips symbols that fail early filters
-
Database queries filter before fetching full data
-
Error Handling:
- Continues processing on individual symbol errors
- Logs errors without stopping entire process
- Falls back to calculation if database unavailable
Bottlenecks¶
- API Calls:
- Each symbol requires 2-3 API calls (market data + company info)
- Network latency adds up
-
Solution: Caching, parallel processing, database storage
-
Indicator Calculations:
- RSI, MACD require full price history
- Some calculations are O(n) where n = data points
-
Solution: Pre-calculate and store in database (10-100x faster)
-
LLM Processing:
- Query interpretation: ~1-2 seconds
- Result analysis: ~2-5 seconds
- Solution: Use faster models (phi3), async processing (future)
Technical Indicators Storage¶
Recommended Architecture: Hybrid Approach¶
Store indicators in database for performance while maintaining calculation fallback.
Database Schema¶
Latest Values Table (for fast screening):
CREATE TABLE data_ingestion.technical_indicators_latest (
symbol VARCHAR(20) PRIMARY KEY,
calculated_date DATE NOT NULL,
-- Moving Averages
sma_20 NUMERIC(15,4),
sma_50 NUMERIC(15,4),
ema_12 NUMERIC(15,4),
ema_26 NUMERIC(15,4),
-- Momentum
rsi NUMERIC(5,2), -- 0-100
rsi_14 NUMERIC(5,2),
-- MACD
macd_line NUMERIC(15,4),
macd_signal NUMERIC(15,4),
macd_histogram NUMERIC(15,4),
-- Bollinger Bands
bb_upper NUMERIC(15,4),
bb_middle NUMERIC(15,4),
bb_lower NUMERIC(15,4),
bb_position NUMERIC(5,4),
-- Volatility & Price Changes
volatility_20 NUMERIC(5,2),
price_change_1d NUMERIC(5,2),
price_change_30d NUMERIC(5,2),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Time-Series Table (for historical analysis):
CREATE TABLE data_ingestion.technical_indicators (
id BIGSERIAL PRIMARY KEY,
symbol VARCHAR(20) NOT NULL,
date DATE NOT NULL,
-- Same indicator fields as latest table
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
CONSTRAINT unique_symbol_date UNIQUE (symbol, date)
);
Calculation Service¶
Service Structure:
src/services/analytics/
├── indicator_calculator.py # Main calculation service
├── indicator_storage.py # Database storage operations
└── indicator_models.py # SQLAlchemy models
Key Functions:
- calculate_indicators_for_symbol(): Calculate all indicators for a symbol
- batch_calculate_indicators(): Process multiple symbols
- update_indicator_tables(): Store in both latest and time-series tables
- get_latest_indicators(): Retrieve current values for screening
Data Pipeline¶
- Daily Calculation Job:
- Runs after market close
- Calculates indicators for all active symbols
- Updates both tables
-
Handles missing data gracefully
-
On-Demand Calculation:
- For new symbols
- For missing dates
-
Triggered by data ingestion
-
Incremental Updates:
- Only calculate for new market data
- Skip if already calculated
- Efficient for daily updates
Screener Integration¶
Updated Flow:
def get_indicators_for_symbol(symbol):
# 1. Try database first (fast)
indicators = get_latest_indicators_from_db(symbol)
if indicators:
return indicators
# 2. Fallback to calculation
market_data = get_market_data(symbol)
indicators = calculate_indicators(market_data)
# 3. Store for future use (optional)
store_indicators_in_db(symbol, indicators)
return indicators
Benefits: - Fast screening (database queries) - Always works (fallback calculation) - Automatic optimization (stores calculated values) - No breaking changes (backward compatible)
Performance Comparison¶
| Approach | Symbols/Second | Storage | Scalability |
|---|---|---|---|
| On-the-fly Calculation | 0.5-1 | None | Limited (50 symbols) |
| Database Storage | 10-50 | ~50-100 bytes/symbol/day | Excellent (1000+ symbols) |
Migration Strategy¶
- Phase 1: Create database tables (no breaking changes)
- Phase 2: Build calculation service
- Phase 3: Populate for existing symbols
- Phase 4: Update screener to use database
- Phase 5: Add scheduled calculation jobs
Error Handling¶
LLM Service Errors¶
try:
criteria = llm_service.interpret_screening_query(query)
except Exception as e:
# Falls back to empty criteria
# Shows warning to user
# Allows traditional filters as fallback
API Errors¶
try:
ohlc_data = get_real_market_data(api_client, symbol, "yahoo")
except Exception as e:
# Skips symbol
# Logs error
# Continues with next symbol
Data Quality Issues¶
- Missing data: Returns
Nonefor indicators - Insufficient history: Skips indicator calculation
- Invalid values: Uses safe defaults
Testing¶
Unit Tests¶
Test technical indicators:
def test_calculate_rsi():
prices = [100, 102, 101, 103, 105, 104, 106]
rsi = calculate_rsi(prices, period=5)
assert 0 <= rsi <= 100
Integration Tests¶
Test screening flow:
def test_screen_stocks():
criteria = {"sector": "Technology", "rsi_max": 30}
results = screen_stocks(api_client, llm_service, criteria, ["AAPL", "MSFT"])
assert isinstance(results, list)
Manual Testing¶
- Test natural language queries
- Test traditional filters
- Verify indicator calculations
- Check LLM integration
- Test error handling
Configuration¶
Environment Variables¶
# Ollama Configuration (optional)
OLLAMA_BASE_URL=http://localhost:11434 # Default
# Model Selection
OLLAMA_MODEL=phi3 # Default model
Code Configuration¶
In Screener.py:
# Symbol limit for screening
symbols[:50] # Limit to 50 symbols
# LLM model selection
llm_service = get_llm_service(model="phi3")
Dependencies¶
Required Packages¶
# Core
streamlit>=1.28.0
pandas>=2.0.0
numpy>=1.24.0
# LLM Integration
ollama>=0.1.0
# UI Components
st-aggrid>=1.2.0 # For interactive tables
plotly>=5.17.0 # For charts (if added)
# API Client
requests>=2.31.0
External Services¶
- Ollama: Must be running locally on port 11434
- Trading System API: Must be accessible
- Database: Market data must be populated
Future Enhancements¶
Planned Features¶
- Parallel Processing:
- Process multiple symbols concurrently
- Use asyncio for API calls
-
Reduce total screening time
-
Saved Screens:
- Save filter combinations
- Schedule automatic screening
-
Alert on matches
-
Performance Improvements:
- Batch API calls
-
Cache results between tab switches
-
Enhanced AI:
- Strategy suggestions from screened results
- Pattern recognition filters
Security Considerations¶
Local LLM¶
- ✅ No data sent to external services
- ✅ All processing happens locally
- ✅ No API keys required for LLM
API Access¶
- Uses existing API authentication
- Respects API rate limits
- Handles connection errors gracefully
Data Privacy¶
- No user data stored
- Results only in session state
- CSV exports are user-initiated
Troubleshooting Guide¶
Common Issues¶
- LLM Not Responding:
- Check Ollama is running:
ollama list - Verify model installed:
ollama pull phi3 -
Test connection:
python scripts/test_ollama.py -
No Results:
- Check filter criteria
- Verify market data exists
-
Check API connection
-
Slow Performance:
- Reduce symbol count
- Use traditional filters (faster)
-
Check API response times
-
Missing Indicators:
- Ensure sufficient historical data
- Check data quality
- Verify calculation functions
Code Examples¶
Adding a New Filter¶
# In screen_stocks() function
if criteria.get('new_filter'):
if stock_value < criteria['new_filter']:
matches = False
Adding a New Indicator¶
# In technical_indicators.py
def calculate_new_indicator(prices: List[float]) -> Optional[float]:
# Implementation
pass
# In calculate_indicators_for_symbol()
indicators['new_indicator'] = calculate_new_indicator(closing_prices)
Customizing LLM Prompts¶
# In llm_service.py
prompt = f"""Your custom prompt here...
Query: "{query}"
...
"""
Last Updated: March 2026 Version: 1.1.0+ Author: Trading System Development Team