financial-data-analysis

Methods for acquiring, cleaning, and analyzing financial datasets for research

191 stars

Best use case

financial-data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Methods for acquiring, cleaning, and analyzing financial datasets for research

Teams using financial-data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/financial-data-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/domains/finance/financial-data-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/financial-data-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How financial-data-analysis Compares

Feature / Agent	financial-data-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Methods for acquiring, cleaning, and analyzing financial datasets for research

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

AI Agent for SaaS Idea Validation

Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.

SKILL.md Source

# Financial Data Analysis

A practical skill for sourcing, processing, and analyzing financial data in academic research contexts. Covers data acquisition from public APIs, cleaning workflows, and standard analytical techniques used in empirical finance research.

## Data Acquisition

### Public Financial Data Sources

| Source | Data Type | Access | Python Package |
|--------|-----------|--------|---------------|
| Yahoo Finance | Prices, fundamentals | Free | `yfinance` |
| FRED (St. Louis Fed) | Macroeconomic indicators | Free (API key) | `fredapi` |
| SEC EDGAR | Company filings (10-K, 10-Q) | Free | `sec-edgar-downloader` |
| WRDS (Wharton) | CRSP, Compustat, IBES | University subscription | `wrds` |
| Alpha Vantage | Real-time and historical prices | Free tier | `alpha_vantage` |

### Fetching Price Data

```python
import yfinance as yf
import pandas as pd

def fetch_stock_data(tickers: list[str], start: str, end: str) -> pd.DataFrame:
    """
    Fetch adjusted close prices for a list of tickers.

    Args:
        tickers: List of ticker symbols (e.g., ['AAPL', 'MSFT'])
        start: Start date (YYYY-MM-DD)
        end: End date (YYYY-MM-DD)
    Returns:
        DataFrame with adjusted close prices
    """
    data = yf.download(tickers, start=start, end=end, auto_adjust=True)
    prices = data['Close'] if len(tickers) > 1 else data[['Close']]
    prices.columns = tickers if len(tickers) > 1 else tickers
    return prices

# Fetch 5 years of data
prices = fetch_stock_data(['AAPL', 'MSFT', 'GOOGL'], '2020-01-01', '2025-01-01')
print(prices.head())
```

### Macroeconomic Data from FRED

```python
from fredapi import Fred

fred = Fred(api_key=os.environ["FRED_API_KEY"])

# Common series for finance research
series_ids = {
    'GDP': 'GDP',
    'CPI': 'CPIAUCSL',
    'Fed_Funds_Rate': 'FEDFUNDS',
    'Unemployment': 'UNRATE',
    '10Y_Treasury': 'DGS10',
    'VIX': 'VIXCLS'
}

macro_data = pd.DataFrame()
for name, sid in series_ids.items():
    macro_data[name] = fred.get_series(sid, observation_start='2000-01-01')
```

## Data Cleaning Pipeline

Financial data requires careful cleaning before analysis:

```python
def clean_financial_data(df: pd.DataFrame) -> pd.DataFrame:
    """Standard cleaning pipeline for financial time series."""
    cleaned = df.copy()

    # 1. Handle missing values
    missing_pct = cleaned.isnull().sum() / len(cleaned) * 100
    print(f"Missing data:\n{missing_pct}")

    # 2. Forward-fill for market holidays (max 5 days)
    cleaned = cleaned.ffill(limit=5)

    # 3. Remove remaining NaN rows
    cleaned = cleaned.dropna()

    # 4. Detect and flag outliers (>5 sigma daily returns)
    returns = cleaned.pct_change()
    z_scores = (returns - returns.mean()) / returns.std()
    outliers = (z_scores.abs() > 5).any(axis=1)
    print(f"Outlier days flagged: {outliers.sum()}")

    # 5. Verify data integrity
    assert cleaned.index.is_monotonic_increasing, "Index must be sorted"
    assert not cleaned.duplicated().any(), "No duplicate rows allowed"

    return cleaned
```

## Standard Financial Metrics

### Return Calculations

```python
def compute_returns(prices: pd.DataFrame) -> dict:
    """Compute standard return metrics."""
    simple_returns = prices.pct_change().dropna()
    log_returns = np.log(prices / prices.shift(1)).dropna()

    annualized_return = simple_returns.mean() * 252
    annualized_vol = simple_returns.std() * np.sqrt(252)
    sharpe_ratio = annualized_return / annualized_vol

    # Maximum drawdown
    cumulative = (1 + simple_returns).cumprod()
    rolling_max = cumulative.cummax()
    drawdown = (cumulative - rolling_max) / rolling_max
    max_drawdown = drawdown.min()

    return {
        'annualized_return': annualized_return,
        'annualized_volatility': annualized_vol,
        'sharpe_ratio': sharpe_ratio,
        'max_drawdown': max_drawdown
    }
```

## Event Studies

A common methodology in empirical finance research:

1. Define the event window (e.g., [-5, +5] trading days around earnings announcement)
2. Estimate normal returns using the market model over the estimation window (e.g., [-250, -30])
3. Compute abnormal returns: AR = R_actual - R_expected
4. Aggregate cumulative abnormal returns (CAR) across firms
5. Test statistical significance using parametric (Patell test) and non-parametric (sign test) methods

Always report both raw and risk-adjusted results, and perform robustness checks with different estimation windows and benchmark models.

## Reproducibility

Store all data processing steps in version-controlled scripts. Use `pandas.DataFrame.to_parquet()` for efficient storage of intermediate datasets, and document data provenance including download dates, API versions, and any filters applied.

Related Skills

json-data-visualizer

191

from wentorai/research-plugins

Guide to JSON Crack for visualizing complex JSON data structures

datagen-research-guide

191

from wentorai/research-plugins

AI-driven multi-agent research assistant for end-to-end studies

data-collection-automation

191

from wentorai/research-plugins

Automate survey deployment, data collection, and pipeline management

database-comparison-guide

191

from wentorai/research-plugins

Compare major academic databases and when to use each for research

wikidata-api-guide

191

from wentorai/research-plugins

Query Wikidata SPARQL for scholarly metadata, authors, and entities

datacite-api

191

from wentorai/research-plugins

Resolve dataset DOIs and query research data metadata via DataCite

crossref-event-data-api

191

from wentorai/research-plugins

Track scholarly mentions across the web via Crossref Event Data

metadata-skills

191

from wentorai/research-plugins

24 metadata & bibliometrics skills. Trigger: DOI resolution, citation metrics, author disambiguation, bibliometrics. Design: metadata APIs and bibliometric analysis tools for scholarly records.

dataverse-api

191

from wentorai/research-plugins

Deposit and discover research datasets via Harvard Dataverse API

network-analysis-guide

191

from wentorai/research-plugins

Social network analysis methods, metrics, and visualization tools

ipums-microdata-api

191

from wentorai/research-plugins

Access harmonized census and survey microdata via the IPUMS API

astrophysics-data-guide

191

from wentorai/research-plugins

Astronomical data processing with Astropy, FITS files, and sky surveys