financial-data-analysis

Methods for acquiring, cleaning, and analyzing financial datasets for research

191 stars

Best use case

financial-data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Methods for acquiring, cleaning, and analyzing financial datasets for research

Teams using financial-data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/financial-data-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/domains/finance/financial-data-analysis/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/financial-data-analysis/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How financial-data-analysis Compares

Feature / Agentfinancial-data-analysisStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Methods for acquiring, cleaning, and analyzing financial datasets for research

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Financial Data Analysis

A practical skill for sourcing, processing, and analyzing financial data in academic research contexts. Covers data acquisition from public APIs, cleaning workflows, and standard analytical techniques used in empirical finance research.

## Data Acquisition

### Public Financial Data Sources

| Source | Data Type | Access | Python Package |
|--------|-----------|--------|---------------|
| Yahoo Finance | Prices, fundamentals | Free | `yfinance` |
| FRED (St. Louis Fed) | Macroeconomic indicators | Free (API key) | `fredapi` |
| SEC EDGAR | Company filings (10-K, 10-Q) | Free | `sec-edgar-downloader` |
| WRDS (Wharton) | CRSP, Compustat, IBES | University subscription | `wrds` |
| Alpha Vantage | Real-time and historical prices | Free tier | `alpha_vantage` |

### Fetching Price Data

```python
import yfinance as yf
import pandas as pd

def fetch_stock_data(tickers: list[str], start: str, end: str) -> pd.DataFrame:
    """
    Fetch adjusted close prices for a list of tickers.

    Args:
        tickers: List of ticker symbols (e.g., ['AAPL', 'MSFT'])
        start: Start date (YYYY-MM-DD)
        end: End date (YYYY-MM-DD)
    Returns:
        DataFrame with adjusted close prices
    """
    data = yf.download(tickers, start=start, end=end, auto_adjust=True)
    prices = data['Close'] if len(tickers) > 1 else data[['Close']]
    prices.columns = tickers if len(tickers) > 1 else tickers
    return prices

# Fetch 5 years of data
prices = fetch_stock_data(['AAPL', 'MSFT', 'GOOGL'], '2020-01-01', '2025-01-01')
print(prices.head())
```

### Macroeconomic Data from FRED

```python
from fredapi import Fred

fred = Fred(api_key=os.environ["FRED_API_KEY"])

# Common series for finance research
series_ids = {
    'GDP': 'GDP',
    'CPI': 'CPIAUCSL',
    'Fed_Funds_Rate': 'FEDFUNDS',
    'Unemployment': 'UNRATE',
    '10Y_Treasury': 'DGS10',
    'VIX': 'VIXCLS'
}

macro_data = pd.DataFrame()
for name, sid in series_ids.items():
    macro_data[name] = fred.get_series(sid, observation_start='2000-01-01')
```

## Data Cleaning Pipeline

Financial data requires careful cleaning before analysis:

```python
def clean_financial_data(df: pd.DataFrame) -> pd.DataFrame:
    """Standard cleaning pipeline for financial time series."""
    cleaned = df.copy()

    # 1. Handle missing values
    missing_pct = cleaned.isnull().sum() / len(cleaned) * 100
    print(f"Missing data:\n{missing_pct}")

    # 2. Forward-fill for market holidays (max 5 days)
    cleaned = cleaned.ffill(limit=5)

    # 3. Remove remaining NaN rows
    cleaned = cleaned.dropna()

    # 4. Detect and flag outliers (>5 sigma daily returns)
    returns = cleaned.pct_change()
    z_scores = (returns - returns.mean()) / returns.std()
    outliers = (z_scores.abs() > 5).any(axis=1)
    print(f"Outlier days flagged: {outliers.sum()}")

    # 5. Verify data integrity
    assert cleaned.index.is_monotonic_increasing, "Index must be sorted"
    assert not cleaned.duplicated().any(), "No duplicate rows allowed"

    return cleaned
```

## Standard Financial Metrics

### Return Calculations

```python
def compute_returns(prices: pd.DataFrame) -> dict:
    """Compute standard return metrics."""
    simple_returns = prices.pct_change().dropna()
    log_returns = np.log(prices / prices.shift(1)).dropna()

    annualized_return = simple_returns.mean() * 252
    annualized_vol = simple_returns.std() * np.sqrt(252)
    sharpe_ratio = annualized_return / annualized_vol

    # Maximum drawdown
    cumulative = (1 + simple_returns).cumprod()
    rolling_max = cumulative.cummax()
    drawdown = (cumulative - rolling_max) / rolling_max
    max_drawdown = drawdown.min()

    return {
        'annualized_return': annualized_return,
        'annualized_volatility': annualized_vol,
        'sharpe_ratio': sharpe_ratio,
        'max_drawdown': max_drawdown
    }
```

## Event Studies

A common methodology in empirical finance research:

1. Define the event window (e.g., [-5, +5] trading days around earnings announcement)
2. Estimate normal returns using the market model over the estimation window (e.g., [-250, -30])
3. Compute abnormal returns: AR = R_actual - R_expected
4. Aggregate cumulative abnormal returns (CAR) across firms
5. Test statistical significance using parametric (Patell test) and non-parametric (sign test) methods

Always report both raw and risk-adjusted results, and perform robustness checks with different estimation windows and benchmark models.

## Reproducibility

Store all data processing steps in version-controlled scripts. Use `pandas.DataFrame.to_parquet()` for efficient storage of intermediate datasets, and document data provenance including download dates, API versions, and any filters applied.

Related Skills

json-data-visualizer

191
from wentorai/research-plugins

Guide to JSON Crack for visualizing complex JSON data structures

datagen-research-guide

191
from wentorai/research-plugins

AI-driven multi-agent research assistant for end-to-end studies

data-collection-automation

191
from wentorai/research-plugins

Automate survey deployment, data collection, and pipeline management

database-comparison-guide

191
from wentorai/research-plugins

Compare major academic databases and when to use each for research

wikidata-api-guide

191
from wentorai/research-plugins

Query Wikidata SPARQL for scholarly metadata, authors, and entities

datacite-api

191
from wentorai/research-plugins

Resolve dataset DOIs and query research data metadata via DataCite

crossref-event-data-api

191
from wentorai/research-plugins

Track scholarly mentions across the web via Crossref Event Data

metadata-skills

191
from wentorai/research-plugins

24 metadata & bibliometrics skills. Trigger: DOI resolution, citation metrics, author disambiguation, bibliometrics. Design: metadata APIs and bibliometric analysis tools for scholarly records.

dataverse-api

191
from wentorai/research-plugins

Deposit and discover research datasets via Harvard Dataverse API

network-analysis-guide

191
from wentorai/research-plugins

Social network analysis methods, metrics, and visualization tools

ipums-microdata-api

191
from wentorai/research-plugins

Access harmonized census and survey microdata via the IPUMS API

astrophysics-data-guide

191
from wentorai/research-plugins

Astronomical data processing with Astropy, FITS files, and sky surveys