citation-analysis

Analyze citation networks, compute bibliometric indicators, and identify research fronts. Use when: user asks about citation patterns, h-index, co-authorship networks, research trends, or bibliometric analysis. NOT for: literature searching (use literature-search) or writing papers (use paper-writing).

564 stars

bybeita6969

View on GitHub Installation ↓

Best use case

citation-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using citation-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/citation-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/citation-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/citation-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How citation-analysis Compares

Feature / Agent	citation-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

AI Agent for SaaS Idea Validation

Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.

SKILL.md Source

# Citation Analysis

Analyze citation networks, compute bibliometric indicators, and identify research fronts using Semantic Scholar, OpenAlex, and CrossRef data.

## When to Use

- "What's the h-index of this author?"
- "Show me the citation network for this paper"
- "Identify the most influential papers in this field"
- "Map the co-authorship network in this area"
- "What are the emerging research fronts in NLP?"
- "Analyze citation trends for CRISPR papers over time"

## When NOT to Use

- Finding papers by topic (use literature-search)
- Reading or summarizing papers (use scienceclaw-summarization)
- Writing papers (use paper-writing)
- Statistical analysis unrelated to citations (use statsmodels-stats)

## Bibliometric Indicators

### Author-Level Metrics

```python
import numpy as np

def h_index(citations: list[int]) -> int:
    """Compute h-index from a list of citation counts."""
    sorted_c = sorted(citations, reverse=True)
    h = 0
    for i, c in enumerate(sorted_c):
        if c >= i + 1:
            h = i + 1
        else:
            break
    return h

def g_index(citations: list[int]) -> int:
    """Compute g-index: largest g such that top g papers have >= g^2 citations."""
    sorted_c = sorted(citations, reverse=True)
    cumsum = np.cumsum(sorted_c)
    g = 0
    for i in range(len(sorted_c)):
        if cumsum[i] >= (i + 1) ** 2:
            g = i + 1
    return g

def i10_index(citations: list[int]) -> int:
    """Number of papers with 10+ citations."""
    return sum(1 for c in citations if c >= 10)
```

### Paper-Level Metrics

- **Citation count**: Raw count from Semantic Scholar / OpenAlex
- **Field-weighted citation impact (FWCI)**: Citations / expected citations in field
- **Percentile rank**: Position relative to same-year, same-field papers
- **Citation velocity**: Citations per year since publication

### Journal-Level Metrics

- **Impact Factor**: Citations in year N to papers published in N-1 and N-2
- **CiteScore**: Citations over 4 years / documents over 4 years
- **h5-index**: h-index of articles published in the last 5 years

## Citation Network Analysis

### Build Citation Graph

```python
import networkx as nx

def build_citation_graph(papers: list[dict]) -> nx.DiGraph:
    """
    Build a directed citation graph.
    Each paper dict should have 'paperId', 'title', 'citations', 'references'.
    Edge direction: citing -> cited.
    """
    G = nx.DiGraph()
    for p in papers:
        G.add_node(p['paperId'], title=p['title'],
                    year=p.get('year'), citations=p.get('citationCount', 0))
        for ref in p.get('references', []):
            if ref.get('paperId'):
                G.add_edge(p['paperId'], ref['paperId'])
        for cit in p.get('citations', []):
            if cit.get('paperId'):
                G.add_edge(cit['paperId'], p['paperId'])
    return G
```

### Key Network Metrics

```python
def analyze_citation_network(G: nx.DiGraph) -> dict:
    """Compute key citation network metrics."""
    results = {}
    results['num_papers'] = G.number_of_nodes()
    results['num_citations'] = G.number_of_edges()
    results['density'] = nx.density(G)

    # Most cited (highest in-degree)
    in_deg = dict(G.in_degree())
    results['most_cited'] = sorted(in_deg.items(), key=lambda x: -x[1])[:10]

    # PageRank (identifies influential papers beyond raw citations)
    pr = nx.pagerank(G)
    results['pagerank_top'] = sorted(pr.items(), key=lambda x: -x[1])[:10]

    # Betweenness centrality (bridge papers connecting subfields)
    bc = nx.betweenness_centrality(G)
    results['bridge_papers'] = sorted(bc.items(), key=lambda x: -x[1])[:10]

    return results
```

### Co-Authorship Network

```python
def build_coauthor_graph(papers: list[dict]) -> nx.Graph:
    """Build undirected co-authorship graph."""
    G = nx.Graph()
    for p in papers:
        authors = [a['name'] for a in p.get('authors', []) if a.get('name')]
        for i, a1 in enumerate(authors):
            G.add_node(a1)
            for a2 in authors[i+1:]:
                if G.has_edge(a1, a2):
                    G[a1][a2]['weight'] += 1
                else:
                    G.add_edge(a1, a2, weight=1)
    return G

def find_communities(G: nx.Graph) -> list:
    """Detect research communities via Louvain."""
    from networkx.algorithms.community import louvain_communities
    return louvain_communities(G, resolution=1.0)
```

## Research Front Detection

### Method: Co-Citation Clustering
1. Identify highly co-cited paper pairs (cited together frequently)
2. Cluster co-cited papers into research fronts
3. Label fronts by common keywords in citing papers

### Method: Citation Burst Detection
1. Track citation counts per year for a set of papers
2. Identify papers with sudden citation increases (Kleinberg burst detection)
3. Papers with recent bursts indicate active research fronts

### Method: Bibliographic Coupling
1. Two papers are coupled if they share references
2. Stronger coupling = more shared references
3. Cluster coupled papers to find parallel research streams

## Visualization

### Citation Trend Plot
```python
import matplotlib.pyplot as plt

def plot_citation_trend(papers: list[dict], output_path: str):
    """Plot citation counts over publication years."""
    years = [p['year'] for p in papers if p.get('year')]
    fig, ax = plt.subplots(figsize=(8, 4))
    ax.hist(years, bins=range(min(years), max(years)+2), edgecolor='black')
    ax.set_xlabel('Publication Year')
    ax.set_ylabel('Number of Papers')
    ax.set_title('Publication Trend')
    fig.tight_layout()
    fig.savefig(output_path, dpi=300)
    print(f"Saved: {output_path}")
```

## Data Sources

| Source | Endpoint | Free? | Rate Limit |
|--------|----------|-------|------------|
| Semantic Scholar | `/paper/{id}/citations`, `/paper/{id}/references` | Yes | 100/5min (no key) |
| OpenAlex | `/works?cited_by={id}`, `/works?cites={id}` | Yes | 100k/day |
| CrossRef | `/works/{doi}` (reference list) | Yes | Polite pool |

## Best Practices

1. Always use Semantic Scholar paperId or DOI as canonical identifiers
2. Normalize author names (handle variants: "J. Smith" vs "John Smith")
3. Filter self-citations when computing impact metrics
4. Use field-normalized metrics for cross-discipline comparisons
5. Report the date of data collection (citation counts change daily)
6. Visualize networks with node size proportional to citations

## Zero-Hallucination Rule

- NEVER fabricate citation counts, h-indices, or paper metadata
- All bibliometric data must come from tool results in the current session
- If an API returns no data for an author/paper, report the empty result explicitly

Related Skills

statistical-analysis

564

from beita6969/ScienceClaw

Guided statistical analysis with test selection and reporting. Use when you need help choosing appropriate tests for your data, assumption checking, power analysis, and APA-formatted results. Best for academic research reporting, test selection guidance. For implementing specific models programmatically use statsmodels.

social-science-analysis

564

from beita6969/ScienceClaw

Social science research methods including survey design, qualitative analysis, content analysis, network analysis, psychometrics, and mixed methods. Covers sociology, psychology, political science, education, and communication studies. Use when user designs surveys, analyzes qualitative data, does content analysis, builds scales, or uses mixed methods. Triggers on "survey design", "qualitative analysis", "content analysis", "Likert scale", "thematic analysis", "grounded theory", "factor analysis", "SEM", "structural equation", "psychometrics", "interview coding".

scipy-analysis

564

from beita6969/ScienceClaw

Scientific computing and statistical analysis with SciPy, NumPy, and pandas. Use when: (1) statistical hypothesis testing, (2) optimization problems, (3) signal processing, (4) numerical integration, (5) data manipulation and analysis. NOT for: symbolic math (use sympy-math), machine learning (use sklearn directly), or visualization (use matplotlib-viz).

patent-analysis

564

from beita6969/ScienceClaw

Conducts patent landscape analysis including prior art searches, patent claim interpretation, freedom-to-operate assessment, and intellectual property strategy for scientific inventions; trigger when users discuss patents, prior art, IP protection, or technology licensing.

paper-analysis

564

from beita6969/ScienceClaw

Read, summarize, and critically analyze scientific papers. Extract key findings, methodology, limitations, and contributions. Use when user shares a paper (PDF/URL/DOI), asks to summarize a paper, critique methodology, extract data from a paper, compare papers, or do a critical review. Triggers on "summarize this paper", "analyze this study", "what does this paper say", "critique this methodology", "extract findings from".

nlp-analysis

564

from beita6969/ScienceClaw

Natural language processing for research including text mining, sentiment analysis, topic modeling, named entity recognition, text classification, and corpus analysis. Use when user needs to analyze text data, extract information from documents, do sentiment analysis, topic modeling, or text classification for research purposes. Triggers on "text mining", "sentiment analysis", "topic modeling", "NER", "named entity", "text classification", "word embeddings", "LDA", "corpus analysis", "word frequency", "TF-IDF".

meta-analysis

564

from beita6969/ScienceClaw

Perform quantitative meta-analysis with effect size calculation, forest plots, funnel plots, and heterogeneity assessment. Use when: user asks to combine results from multiple studies, calculate pooled effect sizes, assess publication bias, or create forest/funnel plots. NOT for: systematic review protocol (use systematic-review) or single-study statistics (use statsmodels-stats).

linguistics-analysis

564

from beita6969/ScienceClaw

Analyze language structures, typological features, and semantic change across languages

legal-analysis

564

from beita6969/ScienceClaw

Analyze legal contracts, extract clauses, and perform legal research with structured frameworks

geospatial-analysis

564

from beita6969/ScienceClaw

Performs geospatial data analysis including GIS operations, spatial statistics, remote sensing image processing, geocoding, and cartographic visualization; trigger when users discuss maps, coordinates, satellite imagery, spatial patterns, or geographic data.

genomics-analysis

564

from beita6969/ScienceClaw

Orchestrates a genomics analysis workflow from gene query through expression analysis to pathway enrichment. Use when investigating gene function, analyzing expression data, or performing pathway-level interpretation. NOT for pure protein structure modeling or drug-target interaction analysis.

genome-analysis

564

from beita6969/ScienceClaw

Performs genomics analyses including gene expression profiling, BLAST sequence alignment, GWAS interpretation, variant calling, and genome assembly tasks; trigger when the user mentions DNA/RNA sequences, SNPs, gene panels, or comparative genomics.