a-share-site-crawl
Crawl and validate A-share information sources with browser-first and fallback fetch workflows. Use when working with A-share content collection from 韭研公社、雪球、东方财富、财联社、巨潮资讯, especially for: (1) checking whether a site is crawlable, (2) extracting usable public content, (3) choosing between browser and plain fetch, (4) handling anti-bot / login walls, or (5) building repeatable market-news collection, normalization, and cron workflows.
Best use case
a-share-site-crawl is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Crawl and validate A-share information sources with browser-first and fallback fetch workflows. Use when working with A-share content collection from 韭研公社、雪球、东方财富、财联社、巨潮资讯, especially for: (1) checking whether a site is crawlable, (2) extracting usable public content, (3) choosing between browser and plain fetch, (4) handling anti-bot / login walls, or (5) building repeatable market-news collection, normalization, and cron workflows.
Teams using a-share-site-crawl should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/a-share-site-crawl/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How a-share-site-crawl Compares
| Feature / Agent | a-share-site-crawl | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Crawl and validate A-share information sources with browser-first and fallback fetch workflows. Use when working with A-share content collection from 韭研公社、雪球、东方财富、财联社、巨潮资讯, especially for: (1) checking whether a site is crawlable, (2) extracting usable public content, (3) choosing between browser and plain fetch, (4) handling anti-bot / login walls, or (5) building repeatable market-news collection, normalization, and cron workflows.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# A Share Site Crawl Use this skill to collect public A-share information from the five target sites and to convert raw site access into repeatable summary-ready records. ## Read Order Always read these first: - `references/sites.md` - `references/workflow.md` Read these in addition when the task involves formal collection, normalization, or recurring jobs: - `references/entrypoints.md` - `references/fields.md` - `references/risks.md` Use `references/entrypoints.md` for fixed site entry pages, verification status, cron priorities, and default crawl mode. Use `references/fields.md` for the normalized schema, source tiering, credibility, opinion-risk handling, content typing, cron retention, time normalization, ticker normalization, and dedup rules. Use `references/risks.md` for P0/P1/P2 risks, recognition signals, and downgrade or mitigation decisions. ## Core Rule Prefer `browser` for page truth and `web_fetch` for cheap probing. - Use `web_fetch` first when the site is known to have stable public text pages - Use `browser` first when the site is dynamic, disclosure-driven, or clearly stronger in rendered form - If both fail, report the site as restricted or missing instead of pretending it was covered - Do not treat anti-bot code, disclaimers, shells, or login walls as usable content ## Working Workflow ### 1. Start from the correct page type - Prefer fixed entrypoints, list pages, search pages, disclosure pages, telegraph streams, and stock-detail pages - Do not judge 巨潮资讯 from homepage-only text - Do not rely on noisy portal homepages when a better inner page exists ### 2. Probe and classify access Judge each probe into one of these buckets: - `usable`: readable and materially sufficient - `partial`: some content is real, but clearly incomplete - `shell-only`: mainly navigation, scripts, disclaimers, or boilerplate - `blocked`: anti-bot, login wall, or meaningless payload ### 3. Choose extraction mode Use one of these verdicts per site or page: - `fetch-first` - `browser-first` - `restricted` - `not-usable` ### 4. Keep site roles distinct - 巨潮资讯: official confirmation and disclosure verification - 东方财富: public aggregation, data-center navigation, and quasi-structured market pages - 财联社: fast market events and telegraph flow - 韭研公社: topic logic, timeline, and community clue discovery - 雪球: sentiment, heat, stock-detail snapshots, and community discussion ### 5. Normalize before summarizing When the task is more than a one-off crawl check, convert findings into normalized records using `references/fields.md`. Minimum normalization discipline: - assign `source_tier`, `credibility`, `content_type`, and `opinion_risk` - normalize time to Asia/Shanghai when possible - normalize A-share tickers conservatively - deduplicate repeated event coverage - separate confirmed facts from market claims and sentiment ### 6. Apply downgrade rules early Use `references/risks.md` when deciding whether to downgrade, defer, or replace a source. Default downgrade behavior: - login-gated or anti-bot content -> `restricted` - shell-only or disclaimer-heavy result -> switch entrypoint or switch tool - 财联社 telegraph 默认先保留列表正文; only hit `detail` when the list is truncated, a canonical URL is needed, or an original-source jump matters - 巨潮公告默认先保留列表元数据; only chase PDF when the title is high-value enough to justify body extraction, otherwise keep title-derived summary and mark that PDF body was not extracted - community-only claim without confirmation -> keep as clue, not fact - unavailable priority site -> disclose it and use approved fallback public sources ## Default Site Priority Use this order for stable public collection when the task does not specify a scenario: 1. 东方财富 2. 财联社 3. 巨潮资讯 4. 韭研公社 5. 雪球 This order reflects public accessibility and extraction stability, not market importance. ## When to Ask for Stronger Access Ask for stronger access only when the user explicitly wants better extraction from a restricted site, especially 雪球. Examples: - attached Chrome relay tab - logged-in browser profile - cookies or authenticated environment - a dedicated crawler or site-specific script ## Scenario Call Contract When a cron or caller specifies one of these scenario ids, treat it as a compact instruction bundle and do not ask for a longer prompt: - `pre-open`: read `references/entrypoints.md`, `references/fields.md`, and `references/risks.md`; use the pre-open priority order; focus on overnight macro or overseas linkage, policy or industry catalysts, key announcements, expected hot sectors, and today's watchlist - `midday`: read `references/entrypoints.md`, `references/fields.md`, and `references/risks.md`; use the intraday priority order; focus on morning index and turnover snapshot, leading or lagging themes, style or sentiment shifts, active stocks with catalysts, and deviation from the pre-open setup - `late-session`: read `references/entrypoints.md`, `references/fields.md`, and `references/risks.md`; use the intraday priority order; focus on whether the afternoon main line strengthens or rotates, late-session anomalies, money-flow return direction, hot-stock persistence, and signals that may affect post-close review or next-day expectations - `post-close`: read `references/entrypoints.md`, `references/fields.md`, and `references/risks.md`; use the post-close priority order; focus on index and turnover recap, main-line review, key stocks and drivers, important announcements plus exchange or regulator dynamics, and next-day clues with risks For every scenario: - keep the output in Chinese and lead with conclusions before detail - keep `已确认事实`, `市场观点与情绪`, and `待核实线索` clearly separated - keep `本轮缺失站点` and `来源层级说明` in the final output - bind every round to the entrypoint, field-normalization, and risk-downgrade rules instead of freehand summarizing - do not output buy or sell recommendations ## Standard Output When producing a formal round output, always structure it with at least these sections: - `已确认事实` - `市场观点与情绪` - `待核实线索` - `本轮缺失站点` - `来源层级说明` Use the sections as follows: - `已确认事实`: only T1 or well-supported T2 items, or items clearly marked as partially confirmed - `市场观点与情绪`: T3 discussion, heat, consensus drift, and sentiment signals - `待核实线索`: rumors, single-source community claims, partial clues, or conflicting statements - `本轮缺失站点`: blocked, unstable, login-gated, or otherwise uncovered priority sites and what fallback was used - `来源层级说明`: explain T1/T2/T3 usage and remind the reader that community sources are not equal to formal disclosure ## Per-Site Quick Output for Crawlability Tasks When the task is specifically about site feasibility rather than a market summary, return: - Site - Status - Recommended mode - Best entry page - What works - Main limitation - Next step ## Non-Negotiables - Distinguish confirmed facts from community opinion - Prefer official disclosure and high-confidence public reporting over discussion boards - Do not output buy/sell recommendations - Do not imply full coverage when a priority site failed or was inaccessible
Related Skills
tushare-cli
Tushare 数据查询 CLI
website-audit
Website Audit mit 230+ Rules für SEO, Performance, Security, Technical und Content Issues. LLM-optimierte Reports mit Health Scores und Handlungsempfehlungen.
akshare-a-stock
A股量化数据分析工具,基于AkShare库获取A股、港股、美股行情、财务数据、板块分析等。用于回答关于股票查询、行情数据、财务分析、资金流向、龙虎榜、涨停跌停、新股IPO、融资融券等问题。
akshare-finance
AKShare财经数据接口库封装,提供股票、期货、期权、基金、外汇、债券、指数、加密货币等金融产品的基本面数据、实时和历史行情数据、衍生数据。
AKShare-Skill
教你如何使用 AKShare 来获取财经数据
website-change-watcher
Monitor website/docs/pricing changes, diff meaningful updates, and summarize business impact with alert-ready reports.
firecrawl
Web search and scraping via Firecrawl API. Use when you need to search the web, scrape websites (including JS-heavy pages), crawl entire sites, or extract structured data from web pages. Requires FIRECRAWL_API_KEY environment variable.
crawl-for-ai
Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.
You are Website Builder, a product-minded website planning assistant.
Your job is to help the user design and structure websites through conversation.
feishu-share-link
飞书专属分享链接生成规范。当生成文档、多维表格、知识库等链接时,必须同时提供专属企业域名的链接和通用飞书根域名的链接,确保稳妥访问。支持多租户动态读取。
Tushare Pro
Fetch Chinese stock and futures market data via Tushare API. Supports stock quotes, futures data, company fundamentals, and macroeconomic indicators. Use when the user needs financial data from Chinese markets. Requires TUSHARE_TOKEN environment variable.
site-architecture
When the user wants to audit, redesign, or plan their website's structure, URL hierarchy, navigation design, or internal linking strategy. Use when the user mentions 'site architecture,' 'URL structure,' 'internal links,' 'site navigation,' 'breadcrumbs,' 'topic clusters,' 'hub pages,' 'orphan pages,' 'silo structure,' 'information architecture,' or 'website reorganization.' Also use when someone has SEO problems and the root cause is structural (not content or schema). NOT for content strategy decisions about what to write (use content-strategy) or for schema markup (use schema-markup).