feat: add optional US retail sentiment dataset support#2179
Open
alexander-schneider wants to merge 2 commits intomicrosoft:mainfrom
Open
feat: add optional US retail sentiment dataset support#2179alexander-schneider wants to merge 2 commits intomicrosoft:mainfrom
alexander-schneider wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an optional alternative-data path for US equities based on daily retail sentiment factors.
It includes:
adanosdata collector for daily US retail sentiment snapshotsAlpha158AdanosUShandler that augmentsAlpha158with lagged sentiment factorsWhy
Qlib already supports custom collectors and custom datasets well. This patch keeps the integration optional and focuses on a reproducible daily factor workflow rather than a sentiment-only demo.
The goal is to make retail sentiment usable as structured alternative data for factor research on top of existing US OHLCV datasets.
What was added
Collector
New files under
scripts/data_collector/adanos/:collector.pyREADME.mdrequirements.txtThe collector pulls daily rows from the Adanos stock detail endpoints for:
It builds per-symbol daily CSVs with source-specific columns such as:
reddit_buzz,reddit_sentiment,reddit_mentionsx_buzz,x_sentiment,x_mentions,x_avg_ranknews_buzz,news_sentiment,news_mentionspolymarket_buzz,polymarket_sentiment,polymarket_trade_countAnd aggregate daily fields:
retail_buzz_avgretail_sentiment_avgretail_coverageretail_alignment_scoreA
merge_with_price_datacommand is included so users can append sentiment fields to existing normalized US daily price CSVs before runningdump_bin.py.Dataset handler
New files under
qlib/contrib/data/:adanos_features.pyhandler_adanos.pyAlpha158AdanosUSextendsAlpha158with lagged sentiment features such as:All sentiment features are lagged to avoid same-day leakage.
Benchmark example
New example config:
examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158Adanos_US.yamlThis shows how to run a US daily LightGBM workflow using merged price + sentiment qlib data.
Notes
Validation
Targeted checks run locally:
python3 -m pytest tests/test_adanos_collector.py tests/data_mid_layer_tests/test_adanos_handler.py -qpython3 -m compileall qlib/contrib/data scripts/data_collector/adanos tests/test_adanos_collector.py tests/data_mid_layer_tests/test_adanos_handler.pypython3 -m pytest tests/ -q(fails in this checkout because of existing environment/setup issues unrelated to this patch, including missing compiled qlib extensions and optional test dependencies such asmlflow,gym,tianshou,dill, andfire)git diff --check