sofvBV_mcp重构v2
Embedding copy
This commit is contained in:
96
mcp/SearchPaperByEmbedding/README.md
Normal file
96
mcp/SearchPaperByEmbedding/README.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Paper Semantic Search
|
||||
|
||||
Find similar papers using semantic search. Supports both local models (free) and OpenAI API (better quality).
|
||||
|
||||
## Features
|
||||
|
||||
- Request for papers from OpenReview (e.g., ICLR2026 submissions)
|
||||
- Semantic search with example papers or text queries
|
||||
- Support embedding caching
|
||||
- Embed model support: Open-source (e.g., all-MiniLM-L6-v2) or OpenAI
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 1. Prepare Papers
|
||||
|
||||
```python
|
||||
from crawl import crawl_papers
|
||||
|
||||
crawl_papers(
|
||||
venue_id="ICLR.cc/2026/Conference/Submission",
|
||||
output_file="iclr2026_papers.json"
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Search Papers
|
||||
|
||||
```python
|
||||
from search import PaperSearcher
|
||||
|
||||
# Local model (free)
|
||||
searcher = PaperSearcher('iclr2026_papers.json', model_type='local')
|
||||
|
||||
# OpenAI model (better, requires API key)
|
||||
# export OPENAI_API_KEY='your-key'
|
||||
# searcher = PaperSearcher('iclr2026_papers.json', model_type='openai')
|
||||
|
||||
searcher.compute_embeddings()
|
||||
|
||||
# Search with example papers that you are interested in
|
||||
examples = [
|
||||
{
|
||||
"title": "Your paper title",
|
||||
"abstract": "Your paper abstract..."
|
||||
}
|
||||
]
|
||||
|
||||
results = searcher.search(examples=examples, top_k=100)
|
||||
|
||||
# Or search with text query
|
||||
results = searcher.search(query="interesting topics", top_k=100)
|
||||
|
||||
searcher.display(results, n=10)
|
||||
searcher.save(results, 'results.json')
|
||||
```
|
||||
|
||||
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Paper titles and abstracts are converted to embeddings
|
||||
2. Embeddings are cached automatically
|
||||
3. Your query is embedded using the same model
|
||||
4. Cosine similarity finds the most similar papers
|
||||
5. Results are ranked by similarity score
|
||||
|
||||
## Cache
|
||||
|
||||
Embeddings are cached as `cache_<filename>_<hash>_<model>.npy`. Delete to recompute.
|
||||
|
||||
## Example Output
|
||||
|
||||
```
|
||||
================================================================================
|
||||
Top 100 Results (showing 10)
|
||||
================================================================================
|
||||
|
||||
1. [0.8456] Paper a
|
||||
#12345 | foundation or frontier models, including LLMs
|
||||
https://openreview.net/forum?id=xxx
|
||||
|
||||
2. [0.8234] Paper b
|
||||
#12346 | applications to robotics, autonomy, planning
|
||||
https://openreview.net/forum?id=yyy
|
||||
```
|
||||
|
||||
## Tips
|
||||
|
||||
- Use 1-5 example papers for best results, or a paragraph of description of your interested topic
|
||||
- Local model is good enough for most cases
|
||||
- OpenAI model for critical search (~$1 for 18k queries)
|
||||
|
||||
If it's useful, please consider giving a star~
|
||||
Reference in New Issue
Block a user