Paper: AgriRegion Revised — Spatial-Semantic Retrieval with Sensitivity Analysis

Research Question

How does geospatial re-ranking improve agricultural RAG systems, and what is the optimal balance between semantic relevance and geographic proximity for region-specific advisory?

Gap in Literature

Existing agricultural RAG systems (AgriGPT, standard RAG pipelines) rely solely on semantic similarity for retrieval. They don't account for the fact that agricultural advice is inherently local — planting dates, pest pressures, soil types, and regulations vary by region. A recommendation valid in California can be harmful in North Carolina.

No prior work combines spatial distance decay with semantic retrieval scoring for agricultural knowledge systems. The closest work (Spatial-RAG, 2025) addresses general geospatial QA but not the domain-specific challenges of agriculture.

What's Already Built

✅ AgriRegion framework — complete and deployed
✅ 70K+ document corpus indexed in OpenSearch (S3 + vector k-NN)
✅ Spatial-semantic scoring function implemented
✅ AgriRegion-Eval benchmark (160 questions, 12 subfields)
✅ Baseline comparisons vs. GPT-4-Turbo, Claude, Gemini, Mistral
✅ Paper draft exists (11 pages, IEEE Access format)

What's Needed to Complete

The desk rejection was likely due to missing baselines and underexplored core contribution. Three additions fix this:

1. Add "Standard RAG (no geo)" baseline

Same corpus, same LLM, same top-k — but α=0 (pure semantic, no distance decay). This isolates the geo contribution. If AgriRegion beats it by even 5% F1, the novelty is undeniable.

2. α Sensitivity Sweep

Run α from 0.0 to 1.0 in 0.1 steps. Plot F1, EM, BERTScore vs. α. Show the sweet spot. Show that both extremes (pure semantic, pure distance) underperform. This is one experiment, one figure, massive reviewer satisfaction.

3. Distance Decay Function Comparison

Compare inverse (current), Gaussian, step function, and power law. Show which works best. Even if inverse wins, the comparison demonstrates rigor.

Expected Results

AgriRegion (with geo) outperforms standard RAG (without geo) by 5-10% F1 on geo-sensitive questions
Optimal α is between 0.3–0.6 (not 0 or 1)
Largest gains in Soil, Irrigation, Pathology (highly local domains)
Smallest gains in Economics, Food Safety (more universal domains)
Gaussian decay slightly outperforms inverse for mid-range distances

Timeline to Submission

2-3 weeks — the paper exists. Run the three experiments, add the figures, fix the template, resubmit to a better-fit venue.

Why This Venue

Computers & Electronics in Agriculture (IF: 8.3) — exact domain match, loves applied AI systems with evaluation. They publish RAG papers, they publish NC agriculture papers. Much better fit than IEEE Access for this work.

AgriRegion Revised: Region-Aware Retrieval with Spatial-Semantic Sensitivity Analysis