IEEE ACCESS 2026 • SUBMITTED

AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice

Mesafint Fanuel, Mahmoud Nabil Mahmoud, Crystal Cook Marshall, Vishal Lakhotia, Biswanath Dari, Kaushik Roy, Shaohu Zhang
NC A&T State University • University of Alabama • NC State Extension • Supported by USDA Grant NC.X 374-5-24-170-1
AgriRegion is a Retrieval-Augmented Generation (RAG) framework designed for high-fidelity, region-aware agricultural advisory. Unlike standard RAG that relies solely on semantic similarity, AgriRegion incorporates a geospatial metadata injection layer and a region-prioritized re-ranking mechanism — ensuring advice about planting schedules, pest control, and fertilization is locally accurate for NC farmers.

Key Results

Evaluated on AgriRegion-Eval (160 questions across 12 agricultural subfields):

0.82
F1 Score
+12% vs GPT-4-Turbo
0.90
BERTScore
+8% vs GPT-4-Turbo
0.86
RAGA-Precision
Retrieval grounding
0.76
Exact Match
+12% vs GPT-4-Turbo
10-20%
Hallucination Reduction
vs state-of-the-art LLMs
70K+
Documents Indexed
NC Extension + Scopus + Textbooks

Core Contributions

System Architecture

AgriRegion is a reactive system — the farmer asks a question or chats, and the AI retrieves region-aware context to generate a grounded response.

AgriRegion — Reactive RAG Backend AWS
💬
Farmer
API Gateway
λ
Lambda
🔎
OpenSearch
📍
Geo Re-Rank
🧠
Bedrock
Response
Reactive: Farmer asks → API Gateway → Lambda orchestrates → OpenSearch retrieves → Geo re-ranks → Bedrock generates → Response

Research Extensions — From AgriRegion to Multiple Papers

AgriRegion is the foundation. The system we built and deployed opens up several distinct follow-up research directions, each publishable independently:

📄 Deep Dives — 3 Papers Closest to Submission:

1. AgriRegion Revised — Add α sensitivity + baselines, resubmit (2-3 weeks)

2. Mobile & Web Deployment — System is live, needs writeup + pilot metrics

3. Proactive vs. Reactive — Both modes running, needs formal comparison

Extension A — Retrieval Optimization

Cross-Encoder Re-Ranking and Hybrid Dense-Sparse Retrieval for Agricultural Knowledge Systems

Information Retrieval JournalSIGIRECIR

Replace the current cosine + distance decay with learned cross-encoder re-ranking. Compare hybrid BM25 + dense retrieval vs. pure dense. Benchmark on AgriRegion-Eval with expanded question set.

Ready to implement — infrastructure exists
Extension B — Multimodal AgriRegion

Multimodal Region-Aware RAG: Integrating Crop Disease Images and Geospatial Features for Agricultural Decision Support

Computers & Electronics in AgricultureAI in Agriculture

Extend the RAG pipeline to accept images (crop disease photos, NDVI maps) alongside text queries. Use vision-language models for multimodal retrieval. NC crop disease image dataset as evaluation.

Builds on AgriGPT-VL direction mentioned in paper
Extension C — Proactive AgriRegion

From Reactive Queries to Proactive Alerts: Anticipatory Agricultural AI with Sensor-Triggered RAG

Frontiers in Plant ScienceSmart Agricultural Technology

Instead of waiting for farmer queries, trigger RAG retrieval automatically when IoT sensor data crosses thresholds. The Farmerly AI platform already implements this — needs formal evaluation comparing proactive vs. reactive modes.

Implemented in Farmerly AI — needs evaluation paper
Extension D — Mobile Field Deployment

On-Device Agricultural RAG: Deploying Region-Aware Retrieval for Low-Connectivity Field Use

IEEE Pervasive ComputingMobiSysACM COMPASS

The mobile app is deployed. Document the engineering of on-device retrieval, offline-first architecture, bandwidth optimization, and field usability with NC farmers.

Mobile app deployed — collecting field data
Extension E — Benchmark Expansion

AgriRegion-Eval v2: A Multi-State, Multi-Crop Benchmark for Region-Aware Agricultural QA

NeurIPS Datasets & BenchmarksACL

Expand from 160 → 500+ questions. Add multi-state coverage beyond NC. Include temporal questions (seasonal advice). Release as open benchmark for the community.

Current 160-question dataset exists — expansion straightforward
Extension F — Farmer Trust & Adoption

Do Farmers Trust AI Agronomists? A Mixed-Methods Study of Region-Aware RAG Adoption in North Carolina

Agriculture and Human ValuesComputers in Human Behavior

Qualitative + quantitative study with pilot farmers. Measure trust, comprehension, adoption barriers, and comparison with human extension agents. IRB-approved farmer interviews.

Pilot farmers actively using system — data collection possible
Extension G — IoT + RAG Integration

Sensor-Grounded RAG: Real-Time IoT Data as Context for Agricultural Knowledge Retrieval

IEEE IoT JournalPrecision Agriculture

Inject live sensor readings (soil moisture, temperature, pH) into the RAG context window alongside retrieved documents. The system already does this — needs formal ablation showing sensor context improves answer quality.

Implemented in Farmerly AI platform

The Spatial-Semantic Score — Deep Research Thread

The core novelty of AgriRegion is the spatial-semantic scoring function. The current paper presents it in half a page, but this single equation opens an entire research direction with 5 additional papers:

CURRENT FORMULATION: S_final = (1 − α) · S_semantic + α · S_distance S_distance = 1 / (1 + d(g_user, g_doc)) WHERE: • S_semantic = cosine similarity (query embedding, doc embedding) • d(g_user, g_doc) = normalized geodesic distance (user → document region) • α = 0.5 (fixed hyperparameter — but should it be?) OPEN RESEARCH QUESTIONS: 1. What is the optimal α? Is it query-dependent? 2. Is inverse-distance the best decay function? 3. Is geodesic distance the right distance metric for agriculture? 4. Should scoring be multi-scale (county vs. state vs. region)? 5. Should time be a third scoring dimension?
Sensitivity Analyses for Revised Paper

These analyses strengthen the AgriRegion resubmission and each can expand into its own paper:

Papers from the Spatial-Semantic Thread

Spatial Paper A — Core Method

Spatial-Semantic Retrieval: Distance Decay Functions and Adaptive Weighting for Geographically-Grounded Knowledge Systems

Information Retrieval JournalSIGIRECIR

Formal framework for spatial-semantic scoring. 5+ distance decay functions compared. Adaptive α (learned per-query vs. fixed). Evaluation on AgriRegion-Eval + a second non-agriculture geo-QA dataset to show generality. This is a pure IR paper — not agriculture-specific, broader audience.

Requires: α sweep experiments + decay function comparison
Spatial Paper B — Agroecological Distance

Beyond Geodesic Distance: Agroecological Zone Similarity for Region-Aware Agricultural Retrieval

Precision AgricultureComputers & Electronics in Agriculture

Replace raw km distance with agroecological similarity (soil type, climate zone, growing season overlap). A farm in eastern NC (sandy loam, humid subtropical) should retrieve docs from coastal SC (similar ecology) over western NC (mountain, clay soil) — even though western NC is geographically closer. Key insight: geographic distance ≠ agricultural relevance.

Requires: USDA soil/climate zone data integration + new distance metric
Spatial Paper C — Adaptive Query Classification

When Does Location Matter? Adaptive Geo-Sensitivity Classification for Agricultural Question Answering

ACL WorkshopEMNLP FindingsExpert Systems with Applications

Train a classifier that predicts whether a query is geo-sensitive or geo-agnostic. Dynamically set α per query (α→0 for "what is photosynthesis", α→0.7 for "when to plant sweet potatoes"). Small model (BERT-tiny) runs in <10ms — no latency cost. Show adaptive beats fixed α by 5-8% on mixed-domain questions.

Requires: query annotation (geo-sensitive vs. agnostic) + classifier training
Spatial Paper D — Multi-Scale Retrieval

Multi-Scale Spatial-Semantic Retrieval: County, State, and Regional Knowledge Fusion for Agricultural AI

IEEE IoT JournalGeoInformaticaJASIST

Documents have different spatial granularity: county-level ("Sampson County pest advisory"), state-level ("NC Extension irrigation guide"), regional ("Southeast US soil management"), national ("USDA organic certification"). Propose hierarchical retrieval scoring at multiple scales and fusing results. Show multi-scale outperforms single-scale.

Requires: document granularity annotation + hierarchical scoring implementation
Spatial Paper E — Temporal-Spatial-Semantic (3D Scoring)

Temporal-Spatial-Semantic Retrieval: When, Where, and What for Agricultural Knowledge Systems

Knowledge-Based SystemsInformation Fusion (IF: 14.7)

Add a temporal dimension: S_final = (1−α−β)·S_sem + α·S_distance + β·S_temporal. S_temporal decays for older documents (2024 pest advisory > 2015 one). But some knowledge is timeless (soil chemistry textbook). Learn which documents are "temporally sensitive" vs. "evergreen." Three-way scoring: semantic × spatial × temporal.

Requires: temporal metadata extraction + decay function + "evergreen" classifier

Total from spatial-semantic thread alone: 5 papers targeting IR, NLP, Agriculture, IoT, and Knowledge Systems venues. Each builds on the AgriRegion foundation but contributes a distinct methodological advance.

What's Deployed Today

AgriRegion isn't just a paper — it's a running system:

Publication Roadmap

Done
AgriRegion paper written, submitted to IEEE Access
Next
Revise with α sensitivity + "RAG without geo" baseline → resubmit to Computers & Electronics in Agriculture
Q3 2026
Extension C — Proactive AgriRegion (Farmerly AI evaluation)
Q4 2026
Spatial Paper A — Distance decay functions + adaptive α (SIGIR/ECIR)
Q4 2026
Extension D — Mobile field deployment paper
Q1 2027
Spatial Paper C — Adaptive geo-sensitivity classification (ACL/EMNLP)
Q1 2027
Extension G — IoT + RAG integration paper
Q2 2027
Spatial Paper B — Agroecological zone distance (Precision Agriculture)
Q2 2027
Extension E — AgriRegion-Eval v2 benchmark (NeurIPS D&B)
Q3 2027
Spatial Paper D — Multi-scale spatial retrieval (IEEE IoT)
Q3 2027
Extension F — Farmer trust & adoption study
Q4 2027
Spatial Paper E — Temporal-spatial-semantic 3D scoring (Information Fusion)
Q4 2027
Extension B — Multimodal AgriRegion

Summary: AgriRegion is one paper that seeds 12+ follow-up publications — 7 system extensions + 5 from the spatial-semantic scoring thread alone. The system is built, deployed, and being tested with real farmers. Each paper is independently publishable and targets a different venue/community.