Evaluated on AgriRegion-Eval (160 questions across 12 agricultural subfields):
Novel spatial-semantic retrieval that combines cosine similarity with geodesic distance decay. Ensures retrieved context is geographically aligned with the user's environment — not just semantically similar.
Dynamic index of verified agricultural extension documents enriched with geospatial metadata. Filters and re-ranks evidence based on user geolocation.
160 domain-specific questions across 12 agricultural subfields (Agronomy, Soil, Pathology, Weeds, Irrigation, Horticulture, Postharvest, Animal, Aquaculture, Food Safety, Economics, Extension).
Web application + mobile application deployed for pilot testing with NC farmers. Not just a research prototype — a working system in active use.
AgriRegion is a reactive system — the farmer asks a question or chats, and the AI retrieves region-aware context to generate a grounded response.
AgriRegion is the foundation. The system we built and deployed opens up several distinct follow-up research directions, each publishable independently:
📄 Deep Dives — 3 Papers Closest to Submission:
1. AgriRegion Revised — Add α sensitivity + baselines, resubmit (2-3 weeks)
2. Mobile & Web Deployment — System is live, needs writeup + pilot metrics
3. Proactive vs. Reactive — Both modes running, needs formal comparison
Replace the current cosine + distance decay with learned cross-encoder re-ranking. Compare hybrid BM25 + dense retrieval vs. pure dense. Benchmark on AgriRegion-Eval with expanded question set.
Extend the RAG pipeline to accept images (crop disease photos, NDVI maps) alongside text queries. Use vision-language models for multimodal retrieval. NC crop disease image dataset as evaluation.
Instead of waiting for farmer queries, trigger RAG retrieval automatically when IoT sensor data crosses thresholds. The Farmerly AI platform already implements this — needs formal evaluation comparing proactive vs. reactive modes.
The mobile app is deployed. Document the engineering of on-device retrieval, offline-first architecture, bandwidth optimization, and field usability with NC farmers.
Expand from 160 → 500+ questions. Add multi-state coverage beyond NC. Include temporal questions (seasonal advice). Release as open benchmark for the community.
Qualitative + quantitative study with pilot farmers. Measure trust, comprehension, adoption barriers, and comparison with human extension agents. IRB-approved farmer interviews.
Inject live sensor readings (soil moisture, temperature, pH) into the RAG context window alongside retrieved documents. The system already does this — needs formal ablation showing sensor context improves answer quality.
The core novelty of AgriRegion is the spatial-semantic scoring function. The current paper presents it in half a page, but this single equation opens an entire research direction with 5 additional papers:
These analyses strengthen the AgriRegion resubmission and each can expand into its own paper:
Sweep α from 0.0 to 1.0 in 0.1 increments. Plot F1, EM, BERTScore as function of α. Show the "sweet spot" where geo-awareness helps without killing semantic relevance. Show α=0 (pure semantic) and α=1 (pure distance) both underperform.
Inverse: 1/(1+d) — current. Gaussian: exp(−d²/2σ²) — smooth, tunable σ. Step: 1 if d < threshold, else 0 — hard boundary. Power law: 1/(1+d)^β — tunable sharpness. Learned: small MLP predicting weight from (d, query_type, doc_type). Show which works best per agricultural domain.
When does geo-reranking hurt? Federal regulations (organic cert), basic chemistry, universal biology — these are geo-agnostic. Categorize questions into "geo-sensitive" (planting dates, pest timing) vs. "geo-agnostic" (food safety, basic science). Show adaptive α outperforms fixed.
Formal framework for spatial-semantic scoring. 5+ distance decay functions compared. Adaptive α (learned per-query vs. fixed). Evaluation on AgriRegion-Eval + a second non-agriculture geo-QA dataset to show generality. This is a pure IR paper — not agriculture-specific, broader audience.
Replace raw km distance with agroecological similarity (soil type, climate zone, growing season overlap). A farm in eastern NC (sandy loam, humid subtropical) should retrieve docs from coastal SC (similar ecology) over western NC (mountain, clay soil) — even though western NC is geographically closer. Key insight: geographic distance ≠ agricultural relevance.
Train a classifier that predicts whether a query is geo-sensitive or geo-agnostic. Dynamically set α per query (α→0 for "what is photosynthesis", α→0.7 for "when to plant sweet potatoes"). Small model (BERT-tiny) runs in <10ms — no latency cost. Show adaptive beats fixed α by 5-8% on mixed-domain questions.
Documents have different spatial granularity: county-level ("Sampson County pest advisory"), state-level ("NC Extension irrigation guide"), regional ("Southeast US soil management"), national ("USDA organic certification"). Propose hierarchical retrieval scoring at multiple scales and fusing results. Show multi-scale outperforms single-scale.
Add a temporal dimension: S_final = (1−α−β)·S_sem + α·S_distance + β·S_temporal. S_temporal decays for older documents (2024 pest advisory > 2015 one). But some knowledge is timeless (soil chemistry textbook). Learn which documents are "temporally sensitive" vs. "evergreen." Three-way scoring: semantic × spatial × temporal.
Total from spatial-semantic thread alone: 5 papers targeting IR, NLP, Agriculture, IoT, and Knowledge Systems venues. Each builds on the AgriRegion foundation but contributes a distinct methodological advance.
AgriRegion isn't just a paper — it's a running system:
Full dashboard with real-time sensor visualization, proactive RAG-powered insights as floating tooltips, interactive maps, and conversational AI chat grounded in NC extension data.
Field-ready mobile version deployed for in-field use by NC pilot farmers. Optimized for low-connectivity rural areas.
15 sensor types (moisture, temp, pH, EC, NPK, NDVI, weather, etc.) designed and simulated. Hardware sensors ordered — pending arrival for physical deployment.
Platform ready for pilot deployment with NC farmers. Farmer outreach in progress — pilot planned for upcoming growing season.
Summary: AgriRegion is one paper that seeds 12+ follow-up publications — 7 system extensions + 5 from the spatial-semantic scoring thread alone. The system is built, deployed, and being tested with real farmers. Each paper is independently publishable and targets a different venue/community.