Public-Data RAG Reframing Todo

This checklist tracks what still needs to change before the kickoff article and linked explorer fully reflect the new framing:

  • legal is an example domain, not the core point of the article
  • the use case is shifting away from a tenant-law-style example
  • the target workflow is legal professionals doing research
  • the first concrete public benchmark areas are Werkvertrag, Schuldrecht B2B, and GBR

Kickoff Article

File: public-data-rag-experiment-kickoff.md

  • Update the demo CTA title so it no longer promises a generic “Legal RAG chat explorer” and instead signals a professional legal research replay or retrieval explorer.
  • Update the CTA copy once the explorer question changes, so it matches the new professional-research example instead of the current placeholder.
  • Rework the evaluation section so it no longer leans so heavily on “legal Q&A” wording and instead describes research tasks, research questions, and source-grounded professional workflows.
  • Done: no disclaimer is needed near the results section because the benchmark, experiment grid, and explorer are now updated.
  • Decision made: keep GerLayQA as the public benchmark name.

Linked Explorer

File: _drafts/legal-rag-chat-explorer/index.html

  • The underlying data is already updated; only wording still needs retouching.
  • Update the page title, meta description, hero copy, and conversation intro so they describe a saved legal research workflow instead of one saved legal question.
  • Rewrite the answer texts in all four modes so they sound like research assistance for a legal professional, not end-user guidance.
  • Update the diagnostic copy in the “What Changed In This Mode” logic so it talks about research quality, authority, source reliability, and reranking opportunities rather than this single tenancy example.
  • Revisit the noisy-source heuristic after the wording pass, because the current explanations may still reflect the earlier framing more than the updated professional-research framing.
  • Rename session_id and any visible labels that still encode the old example.

Cross-Article Follow-Up

File: adding-german-court-judgments-to-my-public-data-rag-system.md

  • Update references to “harder legal questions” and “better legal QA system” so they align with the new professional legal research framing.
  • Make sure the follow-up article does not assume the old example domain once the kickoff article has been generalized.

Decisions Made

  • Keep the explorer visible. The data is updated already; the remaining work is wording polish.
  • Use the same three concrete legal areas in the article and explorer: Werkvertrag, Schuldrecht B2B, and GBR.
  • Keep GerLayQA as the benchmark name in public copy.

Public References To Research And Potentially Add To The Article

The goal of this section is to make it obvious that the techniques used in the project are not proprietary tricks. They are public, documented, and already used across research and production RAG systems.

Core RAG Framing

Hybrid Retrieval, BM25, And Fusion

  • Hybrid search | Elastic Docs
    • Good reference for the idea that lexical and semantic retrieval are commonly combined in production systems.
    • Useful when explaining why BM25 + vector is a practical compromise.
  • Reciprocal rank fusion | Elastic Docs
    • Good citation if you want to explain one standard public technique for combining result lists from different retrievers.
    • Even if your implementation is not exactly RRF, this is a strong public reference for fusion-style retrieval pipelines.
  • A Comprehensive Hybrid Search Guide | Elastic
    • More article-like and easier to cite in a portfolio piece than raw API docs.
    • Useful for explaining the market-standard intuition behind lexical search, vector search, and fusion.
  • Suggested extra source to research: the original Okapi BM25 / Robertson-Spärck Jones literature.
    • Add this if you want a more academic citation for BM25 itself rather than relying only on vendor docs.

Reranking

  • Retrieve & Re-Rank | Sentence Transformers
    • Strong public reference for two-stage retrieval: retrieve a broad candidate set first, then rerank with a stronger model.
    • Very relevant to your todo item about exploring reranking.
  • SentenceTransformers Documentation
    • Good umbrella source for embedding models, sparse encoders, and rerankers all in one public ecosystem.
  • Semantic ranking in Azure AI Search
    • Good production reference for L2 reranking on top of BM25 or hybrid candidate sets.
    • Useful if you want to show that reranking is a standard market technique in commercial retrieval stacks.
  • An Overview of Cohere’s Rerank Model
    • Useful as another production-facing example that reranking is a public, productized capability.
  • Suggested extra source to research: cross-encoder reranker benchmarks or MS MARCO reranking references.
    • Add one academic citation here if you want a stronger research anchor for reranking quality improvements.

Chunking And Document Structuring

Data Pipeline And Medallion Architecture

Evaluation And Benchmarking

  • LangSmith Evaluation
    • Best general reference for explaining why evaluation is part of the system, not a post-hoc presentation layer.
  • Evaluation concepts | LangSmith
    • Good support for the article’s emphasis on offline evaluation, datasets, evaluators, and iterative improvement loops.
  • Evaluation types | LangSmith
    • Useful if you want to distinguish benchmark-style offline evaluation from production / online monitoring.
  • Suggested extra source to research: public IR metrics references for MRR, NDCG, precision@k, recall@k, or answer-grounding evaluation.
    • Add this if the article becomes more technical on retrieval measurement.

Suggested Article Additions

  • Add a short “Techniques Used” or “Why These Techniques Are Not Novel” paragraph that explicitly states the stack is assembled from public, market-standard RAG building blocks.
  • Add 3-6 inline references directly in the relevant sections instead of one long bibliography only at the end.
  • Add one short sentence in the reranking section saying that reranking is already a standard second-stage retrieval technique in production search systems.
  • Add one short sentence in the Bronze / Silver section that the medallion-style layering comes from established data engineering practice.
  • Add one short sentence in the evaluation section that dataset-based offline evaluation and trace inspection are standard development practices for LLM systems.

Optional Extras To Research Later

  • Public references for metadata-aware retrieval and citation extraction.
  • Public references for legal-document structure parsing or judgment segmentation.
  • Public references for query rewriting, decomposition, or multi-stage retrieval.
  • Public references for grounded answer generation and citation display UX.

If the article should stay readable, this is probably enough. The idea is to cite a small set of strong public references that together show the stack is built from established techniques rather than hidden or proprietary methods.

1. Core RAG Pattern

2. Production RAG Framing

3. Hybrid Retrieval

  • Hybrid search | Elastic Docs
    • Why include it: strong public evidence that combining lexical and semantic retrieval is a standard production approach.
    • Where to use it: retrieval section when explaining BM25 + vector + hybrid.

4. Reranking

5. Chunking

  • RAG chunking phase | Microsoft Learn
    • Why include it: concise public explanation of why chunking strategy matters and how it affects retrieval quality.
    • Where to use it: Silver section or chunking discussion.

6. Medallion / Layered Data Pipeline

7. Evaluation

  • LangSmith Evaluation
    • Why include it: supports the claim that repeatable evaluation and trace inspection are part of modern LLM development workflows.
    • Where to use it: evaluation section.

8. Optional Extra If You Want One More Production Citation

  • Semantic ranking in Azure AI Search
    • Why include it: strong public support for reranking as a common second-stage retrieval technique in production systems.
    • Where to use it: retrieval or future work section.

Suggested Minimal Citation Pack

If you want the most compact useful version, use these six:

Suggested Placement In The Article

  • Introduction: cite the RAG paper and one production RAG overview.
  • Bronze / Silver: cite medallion architecture and one chunking reference.
  • Retrieval: cite hybrid retrieval and optionally reranking.
  • Evaluation: cite LangSmith evaluation if you explicitly talk about trace-based inspection and repeatable experiments.
Back to home