RUMAZA Studio
AI for business

RAG: let AI respond with your documents, citing the source

Without RAG, the model invents. With well-implemented RAG, it searches your knowledge, cites the paragraph, and says 'I don't know' when there is no evidence.

The problem

Your company has years of documentation: technical manuals, contracts, business proposals, internal procedures, minutes, and policies. But finding the right answer still means searching through folders, asking on Slack, or interrupting the colleague who 'knows it'.

Generic LLMs respond confidently even when they don't have your data. They ask ChatGPT something about your product and get a plausible but false answer. This generates claims in external support; in legal or compliance, it's a real risk.

Classic search engines return links, not answers. The employee opens ten PDFs to find a paragraph. The lost time multiplies with each repeated inquiry in sales, support, and engineering.

Uploading all PDFs to an 'enterprise' chat without architecture usually fails: outdated documents, misconfigured permissions, answers without citations, and no way to audit which fragment the model used.

RAG is not magic. Poorly implemented —short chunks, cheap embeddings, no access control— still hallucinates or leaks confidential information to the wrong user.

The cost of 'not finding' is also measurable: business proposals that take days because no one locates the similar case, engineers reimplementing already documented solutions, audits failing due to the incorrect version of the procedure.

Many companies have tried 'uploading PDFs to ChatGPT'. It works for a test; in production, it fails due to context limits, no granular permissions, and no way to know if the answer comes from the 2021 manual or the 2024 one.

Organizational change matters: support, IT, and business must agree on what gets automated and what requires human judgment. Without that agreement, the project generates internal friction even if the technology works.

Tribal knowledge in Slack or Teams is not indexed: important decisions never made it to the official PDF. RAG without a culture of documentation remains incomplete.

Multilingual: manuals in English and queries in Spanish require embeddings and reranking that do not assume a single language.

Poor quality scanned documents drastically reduce accuracy. Investing in native digitization or quality OCR before RAG saves frustration.

RUMAZA does not sell licenses: we build a system that you can measure, maintain, and expand. If the core of the problem is not automatable with available data, we tell you in the first meeting —saving months and budget.

Intellectual property and confidentiality: contracts with non-disclosure clauses require the index to reside in controlled infrastructure with encryption at rest.

Contradictory answers between two documents: the system must flag conflict or prioritize the current version based on date metadata.

Comparing three proposals without a common specification is useless: scope, integrations, and acceptance metrics must be identical to make an informed decision.

Without an owner of the document corpus, the index deteriorates in six months. Designate a responsible person by area to validate document additions and removals.

Iteration with real data from the first two weeks in production: adjusting thresholds, prompts, and rules with client metrics, not lab assumptions.

The project's success is defined in the kickoff meeting: base volume, current time per case, manual error rate, and hourly cost —with that, we calculate ROI before writing a line of code.

Training at closure: we do not deliver software that only IT understands. The business user knows how to use, scale, and report issues with screenshots and real examples from their day-to-day.

What is RAG (no fluff)

RAG (Retrieval Augmented Generation) is an architecture: when a question arrives, the system searches for the most relevant fragments in your document base, injects them as context into the language model, and this drafts the answer based solely on that material.

The technical flow: document ingestion → chunking → embeddings → vector index → semantic search → reranking → context prompt → answer with citations.

The key is not the most expensive model. It’s the quality of the index: updated documents, metadata (version, department, permissions), appropriate chunking, and continuous evaluation of accuracy.

RAG allows you to say 'according to manual v3.2, section 4.1...' with a link to the original PDF. If there is not enough evidence, the system responds that it cannot find information —a behavior that must be explicitly designed.

It combines with access control: a salesperson does not see HR contracts; an external user sees nothing internal. Permissions reside in the index, not just in the interface.

Components that make a difference: reranking (reordering retrieved fragments), hybrid keyword + vector, metadata filtering by department, and continuous evaluation with questions from real users.

RAG does not replace the expert on critical topics; it reduces friction in the search. The senior engineer still validates; but finds the relevant section in seconds instead of opening fifteen PDFs.

Operational cost: embeddings + storage + queries. In corpora of thousands of documents, it remains orders of magnitude cheaper than salary hours spent searching poorly.

Gradual deployment: pilot with one channel or one type of query, measurement for two weeks, expansion based on data —not a big bang that overwhelms the team and the client.

Intelligent chunking respects sections, tables, and numbered lists. Blindly chunking breaks pricing tables and generates incorrect answers.

Caching frequent queries reduces cost and latency without sacrificing updates when the source document changes.

Strict grounding: the prompt forces citation of a fragment or a response that there is no information. Default setting in RUMAZA, not optional.

RUMAZA's criterion: concrete problem, accessible data, success metric, and closed scope. Without these four pillars, there is no project —there is an experiment that bills well to the consultant and poorly to the client.

Tables and figures in PDFs require special extraction; sometimes hybrid with page search and caption to avoid losing critical numbers.

API for third parties: other systems consume semantic search without going through chat —useful for internal portals.

Evolutionary maintenance —new intents, providers, languages— is budgeted separately from the MVP to avoid surprises or zombie projects.

Hybrid lexical + vector search improves recall on product codes, SKUs, and exact legal references.

Post-launch support with a direct channel and agreed SLA: critical issues during business hours resolved the same day —not an eternal ticket.

We document assumptions, known limits, and expansion plans in the delivery —total transparency about what the system does today and what remains for a phase two if the numbers justify it.

Architecture ready for expansion: new channels, languages, or documents without starting from scratch —modular extension, not a fragile monolith.

Alignment with security and legal from the design: DPIA when applicable, record of processing activities, and clauses with cloud model subprocessors.

When it makes sense

Criterios
  • More than 100 documents that the team consults daily —with volume and data justifying it.
  • Incorrect answers due to outdated or non-existent information —with volume and data justifying it.
  • Slow onboarding because 'it's somewhere' —with volume and data justifying it.
  • Technical support with extensive manuals and multiple versions —with volume and data justifying it.
  • You need to cite sources for compliance or auditing —with volume and data justifying it.
  • You want an internal copilot before an agent with actions —with volume and data justifying it.

What can be built

01

Technical documentation assistant

Engineers ask in natural language; the system searches manuals and sheets, responds with citations, and links to the PDF. Includes logs, confidence thresholds, and human review in the initial phase until metrics are calibrated in production.

02

Commercial copilot

Searches in won proposals, success cases, and internal pricing. Accelerates drafts without inventing conditions. Includes logs, confidence thresholds, and human review in the initial phase until metrics are calibrated in production.

03

Intelligent FAQ for support

Agents consult policies and procedures; unified response with the current version of the document. Includes logs, confidence thresholds, and human review in the initial phase until metrics are calibrated in production.

04

Corporate semantic search

Replaces or complements keyword search with intent understanding and filters by area and date. Includes logs, confidence thresholds, and human review in the initial phase until metrics are calibrated in production.

How RUMAZA would build it

01
Document inventory
What sources, formats, update frequency, and who can see what. Documented deliverable reviewed with you before the next step.
02
Ingestion pipeline
Text extraction, cleaning, chunking, metadata, and versioning. Documented deliverable reviewed with you before the next step.
03
Vector index
Embeddings, vector base (pgvector, Pinecone, etc.), and reranking for accuracy. Documented deliverable reviewed with you before the next step.
04
Permission layer
Filtering by user, role, and sensitivity before reaching the model. Documented deliverable reviewed with you before the next step.
05
Interface and citations
Chat or search with cited fragments and link to the source document. Documented deliverable reviewed with you before the next step.
06
Evaluation
Set of real questions, accuracy metrics, and review of failed responses. Documented deliverable reviewed with you before the next step.

Possible technologies

  • Python
  • LangChain / LlamaIndex
  • OpenAI / Anthropic embeddings
  • PostgreSQL + pgvector
  • Pinecone / Weaviate
  • Unstructured / PyMuPDF
  • FastAPI
  • Redis

Application scenarios

Escenario 1

Internal documentation hard to find

Manuals, policies, template contracts, and procedures in folders or PDFs. RAG helps search by meaning, not just by file name.

Escenario 2

Mixed versions of the same document

Multiple people save 'the good template' in different locations. It makes sense to index only official sources and mark validity before responding.

Escenario 3

Slow onboarding of new employees

Repeated questions about how to do X in the company. An assistant on internal documentation reduces dependence on a single expert.

Common mistakes

Evitar
  • Indexing everything without cleaning obsolete versions
  • Chunks that are too small or large, losing context
  • Not evaluating with real business questions
  • Ignoring permissions: same index for all roles
  • Trusting the answer without showing citations to the user
  • Not planning reindexing when critical documents change
  • Not reviewing the project after 90 days with real metrics and adjusting or closing what does not add value.

Frequently asked questions

Does RAG replace training a custom model?

In most business cases, yes. It's cheaper, updatable, and auditable compared to fine-tuning for document knowledge. We define this in scope based on your systems, volume, and legal constraints —without promising generic figures.

Does it work with scanned PDFs?

Yes, with quality OCR. It increases ingestion cost and may reduce accuracy. We prioritize sources with native text. We define this in scope based on your systems, volume, and legal constraints —without promising generic figures.

What accuracy can I expect?

It depends on document quality. With a good pipeline, 70–85% on well-defined questions. We measure it in evaluation, not promising 99%. We define this in scope based on your systems, volume, and legal constraints —without promising generic figures.

Do the data leave my server?

We can use cloud models with DPA or local models if policy requires it. The vector index can reside in your infrastructure. We define this in scope based on your systems, volume, and legal constraints —without promising generic figures.

How long does it take to be operational?

MVP with a defined corpus: 4–6 weeks. Includes ingestion, index, basic interface, and initial evaluation. We define this in scope based on your systems, volume, and legal constraints —without promising generic figures.

Does it integrate with SharePoint or Google Drive?

Yes. Connectors for synchronizing and reindexing when files change. We define this in scope based on your systems, volume, and legal constraints —without promising generic figures.

Related guides

Updated: 2026-06-29 · Author: Rubén Maestre

Is your team losing hours searching through documents?

Tell me what sources you have and who is asking what. I propose a measurable RAG architecture.