(Workbook) Integrating LLMs with Your Domain-Specific Data for Enhanced AI Understanding: RAG Demystified

(Note: for those interested, the Jupyter notebook with all the code for this post is available for download on my Decision Sciences GitHub repository)

LLMs are powerful, but they have a major limitation—their knowledge is static. If we want an AI system that understands our specific domain, company, or area of expertise, we need a way to inject real-time, customized information into the model. This is where Retrieval-Augmented Generation (RAG) comes in—a game-changing approach that bridges this critical gap. By integrating LLMs with external knowledge sources—like our own documents, reports, or research—we can personalize responses, keep insights up to date, and ensure the AI generates highly relevant, domain-specific answers.

For business, this capability is critical. Effective decision-making depends on accurate, timely, and context-aware information—not just broad, generic knowledge. While a standard LLM can provide general insights, a RAG-powered decision-support system ensures that AI recommendations are grounded in the latest company data, market trends, or industry research. Whether it’s refining marketing strategies, optimizing sales forecasts, or evaluating risk, an AI model enriched with real-world, domain-specific data enables decision-makers to act with greater precision and confidence.

This challenge applies to any large language model, including OpenAI’s GPT-4, whose knowledge remains static after training (e.g. the training data for GPT-4o is limited to data from October 2023 or earlier) and can’t incorporate new developments or domain-specific nuances without assistance.

In discussions about GenAI, one common concern is that models inherit biases from the data they were trained on. While pre-trained LLMs can provide powerful insights, their responses are shaped by the vast datasets they were exposed to. RAG offers a way to help counteract this (but does not inherently eliminate all bias) by letting you integrate your own trusted knowledge base—ensuring that the AI’s responses reflect your specific expertise, methodologies, and decision-making framework. Instead of relying on an LLM “out-of-the-box”, RAG allows you to inject your own domain-specific insights, preferred analytical approaches, and strategic perspectives directly into the model, helping to shape AI-driven decision support in a way that aligns with your vision.

RAG (Retrieval-Augmented Generation)

RAG combines the raw generative power of LLMs with up-to-date, tailored information relevant to specific domains or subjects. By connecting an LLM to a continuously updated external database (such as your own content repository, reports, industry articles, or internal documentation), RAG ensures the generated responses are not only accurate and current but directly relevant to your unique context.

Consider my blog as a practical example—a sandbox I can use for demonstrating how RAG technology functions.

Here’s what I’ve done:

Content Collection: first, I collected my published content from previous blog posts, ensuring a comprehensive knowledge base of relevant topics and ideas.
Content Processing: next, I broke these documents into smaller, manageable chunks, allowing for more precise retrieval of relevant information.
Embedding Generation: I then used advanced embedding models to convert this text into vector representations, enabling efficient semantic search and retrieval.
Vector Database Creation: these embeddings were stored in a specialized database (such as Chroma), optimized for rapid retrieval based on semantic similarity.
Integration with the LLM: finally, I connected this vector database to GPT-4, creating a retrieval-augmented generation system capable of producing accurate, contextually relevant answers.

Why is this important for decision-makers?

Freshness: continuously updated insights ensure you’re always operating with the latest information.
Precision: tailored, domain-specific knowledge helps reduce ambiguity and enhance decision confidence.
Relevance: customized retrieval systems mean your insights align closely with your organization’s unique needs, strategies, and challenges.

Ultimately, by leveraging RAG technology, we’re not just keeping up with change—we’re proactively harnessing it to make better, faster, and more informed decisions.

OpenAI & Langchain

OpenAI provides cutting-edge language models, like GPT (Generative Pre-trained Transformer), that can generate human-like text, answer questions, and assist with a wide range of tasks. These models are at the forefront of enabling smarter decision-making, automating complex processes, and enhancing business insights.

On the other hand, LangChain is an open-source framework designed to simplify the integration of language models like OpenAI’s GPT into applications. As you will see below, LangChain allows developers to combine language models with various data sources, tools, and workflows, making it easier to create powerful and interactive applications that can process and generate information in dynamic, intelligent ways.

Together, these technologies complement each other—while OpenAI’s models provide the language understanding and generation capabilities, LangChain enables seamless interaction between these models and external data sources, APIs, or systems, driving advanced, data-driven decision-making in fields like marketing, finance, and beyond.

1. Document Collection

The first step is collecting documents from specified URLs on your website, which serve as the foundation of your RAG system. This is where you load the content from external sources, ensuring that the model always has access to relevant, up-to-date information.

from langchain.document_loaders import WebBaseLoader

# List of URLs you want to collect documents from
urls = [
    “https://www.decisionsciences.blog”, 
    “https://decisionsciences.blog/2025/02/10/solving-the-last-mile-problem-in-analytics-introducing-the-decision-science-pre-analysis-framework/”,
    # Add more URLs as needed
]

# Load documents from the URLs using LangChain’s WebBaseLoader
loader = WebBaseLoader(urls)
documents = loader.load()

# Check if the documents were successfully loaded
if not documents:
    raise ValueError(“No documents were loaded. Please check the URLs.”)

from langchain.document_loaders import WebBaseLoader

# List of URLs you want to collect documents from

urls = [

“https://www.decisionsciences.blog”,

“https://decisionsciences.blog/2025/02/10/solving-the-last-mile-problem-in-analytics-introducing-the-decision-science-pre-analysis-framework/”,

# Add more URLs as needed

]

# Load documents from the URLs using LangChain’s WebBaseLoader

loader = WebBaseLoader(urls)

documents = loader.load()

# Check if the documents were successfully loaded

if not documents:

raise ValueError(“No documents were loaded. Please check the URLs.”)

Explanation:

WebBaseLoader: a LangChain tool that scrapes the content from the specified URLs. This ensures that the documents are dynamically pulled from your website, keeping the information fresh and relevant.
Outcome: you get a list of documents (text data) that will be processed in subsequent steps.

2. Text Splitting

Once the documents are collected, they are often large and need to be split into smaller chunks. This allows you to handle them more effectively and makes the information more suitable for embedding into the vector database.

from langchain.text_splitter import RecursiveCharacterTextSplitter

# Split the loaded documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(documents)

# Output the number of chunks created
len(docs)

from langchain.text_splitter import RecursiveCharacterTextSplitter

# Split the loaded documents into smaller chunks

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

docs = text_splitter.split_documents(documents)

# Output the number of chunks created

len(docs)

Explanation:

RecursiveCharacterTextSplitter: splits the documents into smaller, manageable pieces for embedding. This ensures that even large documents are processed efficiently.
Outcome: you now have a list of text chunks, each of which is small enough to be embedded in the vector database.

3. Creating Embeddings

The next step is creating embeddings for the text chunks. Embeddings convert the text into a numerical format that can be processed by machine learning models. These embeddings help the model “understand” the content and its meaning.

from langchain.embeddings.openai import OpenAIEmbeddings

# Create embeddings for the text chunks using OpenAI’s embedding model
embeddings = OpenAIEmbeddings(model=”text-embedding-ada-002″)

# Check the size of the embeddings
embedding_size = len(embeddings.embed_documents(docs)[0])
embedding_size

from langchain.embeddings.openai import OpenAIEmbeddings

# Create embeddings for the text chunks using OpenAI’s embedding model

embeddings = OpenAIEmbeddings(model=“text-embedding-ada-002”)

# Check the size of the embeddings

embedding_size = len(embeddings.embed_documents(docs)[0])

embedding_size

Explanation:

OpenAIEmbeddings: uses OpenAI’s embedding models (such as text-embedding-ada-002) to convert text into vector representations. This is the core part of turning raw text into a form that can be searched and used for decision support.
Outcome: each chunk of text is now represented by a numerical vector that captures the semantic meaning of the text.

4. Storing Embeddings in a Vector Database

Once the embeddings are created, they are stored in a vector database. The database enables fast, similarity-based search, allowing you to retrieve the most relevant information from your documents.

from langchain.vectorstores import Chroma

# Store the embeddings in a Chroma vector database
vector_db = Chroma.from_documents(docs, embeddings, persist_directory=”./chroma_db”)

# Persist the vector database to disk
vector_db.persist()

from langchain.vectorstores import Chroma

# Store the embeddings in a Chroma vector database

vector_db = Chroma.from_documents(docs, embeddings, persist_directory=“./chroma_db”)

# Persist the vector database to disk

vector_db.persist()

Explanation:

Chroma: a vector store that allows you to save embeddings in a way that makes them easily accessible for retrieval. The database allows you to search for documents based on similarity, enabling RAG-style retrieval.
Outcome: your embeddings are now stored in a persistent database, which you can query in future steps.

5. Querying the Vector Database

Now that your embeddings are stored, you can perform real-time queries. The query will be compared to the stored embeddings to find the most relevant documents or sections based on the context of the query.

# Sample query for the decision-making model
query = “How does Monte Carlo simulation apply to decision sciences?”

# Retrieve the most relevant documents for the query
retrieved_docs = vector_db.similarity_search(query, k=3)

# Display the retrieved documents
for doc in retrieved_docs:
    print(doc.page_content)

# Sample query for the decision-making model

query = “How does Monte Carlo simulation apply to decision sciences?”

# Retrieve the most relevant documents for the query

retrieved_docs = vector_db.similarity_search(query, k=3)

# Display the retrieved documents

for doc in retrieved_docs:

print(doc.page_content)

Explanation:

Similarity Search: this step uses the vector database to search for the most relevant documents based on the similarity between the query and the embeddings. You can adjust the number of documents retrieved (k=3 in this case).
Outcome: the model retrieves and displays the most relevant documents, giving decision-makers quick access to the information they need.

6. Integrating with a Chat Model

Finally, you can integrate the retrieved documents with a chat-based model (such as OpenAI’s GPT-3 or GPT-4) to provide detailed, human-like answers based on the data you’ve indexed.

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

# Initialize the chat model
chat_model = ChatOpenAI(model=”gpt-4″)

# Create a RetrievalQA chain
qa_chain = RetrievalQA(combine_docs_chain=chat_model, retriever=vector_db.as_retriever())

# Run the query through the chain
response = qa_chain.run(query)

# Output the response from the model
print(response)

from langchain.chains import RetrievalQA

from langchain.chat_models import ChatOpenAI

# Initialize the chat model

chat_model = ChatOpenAI(model=“gpt-4”)

# Create a RetrievalQA chain

qa_chain = RetrievalQA(combine_docs_chain=chat_model, retriever=vector_db.as_retriever())

# Run the query through the chain

response = qa_chain.run(query)

# Output the response from the model

print(response)

Explanation:

RetrievalQA: this is a LangChain component that integrates a retrieval-based model (the vector database) with a question-answering chain, powered by a chat model like GPT-4. The model uses the retrieved documents to answer the query.
Outcome: you get an insightful, context-aware response based on the most relevant documents in your database.

When you combine these elements, here’s how the information flow looks:

User Query → Retriever (RAG) → Vector DB (blog content) → Relevant Chunks → LLM (GPT-4) → Final Response.

Example LLM Model Outputs:

To demonstrate the power and flexibility of Retrieval-Augmented Generation (RAG), we’ll explore three practical examples that highlight how the system works in different scenarios. These examples showcase how the model interacts with retrieved content, how it generates insightful answers, and how we can address limitations such as linking concepts not explicitly present in the data.

By walking through these mini use cases, we aim to illustrate the strengths of RAG in decision sciences and show how the system can be optimized to provide more relevant, context-aware responses. These examples will shed light on how RAG can be applied to real-world decision-making challenges.

Example 1: Running this code:

# Reload vector database
vector_db = Chroma(persist_directory=”./chroma_db”, embedding_function=embeddings)

# Define retrieval-based QA chain using ChatOpenAI for GPT-4 chat model
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model=”gpt-4″),  # Corrected to use ChatOpenAI
    retriever=vector_db.as_retriever(search_kwargs={“k”: 5}) #Control how many chunks or documents the retriever fetches (e.g., top 3 documents):
)

# Example query
query = “Explain the concept of impact paths in decision sciences as 
described in the blog post.”
response = qa_chain.run(query)
print(response)

# Reload vector database

vector_db = Chroma(persist_directory=“./chroma_db”, embedding_function=embeddings)

# Define retrieval-based QA chain using ChatOpenAI for GPT-4 chat model

qa_chain = RetrievalQA.from_chain_type(

llm=ChatOpenAI(model=“gpt-4”), # Corrected to use ChatOpenAI

retriever=vector_db.as_retriever(search_kwargs={“k”: 5}) #Control how many chunks or documents the retriever fetches (e.g., top 3 documents):

)

# Example query

query = “Explain the concept of impact paths in decision sciences as

described in the blog post.”

response = qa_chain.run(query)

print(response)

Returns this response from the model:

Impact paths, as described in the blog post, are structured, traceable frameworks 
that connect data-driven insights to business decisions and their tangible 
results. They are designed to prevent data insights from disappearing into 
the “decision black hole,” which is when they fail to translate into measurable 
outcomes. Impact paths act as a bridge, closing the gap between the analysis of 
data and the action taken as a result of that analysis. They ensure that every 
insight is directly linked to decisions and their expected outcomes. This 
enables organizations to act with confidence, knowing that their decisions are 
backed by data, and it ensures that these insights result in meaningful, 
measurable change. Overall, impact paths elevate the role of analytics from a 
support function to a critical enabler of strategic decision-making.

Impact paths, as described in the blog post, are structured, traceable frameworks

that connect data-driven insights to business decisions and their tangible

results. They are designed to prevent data insights from disappearing into

the “decision black hole,“ which is when they fail to translate into measurable

outcomes. Impact paths act as a bridge, closing the gap between the analysis of

data and the action taken as a result of that analysis. They ensure that every

insight is directly linked to decisions and their expected outcomes. This

enables organizations to act with confidence, knowing that their decisions are

backed by data, and it ensures that these insights result in meaningful,

measurable change. Overall, impact paths elevate the role of analytics from a

support function to a critical enabler of strategic decision-making.

In the prompt above, the model successfully pulls relevant information from the blog post about impact paths and generates a response that aligns with the concepts discussed in the post. This works well because the content of the blog directly describes impact paths and how they bridge the gap between data analysis and decision-making. The embeddings contain the specific information about impact paths, and when the model is queried, it retrieves this relevant content, allowing it to generate a response that echoes the idea without directly quoting it. This retrieval-augmented generation process ensures that the model delivers accurate, contextually relevant answers based on the content stored in the database.

Example 2: Running this code:

# Reload vector database
vector_db = Chroma(persist_directory=”./chroma_db”, embedding_function=embeddings)

# Define retrieval-based QA chain using ChatOpenAI for GPT-4 chat model
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model=”gpt-4″),  # Corrected to use ChatOpenAI
    retriever=vector_db.as_retriever(search_kwargs={“k”: 5}) #Control how many chunks or documents the retriever fetches (e.g., top 3 documents):
)
# Example query
query = “Using both your knowledge and the blog post about impact paths, 
explain how Monte Carlo simulations could enhance impact paths in decision 
sciences.”
response = qa_chain.run(query)
print(response)

# Reload vector database

vector_db = Chroma(persist_directory=“./chroma_db”, embedding_function=embeddings)

# Define retrieval-based QA chain using ChatOpenAI for GPT-4 chat model

qa_chain = RetrievalQA.from_chain_type(

llm=ChatOpenAI(model=“gpt-4”), # Corrected to use ChatOpenAI

retriever=vector_db.as_retriever(search_kwargs={“k”: 5}) #Control how many chunks or documents the retriever fetches (e.g., top 3 documents):

)

# Example query

query = “Using both your knowledge and the blog post about impact paths,

explain how Monte Carlo simulations could enhance impact paths in decision

sciences.”

response = qa_chain.run(query)

print(response)

Returns this response from the model:

I’m sorry, but the provided context does not include information on how Monte 
Carlo simulations could enhance impact paths in decision sciences.

1 2	I’m sorry, but the provided context does not include information on how Monte Carlo simulations could enhance impact paths in decision sciences.

However, in this second prompt, the model struggles to connect impact paths with Monte Carlo simulations, and here’s why: the blog posts in the embedding don’t include any information that links impact paths directly to Monte Carlo simulations. Since the embeddings reflect only the content explicitly found in the documents, the model can’t generate a connection that doesn’t exist in the data it’s pulling from. This lack of overlap prevents the model from making the connection between these two concepts. In such cases, the retrieval system is limited to the context available in the specific documents, which is why the model cannot fill in the gap by connecting these ideas unless it is explicitly referenced in the content.

Example 3: Running this code:

from langchain.prompts import PromptTemplate

template = “””
You are a helpful assistant in the field of Decision Sciences. Using both your 
own knowledge and the following context, answer the question thoroughly, even 
if some connections aren’t explicit in the provided context.

Context: {context}

Question: {question}

Answer:”””

custom_prompt = PromptTemplate(template=template, input_variables=[“context”, “question”])

qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model=”gpt-4″),
    retriever=vector_db.as_retriever(),
    chain_type_kwargs={“prompt”: custom_prompt}
)

# Example query
query = “Using both your knowledge and the blog post about impact paths, 
explain how Monte Carlo simulations could enhance impact paths in decision 
sciences.”

response = qa_chain.run(query)
print(response)

from langchain.prompts import PromptTemplate

template = “””

You are a helpful assistant in the field of Decision Sciences. Using both your

own knowledge and the following context, answer the question thoroughly, even

if some connections aren’t explicit in the provided context.

Context: {context}

Question: {question}

Answer:”””

custom_prompt = PromptTemplate(template=template, input_variables=[“context”, “question”])

qa_chain = RetrievalQA.from_chain_type(

llm=ChatOpenAI(model=“gpt-4”),

retriever=vector_db.as_retriever(),

chain_type_kwargs={“prompt”: custom_prompt}

)

# Example query

query = “Using both your knowledge and the blog post about impact paths,

explain how Monte Carlo simulations could enhance impact paths in decision

sciences.”

response = qa_chain.run(query)

print(response)

Returns this response from the model:

Monte Carlo simulations could enhance impact paths in decision sciences by 
providing a more comprehensive and probabilistic view of potential outcomes. 
Impact paths, which are essentially the potential sequences of events that could 
occur from a decision, typically rely on static forecasts or single-point 
estimates. This can lead to decision-making that is overly simplistic and 
potentially inaccurate, as it fails to account for the full range of possible 
outcomes.

Through Monte Carlo simulations, these impact paths could be enhanced by 
incorporating a range of possible outcomes and their respective probabilities. 
This would provide decision-makers with a more nuanced understanding of the 
potential impacts of their decisions, enabling them to better assess and manage 
risks and opportunities. 

For instance, in a typical marketing scenario, a Monte Carlo simulation could be 
used to model the potential impacts of different marketing spend decisions, 
taking into account various factors such as market conditions, competitor 
actions, and customer behavior. This would provide a probabilistic view of 
potential return on investment, helping leaders make smarter, data-driven 
decisions.

Moreover, Monte Carlo simulations allow for the exploration of “what-if” 
scenarios, which can further enhance impact paths by providing insights into 
how alternative decisions might influence outcomes. This can support more robust 
decision-making, as it allows leaders to prepare for a variety of scenarios and 
optimize their strategies accordingly. 

In conclusion, Monte Carlo simulations can significantly enhance impact paths 
in decision sciences by providing a more dynamic, probabilistic, and 
comprehensive view of potential outcomes, thereby empowering decision-makers to 
make smarter, more confident decisions.

Monte Carlo simulations could enhance impact paths in decision sciences by

providing a more comprehensive and probabilistic view of potential outcomes.

Impact paths, which are essentially the potential sequences of events that could

occur from a decision, typically rely on static forecasts or single-point

estimates. This can lead to decision-making that is overly simplistic and

potentially inaccurate, as it fails to account for the full range of possible

outcomes.

Through Monte Carlo simulations, these impact paths could be enhanced by

incorporating a range of possible outcomes and their respective probabilities.

This would provide decision-makers with a more nuanced understanding of the

potential impacts of their decisions, enabling them to better assess and manage

risks and opportunities.

For instance, in a typical marketing scenario, a Monte Carlo simulation could be

used to model the potential impacts of different marketing spend decisions,

taking into account various factors such as market conditions, competitor

actions, and customer behavior. This would provide a probabilistic view of

potential return on investment, helping leaders make smarter, data-driven

decisions.

Moreover, Monte Carlo simulations allow for the exploration of “what-if”

scenarios, which can further enhance impact paths by providing insights into

how alternative decisions might influence outcomes. This can support more robust

decision-making, as it allows leaders to prepare for a variety of scenarios and

optimize their strategies accordingly.

In conclusion, Monte Carlo simulations can significantly enhance impact paths

in decision sciences by providing a more dynamic, probabilistic, and

comprehensive view of potential outcomes, thereby empowering decision-makers to

make smarter, more confident decisions.

In some cases, as demonstrated in Example 2 above, the retrieval-augmented generation (RAG) system can struggle when asked to connect concepts that aren’t explicitly present in the retrieved documents. For example, when trying to link impact paths with Monte Carlo simulations, the system may have difficulty making that connection if the content doesn’t directly mention both topics together. This limitation arises because the system is primarily drawing from the retrieved content (i.e. my blog site), and it may not have all the necessary links between concepts.

To address this challenge, a custom prompt using LangChain’s PromptTemplate was implemented. The custom prompt directs the model to blend both the retrieved context (from the vector database) and the model’s broader, pre-existing knowledge. This allows the model to “fill in the gaps” between concepts that may not be explicitly connected in the retrieved content, such as connecting impact paths with Monte Carlo simulations. By doing so, the model can generate more nuanced and contextually relevant answers, even when the retrieved documents don’t contain all the necessary information.

This method enhances the flexibility of the model, ensuring decision-makers can rely on the RAG system for up-to-date, real-time insights. It leverages both context-specific content and general knowledge, providing comprehensive and actionable responses that are aligned with the latest data, even if the source material doesn’t make all the connections between concepts. This approach enriches the decision-making process, enabling more informed, efficient, and strategic decisions.

Conclusion

In this post, we’ve explored how Retrieval-Augmented Generation (RAG) can be used to enhance decision-making by connecting the powerful generative capabilities of Large Language Models (LLMs) with real-time, domain-specific knowledge. Through a series of steps, from collecting content to creating embeddings, we’ve built a system that enables real-time, contextually aware decision support. By using the LangChain framework and OpenAI’s GPT-4 model, we successfully demonstrated how to integrate up-to-date content from my blog, transforming it into a powerful decision support tool.

The journey began with the collection and processing of my blog posts, followed by generating embeddings to capture the semantic meaning of the content. These embeddings were stored in a vector database, allowing for fast, similarity-based retrieval. When queried, the model was able to use this embedded knowledge to produce contextually relevant responses, reflecting the key concepts and ideas presented in the original blog posts. This allowed the model to not only answer questions with precision, but also to provide answers that echoed the themes and insights from the posts, effectively “reflecting” the voice of the blog.

However, as we demonstrated with the challenge of connecting impact paths with Monte Carlo simulations, there are limitations when the content doesn’t explicitly link certain concepts. By integrating a custom prompt using LangChain’s PromptTemplate, we overcame this challenge, allowing the model to fill in the gaps by utilizing its broader knowledge base. This method enhanced the system’s flexibility, enabling it to generate more nuanced and relevant answers, even when the retrieved documents didn’t directly connect the dots.

One of the key takeaways from this experience is how the LLM, when augmented with retrieval-based context, starts to align with the style and voice of the content it draws from. Despite the limited dataset, the LLM’s responses begin to reflect the tone, terminology, and key themes of my blog, demonstrating how RAG preserves the unique ‘voice’ of its knowledge source. This alignment not only makes the outputs more consistent and reflective of the original content, but it also demonstrates how RAG can help maintain coherence and consistency in decision support applications, even with relatively small datasets.

Ultimately, the use of RAG technology in this context has provided a valuable tool for delivering data-driven insights in real-time. By connecting LLMs with a tailored, domain-specific knowledge base, we ensure that decision-makers are empowered to make smarter, more informed decisions based on the most relevant and up-to-date information available. This approach doesn’t just allow us to keep up with the constant flow of new data—it gives us the ability to proactively leverage it, driving better outcomes and more confident decision-making.

(Note: for those interested, the Jupyter notebook with all the code for this post is available for download on my Decision Sciences GitHub repository)