CogneeRetriever
This will help you getting started with the Cognee retriever. For detailed documentation of all CogneeRetriever features and configurations head to the API reference.
Integration details
Bring-your-own data (i.e., index and search a custom corpus of documents):
Retriever | Self-host | Cloud offering | Package |
---|---|---|---|
CogneeRetriever | ✅ | ❌ | langchain-cognee |
Setup
For cognee default setup, only thing you need is your OpenAI API key.
If you want to get automated tracing from individual queries, you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installation
This retriever lives in the langchain-cognee
package:
%pip install -qU langchain-cognee
import nest_asyncio
nest_asyncio.apply()
Instantiation
Now we can instantiate our retriever:
from langchain_cognee import CogneeRetriever
retriever = CogneeRetriever(
llm_api_key="sk-", # OpenAI API Key
dataset_name="my_dataset",
k=3,
)
Usage
Add some documents, process them, and then run queries. Cognee retrieves relevant knowledge to your queries and generates final answers.
# Example of adding and processing documents
from langchain_core.documents import Document
docs = [
Document(page_content="Elon Musk is the CEO of SpaceX."),
Document(page_content="SpaceX focuses on rockets and space travel."),
]
retriever.add_documents(docs)
retriever.process_data()
# Now let's query the retriever
query = "Tell me about Elon Musk"
results = retriever.invoke(query)
for idx, doc in enumerate(results, start=1):
print(f"Doc {idx}: {doc.page_content}")
Use within a chain
Like other retrievers, CogneeRetriever can be incorporated into LLM applications via chains.
We will need a LLM or chat model:
pip install -qU "langchain[groq]"
import getpass
import os
if not os.environ.get("GROQ_API_KEY"):
os.environ["GROQ_API_KEY"] = getpass.getpass("Enter API key for Groq: ")
from langchain.chat_models import init_chat_model
llm = init_chat_model("llama3-8b-8192", model_provider="groq")
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
from langchain_cognee import CogneeRetriever
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
# Instantiate the retriever with your Cognee config
retriever = CogneeRetriever(llm_api_key="sk-", dataset_name="my_dataset", k=3)
# Optionally, prune/reset the dataset for a clean slate
retriever.prune()
# Add some documents
docs = [
Document(page_content="Elon Musk is the CEO of SpaceX."),
Document(page_content="SpaceX focuses on space travel."),
]
retriever.add_documents(docs)
retriever.process_data()
prompt = ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.
Context: {context}
Question: {question}"""
)
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
answer = chain.invoke("What companies do Elon Musk own?")
print("\nFinal chain answer:\n", answer)
API reference
TODO: add link to API reference.
Related
- Retriever conceptual guide
- Retriever how-to guides