In [None]:
!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.139 --progress-bar off
!pip install -qqq openai==0.27.4 --progress-bar off
!pip install -Uqqq watermark==2.3.1 --progress-bar off
!pip install -Uqqq chromadb==0.3.21 --progress-bar off
!pip install -Uqqq tiktoken==0.3.3 --progress-bar off

In [2]:
%load_ext watermark

In [3]:
import os
import textwrap

import chromadb
import langchain
import openai
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma

In [4]:
%watermark --iversions -v -m

Python implementation: CPython
Python version : 3.9.16
IPython version : 7.34.0

Compiler : GCC 9.4.0
OS : Linux
Release : 5.10.147+
Machine : x86_64
Processor : x86_64
CPU cores : 2
Architecture: 64bit

chromadb : 0.3.21
openai : 0.27.4
langchain: 0.0.139



In [None]:
def print_response(response: str):
 print("\n".join(textwrap.wrap(response, width=100)))

In [None]:
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI KEY"

In [None]:
model = OpenAI(temperature=0)

In [None]:
print(
 model(
 "You're Dwight K. Schrute from the Office. Suggest 5 places to visit in Scranton that are connected to the TV show."
 )
)



1. The Dunder Mifflin Paper Company - Visit the office building where the show was filmed and take a tour of the set.

2. Poor Richard's Pub - Enjoy a drink at the bar where the cast often hung out.

3. Steamtown National Historic Site - Take a ride on the historic train that was featured in the show.

4. The Scranton Cultural Center - Attend a show at the theater where the cast performed a play in the episode "The Duel".

5. The Mall at Steamtown - Shop at the mall where the cast went on a shopping spree in the episode "The Coup".


## Q&A Over a Document

In [None]:
loader = WebBaseLoader(
 "https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm"
)

In [None]:
documents = loader.load()
len(documents)

1

In [None]:
document = documents[0]
document.__dict__.keys()

dict_keys(['page_content', 'metadata'])

In [None]:
document.page_content[:100]

"\n\n\n\n\nTwitter's Recommendation Algorithm\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEngineering\n\n\n\n\n\nBac"

In [None]:
document.metadata

{'source': 'https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm'}

In [None]:
index = VectorstoreIndexCreator().from_loaders([loader])



In [None]:
query = """
You're Dwight K. Schrute from the Office.
Explain the Twitter recommendation algorithm in 5 sentences using analogies from the Office.
"""
print_response(index.query(query))

 The Twitter recommendation algorithm is like Dwight K. Schrute's job at Dunder Mifflin. It takes
the 500 million Tweets posted daily and distills them down to a handful of top Tweets that show up
on your timeline, just like Dwight distills the vast amount of paper at Dunder Mifflin into a few
important documents. The algorithm uses a set of core models and features to extract latent
information from Tweet, user, and engagement data, just like Dwight uses his keen eye to spot the
important details in the documents. It then uses a logistic regression model to rank the Tweets,
similar to how Dwight ranks the documents in order of importance. Finally, it traverses the graph of
engagements and follows to answer questions about what Tweets and Users are similar to your
interests, just like Dwight uses his knowledge of the office to answer questions about the people
and documents in the office.


### Using a Prompt Template

In [10]:
template = """You're Dwight K. Schrute from the Office.

{context}

Answer with analogies from the Office to the question and the way Dwight speaks.

Question: {question}
Answer:"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])
print(
 prompt.format(
 context="Paper sells are declining 10% year over year.",
 question="How to sell paper?",
 )
)

You're Dwight K. Schrute from the Office.

Paper sells are declining 10% year over year.

Answer with analogies from the Office to the question and the way Dwight speaks.

Question: How to sell paper?
Answer:


In [None]:
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(documents, embeddings)



In [None]:
chain_type_kwargs = {"prompt": prompt}
chain = RetrievalQA.from_chain_type(
 llm=ChatOpenAI(temperature=0),
 chain_type="stuff",
 retriever=db.as_retriever(search_kwargs={"k": 1}),
 chain_type_kwargs=chain_type_kwargs,
)

In [None]:
query = "Explain the Twitter recommendation algorithm in 5 sentences"
response = chain.run(query)

In [None]:
print_response(response)

Well, Twitter's got this fancy algorithm that picks out the best tweets from the millions of tweets
posted every day. It's like Michael Scott trying to pick the best Dundie award winners from all the
employees. They use a bunch of models and features to figure out what you might like, like how
Dwight uses his knowledge of his coworkers to predict their behavior. Then they rank the tweets
using a big neural network, kind of like how Jim ranks his pranks on Dwight. Finally, they filter
out any tweets you don't want to see, like how Angela filters out any fun from the office. And
voila, you've got your personalized Twitter timeline.


## References

- [Twitter's Recommendation Algorithm](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm)