[00:00.000 --> 00:13.320]  All right, everyone, can we get a big welcome to Tuana?
[00:13.320 --> 00:15.760]  Can you hear me?
[00:15.760 --> 00:16.760]  Great.
[00:16.760 --> 00:27.240]  So if anyone was here for the talk before, just a disclaimer, I'm not as good a public
[00:27.240 --> 00:28.240]  speaker.
[00:28.240 --> 00:32.440]  I think I enjoy malware so much, but it's all downhill from there, so just FYI.
[00:32.440 --> 00:36.360]  All right, so I'm going to be talking about building a semantic search application in
[00:36.360 --> 00:41.240]  Python, and specifically we're going to be using an open source framework called Haystack,
[00:41.240 --> 00:42.240]  and that's why I'm here.
[00:42.240 --> 00:47.880]  So a bit about me, I'm a developer advocate at DeepSet, and we maintain Haystack.
[00:47.880 --> 00:51.720]  And yeah, so this is some information about me, but let's just dive right into it.
[00:51.720 --> 00:56.680]  So the agenda I'm going to follow, I'm going to try to keep the NLP stuff quite high level
[00:56.680 --> 01:01.680]  and focus on the how to build bit, but I do have to give a bit of a high level explanation,
[01:01.680 --> 01:06.400]  so I'm going to do a brief history on what we mean by semantic search.
[01:06.400 --> 01:10.240]  Please do not judge me for this example, Kardashian sisters.
[01:10.240 --> 01:14.520]  So let's assume we have a bunch of documents and let's see what would happen if we do some
[01:14.520 --> 01:18.680]  keyword search on it, and let's say we've got the query Kardashian sisters.
[01:18.680 --> 01:23.240]  You might get something a bit like this, which is great, and you can see that there's some
[01:23.240 --> 01:28.320]  clever stuff going on here, sisters maybe associated with siblings and family as well.
[01:28.320 --> 01:32.360]  Keyword search is still very widely used, but this is the type of result you might get
[01:32.360 --> 01:35.560]  from a corpus of documents you might have.
[01:35.560 --> 01:37.680]  But what if that's just not enough?
[01:37.680 --> 01:42.920]  What if I want to be able to ask something like, who is the richest Kardashian sister?
[01:42.920 --> 01:47.760]  How do I make this system understand what I'm trying to get to?
[01:47.760 --> 01:49.400]  So for that, let's have a look at this.
[01:49.400 --> 01:53.120]  There might be some names you've already seen here, especially the last one there.
[01:53.120 --> 01:57.560]  I think everyone and their grandparents have heard of this by now, chat GPT.
[01:57.560 --> 01:59.240]  So these are language models.
[01:59.240 --> 02:07.400]  I'm going to briefly walk through where they get such impressive functionality from.
[02:07.400 --> 02:11.760]  So most of them are based on what we call transformers.
[02:11.760 --> 02:15.840]  What those are doing is what I try to depict at the top here.
[02:15.840 --> 02:19.320]  So imagine that thing in the middle as the language model.
[02:19.320 --> 02:24.640]  And very, very simply put, obviously every model does something a bit different or for
[02:24.640 --> 02:27.800]  slightly different use cases, let's say.
[02:27.800 --> 02:33.480]  Given a piece of text, they will produce some sort of vector representation of that text.
[02:33.480 --> 02:37.080]  They're trained on very vast amounts of text data, and then this is what we get at the
[02:37.080 --> 02:38.560]  end of the day.
[02:38.560 --> 02:41.680]  And this is cool because it's enabled us to do many different things.
[02:41.680 --> 02:46.360]  We can use those vectors to compare them to each other, like dog might be close to cat
[02:46.360 --> 02:49.160]  but far away from teapot, for example.
[02:49.160 --> 02:54.120]  And that's enabled us to do a lot of different things like question answering, summarization,
[02:54.120 --> 02:57.400]  what we call retrieval, so document retrieval.
[02:57.400 --> 02:59.240]  And it's all thanks to these transformers.
[02:59.240 --> 03:04.720]  And a lot of these use cases are often grouped under the term search because actually what's
[03:04.720 --> 03:08.600]  happening in the background is a very clever search algorithm.
[03:08.600 --> 03:13.000]  So question answering and retrieval specifically can be grouped under search.
[03:13.000 --> 03:15.680]  All right, how does this work?
[03:15.800 --> 03:20.560]  And I'm very briefly going to go through what these different types of models do and how
[03:20.560 --> 03:25.840]  they do what they do, and I'm going to talk about the evolution from extractive models
[03:25.840 --> 03:29.640]  to now generative models like chat GPT, for example.
[03:29.640 --> 03:33.960]  The very simple one, and we're going to build our first semantic search application with
[03:33.960 --> 03:39.520]  this type of model, is often referred to as the reader model, simply a question answering
[03:39.520 --> 03:43.560]  model, very specifically an extractive question answering model.
[03:43.600 --> 03:50.280]  The way these work are given a piece of context and query, they're very good at looking through
[03:50.280 --> 03:56.040]  that context and finding, extracting the answer from that context, but it does need that context.
[03:56.040 --> 04:02.080]  Obviously, there are some limitations to these models because they're limited by input length.
[04:02.080 --> 04:05.400]  I can't give it just infinite amounts of data.
[04:05.400 --> 04:10.960]  But we have come up with ways to make that a bit more efficient, and we've introduced
[04:10.960 --> 04:16.440]  models that we often refer to as retriever models, or embedding models.
[04:16.440 --> 04:19.680]  These don't necessarily have to be language models, I'm going to be looking at language
[04:19.680 --> 04:25.040]  models, it could also be based on keyword search that we saw before.
[04:25.040 --> 04:30.480]  But what they do is they act as a sort of filter, so let's say you've got a bunch of
[04:30.480 --> 04:34.320]  documents, let's say you've got thousands and thousands of documents, and the retriever
[04:34.320 --> 04:39.360]  can basically say, hey, I've got this query, and this is the top five, ten most relevant
[04:39.360 --> 04:42.600]  documents that you should look at, and then that means that the reader doesn't have to
[04:42.600 --> 04:44.080]  look through anything.
[04:44.080 --> 04:46.960]  So we actually gain a lot of speed out of this.
[04:46.960 --> 04:52.840]  All right, finally, this is all the hype today, and you'll notice, well, one thing you should
[04:52.840 --> 04:58.680]  notice is you see that the document context, anything like that, I've chopped it off, it's
[04:58.680 --> 05:00.080]  just a query.
[05:00.080 --> 05:03.840]  So these new language models, they don't actually need context.
[05:03.840 --> 05:07.480]  You can give it context, but it doesn't require context.
[05:07.480 --> 05:11.000]  And this is very cool, because they produce human-like answers.
[05:11.000 --> 05:17.520]  What they're trained to do, the task to do, is not extracting answers, it's generating
[05:17.520 --> 05:19.360]  answers.
[05:19.360 --> 05:22.680]  And I just want to point out there are two things here.
[05:22.680 --> 05:25.280]  It doesn't necessarily have to be answers.
[05:25.280 --> 05:30.480]  So I'm going to be looking at an answer generator, but it can just be, you know, prompt it to
[05:30.480 --> 05:35.880]  produce some context, it doesn't necessarily have to be an answer to a question.
[05:35.880 --> 05:41.240]  So we've been seeing this, maybe you've seen some of these scenes lately, so this is chat
[05:41.240 --> 05:46.560]  GPT again on the theme, who is the tallest Kardashian sister, it hasn't just extracted
[05:46.560 --> 05:51.400]  Kendall for me, it said, the tallest Kardashian sister is Kendall Jenner, perfect.
[05:51.400 --> 05:54.800]  But let's see what happens if it's not like a question.
[05:54.800 --> 05:58.480]  This is not my creativity, by the way, but I think it's amazing.
[05:58.480 --> 06:03.200]  Write a poem about Fostam in the style of Markdown, change log, that's what you get.
[06:03.200 --> 06:04.200]  There you go.
[06:04.200 --> 06:08.240]  All right, so these language models are readily available.
[06:08.240 --> 06:11.480]  You might have already heard these names, OpenAI, Kahir.
[06:11.480 --> 06:15.320]  They provide these increasingly large language models.
[06:15.320 --> 06:19.280]  There is a difference when we say language model and large language model, but leave
[06:19.280 --> 06:22.240]  that aside for now, let's not talk about that.
[06:22.240 --> 06:26.560]  There are also many, many, many open source models on Huggingface, and if you don't know
[06:26.560 --> 06:31.480]  what Huggingface is, I think very simply put, I like to refer it to sort of like the GitHub
[06:31.480 --> 06:32.480]  of machine learning.
[06:32.640 --> 06:36.960]  So you can host your open source models and other developers can use them, use them in
[06:36.960 --> 06:40.360]  their projects or even contribute to them.
[06:40.360 --> 06:45.400]  And what's really cool about them, like I said, your search results stop becoming just
[06:45.400 --> 06:49.320]  simple search results, they are human-like answers.
[06:49.320 --> 06:55.440]  So now let's look at how we use these language models for various use cases.
[06:55.440 --> 06:59.120]  For that, I want to talk about Haystack, this is why I'm here.
[06:59.120 --> 07:06.360]  So Haystack is an open source NLP framework built in Python, and what it achieves is basically
[07:06.360 --> 07:09.160]  what this picture is trying to show you.
[07:09.160 --> 07:14.800]  You're free to build your own end-to-end NLP application, and each of those green boxes
[07:14.800 --> 07:17.560]  are a high-level component in Haystack.
[07:17.560 --> 07:20.920]  There are retrievers that we looked at, there are readers that we looked at, we'll look
[07:20.920 --> 07:25.400]  at some different ones as well, and each of these are basically the main class, and you
[07:25.400 --> 07:29.320]  might have different types of readers, different types of retrievers.
[07:29.320 --> 07:34.000]  For example, there could be a reader that is good at looking at paragraphs and extracting
[07:34.000 --> 07:38.040]  answers, but there might be a reader type called table reader that's good at looking
[07:38.040 --> 07:41.240]  at tables and retrieving answers from that.
[07:41.240 --> 07:45.240]  There are integrations with HuggingFace, so that means you can just download a model off
[07:45.240 --> 07:50.800]  of HuggingFace, but also open AI here, obviously you need to provide an API key, but you are
[07:50.800 --> 07:54.120]  free to use those as well.
[07:54.840 --> 08:00.640]  A building in an NLP application isn't just about the search component, you presumably
[08:00.640 --> 08:06.240]  have lots of documents somewhere, maybe the PDFs, maybe the TXDs, so they're components
[08:06.240 --> 08:12.680]  for you to build your indexing pipeline that we call so that you can write your data somewhere
[08:12.680 --> 08:16.840]  in a way that can be used by these language models.
[08:16.840 --> 08:21.080]  Some of those components, we already talked briefly about the reader and the retriever,
[08:21.120 --> 08:22.480]  we're going to be using those.
[08:22.480 --> 08:25.840]  There could be an answer generator, a question generator, we're not going to look at that
[08:25.840 --> 08:29.480]  today, but that's really cool because then you can use those questions to train another
[08:29.480 --> 08:31.520]  model, for example.
[08:31.520 --> 08:35.680]  Summarizer, prompt node, we're going to very briefly look into that, but you get the idea.
[08:35.680 --> 08:41.000]  There's a bunch of components and each of them might have types under them.
[08:41.000 --> 08:46.120]  You can use data connectors, file converters as mentioned, pre-processing your documents
[08:46.120 --> 08:50.480]  in a way that's going to be a bit more useful to the language model, for example, and of
[08:50.520 --> 08:55.320]  course, you need to keep your data somewhere, so you might decide you want to use elastic
[08:55.320 --> 08:59.800]  search or open search, or you might want to use something a bit more vector optimized,
[08:59.800 --> 09:05.560]  and these are all available in the Haystack framework.
[09:05.560 --> 09:10.640]  This is the idea of, I talked about the nodes, but the idea behind building with these nodes
[09:10.640 --> 09:12.600]  is to build your own pipeline.
[09:12.600 --> 09:13.760]  This is just an example.
[09:13.760 --> 09:17.080]  You really don't have to pay attention to the actual names of these components, but to
[09:17.080 --> 09:18.680]  give you an idea.
[09:18.720 --> 09:24.640]  You are free to decide what path your application should take based on a decision.
[09:24.640 --> 09:29.160]  For example, here we have what we call the query classifier, so let's say a user enters
[09:29.160 --> 09:33.760]  a keyword, there's no point in doing fancy embedding search, maybe, so you might route
[09:33.760 --> 09:35.880]  it to keyword search.
[09:35.880 --> 09:40.840]  If the user enters something that's more like a human-formed question, you might say, okay,
[09:40.840 --> 09:44.720]  do some what we call dense retrieval or embedding retrieval.
[09:44.720 --> 09:46.480]  That's just an example.
[09:46.840 --> 09:51.000]  Finally, I'm not going to get into this today at all, but let's say you have a running application,
[09:51.000 --> 09:55.520]  you can just provide it through REST API, and then you're free to query it, upload more
[09:55.520 --> 09:58.640]  files, and index them, and so on.
[09:58.640 --> 10:04.400]  All right, so let's look at how that might look first thing you do is install farm Haystack.
[10:04.400 --> 10:08.080]  If you're curious as to why there is farm at the beginning there, you can talk about
[10:08.080 --> 10:09.080]  this later.
[10:09.080 --> 10:12.120]  It's a bit about the history of the company.
[10:12.120 --> 10:16.240]  Then we just simply initialize two things, the retriever.
[10:16.240 --> 10:20.920]  Here we specifically have the embedding retriever, and notice that I'm giving it the document
[10:20.920 --> 10:26.240]  stall, so the retriever already knows where to look for these documents, and then we define
[10:26.240 --> 10:27.600]  an embedding model.
[10:27.600 --> 10:32.200]  I mentioned that these retrievers could be keyword retrieval, or it could be retrieval
[10:32.200 --> 10:34.920]  based on some embedding representation.
[10:34.920 --> 10:41.200]  Here we're basically saying use this sum model name, so it's just a model, to create the
[10:41.200 --> 10:43.320]  vector representations.
[10:43.320 --> 10:49.640]  Then I'm initializing a reader, and this is a very commonly used, let's say, extract
[10:49.640 --> 10:50.800]  a question answering model.
[10:50.800 --> 10:55.560]  Again, some other model, and these are both off of hugging face, let's imagine.
[10:55.560 --> 11:01.080]  We've got this retriever, and it's connected to a document store, and we've got a reader.
[11:01.080 --> 11:03.880]  How would we build our pipeline?
[11:03.880 --> 11:08.560]  We would first initialize a pipeline, and then the first thing we add is the first node,
[11:08.560 --> 11:09.720]  and we're saying retriever.
[11:09.760 --> 11:14.640]  I'm first adding the retriever, and that input you see, inputs query, is actually a special
[11:14.640 --> 11:19.800]  input in Haystack, and it's usually indicating that this is the entry point.
[11:19.800 --> 11:24.480]  This is the first thing that gets the query, so okay, we've told it, you've got the query.
[11:24.480 --> 11:29.880]  I could leave it here, and this pipeline, if I run it, what it's doing is, given a query,
[11:29.880 --> 11:31.800]  it's just dumping out documents for me.
[11:31.800 --> 11:35.960]  That's what the retriever does, it's just going to return to me the most relevant documents.
[11:35.960 --> 11:41.560]  I want to build a question answering pipeline, so I would maybe add a second node, and I
[11:41.560 --> 11:46.560]  would say now this is the question answering model node, and anything that's the output
[11:46.560 --> 11:50.000]  from the retriever is an input to this node.
[11:50.000 --> 11:51.000]  That's simply it.
[11:51.000 --> 11:57.840]  You could do this, but you could also just use pre-made pipelines.
[11:57.840 --> 12:00.960]  This is a very common one, so we do have a pre-made pipeline for it, and it's just simply
[12:00.960 --> 12:05.120]  called an extractive QA pipeline, and you just tell it what retriever and what reader
[12:05.120 --> 12:10.480]  to use, but the pipeline I built before, that's just a lot more flexible.
[12:10.480 --> 12:16.440]  I'm free to add any more nodes to this, I'm free to extract any nodes from this, so it's
[12:16.440 --> 12:20.760]  just a better way to build your own pipeline.
[12:20.760 --> 12:25.960]  Then simply what I do is I run what now looks like a very random question, but we'll get
[12:25.960 --> 12:26.960]  to it.
[12:26.960 --> 12:30.680]  Then hopefully you have a working system, and you've got an answer.
[12:30.680 --> 12:31.680]  Great.
[12:31.680 --> 12:36.600]  I'm going to build an actual example, so I want to set the scene, and I was very lazy.
[12:36.600 --> 12:41.120]  This is actually the exact example we have in our first tutorial on our website, but
[12:41.120 --> 12:47.120]  let's assume we have a document store somewhere, and it has a bunch of documents, TXT files
[12:47.120 --> 12:48.960]  about Game of Thrones.
[12:48.960 --> 12:52.240]  I'm going to make this document store FIES document store.
[12:52.240 --> 12:56.400]  This is one of the options, so let's assume I've got FIES document store, and of course
[12:56.440 --> 12:59.880]  I want to do question answering, and I want this to be efficient, so we're going to build
[12:59.880 --> 13:04.080]  exactly that pipeline we just saw before, Retriever followed by a reader.
[13:04.080 --> 13:08.240]  Specifically, I'm going to use an embedding Retriever, so these are the ones that can
[13:08.240 --> 13:13.400]  actually look at vector representations and extract the most similar ones, and then we
[13:13.400 --> 13:18.560]  are going to have a reader, simply a question answering node at the end.
[13:18.560 --> 13:19.880]  How would that look?
[13:19.880 --> 13:22.760]  I first initialize my document store.
[13:22.800 --> 13:26.480]  This is basically, I'm not going through the indexing one just now, we'll look at that
[13:26.480 --> 13:31.120]  in a bit, but let's assume the files are already indexed, and they're in that FIES document
[13:31.120 --> 13:35.640]  store, and then I've got a Retriever, I'm telling it where to look, and look at my document
[13:35.640 --> 13:40.800]  store, and I'm using this very specific embedding model of a hugging face.
[13:40.800 --> 13:47.480]  I then tell the Retriever to update all of the embeddings in my document store, so it's
[13:47.520 --> 13:53.520]  basically using that model to create vector representations of all of my TXD files, and
[13:53.520 --> 13:55.440]  then I'm initializing a reader.
[13:55.440 --> 14:01.040]  Same thing that we did before, I'm just using a specific model of a hugging face, this is
[14:01.040 --> 14:04.480]  trained by the company I work for too.
[14:04.480 --> 14:06.400]  Then I do the exact same thing I did before.
[14:06.400 --> 14:11.240]  I'm just creating the pipeline, adding the nodes, and then I run maybe who is the father
[14:11.240 --> 14:16.760]  of ARIA stock, and this is what I might get back as an answer.
[14:16.800 --> 14:21.000]  The thing to notice here, the answers are very eddard, Ned, and that's because it's
[14:21.000 --> 14:25.600]  not generating answers, it's extracting the answer that's already in the context.
[14:25.600 --> 14:31.600]  If you see the first answer below, you'll notice that there's eddard in there, and this pipeline
[14:31.600 --> 14:35.920]  and this model has decided this is the most relevant answer to you, I could have printed
[14:35.920 --> 14:42.160]  out schools, you can get schools, I just haven't here, and then I said give me the top five.
[14:42.200 --> 14:47.960]  The first two, three, I think are correct, so we've got something working, but what if
[14:47.960 --> 14:53.280]  I want to generate human sounding like answers, eddard is pretty okay, I've got the answer,
[14:53.280 --> 14:58.680]  but maybe I want a system, maybe I want to create a chatbot that talks to me.
[14:58.680 --> 15:02.440]  Let's look at how we might do that.
[15:02.440 --> 15:07.040]  This is going to be a bit of a special example, because I'm not going to build a pipeline.
[15:07.040 --> 15:11.400]  The reason for that is, as mentioned before, these generative models don't need context,
[15:11.440 --> 15:13.880]  so I should be able to just use them.
[15:13.880 --> 15:19.960]  We've got this node called the prompt node, and what this does is actually a special node,
[15:19.960 --> 15:24.640]  because you can more fit based on what you want it to do.
[15:24.640 --> 15:29.680]  You might have heard recently this whole terminology around prompt engineering, and that's basically
[15:29.680 --> 15:35.000]  used with models that are able to consume some instruction and act accordingly.
[15:35.000 --> 15:39.920]  By default, our prompt node is basically told, you know, just answer the question, that's
[15:39.960 --> 15:45.240]  all it does, but you could maybe define a template for it, what we call a prompt template, so
[15:45.240 --> 15:50.680]  I could have maybe said, you know, answer the question as a yes or no answer, and it
[15:50.680 --> 15:54.120]  would give me a yes or no answer, but obviously I need to ask it a yes or no question for
[15:54.120 --> 15:55.120]  it to make sense.
[15:55.120 --> 16:00.360]  Anyway, so I'm just using it like this, like the pure form, and I'm using a model from
[16:00.360 --> 16:05.600]  OpenAI, obviously I need to provide an API key, and I'm using this particular one, text
[16:05.600 --> 16:07.720]  of inchy 003.
[16:07.720 --> 16:11.880]  I actually ran these yesterday, so these are the replies I got, and this particular one
[16:11.880 --> 16:16.440]  I ran a few times, so the first time I ran, when is Milosh flying to Frankfurt?
[16:16.440 --> 16:20.360]  By the way, spoiler alert, Milosh is our CEO.
[16:20.360 --> 16:25.160]  So I know who Milosh is, and I know when he's flying to Frankfurt, or when he flew to Frankfurt.
[16:25.160 --> 16:31.080]  And I get an answer, Milosh's flight to Frankfurt is scheduled for August 7th, 2020.
[16:31.080 --> 16:37.680]  This is really convincing sounding, fine, okay, but this one was actually quite impressive,
[16:37.680 --> 16:44.280]  again, if I ran the same exact query with this model, I got, it's not possible to answer
[16:44.280 --> 16:45.880]  this question without more information.
[16:45.880 --> 16:51.880]  This is actually really cool, because clearly this model sometimes can infer that, hey, maybe
[16:51.880 --> 16:57.800]  I need more information to give you an answer, that what we now refer to as hallucination.
[16:57.800 --> 17:01.360]  Maybe you've heard of that term, also these models can hallucinate, they're tasked to
[17:01.360 --> 17:02.360]  generate answers.
[17:02.920 --> 17:08.720]  It's not tasked to generate, you know, actual answers for you, that are truthful.
[17:08.720 --> 17:11.840]  Anyway, let's say, when is Milosh travelling somewhere?
[17:11.840 --> 17:20.640]  I love this answer, when he has the time and money available to do so.
[17:20.640 --> 17:25.560]  And then, I guess, I don't know which one is my favourite, this one, or the next one,
[17:25.560 --> 17:28.560]  who is Milosh?
[17:28.560 --> 17:31.560]  A Greek island.
[17:31.560 --> 17:38.280]  Lovely, okay, but the problem here is, this is very, you know, I could believe this, it's
[17:38.280 --> 17:41.840]  very like, realistic, these answers.
[17:41.840 --> 17:47.840]  So we're going to look at how we can use these large language models for our use cases, and
[17:47.840 --> 17:51.120]  what we're going to do is basically, we're going to do exactly what we did for the extractive
[17:51.160 --> 17:56.480]  QA1, and we're going to use a component that is quite clever, because it's been prompted
[17:56.480 --> 18:03.520]  to say, generate answers based off of these retrieved documents and nothing else.
[18:03.520 --> 18:08.920]  It can sometimes not work well, but there are ways to make it work well, and we won't
[18:08.920 --> 18:14.040]  get into all the creativity behind it, so I'll show you the most basic solution you
[18:14.040 --> 18:15.040]  might get.
[18:15.040 --> 18:19.840]  But this is going to be what we do, it's the same exact pipeline as before, the reader
[18:19.880 --> 18:22.520]  has been replaced by the generator.
[18:22.520 --> 18:26.640]  So I actually have Milosh's ticket to Frankfurt.
[18:26.640 --> 18:33.040]  It was 14th of November, and as a bonus, I thought I'd try, this is my ticket, my euro
[18:33.040 --> 18:38.040]  star ticket, from Amsterdam to London and back.
[18:38.040 --> 18:41.760]  So I've got these, and they are PDFs.
[18:41.760 --> 18:46.760]  And so now I'm going to start defining my new components.
[18:46.760 --> 18:51.840]  So I've got the same files document store, embedding dimensions is not something you
[18:51.840 --> 18:56.200]  should worry about for now, and I'm defining an embedding retriever here.
[18:56.200 --> 19:02.040]  What I'm doing is, again, I'm using a model by OpenAI, so I'm using an API key.
[19:02.040 --> 19:07.240]  So this is the model I'm going to use to create vector representations and then compare it
[19:07.240 --> 19:09.120]  to queries.
[19:09.120 --> 19:13.680]  And this time, I'm not using the front node, I'm using that clever node there, called the
[19:13.680 --> 19:15.600]  OpenAI answer generator.
[19:15.600 --> 19:21.200]  And you might notice it is the exact same model as the one before.
[19:21.200 --> 19:26.760]  We're going to briefly look at indexing, so we've got the PDF text converter and pre-processor.
[19:26.760 --> 19:28.840]  And let's go to the next slide.
[19:28.840 --> 19:33.560]  As mentioned before, there are pre-made pipelines, so I could have just defined generative QA
[19:33.560 --> 19:37.360]  pipeline and told it what generate and retriever to use, but let's look at what it might look
[19:37.360 --> 19:40.560]  like if I were to build it from scratch.
[19:40.560 --> 19:43.520]  And first, you see the indexing pipeline.
[19:43.520 --> 19:48.520]  So if you follow it, you'll notice that it's getting the PDF file and then writing that
[19:48.520 --> 19:51.280]  to a document store, given some pre-processing steps.
[19:51.280 --> 19:54.960]  And I then write my and Niloche's tickets in there.
[19:54.960 --> 19:59.920]  And the querying pipeline is the exact same as the extractive QA pipeline you saw before.
[19:59.920 --> 20:05.040]  All that, the only difference is, the last bit is the answer generator, not the reader.
[20:05.040 --> 20:10.240]  This time, though, it does have some context and it does have some documents.
[20:10.240 --> 20:14.520]  What did I get when I ran the same two questions?
[20:14.520 --> 20:16.920]  I got, who is Milosh?
[20:16.920 --> 20:18.600]  He's not a Greek island.
[20:18.600 --> 20:23.440]  He is the passenger whose travel data is on the passenger itinerary receipt.
[20:23.440 --> 20:27.680]  Now, this is the only information this model knows, so it can't tell me he's my CEO because
[20:27.680 --> 20:31.360]  I haven't uploaded any information about my company.
[20:31.360 --> 20:35.760]  So don't make something up, just tell me what you know.
[20:35.840 --> 20:39.240]  If I run, when is Milosh flying to Frankfurt?
[20:39.240 --> 20:44.040]  I get Milosh is flying to Frankfurt on the correct date and time.
[20:44.040 --> 20:48.480]  And then I had that bonus in there, who is traveling to London.
[20:48.480 --> 20:51.840]  I would get Twana Caelic is traveling to London.
[20:51.840 --> 21:01.520]  Now, if I were to run, let's say, who is, let's say, when is Alfred traveling to Frankfurt?
[21:01.520 --> 21:05.320]  What I haven't shown you here, because I think it goes a bit too deep into building these
[21:05.360 --> 21:14.040]  types of pipelines, for the open AI answer generator, I could actually provide examples
[21:14.040 --> 21:16.240]  and example documents.
[21:16.240 --> 21:20.920]  Just in case I'm worried that it's going to make up something somewhere at a time that
[21:20.920 --> 21:25.880]  this Alfred who doesn't exist is traveling to Frankfurt, I can give it some example saying,
[21:25.880 --> 21:31.160]  hey, if you encounter something like this, just say I don't have the context for it.
[21:31.200 --> 21:36.320]  So I could have just run query pipeline.run when is Alfred traveling to Frankfurt, and
[21:36.320 --> 21:41.080]  it would have told me I have no context for this, so I'm not going to give you the answer.
[21:41.080 --> 21:44.120]  This model that we saw does do that sometimes.
[21:44.120 --> 21:49.400]  The first example we saw, it did say I don't have enough context for this, but not all
[21:49.400 --> 21:50.400]  the time.
[21:50.400 --> 21:54.360]  So this is how you might use it for your own use cases, you might use large language models
[21:54.360 --> 21:59.640]  for your own use cases, and how you might mitigate them hallucinating.
[21:59.680 --> 22:04.640]  So to conclude, extractive question answering models and pipelines are great at retrieving
[22:04.640 --> 22:09.440]  knowledge that already exists in context, however, generative models are really cool
[22:09.440 --> 22:14.920]  because they can generate human-like answers, but combining them with a retrieval augmenter
[22:14.920 --> 22:20.480]  step means that you can use them very specifically for your own use cases.
[22:20.480 --> 22:26.280]  Haystack as I mentioned is fully open source, it's built in Python, and we accept contributions
[22:27.000 --> 22:32.440]  literally welcome, and I would say every release we have a community contribution in there.
[22:32.440 --> 22:37.840]  Thank you very much, and this QR code is our first tutorial, bear in mind it is an extractive
[22:37.840 --> 22:42.040]  one, it's the non-cool one, but it is a good way to start.
[22:42.040 --> 22:43.040]  Thank you very much.
[22:43.040 --> 22:50.040]  Thank you, Luana.
[23:06.240 --> 23:10.640]  We have a few minutes for questions, if you have questions for Luana, we have three minutes
[23:10.640 --> 23:13.800]  for questions, as you can also find her afterwards.