Building a Domain Specific, GenAI Chatbot with Serverless

Eric Johnson &
41:15 min
May 23, 2024

Navigating the complexities of Generative AI (GenAI) is made simpler with serverless technologies and API-driven approaches to Large Language Models (LLMs). This session will showcase how leveraging Amazon Kendra with Retriever-Augmented Generation (RAG) and AWS Step Functions can streamline GenAI application development. Additionally, we’ll demonstrate how Amazon Bedrock simplifies interacting with LLMs. By the end, you’ll understand how these tools offer a straightforward path to creating effective and targeted GenAI applications.

Eric Johnson

Principal Developer Advocate at AWS

Shaping the Future of Tech with a Smile: AWS Principal Developer Advocate | Passionate Keynote and Public Speaker, Inspiring Minds from Universities to Global Conferences

Transcript

Eric Johnson 0:10
All right. Thank you very much. Thrilled to be here. I’m glad to actually hear you say the more service. I wasn’t sure how to say that. So super excited about that, and I hope that you all have had a great conference so far, and then we’ll wrap up with a lot of fun again. My name is Eric Johnson. I am a principal Developer Advocate. I’m not the I’m one of quite a few principal developer advocates at AWS, but I concentrate mainly on serverless. In fact, I’ll bring this up real quick here. Can I tell you who I am. I’ve been a solutions architect for 15 plus years. I’ve been a software developer for more than that. I’m pretty old. I’m a father of five. That’s not a typo. And real quick, I’m going to stop there and kind of give you some some I say rules, but really, I have no authority here, right? But some guidelines to when I’m speaking that will help you out. The first is, that’s any number I want it to be. I know it looks like a one, but I could throw that up and say any number, and you got to go with what I’m saying. Sometimes I can, sometimes it’s one. I can get to four if I take my shoes off, but that gets awkward for everybody. Second rule is, these are quotes, not apostrophes, and I know that that looks better than this, right? And finally, these are thumbs because really, this will get you beat up. So those are some guidelines to help you. When I’m talking, I use my hands a lot, so you’ll follow through. Today, we are going to be talking about serverless, like I said, I live, eat and breathe serverless, and we’re going to be talking about Gen AI and why Serverless is just, just a really good marriage with Gen AI, and we’re going to do this as an example. We’re going to talk through an application. In fact, I’ll go back to screen so you see this. We’re gonna be talking building a domain specific Gen AI chat bot with serverless. Now, what does that mean here when we talk, when we talk domain specific and we build chat bots. The domain specific means we want to have some private information that maybe not everybody has access to, or the LLM has access to. And we’ll get more into that. So don’t worry about it. So this is our agenda for over the next 30 minutes, we’re going to be looking at terminology, so I’ll go into that in a minute. We’re going to be looking at some architectural options. We’re going to be live coding. Live coding always makes me go but I do it all the time, and sometimes it works, sometimes it doesn’t, and then we’ll do a wrap. All right. So let’s get in to some terminology. Now, it’s been around for a year or more, and you probably know what it is, but I kind of keep us all on the same table when we talk about generative AI. These are some ways that we at AWS looked at this. It’s an AI that could produce original content close enough to human generated content for real world tasks, right? It’s powered by foundational models. Tasks can be customized. It’s applicable to so many use cases. It’s crazy to hear all the all the applications that we’re doing with Gen, Gen AI. And finally, it reduces time and cost to develop. I mean, how many you know to do ml models and innovate faster, but it’s also reducing time and just our our regular life. So let’s kind of just distinguish here real quick what a foundational model or LLM is. When we look at a conventional machine learning flow. We take data, we pre process it, and we do task specific model training. Hey, identify these images, you know, then we do hyper parameter training, we do model validation testing, and then we run the task right, and we have task specific deployment, and then we iterate on that right. So we’re training something to do one specific thing. And a great example of that is like recognition one of our one of our products here at AWS, where it actually does some phenomenal things with image recognition, but an LLM is different. So when you think about the flow with foundational models, you take a bunch of data, you still pre process it right? That all still looks the same, and then we do foundational model training and tuning, and then with this foundational model, you have a lot of task specific things that are going on. You know, you can have task specific deployment, a really clever named, I made that up, that name that just does a thing and it just works off what the LLM already knows. But then you have B that says, hey, I want to do some fine tuning on that. Maybe I’ll do some embedding, or I’ll do some retrieval, augmented generation, also known as rag or I’ll do some other training. And then, you know, in any co working inside same foundation model, and then c is also like a C. So you get this idea that one model kind of fits a lot of things because it’s trained on so many different things. Now, when we think about these two, I’m really just put this slide up because I’m really proud that I learned the word stochastic. Now, what I’m probably going to learn is I don’t say it right, but you get this idea. One is deterministic. Means you put the same thing in, you’re generally going to get the same thing out. But with these llms are foundational models. The stochastic way is or it’s non deterministic means you. Put the same thing in, you’re going to get something else out, right? So a lot of what we want to get out of an LLM really depends on what we put into it, as far as as far as the prompt. So let’s get into that. When we think about building prompts, there’s this thing called retrieval, augmented generation that I mentioned a moment ago a rag. And the idea here is, hey, we’re gonna take some data, and we’ve already, we have a bunch of data saved somewhere. In this example, later, I’ll talk about using Kendra, where we have indexes and we have this data, and it’s, it’s, it’s being pre processed, and then we’re going to take that and add it to the prompt itself, right? And so that prompt will then, then we’ll take that prompt and we take some history, things like that, and then we’ll push it into the LLM. So we’re telling the LLM, hey, given this context, I want you to respond. The same thing is, with history is we want to say, given that context that we just talked about and all the other questions and answer you gave me, I want you to answer this question. So when we build chat bots, it’s more than just saying, Hey, answer this question. Well, we can do that, of course, just based on the LLM, but when we want to build domain specific chat bots, we need to give it context. We need to set some bounds and say, Hey, this is the data I want you to process and kind of reason against. All right, so let me talk about some of the of the architectural choices that I had. And so architectural choices, I had a few requirements in my book. So when I was thinking about, how will I approach this, I knew that I wanted it to be serverless, because that’s what I do. You know, I live, eat and breathe serverless, right? I knew that I wanted a streaming response. Because how many of us have used like someone made a chat bot? We send a question and then we wait and we wait, because right now, llms aren’t terribly fast, so we need to get answers back as it’s coming. If you wait till all the way to the end, we think it’s broken. I needed to be flexible, meaning I needed to be able to add stuff to it, change things. I needed to be easily secured, right? So that I can, you know, I can, I can have some type of authorization for it. And I needed to be scalable, so there’s some things that I knew I was going to use, or these are some Givens inside this is, you know, in the AWS world, there’s bedrock. And if you’re not familiar with bedrock, bedrock, basically, it hosts, not, basically, actually, literally, it hosts all these models, right? Llama three, the Claude series, Mistral, our own Titan, and others that you can do a lot with, right? And what it does, it takes it so that you don’t have to host your own model. You just use an endpoint, and you call bedrock, right? So you could use bedrock from really anything that can call an endpoint. And I’m going to use Amazon Kendra. And this was one of many different services that I could use. It’s just the one I chose because it did some auto indexing on the website that I’m using for my rag, for that, for that retrieval, augmented generation. And so Kendra, basically what it does is it will index anything, a bucket, a website, document. It does all kinds of that. And then it’ll it’ll put it in and it’ll save that. And when you ask it a question, you say, Hey, here’s the question I want to ask. It will actually return all the paragraphs, uh, on that. It’ll say, Hey, here’s some paragraphs that have to do with that. Use that as your context. So that’s why I chose so these are my two Givens. So we’ll start with there. Now I could do it this way. I could say, Hey, I’m just going to have an AWS lambda. I’m going to and talk to bedrock, right? And what that would do the AWS lambda would have to call Kendra and get the documents. It would have to then call probably something else. In my choice, I’m using DynamoDB. It’s just a very, very fast NoSQL database. What to get history. Then it would create the prompt and then and it would talk to bedrock, and then get the answer and respond back. And I can stream from this. In fact, this checks some of my boxes, right? One, it’s serverless. Two, it’s streaming response, and three, it’s scalable, but it’s not quite as flexible. Yes, it’s code. So So in truth, it is flexible. But let me get real with you here for a moment. My title is Developer Advocate. It really shouldn’t say developer in it, right? I’m not the strongest developer. I tend to lean towards tools that can do this for me. We’ll climb into that a little bit. So, so yes, it’s flexible by code, but I wouldn’t be able to change it as easy, because I’m not the best coder in the world. But it is scalable. Obviously, lambda is incredibly scalable, and it’ll go up and down as needed. And as far as easily secured I can use, there’s two types of authentication I can use when I’m doing direct lambda and I’m using what we call lambda function URL. One is IAM, which is little I didn’t want to do that from a client that’s used the system to system. And the second is to roll. Own and do OAuth again. We’ve already discussed the fact that probably I’m not the best person to write the next new authentication system. I’m going to trust what’s already out there.

Eric Johnson 10:11
All right, so my second option is, okay, I can put Amazon API gateway in front of AWS lambda and have it called bedrock. Now this, get this actually checks a few more boxes. Amazon API gateway allows me to do it. Allows me to secure very easy, with the Cognito authorizer, or using OAuth, or something like that, or auth zero, I can actually secure the website, and then I can use lambda to do all the work, like I said, to go to talk to Kendra, to talk to DynamoDB, to talk to bedrock, but there’s still a lot of code in here. It’s not quite what I’m looking for yet. Now, some people ask me, Are you anti lambda? I’m absolutely not. I think lambda is an amazing tool, and we will use it in this but again, I want to use something where that’s a little more configuration over code, right? So with that in mind, we could look at using step functions. Now, raise your hand if you’ve ever used step functions. Of course, I can’t see you, but I’m assuming some of you have or you’ve heard of it, hopefully. And step functions is it’s an orchestration tool that we that we have at AWS, that allows you to do a lot of logic just by configuration. And you’ll actually see that in just a minute. I’m going to walk you through that All right, so with this, I’m almost there, right? So I’ve got it’s serverless, yes, it’s service. All these solutions are it’s flexible, it’s easily secured, and it’s scalable. And what I mean by flexible is I can actually add and take away things just by configuration. And again, I’ll show you that in a minute. However, I can’t stream a response from step functions. Step functions wants to finish synchronously the call with the Amazon bedrock and then return it. And again, this has to be streaming, so that’s probably not going to work, all right, so let’s look at another option here, API or Amazon API gateway, plus AWS step functions and AWS lambda. To me, this is kind of the golden ticket, right? So let me explain why. So I’m going to use Amazon API gateway. I’m going to use web sockets on Amazon API gateway, and you make a whole hold on Eric, web sockets is not the same as streaming. That’s true, but it comes very close. It gives the user the appearance of streaming. So hang in there with me. Okay, we’re going to have Amazon Amazon API gateway to talk directly to AWS step functions, and it’s going to take the request coming from the customer, and it’s going to pass it to step functions, and then step functions is going to process this, once it collects all the information, then we’re going to talk to AWS lambda, and the lambda will then build, or the lambda function will then, you know, build the prompt, call Amazon bedrock and then return through Amazon API gateway, through the web socket endpoint. So the nice thing about this is I can actually have the lambda function call Amazon bedrock through a streaming portal, so it’ll actually send back as it’s coming in. And then each time I get a chunk, I send that chunk to do web socket connection back to the user. So this meets all this checks, all my boxes, right? Serverless, string response, flexible, easily, easily secured and scalable. All right? So with that, we’re going to actually do a little code. And I say kind of here, because I’m not going to code in front of you, because here’s the truth with just one finger on each hand, I’m the worst fat finger on the planet. I will hit the wrong buttons, and whenever I code live, I will never get hired if it’s just based on my live coding. So but let me show you kind of what this looks like in the actual when we’re actually building it. So first of all, let’s take a look and say what the Incoming, incoming payload is. So we got a couple things. First of all, our data envelope, and in that is going to be a message, and it’s going to be what was the question asked? In this instance, we’re going to say, what is EDA, and if you don’t know what EDA is, we’ll get to that. The next thing is, we have a timestamp. Here’s what happened because, because when we’re saving this, we want to know the order of questions and answers, so that that history works when we’re actually sending it back. Excuse me one moment. And finally, a connection ID. And the connection ID allows me to track who we’re talking to. So so if you make the request, I want to be able to send the answer back to you, and you know, not another person using it. You want to keep this private, right? So music connection ID, and that that connection stays, and that’s the WebSocket connection. All right, so let’s jump in to some codes. Let me show you what this can look like here. Okay, hopefully that’s big enough, and the gentleman will tell me if it’s not, but you get getting thumbs up, so to speak. We don’t use thumbs here, but you get it. Okay? Yes, so, so this is the code. I’ll let you peruse this. See what you think. Now, this isn’t going to work. I can’t just show you code like this. This doesn’t make sense. Just so you know, what you’re looking at is AWS Sam, right? I’m using the serverless application model to build the entire application, but I didn’t do it by this. Now, unfortunately, I’ve been using SAM long enough that I can do it by memory, but I use the app composer instead. So I’m going to pull up the app composer so you can see it. So what you’re looking at is our Application Composer, and let’s just get rid of this, and I kind of show you what it is. So in my application, I have several things that we need. First of all, I have a step function or state machine, like we talked about. The next thing is, I have a lambda function that I’m going to use to create the prompt. The third thing is, I have a context table. This is where we’re actually in store the history, okay? And then we have an API over here. I’ll move that over real quick. Here. Should I just slide it over? There you go. There’s my ask API, and the Ask API has a bunch of different stages to it. App compose is fairly new, so we don’t support every resource. So what we do is the ones that sometimes we don’t, we kind of put them together so you have access to them all right? So let’s go into Remember, I talked about building this in a way that’s kind of low code, right? And to do that, I’m going to do step functions. So let me actually go in here. And if you’ve worked with step functions at all, hopefully you’ve seen this. You may not have known you can do this. You can open step functions right in the IDE. I am not on the I’m not on the AWS console. I’m actually in VS code at the moment. And I’m going to open workflow studio. I’m going to make it so it’s readable. Let’s get some of this stuff out of the way. Bear with me here. We’ll zoom in

Eric Johnson 16:49
all right, hopefully that’s readable, and then I want to kind of explain each part. Okay? So what happens is, we’re actually when an API request is made, and I can show you that in a little bit how that works. But when an API request is made, we do a little transformation inside of API gate gateway using what we call VTL, or stands for velocity templating language. Now I’m not gonna lie, velocity templating language is not necessarily for the faint of heart. It can be, it’s a tampering language, right? But the advantage of it is, I’m making transformations directly in API gateway. I’m not having to roll out to some type of compute like a lambda function or container or something like that, to just to transform my data. I just need to, I need to change the way it looks in order for step functions to work with it. So API gateways passes this payload that I showed you earlier into step functions. So step functions does a couple of things. So one of the things is, is we want to do as much at the same time as possible. Now, remember, I’m sorry, excuse me. Remember I talked about we need to respond to the customer very fast, right? So they want to see something. So we got to get our work done fast. So one of the things we do is, the first thing we’re going to have is what’s called a parallel state. And the parallel state says, hey, I want you to run the next few things at the same exact time. They’re not dependent on each other. The actually independent, so I can do them at the same time. Now I’m using step functions. Something to know about step functions is there’s actually two types, there’s a standard, and there’s an Express, and express runs in memory, and it’s very, very fast, right? There’s no cold start with an Express. Something to keep in mind when you’re building orchestration like this, right? So the first thing we’re going to do is we’re going to go ahead and grab DynamoDB, we’re going to grab the history, and we’re going to get that all together. Okay? The next thing we’re going to do is we’re going to query Kendra using a direct access Now, step functions is designed to interact with AWS services using the SDK that you already know and love, if you’re using it from code, so right there, from inside the SDK or from inside step functions. When we set this up, we can actually configure the SDK. So let me show you what that looks like. So here’s the integration type, aw, SDK. All I need to tell it is index ID, kinder ID, and that’s a placeholder, yep. And the query text, hey, here’s what I want you to ask it. And that grabs the data. It’s just, it’s just JSON path from data, dot message, if you remember that payload and what it looked like. And then how many I want to return, I’m just going to return 60 at the moment. Okay? And so then it knows that index ID, it knows where to search it, and Kendra will respond with all this data, right? DynamoDB, on the other hand, will actually call the table here. And if you don’t know the syntax of DynamoDB, don’t, don’t panic. There’s there will not be a test at the end, but we just say, hey, here’s the table name. That’s a placeholder. Again. And it’ll be, it’s on deployment that’s replaced. We pass the expression. We say, I want you to grab any history that’s already happening based on our connection ID, all right. So then what I can do, and I only did on the on the Kendra side here, just to kind of show it, is, if something fails, right? So we say, okay, if it works, that’s great. Then move on to the end of this part. If it fails, then I want you to send an error payload back to the user so they actually get an error payload that says the payload was too large for Amazon. Could just so that’s a standard error that we might have had. Something’s too large, so we can send it back, and we actually exit the state machine. And so the so we’re not having the user sitting there waiting, we’ve responded to him say, hey, there was a problem. Let’s move on. But if everything goes well, it takes all this data, and now I’m at the end of the parallel state. The parallel is this big box here. It takes all this data and it gives it to the lambda function. And the lambda function actually says, All right, I got all the history, and I got all the context. I’m going to put together a prompt. And the back end model we’re using is Claude sonnet, which is version three. It’s fast. It’s kind of the intermediary between Opus and haiku. And it’s, and it’s, it’s fast, it’s agile. It works really well. So what it does is it actually says, Okay, I’m going to call Claude. And I won’t show you all the code for this, because you can make fun of my code, right? So, but it calls Claude in the using the stream response. It said, Hey, here’s my question, here’s my context. Here’s my history. Go ahead and give me an answer and send it back to me. Streaming. So as you start creating those next words, and if you know anything about LLM, it’s all about the next word, as you start creating those, send those to me, and I will then send them back to the API gateway endpoint. And that’s all done inside the lambda function. So lambda functions churning during this whole thing. At the end of that, if everything went well, we’ll actually save the question and the response to DynamoDB, so that’s added to our history for the next thing, right? And then if there’s an error, we’ll send an error to them via the API gateway message. So a lot of this code, a lot of code for calling and and we would have to do it, you know, a linear the lambda function is handled right here in step functions. And when we say the domain specific part, this is the important part, this is where we’re saying, Get Stuff that may not be available to the world or won’t be and we can, if you think about, you think about the ideas of this. Hey, we’ve got an internal chat bot we want to give customers on health care things like that that you have to be signed in, and the data in there is only known to us. This is where this would work. The LLM may not know everything you know about cancer research. We have folks using just for that, actually, or, or I don’t know, scheme, making techniques, you know, things like that. Alright, so that’s the step function that does all the work behind this. Okay, so now let’s go ahead and exit out of this, and I want to show you real quick. I’ll return to the Application Composer. I’m going to go to the details of workflow, just so you can see it. I showed you those tokens, and we said, Okay, we have one called context table. I showed you that one and kinder ID, and this is where they actually reference that they reference other parts of the document, and they actually put all that in as needed. So it makes it very dynamic. I can, I can share this out, and you can deploy this. In fact, at the end of this, I will give you a QR code. You’re welcome to take a look at it and deploy yourself. Alright? So that’s, that’s kind of, that’s, that’s how we build a domain specific chat bot using serverless. So let’s kind of see it in action. So first thing we’re going to do is we’re going to go and and my I actually built this. I have several different ones. Ask Claude is has no it literally has no context. It’s just talking directly to the to the model. So let’s go and ask it a question. Our question we had, what is EDA? Okay, so I’m going to ask this question, and the response it gives back to us momentarily here. This will be the time it fails when everybody’s watching here. Oh, you know what? I know what happened. Let me reset my connection real quick. Here. What now? Here’s the thing. Eric does not write clients like some of the, you know, the the popular people. But let’s go this, see if this works. Now. All right, there we go. EDA stands for exploratory data analysis. It is crucial. Blah, blah, blah, you could see that. And the way I’m returning this, if you’re wondering, how did you get it formatted in there, I return it as markdown, and then I just interpret the markdown at the at the client. So it’s it was a really easy way. I was trying to do a bunch of other ways where. There you go. I figured this out, but markdown was easiest, all right, so, but the basic idea here is, EDA stands for exploratory data analysis. Well, okay, so let’s go back here and I’m gonna show you another one. And this one here, maybe I am, let me just pull it up here. Oh yeah, this is it, okay? And I’m actually, and I’m going to show you this website, serverless land.com, if you’re, if you’re not familiar with serverless land, we have, this is the homepage. We have all this information on serverless and patterns, code examples, all that kind of stuff. This is not a shameless plug, or it is, but it also showing you. You know where I get this information. So back to our chat client, so I can actually notice this says, Ask serverless land. Let’s go and refresh that, because my connection may have dropped since it was sitting there for so long. And I’m going to ask the same question, what is EDA? Okay and send it.

Eric Johnson 26:01
See what it says here. Event driven architecture. Event driven architecture is an architectural pattern, so we’ve changed claude’s answer because we’ve given it domain specific information, and it’s coming from this. In fact, we talk about EDA quite a bit here. This is how you build very large, distributed applications that can handle, you know, things like Netflix and and, you know, Capital One, a very large application. So that’s, that’s what a venture driven architecture is. And using that, you can really, you could point this another one. I have a little pet project. It’s not my project. There’s a, there’s a there’s a project called wing. I did the same thing, and you can this actually does their site, and you can ask about wing and how and wing is used to build monolithic apps that actually compile to serverless microservices. But I’m not here so much to talk about that, right? So let’s see this a little more in action now that we know this is domain specific, so I talked about Sam just a moment ago. So I’m going to say, give me an example of a SAM template that connects API gateway to a lambda function. All right, so let’s do that. So we ask it, and it actually will stream that. Now, this takes a little longer because it’s pulling the system, but here comes your Here comes your answer, right? So there’s the SAM template. Now, one thing I want to show once it’s done here, plot can be very verbose and explanatory, but one thing I want to show is I can say for context. I can say, now do it in CDK, which is another infrastructure as code tool that you can use so it knows contextually that we talked about Amazon API gateway to a lambda function, but we did it in SAM so now I’m going to move that over to CDK, and there you go. So this is kind of an example. And this is, again, this is very simple. You can you can get very, very big with this and build a lot of different things, but this is an example of what a domain specific one looks like. So I told you, I promise you, I’d give you the code to this. If you want to check this out, I encourage you to go ahead and download this, and you can deploy it with your own Sam to see it in action. And then, with that, I know we’re going to do a Q and A here in just a moment, but I do want to tell you again, thank you very much. I appreciate I love doing you know events like this. I couldn’t even think of the right word, and so I appreciate that you take the time to watch, and I hope that you’ve learned something in this. And again, my name is edj geek. Have a great day.

Sean C Davis 28:53
Thanks so much, Eric. It was great. And folks in the audience, we’ve got a few minutes for questions. So if you have any questions, drop them in the Q and A section on the right side of your screen. Eric, yeah, I would love to dig more into the details I have. Yeah, so so many questions now, first one is I, it took me a minute to to realize that you were, you were actually in VS code for most of that that time. So, yeah, is that? Is that an AWS specific extension that you’re using? It

Eric Johnson 29:26
is, yeah, so sorry, I know I just did a loop there. I apologize. Real, real presenter fail. It is in that there’s a what called, it’s called the AWS toolkit. And we bring a lot of things in the Amazon queue, but specifically the Application Composer locally. You can use Application Composer in the in the console as well, but it’s, it’s really killer to use it locally in the IDE,

Sean C Davis 29:53
yeah, it’s amazing. So can you, you, presumably, then you can go back from that composer to writing. Some code for a particular function and back and forth.

Eric Johnson 30:02
That’s right, yeah. So as you generate stuff in the composer, it’s generating frameworks. I can generate code. It’ll have a boiler plug. So let’s say, if you insert a lambda function, it’ll have a boiler plate, but then it’ll actually create the files you need for that. And the really cool thing that’s happening, and a lot of people don’t see that, is, is that it, it’s creating all the permissions take for, for example, a lambda function. A lambda function requires two types of roles. One is what can invoke it, right? So it’s a lambda permissions. And the other is, what can it do? And we really want to do least privilege. And so what we do when you’re producing things or creating these things, it’s writing. We’re writing all those permissions. And when you when you connect, you say, Okay, I’m going to have this lambda connect to this three bucket. It’s going to go ahead and give it access to read and write from that bucket, but not until you use the little connector and connected. And so yes, to answer your question, all that’s getting generated in the sand template. And then you can go into the lambda function itself and write the code,

Sean C Davis 31:03
right? Okay, that’s kind of what I was wondering. So you’re, you’re using that composer largely for configuration and orchestrating all the pieces, but to actually query the LLM and everything, you’re gonna have to write the code in the lambda function, presumably, yeah,

Eric Johnson 31:19
yeah, yeah. Or you can query directly from the step function. That’s really interesting in that I can just drag it in, configure it, and it uses the SDK, and it makes the call and the cool and some Well, why would you do that? Why would I do that? Eric, I’m a better developer than you, so I’m going to go ahead and do that. But the nice thing about step functions is built right into it, is, if bedrock fails for some reason, it will handle that failure for you. It will retry, it’ll do a back off retry, and then it’ll dump that data into a DLQ if needed. And that’s out of the box. So it saves me from writing a bunch of unnecessary code. I want to do business logic, not error handling, so that’s that’s one of the reasons I tend to go to step functions first, and I only use lambda functions when it’s time to write business logic.

Sean C Davis 32:04
Gotcha? Okay, that’s that’s very cool. All right, so then, when you’ve got your your front end application, which I’m presuming you, you’d use anything to write the actual application, because what you’ve got going on in the

Eric Johnson 32:17
what you got going on, because I’m a great front end designer,

Sean C Davis 32:21
it was beautiful. It was amazing. It was so you showed that connection ID in payload of the request. Is that? What was it? Was it the request like? Is it? Is that specific to that particular instance? I’m wondering where the ID itself came from. Yeah,

Eric Johnson 32:38
that’s specific to WebSockets. When a WebSocket client makes a connection, there’s a connection ID, and so that connection ID tells the client and the server, you’re who you are, I’m who I am, and this is the path we’ll talk on. So I can send back to when I when I want to send something, I send that connection ID and say, Send. Send Sean, because he’s connection 1234, send it to him, and that’ll come to you. If I do, 1235, it’ll go to somebody else.

Sean C Davis 33:07
Gotcha, okay? And then when you, after you were done with, say, streaming the the response. And then you, you could take the the that response and the prompt itself, and you were saving them into Dynamo. Are you then sourcing from that or from either the response or the prompt for future answers as well? Are you going just back to that domain specific source at that time,

Eric Johnson 33:33
I am always pulling in all questions asked because I wanted to know, because I wanted to have the user to have to do it again. So I wanted to know, hey, here’s, here’s the last 10 questions I asked in order, and as a sub point, is the answer you gave me in order so that it can, because you know, when you talk to this, you talk to Claude directly, or chatgpt directly, or Q directly, they know what you’re talking about. Oh, yeah, I remember that we talked about, and that’s why you’re passing all that context back into them. Yeah. Okay, so I do a read from DynamoDB at the beginning of every question and a right to DynamoDB at the end of every every question. So

Sean C Davis 34:12
what would you have to do if what I’ve noticed with with chat or I there’s there’s, I tend to use chat GPT through raycast. It’s like, gooey. I can hit, you know, Command N, and I’ve got a new, basically, a new stream, and I don’t have the context of the previous chat. Yeah. What would you have to do in your example to say I’m like, start, start over the context, or, yeah, whatever, however you refer to it,

Eric Johnson 34:40
yeah, in my cheesy little client that’s beautiful and highly sought after mine, it’s just you refresh, because what happens is when, and that’s why I was doing earlier, because I had started those way earlier to have them ready, not even thinking they probably timed out I wouldn’t do anything with them. So that’s why it wasn’t working at first. So and now I’m telling myself, hey, at that time. I probably let the user know, so I’ll add that to my client. But basically, what I have to do is break the connection, start with a new connection ID. It’s all that connection ID. So if you come with a new connection ID, I don’t know who you are, we’re starting over. If I wanted to to have that be persistent across sessions, I could then do storage and say, hey, here it is. And we could, you could certainly get that complex, you know, saving and storage, save as a cookie, and then I’m back. I want to, I want to continue. You see some of the chat clients that have all your chats down the side. That’s how they’re doing that.

Sean C Davis 35:33
Okay, okay, this makes sense. And so you chose Claude, yeah. And you, you’d started to hint it. Why you chose that? Just curious. If you could elaborate on that, or when you might reach for something else in comparison.

Eric Johnson 35:46
Yeah, Claude. Claude’s great one. It’s one we host at bedrock. The other one, I plan on adding one for llama as well. Llama three, it’s real interesting to watch these horses in the race. Kind of they hit, they hop over themselves. You get it all set up. Okay, I got it. Oh, now they’re better. Okay, I’ll repoint that. So the beautiful thing again, that’s where bedrock comes in. So cool is, I can go I just changed my endpoint. The different ones have slightly different prompt. And this is where tools like Lang chain come in. Really helpful is, is they have different prompt structure, so you want to look at it. How do they take problems with Claude? You can give it like, Hey, here’s your system command. Now I’m a user answer. So you may say something like your technology, technological teacher answer the following question, and then the user says, This is my question, whereas it looks different for maybe lum or something like that, but to answer your question, I chose Claude. Actually built this a little while ago for another session that was doing, just in a meetup, and at the time it was in all the ones I had looked at. And boy, I don’t work for Claude, and you can’t go. Eric said this, but it seemed to be the best for what I was doing. And sonnet was a great you could use haiku. It’s a little faster, but it’s not quite as much reasoning. Opus is even more so, but not quite as fast. Sun seems to be right in the middle, in that sweet spot, and do what I need to

Sean C Davis 37:07
do. Okay, okay, that makes sense, yeah. And, you know, I think this is, this is the classic example that we see, is the kind of the the chat interaction with AI, what are some other kind of interesting use cases you’ve seen with Amazon and AI.

Eric Johnson 37:26
So with Amazon AI, I mean, all kinds of things. A great one to look at that it’s not just the AI, but it’s also the serverless and the step functions. So kind of the same architecture is children’s network. They’re doing, they’re discovering cancer and helping cure cancer with, I mean, helping cure people. It’s not curing cancer, so to speak, but they’re helping diagnose students and and I can talk about this openly because they actually came and did a talk at one of our EDA days, and it’s fantastic. They’re using the same kind of architecture to take what was taking days, weeks, months, to diagnose, down to minutes and hours, you know? So that kind of thing. As far as the Gen AI, there’s a lot of stuff I obviously can’t talk about, but we see a lot of usage in the domain specific like, look, here’s you’re probably starting to see me as they pop up on websites. Have a question about our product, ask here, and it’s this same idea. They’ve indexed their own products, and they’re making that available. So it’s information that’s not in the LLM, but they’ve either one trained the model, or there might be a combination embeddings, rag training, the models, agents. There’s all kinds of different techniques for getting more data in this in fact, my my next step on this one is I’m actually going to have it say, I’m going to analyze the prompt, and if the prompt is asked, asking for, for an image, I’m going to use an agent to talk to stability, to create an image and respond that. So you have this ability to kind of flex, and we’re seeing a lot of customers do that.

Sean C Davis 39:04
That’s really cool. That’s cool. And I’ve even seen it popping up in the Amazon App. Like, we don’t have to dig through individual customer comments anymore, because it’s just like, here’s what everybody is saying that this is this works well, but a lot of people are returning this item, like, Oh, this is amazing. It’s amazing. That’s

Eric Johnson 39:21
right. Q inside the IDE is super helpful. You can I put my prompt somewhere and I say, Oh, help me with the permissions on this, and it’ll fix that. They’re not all perfect yet, but we’re, they’re getting better.

Sean C Davis 39:33
Yes, yes. So on that, one last question before we wrap up, what are you most excited about in AI, and, you know, what’s next

Eric Johnson 39:44
snack probably not gonna be. Which one I’m a way better Python developer than I ever was. For me, it’s that. It’s that quick answer to kind of figure things out I love, I mean, I am excited about Gen AI. I mean, I love it. I think it’s. Really cool. I love what we’re doing with Q But for me, I think I’m more excited about the building, about the this is my honest answer, even if I didn’t work for Amazon. So it’s this is going to sound cheesy, but I love step functions. I love lambda functions. I love that architecting. I love the architectural challenges I’m working with serverless containers right now. So what I’m really excited about is helping customers understand how to use Gen AI, or how to use serverless technologies to enable Gen AI. I did a talk here a while back at serverless days in Japan, and I talked about the idea that for most developers, Gen AI is going to be an end point. We’re not training models. We’re not, certainly not doing the math behind it. That’s a special breed of rare people doing that right? Developers like us, we’re consuming the results of it. And so you need to be able to wrangle the data, orchestrate the data, train the data, things like that. And that’s what I get really excited about. I don’t know if that’s the answer you wanted, but that’s the truth.

Sean C Davis 41:02
Oh, that’s amazing. Love it. Well, thank you, Eric, really appreciate the presentation and conversations.

Eric Johnson 41:07
Yeah, all right. Thank you very much. You.

Software Engineering

More Awesome Sessions

SESSION

Streamlining Serverless: Making Development Easier with Framework24

Sumit Verma will show how Framework24, a new open source project, aims to make it easy to deploy serverless infrastructure as code.

SESSION

Function Calling in Large Language Models

Xe Iaso will cut through the hype to tell you what you need to know about Large Language Models, what they are good for and how to best utilize them.

SESSION

So We Created a New AWS SDK

Moar Serverless will give you all the information you need to take advantage of serverless in your application development including new AI and edge capabilities.

SESSION

Adding Serverless Content Moderation to Your Application with Only 3 Simple AWS Tools

Moar Serverless will give you all the information you need to take advantage of serverless in your application development including new AI and edge capabilities.

SESSION

Catch Me If You Can: How LocalStack’s Policy Stream Identifies Least Privilege IAM Policies

Moar Serverless will give you all the information you need to take advantage of serverless in your application development including new AI and edge capabilities.

Check out all 368 sessions

Eric Johnson

Transcript

Tags

More Awesome Sessions

Streamlining Serverless: Making Development Easier with Framework24

Function Calling in Large Language Models

So We Created a New AWS SDK

Adding Serverless Content Moderation to Your Application with Only 3 Simple AWS Tools

Catch Me If You Can: How LocalStack’s Policy Stream Identifies Least Privilege IAM Policies

Don't miss Astro All Day Long! coming up on Jul 17