Private GenAI chat with Librechat.ai - Part 2
This is second part of the series of articles about building your own private ChatGPT like web-chat with Librechat.ai. Introduction post is here.
In this post I describe how to set up secure access to LLM (Large Language Models) in Amazon Bedrock.
You will learn how to:
- Create a new user with only necessary permissions. This user to be used by your Librechat.ai installation to access LLMs in Amazon Bedrock.
- Generate access keys for the user.
- Open access to LLMs of your choice in the Amazon Bedrock service.
Prerequisites
Before start, you need to have account in the Amazon Web Services. To create one, you can follow one of many tutorials available on the YouTube: How to create AWS Account.
User and credentials
Important: You must never use root user for any service consumption, because root user has full administrative permissions. If root credentials are compromised, your account can be misused and your private data can be stolen.
Following best security practices, you should create a new user with only necessary permissions. This is called the principle of least privilege. I advise you to:
- Create a new user group, for LLM use only.
- Assign necessary permissions to this group.
- Create a new user and assign this user to the user group.
- Generate CLI (Command Line Interface) access keys for this user.
This video shows how to do all these steps:
If you followed the video, you should have a new user with access keys and minimal permissions for invoking (sending requests to) LLMs in Amazon Bedrock.
Explaining group permissions policy
In the video I showed how to create a new group and assign a policy to it using the AWS console.
To use LLMs in Amazon Bedrock, you need to allow only two actions: bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream. Some models for response stream require call of the specific inference profile, this is why this policy allows calling both foundational models and inference-profiles.
Full list of actions and resources can be found in the Amazon Service Authorization Reference.
For those, who are more familiar with AWS and want to create user group and user programmatically (with AWS CLI, AWS SDK…) here is the JSON specification of the same policy, which I created in the video:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Action": [
7 "bedrock:InvokeModel",
8 "bedrock:InvokeModelWithResponseStream"
9 ],
10 "Resource": [
11 "arn:aws:bedrock:*::foundation-model/*",
12 "arn:aws:bedrock:*:{AccountID}:inference-profile/*"
13 ]
14 }
15 ]
16}Replace {AccountID} with your AWS account ID.
The policy above allows access to all models in all regions of AWS.
If you want to limit access to only specific regions, you can change resource ARNs (Amazon Resource Name) to include region name, e.g. arn:aws:bedrock:eu-central-1:{AccountID}:inference-profile/*.
You can also restrict access to only specific models by changing the ARN to include model. For example this ARN string arn:aws:bedrock:us-east-1:{AccountID}:foundation-model/amazon.nova-pro-v1:0 limits access to the Amazon Nova Pro model in the us-east-1 region.
To consume models which support cross-region inference, you can use wildcard for specific group of regions, e.g. for the USA: arn:aws:bedrock:us-*:{AccountID}:foundation-model/*.
The best regions to use LLMs in Amazon Bedrock
I advise you to use LLMs from these AWS regions: us-east-1 (N.Virginia), us-west-2 (Oregon) because:
- They have the all supported by Bedrock models available.
- E.g. Claude 3.5 Haiku, Claude 3.7 Sonnet are available only there (at the time of writing this post).
- They usually get new models first.
- They have higher token limits for input and output. Which means you can send larger context and get larger answers.
- E.g. for Claude 3 Haiku model in the
us-east-1allows input of 20 000 tokens per min, while in theeu-central-1(Frankfurt) the limit is only 3 000 tokens per min. -
More about Bedrock limits you can read here.
- E.g. for Claude 3 Haiku model in the
- They provide cross-region inference which increases throughput (tokens and requests per min).
- E.g. for the Claude 3.5 Haiku model the cross-region inference in the
us-east-1doubles the input token per min limit (from 20 000 to 40 000). -
More about cross-region inference you can read in this section of Amazon Bedrock User Guide.
- E.g. for the Claude 3.5 Haiku model the cross-region inference in the
- These regions usually have the lowest prices.
Access to LLMs in Amazon Bedrock
Having a user with access keys and permissions is not enough to use LLMs in Amazon Bedrock. You need to activate access to the models you want to use.
Getting access to LLMs in Amazon Bedrock is easy: in your AWS account console open the Amazon Bedrock service, in the bottom of the left side menu go to Model Access section and on that page requet access to the models you want to use.
This video shows how to do it:
When you request access to the models, be sure you are in the region which matches user permissions (if you put region or regions wildcard in Resource ARN of the user group policy).
To be continued
In the next part of this series, I will show you how to set up your Google Search API and YouTube Data API and how to get API keys for them. Go to the next part.