Athena SQL Query Examples for ALB troubleshooting | Devops Junction

ALB(Application Load Balancer) Logs are a great asset when it comes to troubleshooting.  It provides a finer detail of the HTTP request processed by the ALB and sent to the target.

ALB logs provide various fields and data which are essential to understanding the request flow. You can use it to troubleshoot your application errors, timeouts etc. 4xx and 5xx errors etc.

This can be used for security auditing and security incident post-mortem analysis too.

So this is a must-have for the Production Application Load Balancer.

Once the ALB Logs are enabled, you can ingest the ALB logs into Amazon Athena and start working on the data using SQL queries.

Here is a quick introduction of Athena from the creators themselves.

amazon athena

What is AWS  Athena

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. Athena scales automatically—running queries in parallel—so results are fast, even with large datasets and complex queries.

Athena ALB

Check If ALB logs are enabled - Enable if not

To check whether ELB- Elastic Load Balancer has the Logs enabled.

Go to Load Balancer -> Select the Load Balancer -> Check the Attributes section.

Under the attributes section you can see the Access logs attribute.

If the access log is enabled you would find the S3 Location and the S3 bucket name. If not enabled you would see Disabled

In the following screenshot, you can see the Access logs is enabled and the S3 location is shown.

 

 

If you have not enabled it yet. Just click on the Edit attributes and enable it

athena queries

If you use the existing S3 bucket or select the option to let aws create the S3 bucket for you to store the logs.

now you can go to S3 and make sure that the bucket is created and the logs are coming.

You may have to create some traffic and test the ALB to see the log files created.

 

Ingesting the ALB logs from S3 to  AWS Athena

Now you have the ALB logs enabled and the logs are being written to the S3 bucket.

Time to put the logs into Analysis using Athena.

Let us start ingesting the S3 logs into Athena into some table.

 

Create a table and ingest the logs from S3 bucket

You can go to Athena Query editor and open a new query tab and paste the following SQL query

But you need to replace a few placeholders on the Query

  • Account ID
  • S3 bucket name
  • Region

Ideally, this is the directory structure of how ALB writes the logs in the S3 bucket. You can copy these values from the S3 bucket directory. names itself.

You can optionally change the table name by replacing the  alb_logs on the first line of the Query with some other name of your choice.

This query creates the Partitioned table and ingests only the data defined on the projection.day.range field

To know more about these fields and how to create this table without partition etc. refer this article 

This Query would work for the Application Load Balancer and Network Load Balancer

Classic Load Balancers would not have certain fields in this list. will see the difference shortly

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
            type string,
            time string,
            elb string,
            client_ip string,
            client_port int,
            target_ip string,
            target_port int,
            request_processing_time double,
            target_processing_time double,
            response_processing_time double,
            elb_status_code int,
            target_status_code string,
            received_bytes bigint,
            sent_bytes bigint,
            request_verb string,
            request_url string,
            request_proto string,
            user_agent string,
            ssl_cipher string,
            ssl_protocol string,
            target_group_arn string,
            trace_id string,
            domain_name string,
            chosen_cert_arn string,
            matched_rule_priority string,
            request_creation_time string,
            actions_executed string,
            redirect_url string,
            lambda_error_reason string,
            target_port_list string,
            target_status_code_list string,
            classification string,
            classification_reason string
            )
            PARTITIONED BY
            (
             day STRING
            )
            ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
            WITH SERDEPROPERTIES (
            'serialization.format' = '1',
            'input.regex' = 
        '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
            LOCATION 's3://your-alb-logs-directory/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/'
            TBLPROPERTIES
            (
             "projection.enabled" = "true",
             "projection.day.type" = "date",
             "projection.day.range" = "2022/01/01,NOW",
             "projection.day.format" = "yyyy/MM/dd",
             "projection.day.interval" = "1",
             "projection.day.interval.unit" = "DAYS",
             "storage.location.template" = "s3://your-alb-logs-directory/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/${day}"
            )

 

Once the table is created, you would be able to see it on the left navigation pane of the query editor under Tables

you can start executing some simple SQL queries to test or preview the table

 

How to preview the table in Athena

The easy way to preview the table is to click on the hamburger button adjacent to your table and click on the Preview Table option

post-img

 

Query Editor and How to Edit, Write and Save Queries

I hope now you get a glimpse of it, as soon as you click on the preview table

A new Query tab, would be opened with the following SQL query

Note: The Query Tab would have auto incremented numbers like Query1, Query2 ... QueryN

You can start editing the Query and write your own SQL queries on the same tab.

Or create a new Tab and write your Queries. by clicking on the + sign present the Name bar

Athena ALB Queries

You can Save your query using Save button shown below the Query panel.  Once saved it would be available on the saved queries

By default, all the executed queries would be available in the recent queries tab. You can re-launch it from there too.

Athena ALB Queries

 

You can select a portion/part of the query on the query editor by highlighting it and clicking on the Run again

Otherwise, the entire query on the query editor would be executed by default.

Now you know how Athena works and how to save, edit, write and execute the queries.

Let us give you some Query examples to keep it handy for your troubleshooting

 

Application Load Balancer vs Classic Load Balancer - Fields Comparison

As we have mentioned earlier,  the fields available would vary for the Application Load balancer and Classic Load Balancer.

Here is the list of fields available for Classic Load Balancer

timestamp           	string              	                    
elb_name            	string              	                    
request_ip          	string              	                    
request_port        	int                 	                    
backend_ip          	string              	                    
backend_port        	int                 	                    
request_processing_time	double              	                    
backend_processing_time	double              	                    
client_response_time	double              	                    
elb_response_code   	string              	                    
backend_response_code	string              	                    
received_bytes      	bigint              	                    
sent_bytes          	bigint              	                    
request_verb        	string              	                    
url                 	string              	                    
protocol            	string              	                    
user_agent          	string              	                    
ssl_cipher          	string              	                    
ssl_protocol        	string

 

Here is the list of fields available for Application Load Balancer

type                	string              	                    
time                	string              	                    
elb                 	string              	                    
client_ip           	string              	                    
client_port         	int                 	                    
target_ip           	string              	                    
target_port         	int                 	                    
request_processing_time	double              	                    
target_processing_time	double              	                    
response_processing_time	double              	                    
elb_status_code     	string              	                    
target_status_code  	string              	                    
received_bytes      	bigint              	                    
sent_bytes          	bigint              	                    
request_verb        	string              	                    
request_url         	string              	                    
request_proto       	string              	                    
user_agent          	string              	                    
ssl_cipher          	string              	                    
ssl_protocol        	string              	                    
target_group_arn    	string              	                    
trace_id            	string              	                    
domain_name         	string              	                    
chosen_cert_arn     	string              	                    
matched_rule_priority	string              	                    
request_creation_time	string              	                    
actions_executed    	string              	                    
redirect_url        	string              	                    
lambda_error_reason 	string              	                    
target_port_list    	string              	                    
target_status_code_list	string              	                    
classification      	string              	                    
classification_reason	string

 

 

Athena SQL Queries for ELB troubleshooting

 

Requests grouped by HTTP method and client IP

The following query counts the number of HTTP requests received by the load balancer grouped by the client IP address:

SELECT COUNT(request_verb) AS
 count,
 request_verb,
 client_ip
FROM alb_logs
GROUP BY request_verb, client_ip
LIMIT 100;

Here is the output of this preceding SQL Query

ALB Athena

 

Filter requests by HTTP method GET, POST, OPTIONS etc

Here is the SQL query to filter only POST requests

SELECT Count(request_verb) AS count,
       request_verb,
       client_ip
FROM   alb_logs
WHERE request_verb = 'POST'
GROUP  BY request_verb,
          client_ip
LIMIT  100;

You can remove the limit 100  or increase it if you want more data

This query can be modified to filter other HTTP request methods like GET, OPTIONS, DELETE, HEAD etc

 – HTTP OPTIONS
SELECT Count(request_verb) AS count,
       request_verb,
       client_ip
FROM   alb_logs
WHERE request_verb = 'OPTIONS'
GROUP  BY request_verb,
          client_ip
LIMIT  100; 

 – HTTP GET
SELECT Count(request_verb) AS count,
       request_verb,
       client_ip
FROM   alb_logs
WHERE request_verb = 'GET'
GROUP  BY request_verb,
          client_ip
LIMIT  100; 

 – HTTP DELETE
SELECT Count(request_verb) AS count,
       request_verb,
       client_ip
FROM   alb_logs
WHERE request_verb = 'DELETE'
GROUP  BY request_verb,
          client_ip
LIMIT  100;

 

Filter Requests based on HTTP Status code 200, 400, 5xx

Before we see the query, you need to understand that there are two status codes available per each request

  • elb_status_code
  • target_status_code

elb_status_code is the status code returned by the ELB in case of no response from the target group or backend. All other times it would match the target_status_code

target_status_code is the status code returned by the backend Application.

Why both status code is necessary ?.

Let's suppose ELB time out is configured as 60 seconds and the backend is taking more time to respond. the target_status_code would be empty as the backend has not responded yet and the elb close the connect with status 504.

In scenarios like this. target_status_code and elb_status_code both are crucial

Here is the SQL query to find the requests with 5xx response.

This would be helpful to find your endpoints which are throwing 5xx errors the most.

 – SQL query to find requests with 5xx status codes
SELECT request_url,
       count(elb_status_code)
FROM   alb_logs
WHERE  elb_status_code BETWEEN 500 and 599
group by request_url

 

Here is the query that helps to find out the 2xx and 4xx response codes

 – SQL query to find requests with 4xx status codes
SELECT request_url,
       count(elb_status_code)
FROM   alb_logs
WHERE  elb_status_code BETWEEN 400 and 499
group by request_url

 – SQL query to find requests with 2xx status codes
SELECT request_url,
       count(elb_status_code)
FROM   alb_logs
WHERE  elb_status_code BETWEEN 200 and 299
group by request_url

 

Adding time range to the SQL Query - Find the top 10 users within time range

So far we have seen SQL Queries with no time range specification. Let's suppose you want to find out which users/IPs are using your service the most in a specific time range.

You need to use the time range in your SQL query.

Just like any other SQL query. It's the simple  Start time and end time combined with between clause.

Here is a query to find out which IP has used the services or hit the LB the most in the 24-hour time range.

 – Requests grouped by Client IP
SELECT client_ip, count(client_ip) as cnt
FROM myalb_logs
WHERE  parse_datetime(request_creation_time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
     BETWEEN parse_datetime('2022-08-29-00:00:00','yyyy-MM-dd-HH:mm:ss') 
     AND parse_datetime('2022-08-29-23:59:00','yyyy-MM-dd-HH:mm:ss')
GROUP BY client_ip 
ORDER BY cnt DESC

Here is the execution output. I have masked the IP address for security reasons.

athena alb queries

 

Deep Dive into the user pattern - Finding the top 10 URLs used by the client IP / user

In the previous example we have found out who are our top 10 customers/users by their total usage

Next on our list is to find out what endpoints or URLs they (clients) use most.

Here is the query to find out the top 10 most used endpoints/URLs of the customer with the client_ip 13.xxx.xxx.xxx

You can use this query and make modifications to find out the entire access log of some IP / user

 – Top 10 requests per IP
SELECT request_url, client_ip, count(client_ip) as cnt
FROM myalb_logs
WHERE client_ip = '13.xxx.xxx.xxx'
AND parse_datetime(request_creation_time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z')
BETWEEN parse_datetime('2022-08-29-00:00:00','yyyy-MM-dd-HH:mm:ss')
AND parse_datetime('2022-08-29-23:59:00','yyyy-MM-dd-HH:mm:ss')
GROUP BY client_ip, request_url
ORDER BY cnt DESC
LIMIT 10

 

Find Slow endpoints taking more processing time using AWS Athena

Here is one more Query where use the request_processing_time field of ALB logs to find out which request is taking more time.

In other words, What are the slow URLs/services in the ALB

Here is the SQL Query to find out

SELECT request_url, AVG(request_processing_time) as avg_response, COUNT(request_url) as cnt
FROM myalb_logs
WHERE parse_datetime(request_creation_time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
     BETWEEN parse_datetime('2022-08-24-00:00:00','yyyy-MM-dd-HH:mm:ss') 
     AND parse_datetime('2022-08-24-23:59:00','yyyy-MM-dd-HH:mm:ss')
GROUP BY request_url
ORDER BY avg_response DESC

In this query, we are taking the Average response of the request URL and also showing the count, of how many requests have been taken into consideration.

This would be helpful for finding the slowest endpoints behind the ALB and you can fix it for a good customer experience.

 

Athena SQL query group by 5 minutes time frame - Usage Pattern / Rate limit

This example is a real-time use case to understand your user's behaviour and to find out some baseline numbers for usage.

Let's say you want to implement a Rate limit on the Application Load Balancer with AWS WAF and you want to know what is the current usage pattern

How many users are reaching the application every 5 minutes?

Who are our Highly contributing and heavy usage customers? What is the usage pattern of our customers? Is there a DDOS attack or Burst requests from any customer?

All these questions can be answered with the data you are going to collect

 – Check 5 minute Requests sort by time 00 to 23:50 – 

SELECT from_unixtime(floor(to_unixtime(from_iso8601_timestamp(request_creation_time)) / 300 )*300) as timekey, client_ip, count(client_ip) as cnt
FROM "sampledb"."myalb_logs"
WHERE parse_datetime(request_creation_time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
     BETWEEN parse_datetime('2022-09-30-00:00:00','yyyy-MM-dd-HH:mm:ss') 
     AND parse_datetime('2022-09-30-23:59:00','yyyy-MM-dd-HH:mm:ss')
GROUP BY from_unixtime(floor(to_unixtime(from_iso8601_timestamp(request_creation_time)) / 300 )*300) , client_ip
ORDER BY timekey, client_ip DESC

 – Check 5 minute Requests sort by Request Count – 

SELECT from_unixtime(floor(to_unixtime(from_iso8601_timestamp(request_creation_time)) / 300 )*300) as timekey, client_ip, count(client_ip) as cnt
FROM "sampledb"."myalb_logs"
WHERE parse_datetime(request_creation_time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
     BETWEEN parse_datetime('2022-09-01-00:00:00','yyyy-MM-dd-HH:mm:ss') 
     AND parse_datetime('2022-09-30-23:59:00','yyyy-MM-dd-HH:mm:ss')
GROUP BY from_unixtime(floor(to_unixtime(from_iso8601_timestamp(request_creation_time)) / 300 )*300) , client_ip
ORDER BY cnt DESC

 – Check 5 minute Requests For Single User IP – 

SELECT from_unixtime(floor(to_unixtime(from_iso8601_timestamp(request_creation_time)) / 300 )*300) as timekey, client_ip, count(client_ip) as cnt
FROM "sampledb"."myalb_logs"
WHERE client_ip = '13.**.**.105' AND parse_datetime(request_creation_time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
     BETWEEN parse_datetime('2022-09-01-00:00:00','yyyy-MM-dd-HH:mm:ss') 
     AND parse_datetime('2022-09-30-23:59:00','yyyy-MM-dd-HH:mm:ss')
GROUP BY from_unixtime(floor(to_unixtime(from_iso8601_timestamp(request_creation_time)) / 300 )*300) , client_ip
ORDER BY cnt DESC

I have written three queries for different use cases

  • Requests count for all clients grouped by every 5 minutes
  • Find the request counts of all clients based on the 5-minute usage. sort Higher to lower
  • Find the usage pattern of a single client_ip (user)

All these data would help us to retrospect our application and to set some fair usage policy and rate limiting.

Hope this helps

 

Find the requests closed by the Client abruptly - HTTP status code 499 or 460

At times, when you are trying to troubleshoot a user's session, you might find a status code 460 on the ALB logs, It is a response of ALB to indicate that the client has closed the connection

This is like the nginx 499 status code to indicate that the client has closed the connection

In a system architecture where ALB and NGINX both are used together. when the client has closed the connection due to network issues or some browser issues. you might see 499 at the nginx level and 460 at the alb level.

Here is the diagram I have created, that represents this scenario

alb athena status code

you can see when the client closes the connection. ALB responds with 460.

You can use the following ALB Athena SQL query to find such client-closed connections with alb response code 460

SELECT request_url, client_ip, elb_status_code, target_status_code, count(request_url) as totalrequests FROM "sampledb"."alblogs"
WHERE elb_status_code = 460 AND
parse_datetime(request_creation_time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
     BETWEEN parse_datetime('2022-12-01-00:00:00','yyyy-MM-dd-HH:mm:ss') 
     AND parse_datetime('2022-12-08-23:59:00','yyyy-MM-dd-HH:mm:ss') AND 
request_url like '%/some/pattern/to/search%'
GROUP BY elb_status_code,request_url, client_ip, elb_status_code, target_status_code
ORDER by totalrequests desc

In this query, we are searching for a URL pattern as well as a status code between the time range.

 

There are more SQL Queries. Coming up

I know we have covered only a few SQL queries/examples to show the capability of ALB logs and Athena.

But there are various fields that we are yet to explore and the possibility is huge.

So we will try to add more examples to this article consistently.

You can comment on your favourite/Handy Athena SQL queries for SQL troubleshooting and get it featured here.

Hope this helps.

 

Cheers
Sarav

Follow me on Linkedin My Profile
Follow DevopsJunction onFacebook orTwitter
For more practical videos and tutorials. Subscribe to our channel

Buy Me a Coffee at ko-fi.com

Signup for Exclusive "Subscriber-only" Content

Loading