Amazon S3 Logs help you keep track of data access and maintain a detailed record of each request. These include resources specified in the request, request type, along with the date and time the request was processed. Once you enable the logging process, these are written to an Amazon S3 bucket. For auditing and compliance measures, you can maintain the Logs using either the AWS Server Access Logging, or AWS CloudTrail Logging, or use a combination of both.
This article will talk about Amazon S3 Logs and AWS Server Access Logging in particular. This article will also explore the format, delivery process, and Analysis Reports of Amazon S3 Logs.
Table of Contents
What are Amazon S3 Logs?
Image Source
Amazon S3 Logs (server access logs here) are used to keep detailed records of the requests made to an Amazon S3 bucket. Amazon S3 Logging gives you web-server-like access to the objects in an Amazon S3 bucket. The key features of this type of Amazon S3 Logs are:
- It is granular to the object.
- Non-API access is included, for example, static website browsing.
- It provides comprehensive information about the Amazon S3 Logs like HTTPStatus, ErrorCode, and BucketOwner to name a few.
- It also provides information regarding Life Cycle Expiration, Restores, and Transitions.
You can use these Amazon S3 Logs for security and access audits. It can also be used to give you a deeper insight into your customer base and understand the components of your Amazon S3 bill.
The Amazon S3 Logs consist of a sequence of Log records delimited by a newline. Every Log record stands for a request that contains space-delimited fields. Here is an instance of an Amazon S3 Log that contains three records.
1. 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 3E57427F3EXAMPLE REST.GET.VERSIONING - "GET /awsexamplebucket1?versioning HTTP/1.1" 200 - 113 - 7 - "-" "S3Console/0.4" - s9lzHYrFp76ZVxRcpX9+5cjAnEH2ROuNkd2BHfIa6UkFVdtjf5mKR3/eTPFvsiP/XV/VLi31234= SigV2 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1
2. 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 891CE47D2EXAMPLE REST.GET.LOGGING_STATUS - "GET /awsexamplebucket1?logging HTTP/1.1" 200 - 242 - 11 - "-" "S3Console/0.4" - 9vKBE6vMhrNiWHZmb2L0mXOcqPGzQOI5XLnCtZNPxev+Hf+7tpT6sxDwDty4LHBUOZJG96N1234= SigV2 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1
3. 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be A1206F460EXAMPLE REST.GET.BUCKETPOLICY - "GET /awsexamplebucket1?policy HTTP/1.1" 404 NoSuchBucketPolicy 297 - 38 - "-" "S3Console/0.4" - BNaBsXZQQDbssi6xMBdBU2sLt+Yf5kZDmeBUP35sFoKa3sLLeMC78iwEIWxs99CRUrbS4n11234= SigV2 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1
Image Source
Here is a list of Amazon S3 Logs fields, which will help you understand the log records mentioned above:
- Bucket Owner: The Bucket Owner is the user Id(canonical) of the Amazon S3 source bucket. This user Id is another type of AWS account Id. This is how the Bucket Owner is represented in Amazon S3 Logs:
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be
- Bucket: This is the name of the bucket for which the request was made. If the request received by the system is incorrect, this might lead to the system’s inability to determine the bucket. This also means that the request won’t appear in the server access logs. This is how the bucket name is represented in Amazon S3 logs:
awsexamplebucket1
Image Source
- Time: This refers to the time when the request was received. The dates and times are written in the UTC (Coordinated Universal Time) format. Using the strftime() function, the time in Amazon S3 Logs is represented as follows:
[06/Feb/2019:00:00:38 +0000]
- Remote IP: This is the internet address of the requester as seen by the server. Firewalls and intermediate proxies might mask the actual IP address of the machine that made the request. Here is an example of Remote IP:
192.0.2.3
- Requester: This refers to the user Id of the requester. For requests that are unauthenticated, a – is used. On the other hand, if the user is an IAM user, the Requester field returns the IAM user name of the requester and the AWS root account of the IAM user. This field is also used for the purpose of access control.
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be
- Operation: This field is written as
REST.HTTP_method.resource_type,WEBSITE.HTTP_method.resource_type, SOAP.operation, or S3.action.resource_type, or BATCH.DELETE.OBJECT.
Here is an example of the representation of the Operation field in Amazon S3 Logs.
REST.PUT.OBJECT
- Request Id: Amazon S3 uses Request Id, a string to uniquely identify each request. This is what a Request Id looks like in Amazon S3 Logs:
3E57427F33A59F07
- Request URI: It is the Request URI part of the HTTP request message. Here is an example of a Request URI:
"GET /awsexamplebucket1/photos/2019/08/puppy.jpg?x-foo=bar HTTP/1.1"
- Key: This refers to the “Key” portion of the HTTP request message represented as an encoded URL. It is represented as a – when no key parameter is specified. Here is what a Key looks like in Amazon S3 Logs:
/photos/2019/08/puppy.jpg
- Error Code: This refers to the Amazon S3 error code. It is represented as a – in case no error occurred. Otherwise, this is the representation of the error code:
NoSuchBucket
- Bytes Sent: This refers to the number of bytes sent in response to the HTTP request. This does not include the HTTP protocol overhead. Here is an example:
2662992
- HTTP Status: This refers to the HTTP status code of the HTTP response. It is represented as follows:
200
- Total Time: This field measures the number of milliseconds for which the request was in transit from the perspective of the server. This is measured from the time your request is received till the time the last byte of the HTTP response is sent. If this measurement was taken from the client’s perspective, network latency would factor in to make the measurement longer. Here is how it is represented:
70
- Object Size: This refers to the total size of the object. This is how it is represented:
3462992
- Turnaround Time: This is the time spent by the Amazon S3 server to process your request. This is measured from the time the last byte of your request was received till the first byte of your response was sent. This is how it is represented in Amazon S3 Logs:
10
- User-Agent: This is the value of the HTTP User-Agent Header. This is how it is represented:
"curl/7.15.1"
- Referrer: This field talks about the value of the Referrer header if present. Generally, this field is the URL associated with the linking or embedding page, when you make a request. This is what it looks like:
"http://www.amazon.com/webservices"
- Host Id: This field refers to the x-amz-id-2 or Amazon S3 extended request Id. This is how it looks:
s9lzHYrFp76ZVxRcpX9+5cjAnEH2ROuNkd2BHfIa6UkFVdtjf5mKR3/eTPFvsiP/XV/VLi31234=
- Version Id: This is the Version Id of the request. This is how it looks:
3HL4kqtJvjVBH40Nrjfkd
- Cipher Suite: This is the SSL (Secure Sockets Layer) cipher used for an HTTPS request. This is how it looks:
ECDHE-RSA-AES128-GCM-SHA256
- Signature Version: This is the version of the Amazon S3 bucket signature, SigV2 or SigV4. This is used to authenticate the requests. This is how it looks:
SigV2
- Authentication Type: There are two types of authentication used. QueryString for query strings, and AuthHeader for authentication headers. This is how it looks:
AuthHeader
- TLS Version: The Transport Layer Security (TLS) version that is used by the client. It can be any one of the following: TLSv1, TLSv1.1, TLSv1.2, or – if the client is not using TLS. This is how it looks:
TLSv1.2
- Host Header: This is the endpoint that you can use to link to Amazon S3. This is how it looks:
S3.us-west-2.amazonaws.com
Image Source
A fully-managed No-code Data Pipeline platform like Hevo helps you integrate and load data from Amazon S3 (among 100+ different sources) to a destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources provides users with the flexibility to bring in data of different kinds, in a smooth fashion without having to code a single line.
Get Started with Hevo for Free
Check out some of the cool features of Hevo:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources like Amazon S3, that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!
Understanding the Amazon S3 Logs Analysis Reports
Once you enable Access Logging, the Amazon S3 Logs are written to the Amazon S3 bucket. Once you grant Cloud Security Plus read access to this Amazon S3 bucket, you can analyze and represent the Amazon S3 Logs as Traffic Analysis Reports. Traffic Analysis Reports can be used for:
- Understanding the error conditions.
- A better understanding of the data access patterns.
- Analyzing the security and access audits.
Here are the different kinds of Traffic Analysis Reports available at your expense:
- Requests Based on HTTP Status: These display the results corresponding to an HTTP status code. Say, for instance, if the HTTP status code is shown as 404 then the requests based on HTTP status will display unsuccessful results.
- Operation-Based Requests: Here the report results are based on the data request operation that you enter in a given field. For instance, if the “REST.GET.OBJECT” is the entry, then the corresponding results are displayed.
- Remote IP-Based Requests: The requests made by any remote IP are displayed under Remote IP-Based requests.
- S3 Access Requests: All the details regarding every data request made to an Amazon S3 bucket presented in the Cloud Security Plus Console, are presented.
- Error Access Requests: These Traffic Analysis Reports show the inaccurate requests made by the users. It provides the HTTP error code and the details of the error code.
How to Enable Amazon S3 Access Logs?
Even if they use your root account, Amazon S3 bucket logging provides detailed information on object requests and requesters.
To enable S3 server access logging, the steps to be carried out are as follows:
- Step 1: Navigate to Amazon S3 Console.
- Step 2: Choose the bucket where you want to enable logging.
- Step 3: Now, left-click on the bucket.
- Step 4: Go to the Properties section.
- Step 5: Select the “Server Access Logging” tile. Server access logging dialog appears.
- Step 6: Check the “Enable logging” field.
- Step 7: Enter the name of the target bucket. Choose a target prefix that will help distinguish your logs. Target Bucket and main bucket should be different but in the same AWS region for the Amazon S3 bucket logging to work properly.
- Step 8: Click on the “Save” button. Logging for the Amazon S3 bucket is now enabled, and logs will be available for download in 24 hours.
How to Get Access to Amazon S3 Bucket Logs and Read Them?
You can use MSP360 Explorer for Amazon S3, which includes a log viewer to make reading easier. The steps to be followed are as follows:
- Step 1: Right-click the bucket for which you enabled logging.
- Step 2: Select “Logging” and then click on “View Server Access Log”.
Image Source
- Step 3: A new window pane will appear, displaying a complete bucket log for a specific time period.
Image Source
You can interpret the most important parameters to determine who and when had to access and edited the objects.
- Remote IP: The IP address of the user who performed the operation is displayed. It should be kept in mind that proxies and firewalls can conceal the actual address.
- Requester: The unique identifier of the user who requested the file in your bucket. If the user was not authorized, the entry will be “Anonymous,” but if the user has an IAM role, it will return the IAM user name as well as the root AWS account to which the IAM user belongs.
- Operation: It contains a list of the operations performed on the file and the bucket.
- Object Size: It determines the total size of the requested object.
You’ve now enabled Amazon S3 server access logging for a specific bucket in order to improve account security and monitor user operations over time.
Monitor Amazon S3 Lifecycle Management
Customers frequently want to how they can tell if their S3 Lifecycle rules are functioning properly. S3 server access logging includes information on S3 Lifecycle processing activity, such as object expirations and object transitions.
In this example, a new data frame is created for logs stored in the same centralized logging bucket but with a different prefix. This time, the prefix corresponds to the name of an S3 bucket with lifecycle rules enabled.
lifecycle_log_objects = []
paginator = s3_client.get_paginator('list_objects_v2')
result = paginator.paginate(Bucket = bucket, Prefix = 'demo-lifecycle')
for each in result:
key_list = each['Contents']
for key in key_list:
lifecycle_log_objects.append(key['Key'])
lifecycle_log_data = []
for lifecycle_log in lifecycle_log_objects:
lifecycle_log_data.append(pd.read_csv('s3://' + bucket + '/' + lifecycle_log, sep = " ", names=['Bucket_Owner', 'Bucket', 'Time', 'Time_Offset', 'Remote_IP', 'Requester_ARN/Canonical_ID',
'Request_ID',
'Operation', 'Key', 'Request_URI', 'HTTP_status', 'Error_Code', 'Bytes_Sent', 'Object_Size',
'Total_Time',
'Turn_Around_Time', 'Referrer', 'User_Agent', 'Version_Id', 'Host_Id', 'Signature_Version',
'Cipher_Suite',
'Authentication_Type', 'Host_Header', 'TLS_version'],
usecols=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]))
lifecycle_df = pd.concat(lifecycle_log_data)
lifecycle_df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4609 entries, 0 to 0
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Bucket_Owner 4609 non-null object
1 Bucket 4609 non-null object
2 Time 4609 non-null object
3 Time_Offset 4609 non-null object
4 Remote_IP 4609 non-null object
5 Requester_ARN/Canonical_ID 4609 non-null object
6 Request_ID 4609 non-null object
7 Operation 4609 non-null object
8 Key 4609 non-null object
9 Request_URI 4609 non-null object
10 HTTP_status 4609 non-null object
11 Error_Code 4609 non-null object
12 Bytes_Sent 4609 non-null object
13 Object_Size 4609 non-null object
14 Total_Time 4609 non-null object
15 Turn_Around_Time 4609 non-null object
16 Referrer 4609 non-null object
17 User_Agent 4609 non-null object
18 Version_Id 4526 non-null object
19 Host_Id 4609 non-null object
20 Signature_Version 4609 non-null object
21 Cipher_Suite 4609 non-null object
22 Authentication_Type 4609 non-null object
23 Host_Header 4609 non-null object
24 TLS_version 4609 non-null object
dtypes: object(25)
memory usage: 936.2+ KB
Step 1: Get a count of lifecycle operations performed
For this test, 40 objects are uploaded to three different prefixes in the Amazon S3 bucket, and rules are applied based on prefix name to expire or transition to S3 Glacier Deep Archive. Following that, additional objects are added to the expiration prefix to provide more examples.
lifecycle_df[(lifecycle_df['Requester_ARN/Canlifecycle_df[(lifecycle_df['Requester_ARN/Canonical_ID'] == 'AmazonS3')]['Operation'].value_counts()
onical_ID'] == 'AmazonS3')]['Operation'].value_counts()
Output:
S3.EXPIRE.OBJECT 180
S3.CREATE.DELETEMARKER 46
S3.TRANSITION_GDA.OBJECT 45
S3.TRANSITION.OBJECT 41
Name: Operation, dtype: int64
Step 2: Get a list of objects that have been expired and the date they were expired
By changing the API call in the operation, you can also use this to generate reports on object transitions. The Time and Time–Offset columns are joined to a single Date column in this example.
lifecycle_df['Date'] = lifecycle_df[['Time', 'Time_Offset']].agg(' '.join, axis=1)
lifecycle_df[(lifecycle_df['Operation'] == 'S3.EXPIRE.OBJECT')][['Key', 'Date']]
Output:
| Key | Date |
0 | folder2/test001.txt | [14/Dec/2021:10:50:38 +0000] |
1 | folder2/test002.txt | [14/Dec/2021:10:50:38 +0000] |
0 | expire/ | [01/Jul/2021:18:59:38 +0000] |
1 | expire/test21.txt | [01/Jul/2021:18:59:39 +0000] |
2 | expire/test12.txt | [01/Jul/2021:18:59:39 +0000] |
… | … | … |
41 | expiration/test39.txt | [04/Nov/2021:18:47:52 +0000] |
42 | expiration/test43.txt | [04/Nov/2021:18:47:52 +0000] |
43 | expiration/test41.txt | [04/Nov/2021:18:47:52 +0000] |
44 | expiration/test44.txt | [04/Nov/2021:18:47:53 +0000] |
45 | expiration/test45.txt | [04/Nov/2021:18:47:53 +0000] |
Step 3: Get a list of objects that were expired on a specific day
lifecycle_df['Date'] = lifecycle_df[['Time', 'Time_Offset']].agg(' '.join, axis=1)
lifecycle_df[(lifecycle_df['Operation'] == 'S3.EXPIRE.OBJECT') & (lifecycle_df['Date'].str.contains('01/Jul/2021'))][['Key', 'Date']]
Output:
| Key | Date |
0 | expire/ | [01/Jul/2021:18:59:38 +0000] |
1 | expire/test21.txt | [01/Jul/2021:18:59:39 +0000] |
2 | expire/test12.txt | [01/Jul/2021:18:59:39 +0000] |
3 | expire/test39.txt | [01/Jul/2021:18:59:39 +0000] |
4 | expire/test17.txt | [01/Jul/2021:18:59:39 +0000] |
5 | expire/test32.txt | [01/Jul/2021:18:59:39 +0000] |
6 | expire/test26.txt | [01/Jul/2021:18:59:39 +0000] |
7 | expire/test10.txt | [01/Jul/2021:18:59:39 +0000] |
8 | expire/test34.txt | [01/Jul/2021:18:59:39 +0000] |
9 | expire/test27.txt | [01/Jul/2021:18:59:39 +0000] |
10 | expire/test19.txt | [01/Jul/2021:18:59:39 +0000] |
11 | expire/test29.txt | [01/Jul/2021:18:59:39 +0000] |
12 | expire/test36.txt | [01/Jul/2021:18:59:39 +0000] |
13 | expire/test15.txt | [01/Jul/2021:18:59:39 +0000] |
14 | expire/test20.txt | [01/Jul/2021:18:59:39 +0000] |
15 | expire/test14.txt | [01/Jul/2021:18:59:39 +0000] |
16 | expire/test33.txt | [01/Jul/2021:18:59:39 +0000] |
17 | expire/test07.txt | [01/Jul/2021:18:59:39 +0000] |
18 | expire/test02.txt | [01/Jul/2021:18:59:39 +0000] |
19 | expire/test22.txt | [01/Jul/2021:18:59:39 +0000] |
20 | expire/test38.txt | [01/Jul/2021:18:59:39 +0000] |
21 | expire/test06.txt | [01/Jul/2021:18:59:39 +0000] |
22 | expire/test03.txt | [01/Jul/2021:18:59:39 +0000] |
23 | expire/test37.txt | [01/Jul/2021:18:59:39 +0000] |
24 | expire/test04.txt | [01/Jul/2021:18:59:39 +0000] |
25 | expire/test23.txt | [01/Jul/2021:18:59:39 +0000] |
26 | expire/test25.txt | [01/Jul/2021:18:59:39 +0000] |
27 | expire/test13.txt | [01/Jul/2021:18:59:39 +0000] |
28 | expire/test01.txt | [01/Jul/2021:18:59:39 +0000] |
29 | expire/test30.txt | [01/Jul/2021:18:59:39 +0000] |
30 | expire/test28.txt | [01/Jul/2021:18:59:39 +0000] |
31 | expire/test16.txt | [01/Jul/2021:18:59:39 +0000] |
32 | expire/test18.txt | [01/Jul/2021:18:59:39 +0000] |
33 | expire/test24.txt | [01/Jul/2021:18:59:39 +0000] |
34 | expire/test11.txt | [01/Jul/2021:18:59:39 +0000] |
35 | expire/test40.txt | [01/Jul/2021:18:59:39 +0000] |
36 | expire/test05.txt | [01/Jul/2021:18:59:39 +0000] |
37 | expire/test08.txt | [01/Jul/2021:18:59:39 +0000] |
38 | expire/test35.txt | [01/Jul/2021:18:59:39 +0000] |
39 | expire/test31.txt | [01/Jul/2021:18:59:39 +0000] |
40 | expire/test09.txt | [01/Jul/2021:18:59:39 +0000] |
Step 4: Write a list of expired object keys to a file.
expired_object_keys = []
expired_object_keys.append(lifecycle_df[(lifecycle_df['Operation'] == 'S3.EXPIRE.OBJECT')]['Key'])
with open('expired_objects_list.csv', 'w' ) as f:
for key in expired_object_keys:
f.write("%sn" % key)
Step 5: Get the UTC timestamp when a specific key was expired
lifecycle_df['Date'] = lifecycle_df[['Time', 'Time_Offset']].agg(' '.join, axis=1)
expirations = lifecycle_df[(lifecycle_df['Operation'] == 'S3.EXPIRE.OBJECT')]
expirations[(expirations['Key'] == 'expiration/test25.txt')][['Key','Date']]
Output:
| Key | Date |
26 | expiration/test25.txt | [07/Aug/2021:00:34:36 +0000] |
30 | expiration/test25.txt | [03/Nov/2021:21:16:20 +0000] |
37 | expiration/test25.txt | [04/Nov/2021:18:47:51 +0000] |
Conclusion
This article talks about Amazon S3 Logs in detail while exploring the format of Amazon S3 Logs and the Analysis Reports that you can use to understand the data access patterns, error conditions, and access audits.
Visit our Website to Explore Hevo
Extracting complex data from a diverse set of data sources can be a challenging task and this is where Hevo saves the day! Hevo offers a faster way to move data from Amazon S3 and other Databases or SaaS applications into your desired destination to be visualized in a BI tool.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Share your experience of learning about Amazon S3 Logs in the comments section below!