AWS provides a cloud-based storage service known as Amazon S3. It is one of the most popular tools on the market and provides a wide range of functionality with ease of use. It also supports functionality given through the command line known as AWS CLI.
In this article we will talk in-depth about a specific command known as the AWS Sync Command, understanding its use, syntax, parameters along a few examples.
What is Amazon S3?
Amazon S3 stands for Amazon Simple Storage Service. It’s a well-liked, scalable storage service that provides high-speed solutions, and maybe a web-based cloud storage service. The service is meant for online backup and archiving of knowledge and applications on Amazon Web Services (AWS). Amazon S3 was designed to make web-scale computing easier hence, it comes with a limited feature set.
A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 150+ different sources (including 60+ free sources) to a Data Warehouse such as Amazon Redshift and S3 or Destination of your choice in real-time in an effortless manner. Ensure seamless data migration using features like:
- Seamless integration with your desired data warehouse, such as Redshift.
- Transform and map data easily with drag-and-drop features.
- Real-time data migration to leverage AI/ML features of Redshift.
Still not sure? See how Postman, the world’s leading API platform, used Hevo to save 30-40 hours of developer efforts monthly and found a one-stop solution for all its data integration needs.
Get Started with Hevo for Free
What is AWS CLI?
The AWS instruction Interface (CLI) may be a unified tool to manage your AWS services. With only one tool to download and configure, you’ll control multiple AWS services from the instructions and automate them through scripts. The AWS CLI v2 offers several new features including improved installers, new configuration options like AWS Single Sign-On (SSO), and various interactive features.
What is AWS Sync Command?
The AWS sync command may be a command utilized in the AWS S3 storage. This Command is employed to sync directories to S3 buckets, prefixes, and the other way around. AWS sync command recursively copies new and updated files from the source ( Directory or Bucket/Prefix ) to the destination ( Directory or Bucket/Prefix ). AWS sync command only creates folders within the destination if they contain one or more files.
AWS Sync Command: Syntax
aws s3 sync <S3Uri> <LocalPath>
or
<S3Uri> <S3Uri>
or
<LocalPath> <S3Uri> [options]
Using AWS Sync Options
The AWS sync command has many optional parameters known as options that increase the functionality and increase the specificity of the results obtained from the AWS sync command.
Option | Datatype | Description |
–dryrun | (boolean) | Displays the operations that would be performed using the specified command without actually running them. |
–quiet | (boolean) | Does not display the operations performed from the specified command. |
–include
| (string) | Don’t exclude files or objects in the command that match the specified pattern. See Use of Exclude and Include Filters for details. |
–exclude | (string) | Exclude all files or objects from the command that matches the specified pattern. |
–acl | (string) | Sets the ACL for the object when the command is performed. |
–delete | (boolean) | Files that exist in the destination but not in the source are deleted during sync. |
You can also look at more AWS Sync Command Options from the official documentation.
Syntax for –grants option
--grants Permission=Grantee_Type=Grantee_ID [Permission=Grantee_Type=Grantee_ID ...]
To specify the same permission type for multiple grantees, specify the permission as such as
--grants Permission=Grantee_Type=Grantee_ID,Grantee_Type=Grantee_ID,...
Each value contains the following elements:
- Permission – Specifies the granted permissions, and may be set to read, readacl, writeacl, or full.
- Grantee_Type – Specifies how the grantee is to be identified and may be set to uri or id.
- Grantee_ID – Specifies the grantee supported Grantee_Type. The Grantee_ID value are often one of:
- uri – The group’s URI
- id – The account’s canonical ID
AWS Sync Command: Examples
Example 1:
In this example, the user syncs the bucket to the local current directory. The local current directory contains the files test.txt and test2.txt. The bucket contains no objects:
aws s3 sync . s3://mybucket
Output:
upload: test.txt to s3://mybucket/test.txt
upload: test2.txt to s3://mybucket/test2.txt
The above AWS sync command syncs objects of the bucket to files in a local directory by uploading the local files to s3. A local file will require uploading if the size of the local file is different from the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.
Example 2:
In this example, the user syncs bucket1 to bucket2. Bucket1 contains the objects test.txt and test2.txt. The bucket2 contains no objects:
aws s3 sync s3://mybucket s3://mybucket2
Output:
copy: s3://mybucket/test.txt to s3://mybucket2/test.txt
copy: s3://mybucket/test2.txt to s3://mybucket2/test2.txt
The above AWS sync command syncs objects from one bucket to another bucket by copying s3 objects. An s3 object will require copying if the sizes of the two s3 objects differ, the last modified time of the source is newer than the last modified time of the destination, or the s3 object does not exist under the specified bucket and prefix destination.
Example 3:
In this example, the user syncs the current local directory to the bucket. The bucket contains the objects test.txt and test2.txt. The current local directory has no files:
aws s3 sync s3://mybucket .
Output
download: s3://mybucket/test.txt to test.txt
download: s3://mybucket/test2.txt to test2.txt
The above AWS sync command syncs files in a local directory to objects of the bucket by downloading s3 objects. An s3 object will require downloading if the size of the s3 object differs from the size of the local file, the last modified time of the s3 object is newer than the last modified time of the local file, or the s3 object does not exist in the local directory. Take note that when objects are downloaded from s3, the last modified time of the local file is changed to the last modified time of the s3 object.
Example 4:
In this example, the user syncs the bucket to the local current directory. The local current directory contains the files test.txt and test2.txt. The bucket contains the object test3.txt:
aws s3 sync . s3://mybucket --delete
Output:
upload: test.txt to s3://mybucket/test.txt
upload: test2.txt to s3://mybucket/test2.txt
delete: s3://mybucket/test3.txt
The above AWS sync command syncs objects of bucket to files in a local directory by uploading the local files to s3. Because the –delete parameter flag is thrown, any files existing under the specified prefix and bucket but not existing in the local directory will be deleted.
Example 5:
In this example, the user syncs the bucket to the local current directory. The local current directory contains the files test.jpg and test2.txt. The bucket contains the object test.jpg of a different size than the local test.jpg:
aws s3 sync . s3://mybucket --exclude "*.jpg"
Output
upload: test2.txt to s3://mybucket/test2.txt
The above AWS sync command syncs objects of the bucket to files in a local directory by uploading the local files to s3. Because the –exclude parameter flag is thrown, all files matching the pattern existing both in s3 and locally will be excluded from the sync.
Example 6:
In this example, the user syncs the local current directory to the bucket. The local current directory contains the files test.txt and another/test2.txt. The bucket contains the objects another/test5.txt and test1.txt:
aws s3 sync s3://mybucket/ . --exclude "*another/*"
Output:
The AWS sync command syncs files between two buckets in different regions:
aws s3 sync s3://my-us-west-2-bucket s3://my-us-east-1-bucket --source-region us-west-2 --region us-east-1
Sync to an S3 access point
The AWS sync command syncs the current directory to the access point :
aws s3 sync . s3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/
Output
upload: test.txt to s3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/test.txt
upload: test2.txt to s3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/test2.txt
Integrate Amazon S3 to Redshift
Integrate Amazon Ads to MySQL Amazon Aurora
Integrate Amazon RDS to Amazon S3
Conclusion
This article gave a comprehensive guide on the AWS sync command, its use cases, parameters, and options it offers as well as a few use case examples.
Amazon S3 is a trusted Source that a lot of companies use and store data as it is simple to use but as the volume of data increases moving into a larger solution becomes necessary and moving data from S3 to a larger data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.
Sign up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.
Frequently Asked Questions
Q1) What is AWS Sync?
AWS Sync is a command that lets you synchronize files or directories between two locations on AWS, like S3 buckets or local folders, keeping them up to date.
Q2) What is the difference between AWS Copy and AWS Sync?
Aws copy (cp) moves individual files or folders, while aws sync compares two locations and only transfers new or updated files, making it more efficient for ongoing backups.
Q3) What is AWS DataSync used for?
AWS DataSync is a service to automate and speed up the transfer of data between on-premises storage and AWS services like S3 or EFS, useful for migrations or backups.
Arsalan is a research analyst at Hevo and a data science enthusiast with over two years of experience in the field. He completed his B.tech in computer science with a specialization in Artificial Intelligence and finds joy in sharing the knowledge acquired with data practitioners. His interest in data analysis and architecture drives him to write nearly a hundred articles on various topics related to the data industry.