AWS Sync Command: A Comprehensive Guide with 6 Examples

on Amazon S3, AWS, AWS Commandline Interface, AWS Commands, Data Storage • January 7th, 2022 • Write for Hevo

aws sync - featured image

AWS provides a cloud-based storage service known as Amazon S3. It is one of the most popular tools on the market and provides a wide range of functionality with ease of use. It also supports functionality given through the command line known as AWS CLI.

In this article we will talk in-depth about a specific command known as the AWS Sync Command, understanding its use, syntax, parameters along a few examples.

Table of Contents

What is Amazon S3?

AWS Sync: amazon s3
Image Source: www.split.io

Amazon S3 stands for Amazon Simple Storage Service. it’s a well-liked storage service that’s scalable, provides high-speed solutions, and maybe a web-based cloud storage service. The service is meant for online backup and archiving of knowledge and applications on Amazon Web Services (AWS). Amazon S3 was designed to make web-scale computing easier hence it comes with a limited feature set.

What is AWS CLI?

AWS Sync: AWS CLI
Image Source: www.cloudmantra.net

The AWS instruction Interface (CLI) may be a unified tool to manage your AWS services. With only one tool to download and configure, you’ll control multiple AWS services from the instruction and automate them through scripts. The AWS CLI v2 offers several new features including improved installers, new configuration options like AWS Single Sign-On (SSO), and various interactive features.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

GET STARTED WITH HEVO FOR FREE

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL

What is AWS Sync Command?

AWS Sync: AWS sync in s3
Image Source: www.cleo.com

The AWS sync command may be a command utilized in the AWS S3 storage. This Command is employed to sync directories to S3 buckets, prefixes, and the other way around . AWS sync command recursively copies new and updated files from the source ( Directory or Bucket/Prefix ) to the destination ( Directory or Bucket/Prefix ). AWS sync command only creates folders within the destination if they contain one or more files.

AWS Sync Command: Syntax

aws s3 sync <S3Uri> <LocalPath>

or

<S3Uri> <S3Uri>

or

<LocalPath> <S3Uri> [options]

AWS Sync Command: Options

The AWS sync command has many optional parameters known as options that increase the functionality and increase the specificity of the results obtained from the AWS sync command.

Option DatatypeDescription
–dryrun (boolean)Displays the operations that would be performed using the specified command without actually running them.
–quiet (boolean)Does not display the operations performed from the specified command.
–include
(string) Don’t exclude files or objects in the command that match the specified pattern. See Use of Exclude and Include Filters for details.
–exclude (string)Exclude all files or objects from the command that matches the specified pattern.
–acl (string)Sets the ACL for the object when the command is performed.
–follow-symlinks

or

–no-follow-symlinks
(boolean)Symbolic links are followed only when uploading to S3 from the local filesystem. Note that S3 does not support symbolic links, so the contents of the link target are uploaded under the name of the link. When neither –follow-symlinks
nor –no-follow-symlinks is specified, the default is to follow symlinks.
–no-guess-mime-type (boolean)Do not try to guess the mime type for uploaded files. By default, the mime type of a file is guessed when it is uploaded.
–sse (string)Specifies server-side encryption of the object in S3.
–sse-c (string)Specifies server-side encryption using customer-provided keys of the object in S3.
–sse-kms-key-id(string)The customer-managed AWS Key Management Service (KMS) key ID should be used to server-side encrypt the object in S3.
–storage-class(string)The type of storage to use for the object. Valid choices are:
STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | GLACIER | DEEP_ARCHIVE | GLACIER_IR.
Defaults to ‘STANDARD’
–website-redirect (string)If the bucket is configured as a website, redirect requests for this object to another object in the same bucket or to an external URL.
–content-type (string) Specify an explicit content type for this operation. This value overrides any guessed mime types.
–cache-control(string)Specifies caching behavior along the request/reply chain.
–content-disposition (string)Specifies presentational information for the object.
–content-encoding
(string) Specifies what content encodings have been applied to the object and thus what decoding mechanisms must be applied to obtain the media-type referenced by the Content-Type header field.
–content-language (string) The language the content is in.
–expires(string)The date and time at which the object is no longer cacheable.
–source-region (string) When transferring objects from an s3 bucket to an s3 bucket, this specifies the region of the source bucket.
–only-show-errors (boolean) Only errors and warnings are displayed. All other output is suppressed.
–no-progress
(boolean) File transfer progress is not displayed. This flag is only applied when the quiet and only-show-errors flags are not provided.
–page-size
(integer)The number of results to return in each response to a list operation. The default value is 1000 (the maximum allowed). Using a lower value may help if an operation times out.
–ignore-glacier-warnings
(boolean) Turns off glacier warnings. Warnings about an operation that cannot be performed because it involves copying, downloading, or moving a glacier object will no longer be printed to standard error and will no longer cause the return code of the command to be 2.
–force-glacier-transfer (boolean)Forces a transfer request on all Glacier objects in sync or recursive copy.
–request-payer (string) Confirms that the requester knows that they will be charged for the request. Bucket owners need not specify this parameter in their requests.
–metadata-directive(string) Specifies whether the metadata is copied from the source object or replaced with metadata provided when copying S3 objects.
–size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.
–exact-timestamps (boolean)When syncing from S3 to local, same-sized items will be ignored only when the timestamps match exactly. The default behavior is to ignore same-sized items unless the local version is newer than the S3 version.
–delete(boolean)Files that exist in the destination but not in the source are deleted during sync.
–grants (string)Grant specific permissions to individual users or groups.

Syntax for –grants option

--grants Permission=Grantee_Type=Grantee_ID [Permission=Grantee_Type=Grantee_ID ...]

To specify the same permission type for multiple grantees, specify the permission as such as

--grants Permission=Grantee_Type=Grantee_ID,Grantee_Type=Grantee_ID,...

Each value contains the following elements:

  • Permission – Specifies the granted permissions, and may be set to read, readacl, writeacl, or full.
  • Grantee_Type – Specifies how the grantee is to be identified and may be set to uri or id.
  • Grantee_ID – Specifies the grantee supported Grantee_Type. The Grantee_ID value are often one of:
    • uri – The group’s URI
    • id – The account’s canonical ID

Learn more about AWS Sync Command.

AWS Sync Command: Examples

Example 1:

In this example, the user syncs the bucket to the local current directory. The local current directory contains the files test.txt and test2.txt. The bucket contains no objects:

aws s3 sync . s3://mybucket

Output:

upload: test.txt to s3://mybucket/test.txt
upload: test2.txt to s3://mybucket/test2.txt

The above AWS sync command syncs objects of the bucket to files in a local directory by uploading the local files to s3. A local file will require uploading if the size of the local file is different from the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.

Example 2:

In this example, the user syncs bucket1 to bucket2. Bucket1 contains the objects test.txt and test2.txt. The bucket2 contains no objects:

aws s3 sync s3://mybucket s3://mybucket2

Output:

copy: s3://mybucket/test.txt to s3://mybucket2/test.txt
copy: s3://mybucket/test2.txt to s3://mybucket2/test2.txt

The above AWS sync command syncs objects from one bucket to another bucket by copying s3 objects. An s3 object will require copying if the sizes of the two s3 objects differ, the last modified time of the source is newer than the last modified time of the destination, or the s3 object does not exist under the specified bucket and prefix destination.

Example 3:

In this example, the user syncs the current local directory to the bucket. The bucket contains the objects test.txt and test2.txt. The current local directory has no files:

aws s3 sync s3://mybucket .

Output

download: s3://mybucket/test.txt to test.txt
download: s3://mybucket/test2.txt to test2.txt

The above AWS sync command syncs files in a local directory to objects of the bucket by downloading s3 objects. An s3 object will require downloading if the size of the s3 object differs from the size of the local file, the last modified time of the s3 object is newer than the last modified time of the local file, or the s3 object does not exist in the local directory. Take note that when objects are downloaded from s3, the last modified time of the local file is changed to the last modified time of the s3 object.

Example 4:

In this example, the user syncs the bucket to the local current directory. The local current directory contains the files test.txt and test2.txt. The bucket contains the object test3.txt:

aws s3 sync . s3://mybucket --delete

Output:

upload: test.txt to s3://mybucket/test.txt
upload: test2.txt to s3://mybucket/test2.txt
delete: s3://mybucket/test3.txt

The above AWS sync command syncs objects of bucket to files in a local directory by uploading the local files to s3. Because the –delete parameter flag is thrown, any files existing under the specified prefix and bucket but not existing in the local directory will be deleted.

Example 5:

In this example, the user syncs the bucket to the local current directory. The local current directory contains the files test.jpg and test2.txt. The bucket contains the object test.jpg of a different size than the local test.jpg:

aws s3 sync . s3://mybucket --exclude "*.jpg"

Output

upload: test2.txt to s3://mybucket/test2.txt

The above AWS sync command syncs objects of the bucket to files in a local directory by uploading the local files to s3. Because the –exclude parameter flag is thrown, all files matching the pattern existing both in s3 and locally will be excluded from the sync.

Example 6:

In this example, the user syncs the local current directory to the bucket. The local current directory contains the files test.txt and another/test2.txt. The bucket contains the objects another/test5.txt and test1.txt:

aws s3 sync s3://mybucket/ . --exclude "*another/*"

Output:

The AWS sync command syncs files between two buckets in different regions:

aws s3 sync s3://my-us-west-2-bucket s3://my-us-east-1-bucket --source-region us-west-2 --region us-east-1

Sync to an S3 access point

The AWS sync command syncs the current directory to the access point :

aws s3 sync . s3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/

Output

upload: test.txt to s3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/test.txt
upload: test2.txt to s3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/test2.txt

Conclusion

This article gave a comprehensive guide on the AWS sync command, its use cases, parameters, and options it offers as well as a few use case examples.

Amazon S3 is a trusted Source that a lot of companies use and store data as it is simple to use but as the volume of data increases moving into a larger solution becomes necessary and moving data from S3 to a larger data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.

visit our website to explore hevo

Hevo can help you Integrate your data from numerous sources and load them into a destination to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.

SIGN UP for a 14-day free trial and see the difference!

Share your experience of learning about AWS sync command in the comments section below.

No-code Data Pipeline For Your Data Warehouse