AWS Data Ingestion Methods

After going through this medium post, you would have a good idea to choose a AWS service for data ingestion. You must spend few minutes on flow diagrams which help in understanding data flow and each step.

Don’t forget to read the conclusion!

How AWS helps in data ingestion

AWS architecture offers services and capabilities to quickly and easily ingest multiple types of data, such as real-time streaming data and bulk data assets from on-premises storage platforms, as well as data generated and processed by legacy on-premises platforms, such as mainframes and data warehouses.

There are 3 services offered by AWS for data ingestion

  1. Amazon Kinesis Firehose
  2. AWS Snowball
  3. AWS Storage Gateway

Amazon Kinesis Firehose

  1. Amazon Kinesis Firehose is a fully managed service for delivering real-time streaming data directly to Amazon S3.
  2. Kinesis Firehose automatically scales to match the volume and throughput of streaming data, and requires no ongoing administration
  3. Kinesis Firehose can also be configured to transform streaming data before it’s stored in Amazon S3. Its transformation capabilities include compression, encryption, data batching, and Lambda functions.

Note: Kinesis Firehose can concatenate multiple incoming records, and then deliver them to Amazon S3 as a single S3 object. This is an important capability because it reduces Amazon S3 transaction costs and transactions per second load.

Kinesis Firehose can invoke Lambda functions to transform incoming source data and deliver it to Amazon S3. Common transformation functions include transforming Apache Log and Syslog formats to standardized JSON and/or CSV formats.

AWS Kinesis Firehose

Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud. Using Snowball addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Migrate bulk data from on-premises storage platforms and Hadoop clusters to S3 buckets.

Follow below Steps:

  1. Create a job in the AWS management console for data transfer using Snowball.
  2. Snowball appliance will be automatically shipped to your address.
  3. After a Snowball arrives, connect it to your local network
  4. Install the Snowball client on your on-premises data source.
  5. Use the Snowball client to select and transfer the file directories to the Snowball device.
  6. Ship the device back to AWS.
  7. Once AWS receives the device, data is then transferred from the Snowball device to S3 bucket and stored as S3 objects in their original/native format.

Notes: The Snowball client uses AES-256-bit encryption. Encryption keys are never shipped with the Snowball device, so the data transfer process is highly secure.

AWS Storage gateway

Integrate legacy on-premises data processing platforms with AWS S3 (Data lakes) using AWS Storage gateway. It uses NFS connection to write the files on mount points.

  1. Files written to this mount point are converted to objects stored in Amazon S3 in their original format.
  2. Integrate applications and platforms that don’t have native Amazon S3 capabilities — such as on-premises lab equipment, mainframe computers, databases, and data warehouses with Amazon S3.

Note: This also allows data transfer from an on-premises Hadoop cluster to an S3 bucket.

Useful link

https://www.youtube.com/watch?v=QaCfOatTIDA

Conclusion

Everyone would have a question at last after reading this.

Which one you should prefer for my business requirements?

A Simple Answer is “It depends”

  1. When you have real time streaming data and you would like to transform, encrypt or compress on the fly, then your preferred choice should be Amazon kinesis firehose.
  2. Incase of large amount of data in petabytes, then instead of transferring massive data on network which consumes network bandwidth and can cost you a lot. Then you should go for AWS Snowball.
  3. When you would like to transfer data to AWS S3 or FSx using SMB protocol or NFS. You can create a storage gateway and join it with active directory domain. Finally, mount storage gateway endpoint in existing on premise virtual machine.

--

--

--

Senior Software Engineer by profession. Technology explorer and instructor by passion. I try to share solutions with wide audience to solve their problems.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

New Genius App Training Resources — INSIGHT HEALTH APPS

recipe for cooking liver feed with chicken and saffron

recipe for cooking liver feed with chicken and saffron, Liver feed with Iranian chicken and saffron, Liver feed with chicken, Liver feed with chicken and saffron, Local food with Iranian saffron, How to prepare Liver feed with chicken and Iranian saffron, Tips for cooking liver feed with Iranian chicken and saffron, How to prepare Types of food, cooking training with saffron

HMS Kits implementation of parking services using IoT applications

Create a multi-container app in Web App for Containers

SAP Central Finance — Reconciliation Approach

The Solution to Circled and Resized BitmapDescriptor Images in Flutter

PHP cookies with NextJS app

Adding Xiaomi Miio devices to Home Assistant. Xiaomi air purifier example [zhimi.airpurifier.m1]

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Naman Jain

Naman Jain

Senior Software Engineer by profession. Technology explorer and instructor by passion. I try to share solutions with wide audience to solve their problems.

More from Medium

What is AWS Redshift ?

HOWTO : Develop AWS Glue ETL scripts on AWS Cloud9

Federated Query in Amazon Redshift for Postgres & MySQL databases

AWS Data Platform — Architecture Primer