Scalability and Flexibility
With EMR, you can quickly run your module in a Cluster composed by multiple instance groups. In this way, for example, you can use On-Demand Instances in one group for guaranteed processing power together with Spot Instances in another group to have your jobs completed faster and for lower costs. Moreover, EMR Clusters are scalable in any moment, in order to run algorithm always in a tailored environment.
Additionally, EMR allows to use different storage layers, HDFS or EMRFS. In the first case, data are stored inside HDFS into Core Node of your clusters, avoiding to store permanently these data. In the second case, you can store data on S3 as data layer for applications running on your cluster so that you can separate your compute and storage, and persist data outside of the lifecycle of your cluster.