Cloud Computing: Unit-1 - Introduction To Cloud Technologies
Cloud Computing: Unit-1 - Introduction To Cloud Technologies
Cloud Computing
Cloud computing is the delivery of on-demand computing services, from applications to storage and
processing power, typically over the internet and on a pay-as-you-go basis.
Cloud Computing is the delivery of computing services such as servers, storage, databases, networking,
software, analytics, intelligence, and more, over the Cloud (Internet).
Cloud Computing provides an alternative to the on-premises datacenter.
With an on-premises datacenter, we have to manage everything, such as purchasing and installing
hardware, virtualization, installing the operating system, and any other required applications, setting up
the network, configuring the firewall, and setting up storage for data.
After doing all the set-up, we become responsible for maintaining it through its entire lifecycle.
But if we choose Cloud Computing, a cloud vendor is responsible for the hardware purchase and
maintenance.
They also provide a wide variety of software and platform as a service.
We can take any required services on rent.
The cloud computing services will be charged based on usage.
The cloud environment provides an easily accessible online portal that makes handy for the user to
manage the compute, storage, network, and application resources.
1. On-demand self-service: A consumer can separately provision computing capabilities, such as server
time and network storage, as needed automatically without requiring human interaction with each
service provider.
2. Broad network access: Capabilities are available over the network and accessed through standard
mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones,
tablets, laptops and workstations).
3. Resource pooling: The provider's computing resources are pooled to serve multiple consumers using a
multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned
according to consumer demand. There is a sense of location independence in that the customer
generally has no control or knowledge over the exact location of the provided resources but may be able
to specify location at a higher level of abstraction (e.g., country, state or datacenter). Examples of
resources include storage, processing, memory and network bandwidth.
4. Rapid elasticity: Capabilities can be elastically provisioned and released, in some cases automatically, to
scale rapidly outward and inward matching with demand. To the consumer, the capabilities available for
provisioning often appear to be unlimited and can be appropriated in any quantity at any time.
5. Measured service: Cloud systems automatically control and optimize resource use by leveraging a
metering capability at some level of abstraction appropriate to the type of service (e.g., storage,
processing, bandwidth and active user accounts). Resource usage can be monitored, controlled and
reported, providing transparency for the provider and consumer.
Vendor Lock-in: Although the cloud service providers assure you that they will allow you to switch or
migrate to any other service provider whenever you want, it is a very difficult process. You will find it
complex to migrate all the cloud services from one service provider to another. During migration, you
might end up facing compatibility, interoperability and support issues. To avoid these issues, many
customers choose not to change the vendor.
Technical issues: Even if you are a tech whiz, the technical issues can occur, and everything can’t be
resolved in-house. To avoid interruptions, you will need to contact your service provider for support.
However, not every vendor provides 24/7 support to their clients.
Cloud Orchestration
Cloud Orchestration is a way to manage, co-ordinate, and provision all the components of a cloud
platform automatically from a common interface.
It orchestrates the physical as well as virtual resources of the cloud platform.
Cloud orchestration is a must because cloud services scale up arbitrarily and dynamically, include
fulfillment assurance and billing, and require workflows in various business and technical domains.
Orchestration tools combine automated tasks by interconnecting the processes running across the
heterogeneous platforms in multiple locations.
Orchestration tools create declarative templates to convert the interconnected processes into a single
workflow.
The processes are so orchestrated that the new environment creation workflow is achieved with a single
API call.
Creation of these declarative templates, though complex and time consuming, is simplified by the
orchestration tools.
Cloud orchestration includes two types of models:
o Single Cloud model
o Multi-cloud model
In Single cloud model, all the applications designed for a system run on the same IaaS platform (same
cloud service provider).
Applications, interconnected to create a single workflow, running on various cloud platforms for the
same organization define the concept of multi-cloud model.
IaaS requirement for some applications, though designed for same system, might vary. This results in
availing services of multiple cloud service providers.
For example, application with patient’s sensitive medical data might reside in some IaaS, whereas the
application for online OPD appointment booking might reside in another IaaS, but they are
interconnected to form one system. This is called multi-cloud orchestration.
Multi-cloud models provide high redundancy as compared to single IaaS deployments.
This reduces the risk of down time.
Elasticity in Cloud
Elasticity covers the ability to scale up but also the ability to scale down.
The idea is that you can quickly provision new infrastructure to handle a high load of traffic.
But what happens after that rush? If you leave all of these new instances running, your bill will skyrocket
as you will be paying for unused resources.
In the worst case scenario, these resources can even cancel out revenue from the sudden rush.
An elastic system prevents this from happening. After a scaled up period, your infrastructure can scale
back down, meaning you will only be paying for your usual resource usage and some extra for the high
traffic period.
The key is that this all happens automatically.
When resource needs meet a certain threshold (usually measured by traffic), the system “knows” that it
needs to de-provision a certain amount of infrastructure, and does so.
With a couple hours of training, anyone can use the AWS web console to manually add or subtract
instances.
But it takes a true Solutions Architect to set up monitoring, account for provisioning time, and configure
a system for maximum elasticity.
Prof. Vijay M. Shekhat, CE Department | 2180712 – Cloud Infrastructure and Services 4
Unit-1 – Introduction to Cloud Technologies
IAAS- Network:
There are two major network services offered by public cloud service providers:
1. load balancing and
2. DNS (domain name systems).
Prof. Vijay M. Shekhat, CE Department | 2180712 – Cloud Infrastructure and Services 5
Unit-1 – Introduction to Cloud Technologies
Load balancing provides a single point of access to multiple servers that run behind it. A load balancer is
a network device that distributes network traffic among servers using specific load balancing algorithms.
DNS is a hierarchical naming system for computers, or any other naming devices that use IP addressing
for network identification – a DNS system associates domain names with IP addresses.
Issues of SaaS
Data security
When it comes to migrating traditional local software applications to a cloud based platform, data
security may be a problem.
When a computer and application is compromised the SaaS multi-tenant application supporting many
customers could be exposed to the hackers.
Any provider will promise that it will do the best in order for the data to be secure in any circumstances.
But just to make sure, you should ask about their infrastructure and application security.
Data control
Many businesses have no idea how their SaaS provider will secure their data or what backup procedures
will be applied when needed.
To avoid undesirable effects, before choosing a SaaS vendor, managers should research for providers
with good reputations and that the vendor has backup solutions which are precisely described in the
Service Level Agreement contract.
Data location
This means being permanently aware where exactly in the world your data is located.
Although the Federal Information Security Management Act in the USA requires customers to keep
sensitive data within the country, in virtualized systems, data can move dynamically from one country to
another.
Ask about the laws for your customers data in respect to where they are located.
1. Private Cloud
2. Community Cloud
3. Public Cloud
4. Hybrid Cloud
1. Private Cloud
2. Community Cloud
The cloud infrastructure is shared by several organizations and supports a specific community that has
shared concerns (e.g., mission, security requirements, policy, and compliance considerations).
Government departments, universities, central banks etc. often find this type of cloud useful.
Community cloud also has two possible scenarios:
3. Public Cloud
4. Hybrid Cloud
Eucalyptus
Eucalyptus is an open source software platform for implementing Infrastructure as a Service (IaaS) in a
private or hybrid cloud computing environment.
The Eucalyptus cloud platform pools together existing virtualized infrastructure to create cloud
resources for infrastructure as a service, network as a service and storage as a service.
The name Eucalyptus is an acronym for Elastic Utility Computing Architecture for Linking Your Programs
to Useful Systems.
Eucalyptus was founded out of a research project in the Computer Science Department at the University
of California, Santa Barbara, and became a for-profit business called Eucalyptus Systems in 2009.
Eucalyptus Systems announced a formal agreement with Amazon Web Services (AWS) in March 2012,
allowing administrators to move instances between a Eucalyptus private cloud and the Amazon Elastic
Compute Cloud (EC2) to create a hybrid cloud.
The partnership also allows Eucalyptus to work with Amazon’s product teams to develop unique AWS-
compatible features.
Eucalyptus features
Supports both Linux and Windows virtual machines (VMs).
Application program interface- (API) compatible with Amazon EC2 platform.
Compatible with Amazon Web Services (AWS) and Simple Storage Service (S3).
Works with multiple hypervisors including VMware, Xen and KVM.
Can be installed and deployed from source code or DEB and RPM packages.
Internal processes communications are secured through SOAP and WS-Security.
Multiple clusters can be virtualized as a single cloud.
Administrative features such as user and group management and reports.
Outages
Performance is a consistent challenge in cloud computing, particularly for businesses that rely on cloud
providers to help them run mission-critical applications. When a business moves to the cloud it becomes
dependent on the cloud provider, meaning that any outages suffered by the cloud provider also affect
the business.
The risk of outages in the cloud is not negligible—even the major players in cloud computing are
susceptible. In February 2017, an AWS Amazon S3 outage caused disruptions for many websites and
applications, and even sent them offline.
There is a need, therefore, for some kind of site recovery solution for data held in cloud-based services.
Disaster recovery as a service (DRaaS)—the replication and hosting of servers by a third party to provide
failover in the event of a man-made or natural catastrophe—is a way companies can maintain business
continuity even when disaster strikes.
Expertise
The success of any movement towards cloud adoption comes down to the expertise at your disposal.
The complexity of cloud technology and the sheer range of tools makes it difficult to keep up with the
options available for all your use cases.
Organizations need to strike a balance between having the right expertise and the cost of hiring
dedicated cloud specialists. The optimum solution to this challenge is to work with a trusted cloud
Managed Service Provider (MSP). Cloud MSPs have the manpower, tools and experience to manage
multiple and complex customer environments simultaneously. The MSP takes complete responsibility for
cloud processes and implementing them as the customer desires. This way, organizations can stay
focused on their business goals.
Cost Management
All the main cloud providers have quite detailed pricing plans for their services that explicitly define costs
of processing and storage data in the cloud. The problem is that cost management is often an issue
when using cloud services because of the sheer range of options available.
Businesses often waste money on unused workloads or unnecessarily expensive storage, and 26 percent
of respondents in this cloud survey cited cost management as a major challenge in the cloud. The
solution is for organizations to monitor their cloud usage in detail and constantly optimize their choice of
services, instances, and storage. You can monitor and optimize cloud implementation by using a cloud
cost management tool such as CloudHealth or consulting a cloud cost expert.
There are also some practical cost calculators available which clarify cloud costs, including Amazon’s
AWS Simple Monthly Calculator, and NetApp’s calculators for both AWS and Azure cloud storage.
Governance
Cloud governance, meaning the set of policies and methods used to ensure data security and privacy in
the cloud, is a huge challenge. Confusion often arises about who takes responsibility for data stored in
the cloud, who should be allowed use cloud resources without first consulting IT personnel, and how
employees handle sensitive data.
The only solution is for the IT department at your organization to adapt its existing governance and
control processes to incorporate the cloud and ensure everyone is on the same page. This way, proper
governance, compliance, and risk management can be enforced.
Click on Select a platform next to Predefined configuration, then select “Your Plateform”. Next, click on
the drop-down menu next to Environment type, then select Single instance.
Under Source, select the Upload your own option, then click Choose File to select the “Your-sample-app-
v1.zip” file we downloaded earlier.
Fill in the values for Environment name with “YourSampleApp-env”. For Environment URL, fill in a
globally unique value since this will be your public-facing URL; we will use “YourSampleApp-env” in this
tutorial, so please choose something different from this one. Lastly, fill Description with “Your Sample
App”. For the Environment URL, make sure to click Check availability to make sure that the URL is not
taken. Click Next to continue.
Check the box next to Create this environment inside a VPC. Click Next to continue.
On the Configuration Details step, you can set configuration options for the instances in your stack. Click
Next.
On the Environment Tags step, you can tag all the resources in your stack. Click Next.
On the VPC Configuration step, select the first AZ listed by checking the box under the EC2 column. Your
list of AZs may look different than the one shown as Regions can have different number of AZs. Click
Next.
At the Permissions step, leave everything to their default values, then click Next to continue. Then
review your environment configuration on the next screen and then click Launch to deploy your
application.
At the top of the page, you should see a URL field, with a value that contains the Environment URL you
specified in step 3. Click on this URL field, and you should see a Congratulations page.
Congratulations! You have successfully launched a sample PHP application using AWS Elastic Beanstalk.
Virtualization
Virtualization is changing the mindset from physical to logical.
Fig. : Virtualization
What virtualization means is creating more logical IT resources, called virtual systems, within one
physical system. That’s called system virtualization.
It most commonly uses the hypervisor for managing the resources for every virtual system. The
hypervisor is a software that can virtualize the hardware resources.
Benefits of Virtualization
More flexible and efficient allocation of resources.
Enhance development productivity.
It lowers the cost of IT infrastructure.
Remote access and rapid scalability.
High availability and disaster recovery.
Pay per use of the IT infrastructure on demand.
Enables running multiple operating system.
Types of Virtualization
1. Application Virtualization:
Application virtualization helps a user to have a remote access of an application from a server.
The server stores all personal information and other characteristics of the application but can still run on
a local workstation through internet.
Example of this would be a user who needs to run two different versions of the same software.
Technologies that use application virtualization are hosted applications and packaged applications.
2. Network Virtualization:
The ability to run multiple virtual networks with each has a separate control and data plan.
It co-exists together on top of one physical network.
It can be managed by individual parties that potentially confidential to each other.
Network virtualization provides a facility to create and provision virtual networks—logical switches,
routers, firewalls, load balancer, Virtual Private Network (VPN), and workload security within days or
even in weeks.
3. Desktop Virtualization:
Desktop virtualization allows the users’ OS to be remotely stored on a server in the data center.
It allows the user to access their desktop virtually, from any location by different machine.
Users who wants specific operating systems other than Windows Server will need to have a virtual
desktop.
Main benefits of desktop virtualization are user mobility, portability, and easy management of software
installation, updates and patches.
4. Storage Virtualization:
Storage virtualization is an array of servers that are managed by a virtual storage system.
The servers aren’t aware of exactly where their data is stored, and instead function more like worker
bees in a hive.
It makes managing storage from multiple sources to be managed and utilized as a single repository.
Storage virtualization software maintains smooth operations, consistent performance and a continuous
suite of advanced functions despite changes, break down and differences in the underlying equipment.
Full Virtualization
Virtual machine simulates hardware to allow an unmodified guest OS to be run in isolation.
There is two type of Full virtualizations in the enterprise market.
1. Software assisted full virtualization
2. Hardware-assisted full virtualization
On both full virtualization types, guest operating system’s source information will not be modified.
Paravirtualization
Paravirtualization works differently from the full virtualization.
It doesn’t need to simulate the hardware for the virtual machines.
The hypervisor is installed on a physical server (host) and a guest OS is installed into the environment.
Virtual guests aware that it has been virtualized, unlike the full virtualization (where the guest doesn’t
know that it has been virtualized) to take advantage of the functions.
In this virtualization method, guest source codes will be modified with sensitive information to
communicate with the host.
Guest Operating systems require extensions to make API calls to the hypervisor.
In full virtualization, guests will issue a hardware calls but in paravirtualization, guests will directly
communicate with the host (hypervisor) using the drivers.
Here is the lisf of products which supports paravirtualization.
o Xen
o IBM LPAR
o Oracle VM for SPARC (LDOM)
o Oracle VM for X86 (OVM)
OS level Virtualization
Operating system-level virtualization is widely used.
It also known as “containerization”.
Host Operating system kernel allows multiple user spaces also known as instance.
In OS-level virtualization, unlike other virtualization technologies, there will be very little or no overhead
since its uses the host operating system kernel for execution.
Oracle Solaris zone is one of the famous containers in the enterprise market.
Here is the list of other containers.
o Linux LCX
o Docker
o AIX WPAR
Virtual computing
Virtual computing refers to the use of a remote computer from a local computer where the actual
computer user is located.
For example, a user at a home computer could log in to a remote office computer (via the Internet or a
network) to perform job tasks.
Once logged in via special software, the remote computer can be used as though it were at the user's
location, allowing the user to perform tasks via the keyboard, mouse, or other tools.
Virtual Machine
A virtual machine (VM) is an operating system (OS) or application environment that is installed on
software, which reproduces dedicated hardware. The end user has the same experience on a virtual
machine as they would have on dedicated hardware.
2. Compute Optimized
This family includes the C1 and CC2 instance types, and is geared towards applications that benefit from
high compute power.
Compute-optimized VM types have a higher ratio of virtual CPUs to memory than other families but
share the NCs (Node Controllers) with non-optimized ones.
It is recommended to use these type if you are running any CPU-bound scale-out applications.
CC2 instances provide high core count (32 virtual CPUs) and support for cluster networking.
C1 instances are available in smaller sizes and are ideal for scaled-out applications at massive scale.
3. Memory Optimized
This family includes the CR1 and M2 VM types and is designed for memory-intensive applications.
It is recommended to use these VM types for performance-sensitive database, where your application is
memory-bound.
CR1 VM types provide more memory and faster CPU than do M2 types.
CR1 instances also support cluster networking for bandwidth intensive applications.
M2 types are available in smaller sizes, and are an excellent option for many memory-bound
applications.
4. Micro
This Micro family contains the T1 VM type.
The T1 micro provides a small amount of consistent CPU resources and allows you to increase CPU
capacity in short bursts when additional cycles are available.
It is recommended to use this type of VM for lower throughput applications like a proxy server or
administrative applications, or for low-traffic websites that occasionally require additional compute
cycles. It is not recommended for applications that require sustained CPU performance.
Load Balancing
In computing, load balancing improves the distribution of workloads across multiple computing
resources, such as computers, a computer cluster, network links, central processing units, or disk drives.
Hypervisors
It is the part of the private cloud that manages the virtual machines, i.e. it is the part (program) that
enables multiple operating systems to share the same hardware.
Each operating system could use all the hardware (processor, memory, etc.) if no other operating system
is on. That is the maximum hardware available to one operating system in the cloud.
Nevertheless, the hypervisor is what controls and allocates what portion of hardware resources each
operating system should get, in order every one of them to get what they need and not to disrupt each
other.
Machine Imaging
Machine imaging is a process that is used to achieve the goal of system portability, provision, and deploy
systems in the cloud through capturing the state of systems using a system image.
A system image makes a copy or a clone of the entire computer system inside a single file.
The image is made by using a program called system imaging program and can be used later to restore a
system image.
For example Amazon Machine Image (AMI) is a system image that is used in the cloud computing.
The Amazon Web Services uses AMI to store copies of a virtual machine.
An AMI is a file system image that contains an operating system, all device drivers, and any applications
and state information that the working virtual machine would have.
The AMI files are encrypted and compressed for security purpose and stored in Amazon S3 (Simple
Storage System) buckets as a set of 10MB chunks.
Machine imaging is mostly run on virtualization platform due to this it is also called as Virtual Appliances
and running virtual machines are called instances.
Because many users share clouds, the cloud helps you track information about images, such as
ownership, history, and so on.
The IBM SmartCloud Enterprise knows what organization you belong to when you log in.
You can choose whether to keep images private, exclusively for your own use, or to share with other
users in your organization.
If you are an independent software vendor, you can also add your images to the public catalog.
AWS History
The AWS platform was launched in July 2002.
In its early stages, the platform consisted of only a few disparate tools and services.
Then in late 2003, the AWS concept was publicly reformulated when Chris Pinkham and Benjamin Black
presented a paper describing a vision for Amazon's retail computing infrastructure that was completely
standardized, completely automated, and would rely extensively on web services for services such as
storage and would draw on internal work already underway.
Near the end of their paper, they mentioned the possibility of selling access to virtual servers as a
service, proposing the company could generate revenue from the new infrastructure investment.
In November 2004, the first AWS service launched for public usage: Simple Queue Service (SQS).
Thereafter Pinkham and lead developer Christopher Brown developed the Amazon EC2 service, with a
team in Cape Town, South Africa.
Amazon Web Services was officially re-launched on March 14, 2006, combining the three initial service
offerings of Amazon S3 cloud storage, SQS, and EC2.
The AWS platform finally provided an integrated suite of core online services, as Chris Pinkham and
Benjamin Black had proposed back in 2003, as a service offered to other developers, web sites, client-
side applications, and companies.
Andy Jassy, AWS founder and vice president in 2006, said at the time that Amazon S3 (one of the first
and most scalable elements of AWS) helps free developers from worrying about where they are going to
store data, whether it will be safe and secure, if it will be available when they need it, the costs
associated with server maintenance, or whether they have enough storage available.
Amazon S3 enables developers to focus on innovating with data, rather than figuring out how to store it.
In 2016 Jassy was promoted to CEO of the division.
Reflecting the success of AWS, his annual compensation in 2017 hit nearly $36 million.
In 2014, AWS launched its partner network entitled APN (AWS Partner Network) which is focused on
helping AWS-based companies grow and scale the success of their business with close collaboration and
best practices.
To support industry-wide training and skills standardization, AWS began offering a certification program
for computer engineers, on April 30, 2013, to highlight expertise in cloud computing.
In January 2015, Amazon Web Services acquired Annapurna Labs, an Israel-based microelectronics
company reputedly for US$350–370M.
James Hamilton, an AWS engineer, wrote a retrospective article in 2016 to highlight the ten-year history
of the online service from 2006 to 2016. As an early fan and outspoken proponent of the technology, he
had joined the AWS engineering team in 2008.
In January 2018, Amazon launched an auto scaling service on AWS.
In November 2018, AWS announced customized ARM cores for use in its servers.
Also in November 2018, AWS is developing ground stations to communicate with customer's satellites.
AWS Infrastructure
Amazon Web Services (AWS) is a global public cloud provider, and as such, it has to have a global
network of infrastructure to run and manage its many growing cloud services that support customers
around the world.
Now we’ll take a look at the components that make up the AWS Global Infrastructure.
1) Availability Zones (AZs)
2) Regions
3) Edge Locations
4) Regional Edge Caches
If you are deploying services on AWS, you’ll want to have a clear understanding of each of these
components, how they are linked, and how you can use them within your solution to YOUR maximum
benefit. Let’s take a closer look.
2) Regions
Region is a collection of availability zones that are geographically located close to one other.
This is generally indicated by AZs within the same city. AWS has deployed them across the globe to allow
its worldwide customer base to take advantage of low latency connections.
Each Region will act independently of the others, and each will contain at least two Availability Zones.
Example: if an organization based in London was serving customers throughout Europe, there would be
no logical sense to deploy services in the Sydney Region simply due to the latency response times for its
Prof. Vijay M. Shekhat, CE Department | 2180712 – Cloud Infrastructure and Services 2
Unit-3 – Introduction to AWS
customers. Instead, the company would select the region most appropriate for them and their customer
base, which may be the London, Frankfurt, or Ireland Region.
Having global regions also allows for compliance with regulations, laws, and governance relating to data
storage (at rest and in transit).
Example: you may be required to keep all data within a specific location, such as Europe. Having multiple
regions within this location allows an organization to meet this requirement.
Similarly to how utilizing multiple AZs within a region creates a level of high availability, the same can be
applied to utilizing multiple regions.
You may want to use multiple regions if you are a global organization serving customers in different
countries that have specific laws and governance about the use of data.
In this case, you could even connect different VPCs together in different regions.
The number of regions is increasing year after year as AWS works to keep up with the demand for cloud
computing services.
In July 2017, there are currently 16 Regions and 43 Availability Zones, with 4 Regions and 11 AZs
planned.
3) Edge Locations
Edge Locations are AWS sites deployed in major cities and highly populated areas across the globe. They
far outnumber the number of availability zones available.
While Edge Locations are not used to deploy your main infrastructures such as EC2 instances, EBS
storage, VPCs, or RDS resources like AZs, they are used by AWS services such as AWS CloudFront and
AWS Lambda@Edge (currently in Preview) to cache data and reduce latency for end user access by using
the Edge Locations as a global Content Delivery Network (CDN).
As a result, Edge Locations are primarily used by end users who are accessing and using your services.
For example, you may have your website hosted on EC2 instances and S3 (your origin) within the Ohio
region with a configured CloudFront distribution associated. When a user accesses your website from
Europe, they would be re-directed to their closest Edge Location (in Europe) where cached data could be
read on your website, significantly reducing latency.
A Regional Edge Cache has a larger cache-width than each of the individual Edge Locations, and because
data expires from the cache at the Edge Locations, the data is retained at the Regional Edge Caches.
Therefore, when data is requested at the Edge Location that is no longer available, the Edge Location can
retrieve the cached data from the Regional Edge Cache instead of the Origin servers, which would have a
higher latency.
AWS Services
This AWS services list covers the huge catalog of services offered by Amazon Web Services (AWS). These
services range from the core compute products like EC2 to newer releases like AWS Deepracer for
machine learning.
There are currently 190 unique services provided by AWS which divided into 24 categories which are
listed below:
o Analytics
o Application Integration
o AR & VR
o AWS Cost Management
o Blockchain
o Business Applications
o Compute
o Customer Engagement
o Database
o Developer Tools
o End User Computing
o Game Tech
o Internet of Things
o Machine Learning
o Management & Governance
o Media Services
o Migration & Transfer
o Mobile
o Networking & Content Delivery
o Robotics
o Satellite
o Security, Identity, & Compliance
o Storage
o Quantum Technologies
AWS Ecosystem
In general a cloud ecosystem is a complex system of interdependent components that all work together
to enable cloud services. In cloud computing, the ecosystem consists of hardware and software as well
as cloud customers, cloud engineers, consultants, integrators and partners.
Amazon Web Services (AWS) is the market leader in IaaS (Infrastructure-as-a-Service) and PaaS
(Platform-as-a-Service) for cloud ecosystems, which can be combined to create a scalable cloud
application without worrying about delays related to infrastructure provisioning (compute, storage, and
network) and management.
With AWS you can select the specific solutions you need, and only pay for exactly what you use,
resulting in lower capital expenditure and faster time to value without sacrificing application
performance or user experience.
New and existing companies can build their digital infrastructure partially or entirely in the cloud with
AWS, making the on premise data center a thing of the past.
The AWS cloud ensures infrastructure reliability, compliance with security standards, and the ability to
instantly grow or shrink your infrastructure to meet your needs and maximize your budget, all without
upfront investment in equipment.
API Deployment
A point-in-time snapshot of your API Gateway API. To be available for clients to use, the deployment must
be associated with one or more API stages.
API Developer
Your AWS account that owns an API Gateway deployment (for example, a service provider that also
supports programmatic access).
API Endpoint
A hostname for an API in API Gateway that is deployed to a specific Region. The hostname is of the form
{api-id}.execute-api.{region}.amazonaws.com. The following types of API endpoints are supported:
o Edge-optimized API endpoint
The default hostname of an API Gateway API that is deployed to the specified Region while using a
CloudFront distribution to facilitate client access typically from across AWS Regions. API requests
are routed to the nearest CloudFront Point of Presence (POP), which typically improves connection
time for geographically diverse clients.
o Private API endpoint
An API endpoint that is exposed through interface VPC endpoints and allows a client to securely
access private API resources inside a VPC. Private APIs are isolated from the public internet, and
they can only be accessed using VPC endpoints for API Gateway that have been granted access.
o Regional API endpoint
The host name of an API that is deployed to the specified Region and intended to serve clients,
such as EC2 instances, in the same AWS Region. API requests are targeted directly to the Region-
specific API Gateway API without going through any CloudFront distribution. For in-Region
requests, a Regional endpoint bypasses the unnecessary round trip to a CloudFront distribution.
API Key
An alphanumeric string that API Gateway uses to identify an app developer who uses your REST or
WebSocket API. API Gateway can generate API keys on your behalf, or you can import them from a CSV
file. You can use API keys together with Lambda authorizers or usage plans to control access to your APIs.
WebSocket Connection
API Gateway maintains a persistent connection between clients and API Gateway itself. There is no
persistent connection between API Gateway and backend integrations such as Lambda functions. Backend
services are invoked as needed, based on the content of messages received from clients.
Web Services
You can choose from a couple of different schools of thought for how web services should be delivered.
The older approach, SOAP (short for Simple Object Access Protocol), had widespread industry support,
complete with a comprehensive set of standards.
Those standards were too comprehensive, unfortunately. The people designing SOAP set it up to be
extremely flexible —it can communicate across the web, e-mail, and private networks.
To ensure security and manageability, a number of supporting standards that integrate with SOAP were
also defined.
SOAP is based on a document encoding standard known as Extensible Markup Language (XML, for short),
and the SOAP service is defined in such a way that users can then leverage XML no matter what the
underlying communication network is.
For this system to work, though, the data transferred by SOAP (commonly referred to as the payload) also
needs to be in XML format.
Notice a pattern here? The push to be comprehensive and flexible (or, to be all things to all people) plus
the XML payload requirement meant that SOAP ended up being quite complex, making it a lot of work to
use properly.
As you might guess, many IT people found SOAP daunting and, consequently, resisted using it.
About a decade ago, a doctoral student defined another web services approach as part of his thesis: REST,
or Representational State Transfer.
REST, which is far less comprehensive than SOAP, aspires to solve fewer problems.
It doesn’t address some aspects of SOAP that seemed important but that, in retrospect, made it more
complex to use — security, for example.
The most important aspect of REST is that it’s designed to integrate with standard web protocols so that
REST services can be called with standard web verbs and URLs.
For example, a valid REST call looks like this:
http://search.examplecompany.com/CompanyDirectory/EmployeeInfo?empname=BernardGolden
That’s all it takes to make a query to the REST service of examplecompany to see my personnel
information.
The HTTP verb that accompanies this request is GET, asking for information to be returned.
To delete information, you use the verb DELETE.
To insert my information, you use the verb POST.
To update my information, you use the verb PUT.
For the POST and PUT actions, additional information would accompany the empname and be separated
by an ampersand (&) to indicate another argument to be used by the service.
REST imposes no particular formatting requirements on the service payloads. In this respect, it differs from
SOAP, which requires XML.
For simple interactions, a string of bytes is all you need for the payload. For more complex interactions
(say, in addition to returning my employee information, I want to place a request for the employee
information of all employees whose names start with G), the encoding convention JSON is used.
As you might expect, REST’s simpler use model, its alignment with standard web protocols and verbs, and
its less restrictive payload formatting made it catch on with developers like a house on fire.
AWS originally launched with SOAP support for interactions with its API, but it has steadily deprecated its
SOAP interface in favor of REST.
For example, if your bucket is in the South America (São Paulo) Region, you must use the http://s3-sa-
east-1.amazonaws.com/bucket endpoint. If your bucket is in the US East (N. Virginia) Region, you must
use the http://s3.amazonaws.com/bucket endpoint.
Matching Operations
An inbound service interface operation matches an outbound service interface operation (and the other
way around) if the following conditions are met:
o Both operations must have the name mode (asynchronous or synchronous).
o Both operations must have the same Operation Pattern.
o The message type for the request, which must be referenced by each operation, must have the same
name and same XML Namespace. The names of the operations may differ. The same applies for the
response with synchronous communication.
o If the inbound service interface operation references a fault message type, the outbound service
interface operation must also reference a fault message type with the same name and XML
Namespace.
o The data types of the message types, which the outbound service interface for the request message
references (and, if necessary, for the response and fault message) must be compatible with the
corresponding inbound service interface data types.
The data types are compared using the same method as other objects: The structures are compatible if
they contain the same fields (elements and attributes) and if these fields have compatible types,
frequencies, details, and default values.
There are however a few restraints, for example the target structure can contain attributes or elements
that do not appear in the outbound structure, but if these are not required and where the frequency is
optional or prohibited (attributes) or minOccurs=0 (elements).
o The data structures compared must both be correct. For example, not all correct facets are skipped or
considered in the compatibility check.
o Some XSD schema language elements that can appear in a reference to an external message in the data
structure are not supported. Therefore, the elements redefine and any, for example, as well as the
attributes blockDefault, finalDefault, and substitutionGroup.
o The comparison of structures is, for example, restricted to the following:
The details white Space and pattern are not checked
If the facet pattern is used for the outbound structure field, all the other details are not checked.
If the order of sub elements if different between the outbound and target field, a warning is
displayed.
o Create Buckets – Create and name a bucket that stores data. Buckets are the fundamental container
in Amazon S3 for data storage.
o Store data in Buckets – Store an infinite amount of data in a bucket. Upload as many objects as you
like into an Amazon S3 bucket. Each object can contain up to 5 TB of data. Each object is stored and
retrieved using a unique developer-assigned key.
o Download data – Download your data any time you like or allow others to do the same.
o Permissions – Grant or deny access to others who want to upload or download data into your Amazon
S3 bucket.
o Standard interfaces – Use standards-based REST and SOAP interfaces designed to work with any
Internet-development toolkit.
Amazon Glacier enables customers to offload the administrative burdens of operating and scaling storage
to AWS, so that they don’t have to worry about capacity planning, hardware provisioning, data replication,
hardware failure detection and repair, or time-consuming hardware migrations.
Amazon Glacier enables any business or organization to easily and cost effectively retain data for months,
years, or decades.
With Amazon Glacier, customers can now cost effectively retain more of their data for future analysis or
reference, and they can focus on their business rather than operating and maintaining their storage
infrastructure.
Customers seeking compliance storage can deploy compliance controls using Vault Lock to meet
regulatory and compliance archiving requirements.
4. Low Cost
Amazon Glacier is designed to be the lowest cost AWS object storage class, allowing you to archive large
amounts of data at a very low cost. This makes it feasible to retain all the data you want for use cases like
data lakes, analytics, IoT, machine learning, compliance, and media asset archiving. You pay only for what
you need, with no minimum commitments or up-front fees.
6. Query in Place
Amazon Glacier is the only cloud archive storage service that allows you to query data in place and retrieve
only the subset of data you need from within an archive. Amazon Glacier Select helps you reduce the total
cost of ownership by extending your data lake into cost-effective archive storage.
This identity is called the AWS account root user and is accessed by signing in with the email address and
password that you used to create the account.
IAM Users
An IAM user is an entity that you create in AWS.
The IAM user represents the person or service who uses the IAM user to interact with AWS.
A primary use for IAM users is to give people the ability to sign in to the AWS Management Console for
interactive tasks and to make programmatic requests to AWS services using the API or CLI.
A user in AWS consists of a name, a password to sign into the AWS Management Console, and up to two
access keys that can be used with the API or CLI.
When you create an IAM user, you grant it permissions by making it a member of a group that has
appropriate permission policies attached (recommended), or by directly attaching policies to the user.
You can also clone the permissions of an existing IAM user, which automatically makes the new user a
member of the same groups and attaches all the same policies.
IAM Groups
An IAM group is a collection of IAM users.
You can use groups to specify permissions for a collection of users, which can make those permissions
easier to manage for those users.
For example, you could have a group called Admins and give that group the types of permissions that
administrators typically need.
Any user in that group automatically has the permissions that are assigned to the group. If a new user joins
your organization and should have administrator privileges, you can assign the appropriate permissions
by adding the user to that group.
Similarly, if a person changes jobs in your organization, instead of editing that user's permissions, you can
remove him or her from the old groups and add him or her to the appropriate new groups.
Note that a group is not truly an identity because it cannot be identified as a Principal in a resource-based
or trust policy. It is only a way to attach policies to multiple users at one time.
IAM Roles
An IAM role is very similar to a user, in that it is an identity with permission policies that determine what
the identity can and cannot do in AWS.
However, a role does not have any credentials (password or access keys) associated with it.
Instead of being uniquely associated with one person, a role is intended to be assumable by anyone who
needs it.
An IAM user can assume a role to temporarily take on different permissions for a specific task.
A role can be assigned to a federated user who signs in by using an external identity provider instead of
IAM.
AWS uses details passed by the identity provider to determine which role is mapped to the federated user.
Temporary Credentials
Temporary credentials are primarily used with IAM roles, but there are also other uses.
You can request temporary credentials that have a more restricted set of permissions than your standard
IAM user.
This prevents you from accidentally performing tasks that are not permitted by the more restricted
credentials.
A benefit of temporary credentials is that they expire automatically after a set period of time.
You have control over the duration that the credentials are valid.
Security Policies
You manage access in AWS by creating policies and attaching them to IAM identities (users, groups of
users, or roles) or AWS resources.
A policy is an object in AWS that, when associated with an identity or resource, defines their permissions.
AWS evaluates these policies when an IAM principal (user or role) makes a request.
Permissions in the policies determine whether the request is allowed or denied.
Most policies are stored in AWS as JSON documents.
AWS supports six types of policies.
IAM policies define permissions for an action regardless of the method that you use to perform the
operation.
For example, if a policy allows the GetUser action, then a user with that policy can get user information
from the AWS Management Console, the AWS CLI, or the AWS API.
When you create an IAM user, you can choose to allow console or programmatic access. If console access
is allowed, the IAM user can sign in to the console using a user name and password. Or if programmatic
access is allowed, the user can use access keys to work with the CLI or API.
Policy Types
Identity-based policies – Attach managed and inline policies to IAM identities (users, groups to which
users belong, or roles). Identity-based policies grant permissions to an identity.
Resource-based policies – Attach inline policies to resources. The most common examples of resource-
based policies are Amazon S3 bucket policies and IAM role trust policies. Resource-based policies grant
permissions to the principal that is specified in the policy. Principals can be in the same account as the
resource or in other accounts.
Permissions boundaries – Use a managed policy as the permissions boundary for an IAM entity (user or
role). That policy defines the maximum permissions that the identity-based policies can grant to an entity,
but does not grant permissions. Permissions boundaries do not define the maximum permissions that a
resource-based policy can grant to an entity.
Organizations SCPs – Use an AWS Organizations service control policy (SCP) to define the maximum
permissions for account members of an organization or organizational unit (OU). SCPs limit permissions
that identity-based policies or resource-based policies grant to entities (users or roles) within the account,
but do not grant permissions.
Access control lists (ACLs) – Use ACLs to control which principals in other accounts can access the resource
to which the ACL is attached. ACLs are similar to resource-based policies, although they are the only policy
type that does not use the JSON policy document structure. ACLs are cross-account permissions policies
that grant permissions to the specified principal. ACLs cannot grant permissions to entities within the same
account.
Session policies – Pass advanced session policies when you use the AWS CLI or AWS API to assume a role
or a federated user. Session policies limit the permissions that the role or user's identity-based policies
grant to the session. Session policies limit permissions for a created session, but do not grant permissions.
For more information, see Session Policies.
IAM Abilities/Features
Shared access to the AWS account. The main feature of IAM is that it allows you to create separate
usernames and passwords for individual users or resources and delegate access.
Granular permissions. Restrictions can be applied to requests. For example, you can allow the user to
download information, but deny the user the ability to update information through the policies.
Multifactor authentication (MFA). IAM supports MFA, in which users provide their username and
password plus a one-time password from their phone—a randomly generated number used as an
additional authentication factor.
Identity Federation. If the user is already authenticated, such as through a Facebook or Google account,
IAM can be made to trust that authentication method and then allow access based on it. This can also be
used to allow users to maintain just one password for both on-premises and cloud environment work.
Free to use. There is no additional charge for IAM security. There is no additional charge for creating
additional users, groups or policies.
PCI DSS compliance. The Payment Card Industry Data Security Standard is an information security
standard for organizations that handle branded credit cards from the major card schemes. IAM complies
with this standard.
Password policy. The IAM password policy allows you to reset a password or rotate passwords remotely.
You can also set rules, such as how a user should pick a password or how many attempts a user may
make to provide a password before being denied access.
IAM Limitations
Names of all IAM identities and IAM resources can be alphanumeric. They can include common characters
such as plus (+), equal (=), comma (,), period (.), at (@), underscore (_), and hyphen (-).
Names of IAM identities (users, roles, and groups) must be unique within the AWS account. So you can't
have two groups named DEVELOPERS and developers in your AWS account.
AWS account ID aliases must be unique across AWS products in your account. It cannot be a 12 digit
number.
You cannot create more than 100 groups in an AWS account.
You cannot create more than 5000 users in an AWS account. AWS recommends the use of temporary
security credentials for adding a large number of users in an AWS account.
You cannot create more than 500 roles in an AWS account.
An IAM user cannot be a member of more than 10 groups.
An IAM user cannot be assigned more than 2 access keys.
An AWS account cannot have more than 1000 customer managed policies.
You cannot attach more than 10 managed policies to each IAM entity (user, groups, or roles).
You cannot store more than 20 server certificates in an AWS account.
You cannot have more than 100 SAML providers in an AWS account.
A policy name should not exceed 128 characters.
An alias for an AWS account ID should be between 3 and 63 characters.
A username and role name should not exceed 64 characters.
A group name should not exceed 128 characters.
Amazon has many years of experience in designing, constructing, and operating large-scale data centers.
This experience has been applied to the AWS platform and infrastructure.
AWS data centers are housed in facilities that are not branded as AWS facilities.
Physical access is strictly controlled both at the perimeter and at building ingress points by professional
security staff utilizing video surveillance, intrusion detection systems, and other electronic means.
Authorized staff must pass two-factor authentication a minimum of two times to access data center floors.
All visitors are required to present identification and are signed in and continually escorted by authorized
staff.
AWS only provides data center access and information to employees and contractors who have a
legitimate business need for such privileges.
When an employee no longer has a business need for these privileges, his or her access is immediately
revoked, even if they continue to be an employee of Amazon or Amazon Web Services.
All physical access to data centers by AWS employees is logged and audited routinely.
Power
The data center electrical power systems are designed to be fully redundant and maintainable without
impact to operations, 24 hours a day, and seven days a week.
Uninterruptible Power Supply (UPS) units provide back-up power in the event of an electrical failure for
critical and essential loads in the facility.
Data centers use generators to provide back-up power for the entire facility.
Management
AWS monitors electrical, mechanical, and life support systems and equipment so that any issues are
immediately identified.
Preventative maintenance is performed to maintain the continued operability of equipment.
Since the private key is not stored by Amazon, it’s advisable to store it in a secure place as anyone who
has this private key can log in on your behalf.
Dark Web
The dark web is a general term for the seedier corners of the web, where people can interact online
without worrying about the watchful eye of the authorities.
Usually, these sites are guarded by encryption mechanisms such as Tor that allow users to visit them
anonymously.
But there are also sites that don't rely on Tor, such as password-protected forums where hackers trade
secrets and stolen credit card numbers, that can also be considered part of the dark web.
People use the dark web for a variety of purposes: buying and selling drugs, discussing hacking techniques
and selling hacking services and so forth.
It's important to remember that the technologies used to facilitate "dark web" activities aren't inherently
good or bad.
The same technologies used by drug dealers to hide their identity can also be used by authorized informers
to securely pass information to government agencies.
Advantages of EC2
In less than 10 minutes you can rent a slice of Amazon’s vast cloud network and put those computing
resources to work on anything from data science to bitcoin mining.
EC2 offers a number of benefits and advantages over alternatives. Most notably:
Affordability
EC2 allows you to take advantage of Amazon’s enormous scale.
You can pay a very low rate for the resources you use. The smallest EC2 instance can be rented for as little
as $.0058 per hour which works out to about $4.18 per month. Of course, instances with more resources
are more expensive but this gives you a sense of how affordable EC2 instances are.
With EC2 instances, you’re only paying for what you use in terms of compute hours and bandwidth so
there’s little wasted expense.
Ease of use
Amazon’s goal with EC2 was to make accessing compute resources low friction and, by and large, they’ve
succeeded.
Launching an instance is simply a matter of logging into the AWS Console, selecting your operating system,
instance type, and storage options.
At most, it’s a 10 minute process and there aren’t any major technical barriers preventing anyone from
spinning up an instance, though it may take some technical knowledge to leverage those resources after
launch.
Scalability
You can easily add EC2 instances as needed, creating your own private cloud of computer resources that
perfectly matches your needs.
Here at Pagely a common configuration is an EC2 instance to run a WordPress app, an instance to run RDS
(a database service), and an EBS so that data can easily be moved and shared between instances as they’re
added.
AWS offers built-in, rules-based auto scaling so that you can automatically turn instances on or off based
on demand.
This helps you ensure that you’re never wasting resources but you also have enough resources available
to do the job.
Integration
Perhaps the biggest advantage of EC2, and something no competing solution can claim, is its native
integration with the vast ecosystem of AWS services.
Currently there are over 170 services. No other cloud network can claim the breadth, depth, and flexibility
AWS can.
Auto Scaling
AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady,
predictable performance at the lowest possible cost.
Using AWS Auto Scaling, it’s easy to setup application scaling for multiple resources across multiple
services in minutes.
The service provides a simple, powerful user interface that lets you build scaling plans for resources
including Amazon EC2 instances and Spot Fleets, Amazon ECS tasks, Amazon DynamoDB tables and
indexes, and Amazon Aurora Replicas.
AWS Auto Scaling makes scaling simple with recommendations that allow you to optimize performance,
costs, or balance between them.
If you’re already using Amazon EC2 Auto Scaling to dynamically scale your Amazon EC2 instances, you can
now combine it with AWS Auto Scaling to scale additional resources for other AWS services.
With AWS Auto Scaling, your applications always have the right resources at the right time.
It’s easy to get started with AWS Auto Scaling using the AWS Management Console, Command Line
Interface (CLI), or SDK.
AWS Auto Scaling is available at no additional charge. You pay only for the AWS resources needed to run
your applications and Amazon CloudWatch monitoring fees.
Secure
Elastic Load Balancing works with Amazon Virtual Private Cloud (VPC) to provide robust security features,
including integrated certificate management and SSL decryption. Together, they give you the flexibility to
centrally manage SSL settings and offload CPU intensive workloads from your applications.
Elastic
Elastic Load Balancing is capable of handling rapid changes in network traffic patterns. Additionally, deep
integration with Auto Scaling ensures sufficient application capacity to meet varying levels of application
load without requiring manual intervention.
Flexible
Elastic Load Balancing also allows you to use IP addresses to route requests to application targets. This
offers you flexibility in how you virtualize your application targets, allowing you to host more applications
on the same instance. This also enables these applications to have individual security groups and use the
same network port to further simplify inter-application communication in microservices based
architecture.
your applications, uncovering issues and identifying performance bottlenecks in your application stack at
the granularity of an individual request.
AMIs
An Amazon Machine Image (AMI) provides the information required to launch an instance.
You must specify an AMI when you launch an instance.
You can launch multiple instances from a single AMI when you need multiple instances with the same
configuration.
You can use different AMIs to launch instances when you need instances with different configurations.
An AMI includes the following:
o One or more EBS snapshots, or, for instance-store-backed AMIs, a template for the root volume of the
instance (for example, an operating system, an application server, and applications).
o Launch permissions that control which AWS accounts can use the AMI to launch instances.
o A block device mapping that specifies the volumes to attach to the instance when it's launched.
Using an AMI
Fig. : The AMI lifecycle (create, register, launch, copy, and deregister)
Multi Tenancy
In cloud computing, multi tenancy means that multiple customers of a cloud vendor are using the same
computing resources.
Despite the fact that they share resources, cloud customers aren't aware of each other, and their data is
kept totally separate.
Multi tenancy is a crucial component of cloud computing; without it, cloud services would be far less
practical.
Multitenant architecture is a feature in many types of public cloud computing, including IaaS, PaaS, SaaS,
containers, and server less computing.
To understand multi tenancy, think of how banking works.
Multiple people can store their money in one bank, and their assets are completely separate even though
they're stored in the same place.
Customers of the bank don't interact with each other, don't have access to other customers' money, and
aren't even aware of each other.
Similarly, in public cloud computing, customers of the cloud vendor use the same infrastructure – the same
servers, typically – while still keeping their data and their business logic separate and secure.
The classic definition of multi tenancy was a single software instance that served multiple users, or
tenants.
However, in modern cloud computing, the term has taken on a broader meaning, referring to shared cloud
infrastructure instead of just a shared software instance.
When a consumer subscribes to a product offering, they agree to the pricing and terms and conditions set
for the offer.
The product can be free to use or it can have an associated charge.
The charge becomes part of your AWS bill, and after you pay, AWS Marketplace pays the seller.
Products can take many forms. For example, a product can be offered as an Amazon Machine Image (AMI)
that is instantiated using your AWS account.
The product can also be configured to use AWS CloudFormation templates for delivery to the consumer.
The product can also be software as a service (SaaS) offerings from an ISV, web ACL, set of rules, or
conditions for AWS WAF.
Software products can be purchased at the listed price using the ISV’s standard end user license agreement
(EULA) or offered with customer pricing and EULA.
Products can also be purchased under a contract with specified time or usage boundaries.
After the product subscriptions are in place, the consumer can copy the product to their AWS Service
Catalog to manage how the product is accessed and used in the consumer’s organization.
Route 53 will only respond to queries for these names when the queries originate from within the VPC(s)
that you authorize.
Using custom internal DNS names (rather than IP addresses or AWS-provided names such as ec2-10-1-2-
3.us-west-2.compute.amazonaws.com) has a variety of benefits, for example, being able to flip from one
database to another just by changing the mapping of a domain name such as internal.example.com to
point to a new IP address.
Route 53 also supports split-view DNS, so you can configure public and private hosted zones to return
different external and internal IP addresses for the same domain names.
(iv) Fast
Amazon RDS supports the most demanding database applications. You can choose between two SSD-
backed storage options: one optimized for high-performance OLTP applications, and the other for cost-
effective general-purpose use. In addition, Amazon Aurora provides performance on par with commercial
databases at 1/10th the cost.
(v) Secure
Amazon RDS makes it easy to control network access to your database. Amazon RDS also lets you run your
database instances in Amazon Virtual Private Cloud (Amazon VPC), which enables you to isolate your
database instances and to connect to your existing IT infrastructure through an industry-standard
encrypted IPsec VPN. Many Amazon RDS engine types offer encryption at rest and encryption in transit.
(vi) Inexpensive
You pay very low rates and only for the resources you actually consume. In addition, you benefit from the
option of On-Demand pricing with no up-front or long-term commitments, or even lower hourly rates via
Reserved Instance pricing.
DynamoDB
Amazon DynamoDB -- also known as Dynamo Database or DDB -- is a fully managed NoSQL database
service provided by Amazon Web Services. DynamoDB is known for low latencies and scalability.
According to AWS, DynamoDB makes it simple and cost-effective to store and retrieve any amount of data,
as well as serve any level of request traffic.
All data items are stored on solid-state drives, which provide high I/O performance and can more
efficiently handle high-scale requests.
An AWS user interacts with the service by using the AWS Management Console or a DynamoDB API.
DynamoDB uses a NoSQL database model, which is nonrelational, allowing documents, graphs and
columnar among its data models.
A user stores data in DynamoDB tables, then interacts with it via GET and PUT queries, which are read and
write operations, respectively.
DynamoDB supports basic CRUD operations and conditional operations. Each DynamoDB query is
executed by a primary key identified by the user, which uniquely identifies each item.
Security
Amazon DynamoDB offers Fine-Grained Access Control (FGAC) for an administrator to protect data in a
table.
The admin or table owner can specify who can access which items or attributes in a table and what actions
that person can perform.
FGAC is based on the AWS Identity and Access Management service, which manages credentials and
permissions.
As with other AWS products, the cloud provider recommends a policy of least privilege when granting
access to items and attributes.
An admin can view usage metrics for DynamoDB with Amazon CloudWatch.
Advantages of DynamoDB
Performance at scale
DynamoDB supports some of the world’s largest scale applications by providing consistent, single-digit
millisecond response times at any scale.
You can build applications with virtually unlimited throughput and storage.
DynamoDB global tables replicate your data across multiple AWS Regions to give you fast, local access to
data for your globally distributed applications.
For use cases that require even faster access with microsecond latency, DynamoDB Accelerator (DAX)
provides a fully managed in-memory cache.
No servers to manage
DynamoDB is server less with no servers to provision, patch, or manage and no software to install,
maintain, or operate.
DynamoDB automatically scales tables up and down to adjust for capacity and maintain performance.
Availability and fault tolerance are built in, eliminating the need to architect your applications for these
capabilities.
DynamoDB provides both provisioned and on-demand capacity modes so that you can optimize costs by
specifying capacity per workload, or paying for only the resources you consume.
Enterprise ready
DynamoDB supports ACID transactions to enable you to build business-critical applications at scale.
DynamoDB encrypts all data by default and provides fine-grained identity and access control on all your
tables.
You can create full backups of hundreds of terabytes of data instantly with no performance impact to your
tables, and recover to any point in time in the preceding 35 days with no downtime.
DynamoDB is also backed by a service level agreement for guaranteed availability.
ElastiCache
ElastiCache is a web service that makes it easy to set up, manage, and scale a distributed in-memory data
store or cache environment in the cloud.
It provides a high-performance, scalable, and cost-effective caching solution, while removing the
complexity associated with deploying and managing a distributed cache environment.
With ElastiCache, you can quickly deploy your cache environment, without having to provision hardware
or install software.
You can choose from Memcached or Redis protocol-compliant cache engine software, and let ElastiCache
perform software upgrades and patch management for you.
For enhanced security, ElastiCache can be run in the Amazon Virtual Private Cloud (Amazon VPC)
environment, giving you complete control over network access to your clusters.
With just a few clicks in the AWS Management Console, you can add or remove resources such as nodes,
clusters, or read replicas to your ElastiCache environment to meet your business needs and application
requirements.
Existing applications that use Memcached or Redis can use ElastiCache with almost no modification.
Your applications simply need to know the host names and port numbers of the ElastiCache nodes that
you have deployed.
The ElastiCache Auto Discovery feature for Memcached lets your applications identify all of the nodes in
a cache cluster and connect to them, rather than having to maintain a list of available host names and port
numbers.
In this way, your applications are effectively insulated from changes to node membership in a cluster.
ElastiCache has multiple features to enhance reliability for critical production deployments:
o Automatic detection and recovery from cache node failures.
o Multi-AZ with Automatic Failover of a failed primary cluster to a read replica in Redis clusters that
support replication (called replication groups in the ElastiCache API and AWS CLI.
o Flexible Availability Zone placement of nodes and clusters.
o Integration with other AWS services such as Amazon EC2, Amazon CloudWatch, AWS CloudTrail, and
Amazon SNS to provide a secure, high-performance, managed in-memory caching solution.
ElastiCache Nodes
A node is the smallest building block of an ElastiCache deployment.
A node can exist in isolation from or in some relationship to other nodes.
A node is a fixed-size chunk of secure, network-attached RAM.
Each node runs an instance of the engine and version that was chosen when you created your cluster.
If necessary, you can scale the nodes in a cluster up or down to a different instance type.
Every node within a cluster is the same instance type and runs the same cache engine.
Each cache node has its own Domain Name Service (DNS) name and port.
Multiple types of cache nodes are supported, each with varying amounts of associated memory.
You can purchase nodes on a pay-as-you-go basis, where you only pay for your use of a node.
Or you can purchase reserved nodes at a much-reduced hourly rate.
If your usage rate is high, purchasing reserved nodes can save you money.
Redshift
Perhaps one of the most exciting outcomes of the public cloud was addressing the shortcomings of
traditional enterprise data warehouse (EDW) storage and processing. The fast provisioning, commodity
costs, infinite scale, and pay-as-you-grow pricing of public cloud are a natural fit for EDW needs, providing
even the smallest of users the ability to now get valuable answers to business intelligence (BI) questions.
Amazon Redshift is one such system built to address EDW needs, and it boasts low costs, an easy SQL-
based access model, easy integration to other Amazon Web Services (AWS) solutions, and most
importantly, high query performance.
Amazon Redshift gets its name from the astronomical phenomenon noticed by Hubble, which explained
the expansion of the universe. By adopting the Amazon Redshift moniker, AWS wanted to relay to
customers that the service was built to handle the perpetual expansion of their data.
Customers can start with a “cluster” as small as a single node (acting as both leader and follower), and for
the smallest supported instance type (a DW2), that could be as low cost as $0.25/hour or about
$180/month. By using “Reservations” (paying an up-front fee in exchange for a lower hourly running cost)
for the underlying instances, Amazon Redshift can cost as little as $1,000/TB/year — upwards of one-fifth
to one-tenth of the cost of a traditional EDW.
Because Amazon Redshift provides native Open Database Connectivity (ODBC) and Database Connectivity
(JDBC) connectivity (in addition to PostgresSQL driver support), most third-party BI tools (like Tableu,
Qlikview, and MicroStrategy) work right out of the box. Amazon Redshift also uses the ubiquitous
Structured Query Language (SQL) language for queries, ensuring that your current resources can quickly
and easily become productive with the technology.
Amazon Redshift was custom designed from the ParAccel engine — an analytic database which used
columnar storage and parallel processing to achieve very fast I/O.
Columns of data in Amazon Redshift are stored physically adjacent on disk, meaning that queries and scans
on those columns (common in online analytical processing [OLAP] queries) run very fast.
Additionally, Amazon Redshift uses 10GB Ethernet interconnects, and specialized EC2 instances (with
between three and 24 spindles per node) to achieve high throughput and low latency.
For even faster queries, Amazon Redshift allows customers to use column-level compression to both
greatly reduce the amount of data that needs stored, and reduce the amount of disk I/O.
Amazon Redshift, like many of AWS’s most popular services, is also fully managed, meaning that low-level,
time-consuming administrative tasks like OS patching, backups, replacing failed hardware, and software
upgrades are handled automatically and transparently.
With Amazon Redshift, users simply provision a cluster, load it with their data, and begin executing
queries. All data is continuously, incrementally, automatically backed up in the highly durable S3, and
enabling disaster recovery across regions can be accomplished with just a few clicks.
Spinning a cluster up can be as simple as a few mouse clicks, and as fast as a few minutes.
A very exciting aspect of Amazon Redshift, and something that is not possible in traditional EDWs, is the
ability to easily scale a provisioned cluster up and down.
In Amazon Redshift, this scaling is transparent to the customer—when a resize is requested, data is copied
in parallel from the source cluster (which continues to function in read-only mode) to a new cluster, and
once all data is live migrated, DNS is flipped to the new cluster and the old cluster is de-provisioned.
This allows customers to easily scale up and down, and each scaling event nicely re-stripes the data across
the new cluster for a balanced workload.
Amazon Redshift offers mature, native, and tunable security. Clusters can be deployed into a Virtual
Private Cloud (VPC), and encryption of data is supported via hardware accelerated AES-256 (for data at
rest) and SSL (for data on the wire).
Compliance teams will be pleased to learn that users can manage their own encryption keys via AWS’s
Hardware Security Module (HSM) service, and that Amazon Redshift provides a full audit trail of all SQL
connection attempts, queries, and modifications of the cluster.
Redshift gives you an option to use Dense Compute nodes which are SSD based data warehouses. Using
this you can run most complex queries in very less time.
High Performance
As discussed in the previous point, Redshift gains high performance using massive parallelism, efficient
data compression, query optimization, and distribution.
MPP enables Redshift to parallelize data loading, backup and restore operation. Furthermore, queries that
you execute get distributed across multiple nodes.
Redshift is a columnar storage database, which is optimized for huge and repetitive type of data. Using
columnar storage, reduces the I/O operations on disk drastically, improving performance as a result.
Redshift gives you an option to define column-based encoding for data compression. If not specified by
the user, redshift automatically assigns compression encoding.
Data compression helps in reducing memory footprint and significantly improves the I/O speed.
Horizontally Scalable
Scalability is a very crucial point for any Data warehousing solution and Redshift does pretty well job in
that.
Redshift is horizontally scalable. Whenever you need to increase the storage or need it to run faster, just
add more nodes using AWS console or Cluster API and it will upscale immediately.
During this process, your existing cluster will remain available for read operations so your application stays
uninterrupted.
During the scaling operation, Redshift moves data parallel between compute nodes of old and new
clusters. Therefore enabling the transition to complete smoothly and as quickly as possible.
SQL interface
Redshift Query Engine is based on ParAccel which has the same interface as PostgreSQL If you are already
familiar with SQL, you don’t need to learn a lot of new techs to start using query module of Redshift.
Since Redshift uses SQL, it works with existing Postgres JDBC/ODBC drivers, readily connecting to most of
the Business Intelligence tools.
AWS ecosystem
Many businesses are running their infrastructure on AWS already, EC2 for servers, S3 for long-term
storage, RDS for database and this number is constantly increasing.
Redshift works very well if the rest of your infra is already on AWS and you get the benefit of data locality
and cost of data transport is comparatively low.
For a lot of businesses, S3 has become the de-facto destination for cloud storage.
Since Redshift is virtually co-located with S3 and it can access formatted data on S3 with single COPY
command.
When loading or dumping data on S3, Redshift uses Massive Parallel Processing which can move data at a
very fast speed.
Security
Amazon Redshift comes packed with various security features.
There are options like VPC for network isolation, various ways to handle access control, data encryption
etc.
Data encryption option is available at multiple places in Redshift.
To encrypt data stored in your cluster you can enable cluster encryption at the time of launching the
cluster.
Also, to encrypt data in transit, you can enable SSL encryption.
When loading data from S3, redshift allows you to use either server-side encryption or client-side
encryption.
Finally, at the time of loading data, S3 or Redshift copy command handles the decryption respectively.
Amazon Redshift clusters can be launched inside your infrastructure Virtual Private Cloud (VPC).
Hence you can define VPC security groups to restrict inbound or outbound access to your redshift clusters.
Using the robust Access Control system of AWS, you can grant privilege to specific users or maintain access
on specific database level.
Additionally, you can even define users and groups to have access to specific data in tables.
Only S3, DynamoDB and Amazon EMR support for parallel upload
If your data is in Amazon S3 or relational DynamoDB or on Amazon EMR, Redshift can load it using
Massively Parallel Processing which is very fast.
But for all other sources, parallel loading is not supported.
You will either have to use JDBC inserts or some scripts to load data into Redshift.
Alternatively, you can use an ETL solution like Hevo which can load your data into Redshift parallel from
100s of sources.
There can be only one distribution key for a table and that cannot be changed later on, which means you
have to think carefully and anticipate future workloads before deciding Distribution key.
Data on Cloud
Though it is a good thing for most of the people, in some use cases it could be a point of concern.
So if you are concerned with the privacy of data or your data has extremely sensitive content, you may
not be comfortable putting it on the cloud.
Amazon EMR
Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to
process vast amounts of data across dynamically scalable Amazon EC2 instances.
You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in
Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.
EMR Notebooks, based on the popular Jupyter Notebook, provide a development and collaboration
environment for ad hoc querying and exploratory analysis.
Amazon EMR securely and reliably handles a broad set of big data use cases, including log analysis, web
indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and
bioinformatics.
Amazon CloudSearch
Amazon CloudSearch is a managed service in the AWS Cloud that makes it simple and cost-effective to set
up, manage, and scale a search solution for your website or application.
Amazon CloudSearch supports 34 languages and popular search features such as highlighting,
autocomplete, and geospatial search.
The service offers integrations with open-source tools like Kibana and Logstash for data ingestion and
visualization.
It also integrates seamlessly with other AWS services such as Amazon Virtual Private Cloud (Amazon VPC),
AWS Key Management Service (AWS KMS), Amazon Kinesis Data Firehose, AWS Lambda, AWS Identity
and Access Management (IAM), Amazon Cognito, and Amazon CloudWatch, so that you can go from raw
data to actionable insights quickly.
Amazon Kinesis
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get
timely insights and react quickly to new information.
Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with
the flexibility to choose the tools that best suit the requirements of your application.
With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website
clickstreams, and IoT telemetry data for machine learning, analytics, and other applications.
Amazon Kinesis enables you to process and analyze data as it arrives and respond instantly instead of
having to wait until all your data is collected before the processing can begin.
Amazon Kinesis currently offers four services: Kinesis Data Firehose, Kinesis Data Analytics, Kinesis Data
Streams, and Kinesis Video Streams.
Amazon Redshift
Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all
your data across your data warehouse and data lake.
Redshift delivers ten times faster performance than other data warehouses by using machine learning,
massively parallel query execution, and columnar storage on high-performance disk.
You can setup and deploy a new data warehouse in minutes, and run queries across petabytes of data in
your Redshift data warehouse, and exabytes of data in your data lake built on Amazon S3.
You can start small for just $0.25 per hour and scale to $250 per terabyte per year, less than one-tenth
the cost of other solutions.
Amazon QuickSight
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy for you to
deliver insights to everyone in your organization.
QuickSight lets you create and publish interactive dashboards that can be accessed from browsers or
mobile devices.
You can embed dashboards into your applications, providing your customers with powerful self-service
analytics.
QuickSight easily scales to tens of thousands of users without any software to install, servers to deploy, or
infrastructure to manage.
AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant,
repeatable, and highly available.
You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying
transient failures or timeouts in individual tasks, or creating a failure notification system.
AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premises
data silos.
AWS Glue
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to
prepare and load their data for analytics.
You can create and run an ETL job with a few clicks in the AWS Management Console.
You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the
associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog.
Once cataloged, your data is immediately searchable, queryable, and available for ETL.
When you run Apache Kafka on your own, you need to provision servers, configure Apache Kafka
manually, replace servers when they fail, orchestrate server patches and upgrades, architect the cluster
for high availability, ensure data is durably stored and secured, setup monitoring and alarms, and carefully
plan scaling events to support load changes.
Amazon Managed Streaming for Kafka makes it easy for you to build and run production applications on
Apache Kafka without needing Apache Kafka infrastructure management expertise.
That means you spend less time managing infrastructure and more time building applications.
With a few clicks in the Amazon MSK console you can create highly available Apache Kafka clusters with
settings and configuration based on Apache Kafka’s deployment best practices.
Amazon MSK automatically provisions and runs your Apache Kafka clusters.
Amazon MSK continuously monitors cluster health and automatically replaces unhealthy nodes with no
downtime to your application.
In addition, Amazon MSK secures your Apache Kafka cluster by encrypting data at rest.
Application Services
Tracking Software Licenses with AWS Service Catalog and AWS Step Functions
Enterprises have many business requirements for tracking how software product licenses are used in their
organization for financial, governance, and compliance reasons.
By tracking license usage, organizations can stay within budget, track expenditures, and avoid unplanned
true-up bills from their vendors’ true-up processes.
The goal is to track the usage licenses as resources are deployed.
In this post, you learn how to use AWS Service Catalog to deploy services and applications while tracking
the licenses being consumed by end users, and how to prevent license overruns on AWS.
This solution uses the following AWS services. Most of the resources are set up for you with an AWS
CloudFormation stack:
o AWS Service Catalog
o AWS Lambda
o AWS Step Functions
o AWS CloudFormation
o Amazon DynamoDB
o Amazon SES
How to secure infrequently used EC2 instances with AWS Systems Manager
Many organizations have predictable spikes in the usage of their applications and services.
For example, retailers see large spikes in usage during Black Friday or Cyber Monday.
The beauty of Amazon Elastic Compute Cloud (Amazon EC2) is that it allows customers to quickly scale up
their compute power to meet these demands.
However, some customers might require more time-consuming setup for their software running on EC2
instances.
Instead of creating and terminating instances to meet demand, these customers turn off instances and
then turn them on again when they are needed.
Eventually the patches on those instances become out of date, and they require updates.
How Cloudticity Automates Security Patches for Linux and Windows using Amazon
EC2 Systems Manager and AWS Step Functions
As a provider of HIPAA-compliant solutions using AWS, Cloudticity always has security as the base of
everything we do.
HIPAA breaches would be an end-of-life event for most of our customers.
Having been born in the cloud with automation in our DNA, Cloudticity embeds automation into all levels
of infrastructure management including security, monitoring, and continuous compliance.
As mandated by the HIPAA Security Rule (45 CFR Part 160 and Subparts A and C of Part 164), patches at
the operating system and application level are required to prevent security vulnerabilities.
As a result, patches are a major component of infrastructure management.
Cloudticity strives to provide consistent and reliable services to all of our customers.
As such, we needed to create a custom patching solution that supports both Linux and Windows.
The minimum requirements for such a solution were to read from a manifest file that contains instance
names and a list of knowledge base articles (KBs) or security packages to apply to each instance.
Below is a simplified, high-level process overview.
Cloud Security
A number of security threats are associated with cloud data services: not only traditional security threats,
such as network eavesdropping, illegal invasion, and denial of service attacks, but also specific cloud
computing threats, such as side channel attacks, virtualization vulnerabilities, and abuse of cloud services.
The following security requirements limit the threats if we achieve that requirement than we can say our
data is safe on cloud.
Identity management
o Every enterprise will have its own identity management system to control access to information and
computing resources.
o Cloud providers either integrate the customer’s identity management system into their own
infrastructure, using federation or SSO technology, or a biometric-based identification system, or
provide an identity management system of their own.
o CloudID, for instance, provides privacy-preserving cloud-based and cross-enterprise biometric
identification.
o It links the confidential information of the users to their biometrics and stores it in an encrypted
fashion.
o Making use of a searchable encryption technique, biometric identification is performed in encrypted
domain to make sure that the cloud provider or potential attackers do not gain access to any sensitive
data or even the contents of the individual queries.
Physical security
o Cloud service providers physically secure the IT hardware (servers, routers, cables etc.) against
unauthorized access, interference, theft, fires, floods etc. and ensure that essential supplies (such as
electricity) are sufficiently robust to minimize the possibility of disruption.
o This is normally achieved by serving cloud applications from 'world-class' (i.e. professionally specified,
designed, constructed, managed, monitored and maintained) data centers.
Personnel security
o Various information security concerns relating to the IT and other professionals associated with cloud
services are typically handled through pre-, para- and post-employment activities such as security
screening potential recruits, security awareness and training programs, proactive.
Privacy
o Providers ensure that all critical data (credit card numbers, for example) are masked or encrypted and
that only authorized users have access to data in its entirety. Moreover, digital identities and
credentials must be protected as should any data that the provider collects or produces about
customer activity in the cloud.
Confidentiality
o Data confidentiality is the property that data contents are not made available or disclosed to illegal
users.
Prof. Vijay M. Shekhat, CE Department | 2180712 – Cloud Infrastructure and Services 7
Unit-8 – Other AWS Services & Management
Services
o Outsourced data is stored in a cloud and out of the owners' direct control. Only authorized users can
access the sensitive data while others, including CSPs, should not gain any information of the data.
o Meanwhile, data owners expect to fully utilize cloud data services, e.g., data search, data
computation, and data sharing, without the leakage of the data contents to CSPs or other adversaries.
Access controllability
o Access controllability means that a data owner can perform the selective restriction of access to her
or his data outsourced to cloud.
o Legal users can be authorized by the owner to access the data, while others cannot access it without
permissions.
o Further, it is desirable to enforce fine-grained access control to the outsourced data, i.e., different
users should be granted different access privileges with regard to different data pieces.
o The access authorization must be controlled only by the owner in untrusted cloud environments.
Integrity
o Data integrity demands maintaining and assuring the accuracy and completeness of data.
o A data owner always expects that her or his data in a cloud can be stored correctly and trustworthily.
o It means that the data should not be illegally tampered, improperly modified, deliberately deleted, or
maliciously fabricated.
o If any undesirable operations corrupt or delete the data, the owner should be able to detect the
corruption or loss.
o Further, when a portion of the outsourced data is corrupted or lost, it can still be retrieved by the data
users.
CloudWatch
Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications you run on
AWS.
You can use Amazon CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and
automatically react to changes in your AWS resources.
Amazon CloudWatch can monitor AWS resources such as Amazon EC2 instances, Amazon DynamoDB
tables, and Amazon RDS DB instances, as well as custom metrics generated by your applications and
services, and any log files your applications generate.
You can use Amazon CloudWatch to gain system-wide visibility into resource utilization, application
performance, and operational health.
You can use these insights to react and keep your application running smoothly.
CloudFormation
AWS CloudFormation provides a common language for you to describe and provision all the infrastructure
resources in your cloud environment.
CloudFormation allows you to use a simple text file to model and provision, in an automated and secure
manner, all the resources needed for your applications across all regions and accounts.
This file serves as the single source of truth for your cloud environment.
AWS CloudFormation is available at no additional charge, and you pay only for the AWS resources needed
to run your applications.
CloudTrail
AWS CloudTrail is an AWS service that helps you enable governance, compliance, and operational and risk
auditing of your AWS account.
Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail.
Events include actions taken in the AWS Management Console, AWS Command Line Interface, and AWS
SDKs and APIs.
CloudTrail is enabled on your AWS account when you create it.
When activity occurs in your AWS account, that activity is recorded in a CloudTrail event.
You can easily view recent events in the CloudTrail console by going to Event history.
For an ongoing record of activity and events in your AWS account, create a trail.
Visibility into your AWS account activity is a key aspect of security and operational best practices.
You can use CloudTrail to view, search, download, archive, analyze, and respond to account activity across
your AWS infrastructure.
You can identify who or what took which action, what resources were acted upon, when the event
occurred, and other details to help you analyze and respond to activity in your AWS account.
Optionally, you can enable AWS CloudTrail Insights on a trail to help you identify and respond to unusual
activity.
You can integrate CloudTrail into applications using the API, automate trail creation for your organization,
check the status of trails you create, and control how users view CloudTrail events.
Working of CloudTrail
You can create two types of trails for an AWS account:
Benefits of CloudTrail
Simplified compliance
With AWS CloudTrail, simplify your compliance audits by automatically recording and storing event logs
for actions made within your AWS account.
Integration with Amazon CloudWatch Logs provides a convenient way to search through log data, identify
out-of-compliance events, accelerate incident investigations, and expedite responses to auditor requests.
You can identify which users and accounts called AWS, the source IP address from which the calls were
made, and when the calls occurred.
Security automation
AWS CloudTrail allows you track and automatically respond to account activity threatening the security of
your AWS resources.
With Amazon CloudWatch Events integration, you can define workflows that execute when events that
can result in security vulnerabilities are detected.
For example, you can create a workflow to add a specific policy to an Amazon S3 bucket when CloudTrail
logs an API call that makes that bucket public.
OpsWorks
AWS OpsWorks is a configuration management service that provides managed instances of Chef and
Puppet.
Chef and Puppet are automation platforms that allow you to use code to automate the configurations of
your servers.
OpsWorks lets you use Chef and Puppet to automate how servers are configured, deployed, and managed
across your Amazon EC2 instances or on-premises compute environments.
OpsWorks has three offerings, AWS Opsworks for Chef Automate, AWS OpsWorks for Puppet Enterprise,
and AWS OpsWorks Stacks.
AWS provides forecasts based on your cost and usage history and allows you to set budget threshold and
alerts, so you can stay informed whenever cost and usage is forecasted to, or exceeds the threshold limit.
You can also set reservation utilization and/or coverage targets for your Reserved Instances and Savings
Plans and monitor how they are progressing towards your target.
Amazon CloudWatch
Amazon CloudWatch collects monitoring and operational data in the form of logs, metrics, and events,
providing you with a unified view of AWS resources, applications, and services that run on AWS and on-
premises servers.
Geographic Concerns
The AWS Global Cloud Infrastructure is the most secure, extensive, and reliable cloud platform, offering
over 175 fully featured services from data centers globally.
Whether you need to deploy your application workloads across the globe in a single click, or you want to
build and deploy specific applications closer to your end-users with single-digit millisecond latency, AWS
provides you the cloud infrastructure where and when you need it.
With millions of active customers and tens of thousands of partners globally, AWS has the largest and
most dynamic ecosystem.
Customers across virtually every industry and of every size, including start-ups, enterprises, and public
sector organizations, are running every imaginable use case on AWS.
This core area is the pilot from our gas heater analogy. The application and caching server replica
environments are created on cloud and kept in standby mode as very few changes take place over time.
These AMIs can be updated periodically. This is the entire furnace from our example. If the on premise
system fails, then the application and caching servers get activated; further users are rerouted using elastic
IP addresses to the ad hoc environment on cloud. Your Recovery takes just a few minutes.
4. Multi-Site Approach
Well this is the optimum technique in backup and DR and is the next step after warm standby.
All activities in the preparatory stage are similar to a warm standby; except that AWS backup on Cloud is
also used to handle some portions of the user traffic using Route 53.
When a disaster strikes, the rest of the traffic that was pointing to the on premise servers are rerouted to
AWS and using auto scaling techniques multiple EC2 instances are deployed to handle full production
capacity.
You can further increase the availability of your multi-site solution by designing Multi-AZ architectures.
Examining Logs
It is necessary to examine the log files in order to locate an error code or other indication of the issue that
your cluster experienced.
It may take some investigative work to determine what happened.
Hadoop runs the work of the jobs in task attempts on various nodes in the cluster.
Amazon EMR can initiate speculative task attempts, terminating the other task attempts that do not
complete first.
This generates significant activity that is logged to the controller, stderr and syslog log files as it happens.
In addition, multiple tasks attempts are running simultaneously, but a log file can only display results
linearly.
Start by checking the bootstrap action logs for errors or unexpected configuration changes during the
launch of the cluster.
From there, look in the step logs to identify Hadoop jobs launched as part of a step with errors.
Examine the Hadoop job logs to identify the failed task attempts.
The task attempt log will contain details about what caused a task attempt to fail.
Book
1. Cloud Computing Bible, Barrie Sosinsky, John Wiley & Sons, ISBN-13: 978-0470903568.
2. Mastering AWS Security, Albert Anthony, Packt Publishing Ltd., ISBN 978-1-78829-372-3.
3. Amazon Web Services for Dummies, Bernard Golden, For Dummies, ISBN-13: 978- 1118571835.
Websites
1. www.aws.amazon.com
2. www.docs.aws.amazon.com
3. www.bluepiit.com
4. www.inforisktoday.com
5. www.techno-pulse.com
6. www.exelanz.com
7. www.ibm.com
8. www.iarjset.com/upload/2017/july-17/IARJSET%2018.pdf
9. www.searchservervirtualization.techtarget.com
10. www.docs.eucalyptus.com
11. www.cloudacademy.com
12. www.searchaws.techtarget.com
13. www.searchsecurity.techtarget.com
14. www.en.wikipedia.org/wiki/Cloud_computing_security
15. www.znetlive.com
16. www.en.wikipedia.org/wiki/Virtual_private_cloud
17. www.resource.onlinetech.com
18. www.globalknowledge.com
19. www.blog.blazeclan.com/4-approaches-backup-disaster-recovery-explained-amazon-cloud
20. www.zdnet.com/article/what-is-cloud-computing-everything-you-need-to-know-about-the-cloud
21. www.javatpoint.com/introduction-to-cloud-computing
22. www.javatpoint.com/history-of-cloud-computing
23. www.allcloud.io/blog/6-cloud-computing-concerns-facing-2018
24. www.searchitchannel.techtarget.com/definition/cloud-marketplace
25. www.en.wikipedia.org/wiki/Amazon_Web_Services
26. www.msystechnologies.com/blog/cloud-orchestration-everything-you-want-to-know
27. www.linuxacademy.com/blog/linux-academy/elasticity-cloud-computing
28. www.searchitchannel.techtarget.com/definition/Eucalyptus
29. www.geeksforgeeks.org/virtualization-cloud-computing-types
30. www.cloudsearch.blogspot.com
31. www.simplilearn.com/tutorials/aws-tutorial/aws-iam
32. www.d1.awsstatic.com/whitepapers/aws-security-whitepaper.pdf
33. www.resources.intenseschool.com/amazon-aws-understanding-ec2-key-pairs-and-how-they-are-used-
for-windows-and-linux-instances/
34. www.pagely.com/blog/amazon-ec2/
35. www.cloudflare.com/learning/cloud/what-is-multitenancy/
36. www.hevodata.com/blog/amazon-redshift-pros-and-cons/