Cloud Security Basics: Protecting Infrastructure and Data in Cloud Environments

In July 2019, Capital One discovered that a former AWS employee had exploited a misconfigured web application firewall to access the personal data of over 100 million customers in the United States and Canada. The breach did not exploit a vulnerability in AWS's infrastructure. AWS's own security worked as designed. The failure was in Capital One's configuration: an overly permissive IAM role attached to an EC2 instance allowed the attacker, once inside the application server, to query the AWS metadata service and obtain credentials that granted access to sensitive S3 buckets.

Capital One paid $80 million in regulatory fines, spent hundreds of millions on remediation, and spent years rebuilding customer trust---all from a misconfiguration that could have been prevented with proper IAM configuration and network segmentation.

The incident crystallized a truth that many organizations learn the hard way: moving to the cloud does not outsource security. It changes who is responsible for what, and misunderstanding those boundaries is precisely where breaches happen. Cloud security is not the provider's problem alone, nor is it the customer's alone. It is a shared responsibility with clearly defined boundaries---and organizations that fail to understand those boundaries pay the price.


The Shared Responsibility Model

The shared responsibility model is the foundational concept in cloud security. AWS formalized it; Azure and Google Cloud follow the same structure. Understanding it is prerequisite to everything else.

The cloud provider is responsible for security of the cloud---the physical security of data centers, the hardware infrastructure, the hypervisor that isolates tenant virtual machines, the global network fabric, and the host operating systems on managed services. AWS, Azure, and Google Cloud invest billions in securing this layer. Physical data center breaches are essentially unheard of among major providers. Hypervisor escapes have occurred in research contexts but are exceptionally rare in production. When AWS or Azure has a major outage, it is almost never due to a security failure at their infrastructure layer.

You are responsible for security in the cloud---your data, your access controls, your application code, your operating system patches (for IaaS), your encryption configuration, your network settings, and your compliance with relevant regulations. This is where the vast majority of cloud security incidents occur: customer misconfiguration and oversight, not provider failure.

The division of responsibility shifts depending on the service model:

Responsibility IaaS (e.g., EC2) PaaS (e.g., App Engine) SaaS (e.g., Salesforce)
Physical infrastructure Provider Provider Provider
Hypervisor/virtualization Provider Provider Provider
Network infrastructure Provider Provider Provider
Operating system Customer Provider Provider
Runtime/middleware Customer Provider Provider
Application code Customer Customer Provider
Data encryption Customer Customer Shared
Access management Customer Customer Customer
Data classification Customer Customer Customer

The practical implication: as you move up the stack from IaaS to SaaS, the provider takes on more responsibility---but the customer never loses responsibility for access management and data governance. These two areas remain the customer's domain regardless of service model.


Identity and Access Management: The Highest-Leverage Control

IAM (Identity and Access Management) is the most critical security control in cloud environments, and misconfigured IAM is the most common cause of cloud security incidents. Getting IAM right prevents the majority of breaches; getting it wrong exposes everything.

Core IAM Concepts

Users represent individual people or service accounts. Each user has credentials (passwords, access keys) and policies defining what actions they can perform on which resources.

Groups collect users who share the same permissions. Managing permissions through groups rather than individual user assignments is more maintainable at scale---when a role changes, you update one group policy rather than dozens of individual user policies.

Roles provide temporary, short-lived credentials for specific tasks. Instead of giving an application permanent access keys (which can be stolen and reused indefinitely), the application assumes a role that grants time-limited credentials for specific permissions. Roles are the preferred mechanism for granting access to applications and services.

Policies are documents defining what actions are allowed or denied on what resources. Policies attach to users, groups, or roles. In AWS IAM, explicit deny always overrides explicit allow, and no permissions are granted unless explicitly allowed.

IAM Best Practices

Apply least privilege consistently. This principle---give each entity only the minimum permissions necessary for its function---is the most important IAM practice. A developer who needs to read from a database should not have permission to delete it. A monitoring service that reads metrics should not have permission to modify infrastructure. Wildcard permissions (Action: "*" or Resource: "*") should be viewed with deep suspicion.

Example: The Capital One breach exploited an IAM role attached to a web server that had s3:GetObject permission on all S3 buckets rather than only the specific buckets the web application needed. Proper least-privilege would have restricted the role to the three or four specific bucket ARNs the application legitimately accessed.

Never use root accounts for daily work. Cloud provider root accounts have unrestricted access to everything, cannot have permissions revoked, and should be treated like nuclear launch codes. Create them, secure them with hardware MFA, store the MFA device in a physical safe, and use them only for the rare tasks that genuinely require root privileges (like recovering from a locked-out IAM configuration).

Enable Multi-Factor Authentication everywhere. MFA requires something you know (password) and something you have (authenticator app, hardware token) to authenticate. For cloud console access and privileged accounts, MFA is the single most impactful control against credential compromise. Enable it for all human users, required for users with any meaningful permissions.

Rotate and audit access keys. Long-lived access keys are security liabilities. Keys that have existed for years may have been inadvertently committed to source code, shared in a Slack message, or stored in a configuration file on a compromised laptop. Implement automated key rotation (90-day maximum), audit for unused keys (revoke keys unused for 30+ days), and use roles rather than access keys for machine-to-machine authentication wherever possible.

Segregate duties. The person who can create IAM policies should not be the same person who can approve their own access requests. The person who can deploy to production should not be the same person who can modify audit logs. These separations prevent single-actor compromise.

Use IAM Access Analyzer. AWS IAM Access Analyzer, Azure's equivalent tools, and GCP Policy Analyzer automatically identify resources (S3 buckets, KMS keys, IAM roles) that are accessible from outside your account or organization. Run this continuously and address any identified access you did not intentionally grant.


Network Security Architecture

Cloud network security operates in layers. Each layer provides different protection, and the combination provides defense-in-depth.

Virtual Private Cloud (VPC)

A VPC is an isolated network segment within the cloud provider's infrastructure. By default, resources within a VPC can communicate with each other but are isolated from other customers and from the internet. The VPC provides the fundamental network boundary.

When creating infrastructure, always use a VPC (AWS creates a default VPC but using a custom VPC with explicit configuration is more secure). Never deploy sensitive resources outside a VPC.

Subnets: The Public/Private Divide

Within a VPC, subnets separate resources by exposure level.

Public subnets have routes to the internet through an Internet Gateway. Appropriate for resources that must be directly internet-accessible: load balancers, NAT gateways, and sometimes web servers.

Private subnets have no direct internet access. All outbound internet traffic routes through a NAT Gateway in a public subnet. Appropriate for: databases, application servers, caches, internal services---any resource that should not receive inbound connections from the internet.

The pattern for a three-tier application: load balancer in public subnets, application servers in private subnets, databases in private subnets. Internet traffic reaches only the load balancer; databases are never directly reachable from the internet.

Example: Equifax's 2017 data breach (147 million records compromised) exploited an unpatched Apache Struts vulnerability on a web server. If the database had been in a private subnet rather than accessible from the web tier through overly broad security group rules, the attacker's path from the compromised web server to the database would have been blocked at the network level, limiting the breach scope significantly.

Security Groups

Security groups act as stateful firewalls attached to individual resources (EC2 instances, RDS databases, load balancers). They define which inbound and outbound traffic is permitted by source IP, CIDR, other security groups, or protocol/port.

Key security group practices:

  • Default to deny all inbound traffic; explicitly open only what is necessary
  • Allow inbound from other security groups rather than IP ranges where possible (more maintainable)
  • Never allow 0.0.0.0/0 inbound on port 22 (SSH) or 3389 (RDP) to production instances
  • Administrative ports should be restricted to specific IP ranges or accessed via bastion hosts/Session Manager

Secure Administrative Access

Never expose SSH or RDP directly to the internet for production systems. Options:

AWS Systems Manager Session Manager: Provides browser-based shell access to EC2 instances without opening any inbound ports. Access is authenticated through IAM, fully logged, and works without a bastion host. Session Manager should be the default administrative access method.

Bastion hosts (jump boxes): A single, hardened instance in a public subnet, accessible from known IP ranges, through which administrators SSH to private instances. Simpler than Session Manager to understand, but requires managing the bastion host as additional infrastructure.

VPN connections: AWS Site-to-Site VPN or Client VPN provides encrypted access to the VPC from corporate networks or individual machines. Appropriate for organizations that need broad VPC access from managed corporate machines.


Encryption: Data at Rest and in Transit

Encryption protects data confidentiality---ensuring that even if storage is physically compromised or network traffic is intercepted, the data remains unreadable without the correct keys.

Encryption at Rest

Data stored in cloud storage should be encrypted. Cloud providers offer this transparently for most storage services:

  • AWS: Default encryption available for S3, EBS, RDS, DynamoDB, and most other storage services. Provider-managed keys (SSE-S3) or customer-managed keys via KMS.
  • GCP: Default encryption for all data at rest, customer-managed keys available via Cloud KMS.
  • Azure: Storage Service Encryption is on by default; Azure Key Vault for customer-managed keys.

For most organizations, provider-managed encryption (the default) provides adequate protection against infrastructure-level compromise. For workloads handling particularly sensitive data (healthcare records, financial data), customer-managed keys (CMK) provide additional control: you can revoke the key, disabling access to the data, and you have a complete audit trail of key usage.

The critical principle: encrypt everything, particularly databases, file storage, and backup archives. The computational overhead of encryption is minimal; the protection it provides is substantial.

Encryption in Transit

All network communications should use TLS (Transport Layer Security). This includes:

  • All internet-facing connections (HTTPS for web applications and APIs)
  • Internal service-to-service communications within the VPC
  • Administrative access (SSH is encrypted; HTTP admin interfaces are not acceptable)
  • Database connections (use TLS options in RDS, Cloud SQL, Azure Database)

Certificate management: AWS Certificate Manager, Google-managed SSL certificates, and Azure App Service Managed Certificates provide free TLS certificates with automatic renewal. There is no acceptable reason to serve sensitive applications over HTTP in 2024.

Key Management

Keys must be:

  • Never stored alongside encrypted data: A key stored in the same S3 bucket as the data it encrypts provides no protection
  • Rotated regularly: Annual rotation at minimum; quarterly for high-sensitivity data
  • Access-controlled: Only services that legitimately need decryption capability should have access to keys
  • Audited: Every key usage should be logged and reviewed

AWS KMS, Azure Key Vault, and Google Cloud KMS all provide managed key management services that handle rotation, access control, and audit logging. These services should be preferred over building custom key management.


Data Protection and Access Controls for Storage

S3 and Object Storage Security

Public S3 buckets have been among the most common sources of large-scale data exposures. Organizations including FedEx, Verizon, Accenture, and hundreds of others have inadvertently exposed data through publicly accessible buckets.

AWS S3 Block Public Access settings (available at account and bucket level) should be enabled everywhere by default. These settings override any bucket policies or ACLs that would grant public access. Enable them at the account level and treat any exception as requiring explicit justification and review.

Additional S3 security controls:

  • Bucket policies: Control access at the bucket level, in addition to IAM policies
  • VPC Endpoints: Allow EC2 instances to access S3 without traffic traversing the internet
  • S3 Access Logs: Log all requests to S3 buckets for audit purposes
  • Object Lock: Prevent deletion or modification of objects for a defined retention period (for compliance requirements)
  • Versioning: Enable versioning to protect against accidental deletion and ransomware

Database Security

Databases contain the highest-value data and deserve layered protection:

  • No public accessibility: Database instances should never have public-facing endpoints. All access should come through application tiers or administrative hosts within the VPC.
  • Encryption at rest and in transit: Enable both for all production databases
  • Least-privilege database accounts: Application database users should have only the permissions they need (SELECT/INSERT/UPDATE for application tables; no DROP, TRUNCATE, or access to system tables)
  • Automated backups: Maintain encrypted backups in multiple availability zones
  • Audit logging: Enable database audit logging for compliance and forensics
  • Secrets management: Database credentials should be stored in AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault---never in environment variables, configuration files, or code

Secrets Management: Eliminating Hardcoded Credentials

Hardcoded credentials in source code, configuration files, or environment variables are among the most persistent and damaging cloud security vulnerabilities.

Credentials committed to Git repositories are:

  1. Visible to everyone with repository access
  2. Persisted in git history even after deletion
  3. Frequently exposed through public GitHub repositories, either accidentally or through misconfiguration

The solution is systematic secrets management:

AWS Secrets Manager: Store and automatically rotate database credentials, API keys, and other secrets. Applications retrieve secrets via API call at runtime. Supports automatic rotation for RDS passwords and custom rotation for other secret types.

AWS Parameter Store: Simpler (less expensive) alternative for configuration data and non-sensitive parameters. Supports encrypted parameters using KMS.

HashiCorp Vault: Open-source secrets management tool that works across cloud providers and on-premises. Provides dynamic secrets (generating short-lived credentials on demand rather than storing long-lived ones), policy-based access control, and comprehensive audit logging.

OIDC for CI/CD: Modern CI/CD platforms (GitHub Actions, GitLab CI) support OpenID Connect, allowing pipelines to authenticate to cloud providers without any stored credentials. The pipeline proves its identity cryptographically; the cloud provider issues temporary credentials. This eliminates the category of long-lived CI credentials entirely.

Prevent credential leakage:

  • Git pre-commit hooks: Tools like git-secrets, detect-secrets, and Gitleaks scan commits for credential patterns before they enter version control
  • GitHub secret scanning: Automatically detects known credential patterns (API keys, private keys) in public repositories and notifies providers
  • Periodic secret rotation: Even properly managed secrets should be rotated regularly

Logging, Monitoring, and Incident Detection

Detection is as important as prevention. No security posture is perfect, and the difference between a minor incident and a catastrophic breach is often detection speed. The average time to detect a breach without active monitoring is measured in months.

Enable Cloud-Native Audit Logging

Every major cloud provider offers comprehensive API audit logging:

  • AWS CloudTrail: Records all API calls made in your AWS account, including who made them, when, from where, and what the response was. Enable CloudTrail in all regions and protect the log archive from modification.
  • Azure Activity Log: Similar to CloudTrail for Azure resource operations
  • GCP Audit Logs: Admin Activity logs (always on), Data Access logs (must be enabled), and System Event logs

These logs are essential for forensic investigation after incidents. Without them, you cannot determine what an attacker did, what data they accessed, or what cleanup is required.

Threat Detection Services

Cloud-native threat detection services analyze activity patterns and identify suspicious behavior automatically:

AWS GuardDuty: Continuously analyzes CloudTrail logs, VPC Flow Logs, and DNS logs using machine learning to identify threat indicators. Detects patterns like: API calls from known malicious IPs, credential exfiltration, unusual data access patterns, cryptocurrency mining activity, and communication with known command-and-control infrastructure. GuardDuty should be enabled in every AWS account and every region.

Azure Defender: Threat detection integrated with Azure Security Center, analyzing logs across Azure services for attack patterns.

Google Cloud Security Command Center: Centralized security and risk management for GCP, with threat intelligence and anomaly detection.

Third-party SIEM: For organizations with complex environments, Security Information and Event Management tools (Splunk, Elastic Security, Sumo Logic) aggregate logs from cloud and on-premises sources, enabling correlation analysis that individual cloud tools cannot perform.

Critical Security Alerts

At minimum, configure alerts for:

  • Root account login (should almost never happen)
  • IAM policy changes (who changed what permissions)
  • Security group changes (new inbound rules)
  • S3 bucket policy changes
  • Failed authentication attempts (credential stuffing, brute force)
  • API calls from unusual geographic locations
  • Large data transfers out of the environment

Each alert should have a documented response procedure so responders know exactly what to do when it fires, rather than improvising under pressure.


Common Cloud Security Vulnerabilities

The same misconfiguration patterns appear repeatedly across cloud security assessments. Knowing them enables systematic prevention.

Publicly accessible storage remains the leading cause of large-scale data exposure. S3 buckets, Azure Blob containers, and GCS buckets set to public access have exposed customer databases, credentials, healthcare records, and financial data for hundreds of organizations. The mitigations are straightforward: enable S3 Block Public Access at the account level, run automated checks for public buckets, and treat any public bucket exception as requiring formal approval.

Overly permissive IAM policies expand the blast radius of any compromise. Administrator-level policies granted to services that need read access to a single bucket, wildcard resource permissions (*), and permissions granted for convenience without review---all create unnecessary risk. The antidote is regular IAM access reviews using tools like IAM Access Analyzer and CloudSplaining.

Unpatched vulnerabilities in operating systems and dependencies run on cloud infrastructure just as on-premises. Cloud does not automatically keep your software updated. Implement automated patching for EC2 instances using AWS Systems Manager Patch Manager, enable automatic minor version upgrades for RDS, and maintain a software bill of materials (SBOM) for container images.

Missing encryption on databases and storage. Despite years of guidance, unencrypted databases and storage buckets remain common in cloud assessments. Implement policy-as-code checks using AWS Config rules, Azure Policy, or GCP Organization Policies to enforce encryption at account level.

Disabled logging means breaches go undetected. Enable CloudTrail, VPC Flow Logs, and application-level logging everywhere. Store logs in a separate, access-controlled location so that an attacker who compromises application infrastructure cannot cover their tracks by deleting logs.

Overly broad CORS policies on web applications and APIs---particularly allowing all origins (*)---can enable cross-site data theft. Configure CORS to allow only legitimate origins.


Compliance Frameworks for Cloud Environments

Regulatory compliance in cloud environments requires understanding both what the regulations require and how shared responsibility applies.

SOC 2 evaluates security controls against Trust Services Criteria. Cloud providers (AWS, Azure, GCP) have their own SOC 2 certifications for their infrastructure. These are valuable evidence but do not satisfy your SOC 2 requirement: your application's security controls, access management, and data handling are your responsibility and must be audited separately.

PCI-DSS governs systems that store, process, or transmit payment card data. Cloud environments can be PCI-compliant, but require specific configurations: network segmentation, encryption in transit and at rest, access logging, and regular vulnerability scanning. AWS has a PCI-DSS Compliance program with documentation on the shared responsibility model for PCI.

HIPAA governs healthcare data in the United States. AWS, Azure, and GCP all offer HIPAA Business Associate Agreements (BAAs) and document their HIPAA-eligible services. HIPAA compliance requires customer-side controls: encryption of protected health information (PHI) at rest and in transit, access controls, audit logging, and breach notification procedures.

GDPR applies to personal data of EU residents regardless of where the organization operates. Key requirements for cloud environments: data residency (ensuring EU personal data does not leave EU regions without adequate protections), right to erasure (ability to delete specific user data), data processing documentation, and breach notification timelines.

Understanding how DevOps practices integrate security into the development lifecycle---the "shift left" movement---is essential for building compliant systems rather than retrofitting compliance onto existing ones.


Security Automation: Security as Code

Manual security processes do not scale with cloud's speed and dynamism. Effective cloud security requires automation.

Infrastructure as Code security scanning: Tools like Checkov, Terrascan, and tfsec scan Terraform, CloudFormation, and other IaC templates for security misconfigurations before resources are deployed. A policy-as-code check that blocks deployment of unencrypted databases prevents the misconfiguration from ever reaching production.

Automated compliance checking: AWS Config Rules, Azure Policy, and GCP Organization Policies define required configurations for cloud resources and automatically flag or remediate deviations. An AWS Config rule that requires all S3 buckets to have Block Public Access enabled will alert immediately if any bucket is created without it.

Container image scanning: CI/CD pipelines should scan container images for vulnerabilities before pushing them to registries. Snyk, Trivy, and Clair scan images against vulnerability databases and block deployment of images with critical vulnerabilities.

Runtime security monitoring: Tools like Falco (open-source), Aqua Security, and Sysdig monitor container and Kubernetes workloads at runtime, detecting anomalous behavior (unexpected outbound connections, unusual file system writes, privilege escalation attempts) that static analysis cannot detect.


Building a Cloud Security Program

Security is not a feature you add to cloud infrastructure---it is a practice you embed throughout.

Start with inventory: You cannot secure what you do not know exists. Implement comprehensive tagging, enable resource inventory tools (AWS Config, Azure Resource Graph), and build dashboards that show all resources across all accounts and regions.

Prioritize identity: IAM misconfigurations cause the majority of breaches. Invest in IAM expertise, implement regular access reviews, enforce MFA universally, and audit privileges continuously.

Automate the basics: Encryption at rest, encryption in transit, and private subnets for databases should be enforced programmatically, not checked manually. Humans forget; policies enforced by infrastructure code do not.

Assume breach and invest in detection: Perfect prevention is impossible. Build logging infrastructure, implement threat detection, define incident response procedures, and practice them. A team that has run tabletop exercises and knows exactly what to do when GuardDuty fires a critical alert responds in hours. A team improvising from scratch responds in days.

Integrate security into the development process: The cheapest security fix is the one that prevents a vulnerability from ever reaching production. Code review processes, automated scanning in CI/CD pipelines, developer security training, and threat modeling during design catch issues early, when they are cheapest to fix.

Cloud security is a competitive concern as much as a technical one. The organizations that build security into their cloud architecture from the beginning maintain trust, avoid regulatory fines, and spend less on remediation than those that treat security as an afterthought.


References