Vahap Öç – Exevolium

Understanding AI-Powered Analysis with CloudWatch Investigations and the Bigger Picture of Incident Management (Part 1)

root — Wed, 29 Oct 2025 19:55:56 +0000

Hello everyone,

In this article, I will talk about the AI-powered CloudWatch feature that simplifies root cause analysis for issues occurring in your AWS environment, as well as the possibilities for semi-automating an incident analysis process.

Let me first mention the CloudWatch Investigations feature, announced on June 24. This feature automatically analyzes CloudWatch metrics, logs, events, deployment data, AWS Health, and CloudTrail records when an alarm or performance issue occurs in your system, providing you with potential causes and solution suggestions.

For example, imagine you no longer need to manually review all metrics and logs when a performance issue arises. CloudWatch Investigations can examine all the values on your behalf and initiate an analysis with the support of Amazon Q. This can make problem resolution during a crisis much easier.

As another example of a use case, this feature can assist first-level support personnel in analyzing issues in a shift-based work environment and help newly hired engineers with limited knowledge of the environment to perform easier issue analysis. Many other convenience and usability scenarios can be considered in this way.

Since this feature supports cross-account functionality, you can add and analyze multiple accounts. Imagine you have an environment operating in a Hub & Spoke structure. Suppose all network traffic exits through the network account, and the solution you use there (such as a Fortigate, Palo Alto, or AWS Network Firewall within an EC2 instance) encounters an issue. Everything in Account A is functioning correctly, but the application cannot access the internet. In such a case, you can use cross-account analysis to examine dependencies across multiple accounts.

In general, an investigation is initiated through an alarm, metric, or log query. CloudWatch Investigations scans the relevant data from associated resources using your IAM permissions. The AI then generates observations, suggestions, and hypotheses from this data. All these actions are also recorded in CloudTrail.

When you create an Investigation Group, you gain access to the following options:

You can accept or reject AI suggestions, allowing the system to learn from your feedback.
You can add cross-account access to analyze data from different AWS accounts.
You can include additional data sources such as CloudTrail, X-Ray, Application Signals, and EKS.
You can execute automated remediation actions through runbook recommendations (Systems Manager Automation).
You can add and share notes, enabling collaborative review of the report with team members.
You can perform actions such as stopping, archiving, or reopening the investigation.

At the same time, with the Incident Report creation feature announced on October 22, you can automatically create a report output that would otherwise need to be prepared manually in the event of an incident based on the research group you created.

The report is structured according to industry-standard incident report formats:

Incident Overview:
– A general summary of the incident, including its severity, duration, and operational hypothesis.
Impact Assessment:
– The impact of the incident on customers, services, and business operations.
Detection and Response:
– When and how the incident was detected and how the team responded.
Root Cause Analysis:
– A detailed analysis of the underlying causes of the incident.
Mitigation and Resolution:
– The mitigation steps, resolution measures, and resolution timeframes.
Learning and Next Steps:
Future recommendations, preventive actions, and improvement plans.

When you choose Generate report, the system creates the report. AI combines all extracted facts and produces a comprehensive analysis.

Lastly, let’s talk about the pricing for this feature. The CloudWatch Investigations capability is now generally available at no additional cost.

Now that I’ve explained what this feature does and how it can be activated, I’d like to discuss how incidents can be detected in your account, how notifications can be sent to the people responsible for analyzing captured incidents, and what time-saving options are available to those reviewers. As you can probably guess, one of these options will be CloudWatch Investigations.

Let’s imagine we have an application running in our AWS environment, and we want an incident mechanism to be triggered whenever there’s an issue with this application. To do this, we first need a service to monitor both our application and its infrastructure. Here, CloudWatch Metrics and Logs monitor the infrastructure, while CloudWatch Application Insights monitors the application itself.

Now, suppose we’re already monitoring everything, but we also want to automatically trigger response mechanisms (such as phone calls, Slack or Teams notifications) and manage who will handle the issue when it arises. In that case, we use Systems Manager Incident Manager.

Once we’ve identified the bottleneck, generated a notification, and delivered it to the recipient, if you’re wondering how we can further assist the engineer in resolving the incident faster, that’s where CloudWatch Investigations comes into play.

Now you can see why I mentioned CloudWatch Investigations AWS offers many different services, and the best part is how seamlessly they can work together as a unified system.

In other words:

Application Insights detected a CPU spike in one of the services and created an alarm.
This alarm triggered Incident Manager, which alerted the on-call engineer and opened a runbook.
The team initiated the incident, and CloudWatch Investigations stepped in to perform an AI-powered root cause analysis, revealing that the issue was caused by increased database queries following a new deployment, leading to a solution.

I created this content to explain how this feature fits as part of a larger system, how it connects to other services, and what can be achieved when integrated together. In the next article of this series, I will discuss end-to-end incident management in an AWS environment, presenting it through a real customer case.

I hope this article helps save you time.

References:

1. AWS Docs
2. AWS Docs 2

Automating SAP HANA Configuration Checks with AWS Systems Manager

root — Wed, 29 Oct 2025 13:43:41 +0000

Hello everyone,

In this article, I will discuss the new Systems Manager feature announced by AWS in early September, which verifies how well SAP configurations running on AWS comply with best practices, as well as the details of this feature. This new capability helps you compare your SAP HANA configurations against the AWS Well-Architected Framework (SAP Lens) and official AWS for SAP documentation. It automatically checks EC2 instance types, EBS storage setup, and Pacemaker HA configurations to ensure compliance with best practices.

Based on the AWS Well-Architected Framework’s SAP Lens documentation and official AWS-SAP guidelines, this tool automatically evaluates how well your systems are configured in terms of performance, security, and high availability.

The feature examines and evaluates the system under three main categories:

SAP HANA Pacemaker Configuration
– Checks whether the EC2 instances running SAP HANA are SAP-certified and whether the hardware settings are correctly configured.

SAP HANA EBS Storage Configuration
-Verifies whether the EBS disks’ file system and RAID configuration comply with AWS recommendations.

SAP EC2 Instance Type Selection
– Analyzes whether the Pacemaker cluster is correctly configured for HANA.

Each of these categories includes several subtests, and the results are clearly listed as “OKAY,” “WARNING,” or “ERROR.”

To use this feature, you must complete the following prerequisite steps on your instances:

The Amazon Systems Manager Agent (SSM Agent) must be installed on your servers, and the appropriate IAM role must be attached to the EC2 instance. To do this, simply attach the AWS Managed Policy “AmazonSSMManagedInstanceCore” to the role and create a Customer Managed Policy as shown below, then attach it to the same role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AwsSsmForSapPermissions",
            "Effect": "Allow",
            "Action": [
                "ssm-sap:*"
            ],
            "Resource": "arn:*:ssm-sap:*:*:*"
        },
        {
            "Sid": "AwsSsmForSapServiceRoleCreationPermission",
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": [
                "arn:aws:iam::*:role/aws-service-role/ssm-sap.amazonaws.com/AWSServiceRoleForAWSSSMForSAP"
            ],
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": "ssm-sap.amazonaws.com"
                }
            }
        },
        {
            "Sid": "AllowGetSecretValue",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:aws:secretsmanager:YOUR_REGION:YOUR_AWS_ACCOUNT_ID:secret:YOUR_SAP_SECRETS_ID"
            ]
        }
    ]
}

– Configure the required TCP ports (typically 30013-30015 for SAP HANA).
– You need to add your SAP system as an application in AWS Systems Manager. For this, it is recommended to create a new user in your SAP environment rather than using an existing system user. The permissions for this user should be as follows (there is no need to grant FULL ADMIN privileges).

MONITORING role
PUBLIC role
System privilege for “RESOURCE ADMIN”
System privilege for “CATALOG READ”

These permissions provide sufficient access for configuration checks while maintaining security. You won’t need full administrative rights.

To proceed, go to AWS Systems Manager, and in the left-hand menu, select Application Manager. In the opened section, choose Create Application, then select Enterprise Workload.

In the section that appears, enter your application and database information. Grant access to all the databases you are using, including SYSTEMDB. In the next step, create a new secret in AWS Secrets Manager and add it here you should include it as username and password.

If you encounter an error, first check the database connectivity using the hdbsql client. If you experience other issues, please review the AWS troubleshooting page. Once your configuration is correct, Systems Manager will begin recognizing your environment.

As you can see in the image above, within this console you can perform various monitoring operations and compliance actions related to your defined SAP servers and databases, and you can also run Runbooks.

To summarize briefly:

The Overview section allows you to view the overall system status in a single console. Here, you can see the status of your servers, as well as any custom alarms and metrics you’ve defined. With Cost Allocation Tags, you can also view customized cost tracking in this section. At the bottom of the page, the Compliance section displays rules that can be quickly defined using AWS Config. You can create automation workflows with Runbooks for example, you can define RemoteRunShell commands such as restarting corosync.

With OpsItems, you can view centralized records and tracking items created for incidents or issues detected in AWS resources.

In short, you can manage all monitoring and automation operations related to your SAP environment from a single console.

Now, regarding the main topic -SAP configuration check status- as shown in the screenshot above, the analysis is presented under three main categories. You can hover over any number to view more details.

As mentioned earlier, the configuration is examined under three main categories. Below, I am sharing an example result from the analysis related to the Pacemaker configuration:

Now that the setup is complete, let’s move on to the pricing of this great feature and you might be surprised by some of the details. Since this feature is part of Systems Manager, you only pay for the functionalities you use. There are no minimum fees or commitments.

Here are the features available at no additional cost for this product:

Free features:

SAP application registration and management
Application-aware start and stop operations
Basic application monitoring and insights

SAP Configuration Management pricing:

Configuration check results are retained for 30 days
$0.25 USD per configuration check run per application in all AWS regions
Checks can be run on-demand or on a schedule

Example Pricing Scenario for Configuration Management:
If you run three configuration checks per week on two SAP HANA applications, your monthly cost would be $6.00 USD.
This is calculated as:
3 checks × 2 applications × 4 weeks × $0.25 = $6.00

There are no minimum fees or upfront commitments, and no charge for registering SAP applications.

It’s an excellent pricing model with these features, you can have a second pair of eyes evaluating how accurately your cluster is configured.

I hope these article help save you time.

References:

Rightsizing, Visibility, and Cost Optimization: Inside EC2 Capacity Manager

root — Sun, 19 Oct 2025 15:49:06 +0000

Hello everyone,

In this article, I’ll be talking about AWS’s newly announced EC2 Capacity Manager. This new feature is one of the most valuable additions to centralized resource management, following the previously released EC2 Global View.

Previously, monitoring, analyzing, and optimizing EC2 cost and usage required using multiple solutions such as CloudWatch, Cost and Usage Reports (CUR), Cost Explorer, EC2 APIs, Savings Plans & Reserved Instances dashboards, and CUDOS, or relying on various third-party tools.

Just as Global View provides a global perspective of which compute-related services are running in different regions within an AWS account, this new feature brings together nearly all of the elements mentioned above into a single interface focused on cost visibility.

It can also be enabled at the organization level instead of managing each account separately. This makes it possible to view Savings Plans (SP) or Reserved Instance (RI) utilization across accounts and to examine interruptions or fluctuations in Spot Instance usage. The generated reports can be exported and imported into PowerBI or QuickSight for further analysis.

I have also submitted a feature request to integrate Compute Optimizer into this portal. Once added, it will allow viewing how much each EC2 resource is utilized and whether further optimization is needed, directly from the same page. BUT I don’t think this suggestion will be very likely to be accepted, since this solution is completely free, while Compute Optimizer becomes a paid service at a certain point. Still, there’s always hope, we’ll see

I’ve outlined the core functions of EC2 Capacity Manager below, categorized under their respective sections. You can review them for a quick summary:

Collects and analyzes EC2 capacity usage across all AWS accounts and Regions.
Refreshes data hourly to ensure up-to-date insights.
Displays combined metrics for On-Demand Instances, Spot Instances, and Capacity Reservations.
Provides a single pane of glass for capacity planning, reservation efficiency, and usage optimization.

Displays comparative usage data for Spot, On-Demand, and Reservation instances across all Regions.
Allows data grouping with the Dimension Filter by Account ID, Region, Instance Family, Availability Zone (AZ), and Instance Type.
The Aggregations table provides analysis of total, reserved, and Spot usage hours by instance family.
The “View Breakdown” option enables drilling down into specific instance types to identify detailed optimization opportunities.

Tracks the utilization rates of On-Demand Capacity Reservations (ODCRs).
Automatically detects underutilized reservations and provides prioritized optimization recommendations.
Displays the ratio of reserved vs. unused capacity to measure reservation efficiency.
When the reservation exists in the same account, it allows direct modification of reservation parameters from the console.
The Statistics section summarizes key metrics such as total reservation count, average utilization rate, and Regions with the highest or lowest utilization.

Analyzes interruption durations of Spot Instances.
Provides Spot placement score recommendations to help improve workload flexibility.

Allows exporting data to Amazon S3, extending beyond the 90-day retention limit of the console.
Enables long-term trend analysis and integration with external business intelligence (BI) tools.
Supports creating automated export schedules for continuous data delivery.

When evaluating EC2 optimization opportunities, consider:

CPU utilization trends over time
Memory pressure and application behavior
Network throughput requirements
Reserved Instance and Savings Plan coverage

For example:

Low CPU + low network → downsize instance
High CPU spikes → consider burstable or autoscaling

A production workload running on m5.4xlarge showed:

Average CPU utilization: 15%
No significant memory pressure

After analysis:

Instance downsized to m5.2xlarge
Achieved ~40% cost reduction
No performance degradation observed

At this point, the following steps should be followed for best practices:

Always analyze at least 14–30 days of metrics
Combine rightsizing with Savings Plans
Avoid over-aggressive downsizing in production
Validate changes in staging environments

I hope these article help save you time.

References:

AWS Docs

The Smart Way to Secure EC2 Access: AWS Just-in-Time Node Access

root — Mon, 06 Oct 2025 08:13:14 +0000

Hello everyone,

In this article, I will talk about how you can manage access to your EC2 instances in your AWS cloud environment with the principle of least privilege and ensure that users can only access the resources they need, and only for as long as necessary. In other words, I will explain a partial PAM setup.

Before having you read the entire article, let me explain the capabilities this feature provides:

Eliminates the need for password/ssh-key access on Unix/Linux servers.
Enables users to create requests for access to servers and provides an approval mechanism.
Prevents long-term access (such as password or key) to servers. Without needing a VPN, you can grant access to your servers in private subnets for as long as you wish.
Allows you to create manual approval, automatic approval, or automatic rejection policies for desired servers.
Lets you receive server access requests via email, Slack, or Teams, and when user requests are approved, ensures that emails are sent to users.
Enables access to Unix/Linux servers with a different read-only user via Run As. (When directly starting a session, connections were made with the ssm-user with sudo privileges.)
Lets you record RDP connections to your Windows servers into S3.
Stores all server requests and access logs in both CloudWatch Logs and S3, encrypted with KMS.

If the above actions meet your needs, you can continue reading the rest of the document.

The architecture of the diagram:

On April 29, 2025, AWS announced the JITNA (Just-in-time node access) feature to meet this need. With this feature, you can use it not only on your servers in AWS but also on your on-premises servers or servers in other cloud providers. The only requirement is that the AWS SSM Agent must be installed on the relevant servers, and the corresponding IAM role or IAM user (if outside AWS) must be defined.

JITNA (Just-in-time node access) is a feature under the Systems Manager service. After the first 30-day trial period, pricing is calculated per managed node. This feature is also regional, meaning you need to enable it separately for the servers in each region.

When you go to Just-in-time node access in Systems Manager and you haven’t used it before, you will be greeted with the following screen:

For those wondering, DHMC (Default Host Management Configuration) is a mechanism that ensures the necessary settings for Systems Manager are automatically applied to new EC2 instances or managed servers. Its main purpose is to make EC2 instances or on-premises servers ready to connect to Systems Manager without manual intervention. With this configuration:

The SSM Agent can be automatically installed (if not already present),
An IAM instance profile is assigned,
The permissions required for communication with SSM (e.g., AmazonSSMManagedInstanceCore) are added,
Servers become visible by default in the Systems Manager console.

When you enable the JITNA feature in AWS Systems Manager, the system performs a number of automated tasks on your behalf:

First, a special IAM role containing the necessary permissions is created in each target account. This role allows Systems Manager to prepare the infrastructure required for temporary access.
Then, State Manager is activated in each account, defining a service-linked role that enables Systems Manager to configure on your behalf.
Additionally, in order to generate temporary security credentials, another association is established through State Manager, and a new IAM role is created for this purpose.
Finally, an association is set up in the designated delegated administrator account and the home region within your organization. A role is then activated to allow access denial policies to be shared with member accounts.

At the end of this process, JITNA automatically prepares all the essential components needed for setup, leaving you only with the task of managing access requests.

After activating the feature, you can also change the periods from the Settings section located at the bottom left.

After the initial setup phase is completed, you can see the installation I performed for my environment on the screen below.

If I were to explain the features mentioned above:

As in the scenario in my setup, if you have SSO integration at the organization or account level, you can assign “Approver” users or groups. The group or list is pulled directly from SSO. Another option is to use IAM. In this case, you can define a role as the Approver. You can then grant your users the authority to assume this role and connect through it. For Slack or Teams notifications, it is recommended to proceed with this method.

As you can see below, two types of notifications can be delivered. Along with filling out this information correctly, it is important that you keep your Slack session open in the browser. When you click Configure channel, it will redirect you to Slack and request approval for Q Developer integration.

If you need more information about Slack or Teams integration, you can check the references section.

Session Options

As you can see in the two screenshots below, there are settings such as adjusting the timeout duration of an existing session, encrypting its connection (with KMS), or changing the Run As user on the Unix/Linux side. For this, the relevant user must exist on the server side and also be associated on the IAM side.

At the same time, you can configure the logging of all session information into S3 and CloudWatch Logs.

Another feature I enjoy using is shown below. If you have Windows servers in your environments, recordings of the session can be captured in .mp4 format, and these can be uploaded in encrypted form to the designated S3 bucket.

Linux and Windows shell profiles are particularly beneficial for regulation and compliance. For this reason, you can use a standard profile at the start of each session.

In this way, we have completed our default settings. After this point, we have two remaining steps:

Install the SSM Agent on all servers that do not have it installed, and if your environment is on AWS, assign the relevant IAM role. This role must include at least the AmazonSSMManagedInstanceCore policy.

Define the approval policies.

If you don’t know how to perform the first step, you can click on this link for Ubuntu and other operating systems.

For the second step, the approval policies, you should come to this section on the same page:

Supported policy types:

JITNA supports three distinct policy types to manage access requests:

Auto-Deny (organization-wide): This policy type is designed to automatically deny access requests. It can only contain forbid instructions, e.g. any resource tagged as “Prod”. At the same time, if you do not define any policy, the automatic rejection policy will be activated.
Auto-Approve (one per account and region, but can be have multiple permit instructions): This policy automatically approves requests from any principal belonging to the “Security” Group.

Note: Once an auto-approved request is granted, it cannot be canceled or revoked by an administrator or the requestor. Access is granted for fixed start and end times, with a one-hour timeout automatically applied from the time of approval.

Manual-Approval (account and region specific, with a limit of 50): This policy is used when neither auto-approval nor auto-deny rules apply. Requests are sent to designated decision-makers for manual review and approval. This policy can be applied to all or specific nodes.

Overlapping Policies:

When policies overlap, their interaction determines the outcome:

If two manual approval policies overlap, the request will be DENIED.
If an auto-approval policy is present and overlaps with a manual-approval workflow, the request will be APPROVED.

After discussing all the steps, let me point out that you need to remove the “ssm:StartSession” permission from the users in your environment. Otherwise, users can start a session both from the AWS CLI and the console without submitting a request.

The permission policy for the users who will make requests should look like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:StartAccessRequest",
                "ssm:GetAccessToken"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/SSM-JustInTimeAccessTokenRole",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "justintimeaccess.ssm.amazonaws.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:CreateOpsItem",
                "ssm:GetOpsItem",
                "ssm:DescribeOpsItems"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Deny",
            "Action": "ssm:StartSession",
            "Resource": "*"
        }
    ]
}

In this way, no user will be able to start a session without your knowledge, and when they want to initiate one, they will be required to request approval.

The person requesting approval receives an email as shown below:

The approver who will approve the request also receives an email as shown below:

If you would like to follow the access request process by sending a request through AWS CLI, the commands are as follows:

aws ssm start-access-request --targets Key=InstanceIds,Values=i-111111111111 --reason "Troubleshooting networking performance issue" --region eu-west-1

After approval is received, you can proceed by using the standard aws ssm start-session command.

aws ssm start-session --target i-111111111111 --region eu-west-1

Pricing:

Just-in-time node access is priced per node per hour for nodes managed by the SSM Agent. You are billed per node per hour at the hourly rate based on the below table:

Volume	Price per node per hour
First 72,000 hours	$0.0137
Next 647,999 hours (72,001 hours to 720,000 hours)	$0.0103
Next 6,479,999 hours (720,001 hours to 7,200,000 hours)	$0.0034
Hours above 7,200,001 hours	$0.0014

Pricing example:

You enable just-in-time node access for an account. That account has 200 EC2 instances which were managed by AWS Systems Manager over the course of a billing period. Each of those 200 AWS Systems Manager managed nodes were managed for 100 hours, resulting in a total usage of 20,000 Systems Manager managed node hours.

All 20,000 node hours are billed at $0.0137/hour, for a total expected billing period cost of $274.00.

I hope these article help save you time.

References:

What Happens If You Don’t Configure AWS S3 Correctly? Steps You Can Take for Both Cost and Security

root — Tue, 30 Sep 2025 05:19:05 +0000

Hello everyone,

In this article, I’d like to walk you through a few key steps for keeping your data secure and continuously monitored in Amazon S3 AWS’s very first object storage service, launched back in 2006. Out of more than 240 services today, why start with S3?

When you look back, some of the biggest data breaches have almost always been tied to misconfigurations: IAM roles and users with excessive permissions, security groups left too open to the internet, and S3 buckets that were exposed due to incorrect settings. None of these stem from the provider itself they are all user-side mistakes. That’s exactly why in this post, I’ll highlight a few things many already know, but framed both as a reminder and with a slightly fresh perspective.

Let’s first consider what happens if you don’t secure your S3 buckets properly. Most of the examples I’ll share here come from legitimate sources created for security research purposes. They’re meant only to broaden your perspective please don’t attempt to download or misuse the data.

If a bucket that’s supposed to be private is accidentally left open to the internet, and its contents get referenced in a snippet of code or through a static object, services like PublicWWW or Common Crawl will eventually pick it up. Some of the most well-known examples in this space are AWSEye and GrayhatWarfare, which catalog exposed S3 buckets and their contents.

AWSEye:

-> https://awseye.com/resources?type=AWS::S3::Bucket

GrayhatWarfare:

-> https://buckets.grayhatwarfare.com/buckets?type=aws

If you’re curious about how these tools crawl data in the simplest way, you can take a look at the two GitHub repositories linked below:

-> https://github.com/clarketm/s3recon (Legacy method)

-> https://github.com/sa7mon/S3Scanner

Of course, crawling and public scanning cover a much wider range of techniques. My goal here isn’t to show every method or teach how it’s done it’s to show some real examples of what can be exposed when the right precautions aren’t taken. For example:

Now let’s get to the core of the discussion:

What steps can you take to improve S3 Bucket Security and S3 Cost Optimization?
AWS-Native Options:

1. S3 Object ACL (Legacy)
This is an authorization model that defines who has which permissions on a bucket or an object in S3. Every time you upload an object to S3, an ACL is automatically attached. By default, that ACL grants full control to the object owner (i.e., the account that uploaded it). With ACLs, you can assign READ, WRITE, or FULL_CONTROL on a file.

The types of principals you can grant permissions to include:

In this section, you can directly restrict access at the object or bucket level. However, since this approach is now considered legacy, AWS has been recommending the use of S3 Bucket Policies and IAM Policies for the past couple of years. If you do need to make a bucket publicly accessible (though you can also do this by simply marking the bucket or object as public), ACLs are still available. AWS continues to keep them active for backward compatibility.

2. S3 Public Access Settings

One of the most critical aspects of bucket security is whether an object or the entire bucket is left open to the public. If public access is enabled, it often results in directory listings being exposed and all objects becoming accessible.

When creating a bucket, if you’re not planning to use it for Static Website Hosting or don’t intend to make it publicly available, it’s best practice to leave the public access settings in their default state effectively blocking exposure from the start.

3. Bucket Versioning

If your application or a user accidentally deletes a file in S3, enabling versioning lets you roll back to a previous version of the object. Yes, versioning increases storage use (and cost), but you can control that with S3 Lifecycle rules for example, delete old versions after a set time or transition them to cheaper storage classes to reduce expense.

4. Encryption:

Since 2023, AWS requires that objects uploaded to S3 be encrypted by default and for good reason. That said, if you make a bucket or object public, encryption loses much of its practical value because in standard server-side encryption modes (SSE-S3 or SSE-KMS) AWS transparently decrypts objects when they’re downloaded. Still, encryption matters for real threat scenarios: it protects against theft of physical storage media and against unauthorized access within the AWS account if the user or role does not have decrypt permissions. Encryption also helps you meet regulatory requirements (HIPAA, PCI-DSS, etc.) and gives you better guarantees about data integrity: AWS does automatic checksum/validation during encryption flows, so tampering is detectable.

5. Object Lock:

Object Lock is useful when you need immutable storage for example, to satisfy legal retention requirements or to harden archives against ransomware. Object Lock works only on buckets where versioning is enabled. It enforces a write-once, read-many model: objects cannot be overwritten or deleted for the retention period you set (or indefinitely, if needed). Typical use cases include keeping financial records for a fixed number of years, preserving clinical data unchanged, or simply making sure backups and archives can’t be tampered with.

Object Lock has two modes:

Compliance mode: Strict: even the root user or accounts with special permissions cannot delete or overwrite objects until the retention period expires. Use this when you need legally enforceable immutability.

Governance mode: Prevents most users from deleting or modifying locked objects, but users with special privileges (e.g., those with s3:BypassGovernanceRetention) can still make changes. This mode is flexible for internal workflows where emergency overrides may be necessary.

6. MFA Delete Protection:

The MFA Delete feature adds an extra layer of security. Its purpose is to prevent critical deletions from happening without a second factor of verification. When enabled, it requires both valid IAM permissions and an MFA code in two key situations: disabling bucket versioning, and permanently deleting previous versions of objects. This protects against accidental deletions as well as malicious actions such as ransomware attacks. It’s worth noting that MFA Delete cannot be turned on from the AWS Console you’ll need to use the AWS CLI, SDK, or the S3 REST API to enable it.

7. S3 Server Access Logging:

Server Access Logging records every request made to objects within a bucket. By default, detailed logging is not enabled. But it becomes invaluable when you need audit trails for security reviews, want to understand abnormal or heavy traffic patterns, or need to meet compliance requirements.

When enabled, logs include details such as:

Timestamp of the request
The AWS account or IP address making the request
Access method used (REST, SOAP, CLI, SDK, etc.)
Which object was targeted
Type of operation (GET, PUT, DELETE, LIST, etc.)
Response code (200, 403, 404, and so on)
Number of bytes transferred

8. AWS CloudTrail Data Events:

Unlike Server Access Logs, CloudTrail Data Events focus on API-level activity. They provide fine-grained visibility into who deleted, uploaded, or viewed an object. Data Events are disabled by default but can be turned on for stronger auditing, compliance, and security analysis. Without enabling them, AWS does not automatically log these object-level API calls in S3.

9. S3 Bucket Policies and IAM Access Analyzer for S3:

The IAM Access Analyzer for S3 helps you review and understand bucket access policies. It automatically checks whether a bucket is publicly accessible or if another AWS account has access. This eliminates the need to manually read through JSON policies line by line and provides a clear report of potential risks. Imagine dozens of buckets managed by different teams, and one of them temporarily sets a bucket to public but forgets to revert it. The analyzer will catch that misconfiguration and alert you, minimizing the chance of an accidental data exposure.

Meanwhile, an S3 Bucket Policy is a JSON-based document that directly defines access rules at the bucket level. With it, you can specify who can access what resource and how.

It’s often confused with IAM policies, so here’s the difference:

IAM Policy permissions tied to a user or role (e.g., “this user can access this resource”).
Bucket Policy permissions tied directly to a bucket (e.g., “this bucket can be accessed by these users”).

Why does it matter?

Security: A poorly written policy can unintentionally expose your data to the internet.
Centralized Management: Bucket-level rules apply across all objects in that bucket.
Compliance: Ensures your security posture aligns with corporate policies and industry regulations.

10. Cross-Region Replication:

Amazon S3 is a region-based service. If a single Availability Zone (AZ) goes down, your data remains accessible, and AWS guarantees 11 nines of durability (99.999999999%). However, if an entire region were ever to become unavailable (which, to date, has never happened), access to your data could be impacted. That’s why enabling cross-region replication can be an important part of a disaster recovery strategy, ensuring copies of your data exist in another geographic location.

11. Amazon Macie:

Macie is a security service built on AI and machine learning to help protect the data you store in S3. Its primary purpose is to automatically detect and classify sensitive information, such as personally identifiable information (PII), financial records, or confidential business data. With Macie, identifying which files in your S3 buckets contain sensitive data becomes far easier, especially at scale. It’s a powerful tool, but also a relatively costly service so it’s worth reviewing the pricing model carefully before enabling it.

12. S3 Storage Lens:

S3 Storage Lens provides an organization-wide view of how your S3 buckets are being used across one or multiple AWS accounts. It gives insights into storage usage, activity trends, and potential areas for cost optimization. At the very least, I recommend turning on the free-tier dashboard, as it offers valuable visibility into your storage footprint without additional cost.

If your AWS environment is managed under an AWS Organization, you don’t need to enable S3 Storage Lens account by account. Instead, you can configure it centrally from the management account and generate dashboards that cover all linked accounts and buckets at once. This makes visibility and cost optimization much easier across a multi-account setup.

13. CloudWatch Alarms:

While CloudWatch alarms are often used to trigger other actions, in our scenario they’re especially useful for monitoring requests to your S3 buckets. Why does this matter? Because in AWS, even failed requests can generate costs.

For example, imagine you have a NetBackup server in Account A writing data to a bucket in Account B. If the bucket name or path is incorrect, the request will fail but AWS will still charge you for it. Setting up alarms helps you spot these mistakes early and avoid unnecessary charges.

You can also use alarms to notify you whenever an object is uploaded or deleted from a bucket, giving you real-time awareness of critical changes in your environment.

14. VPC Endpoint for S3:

When you upload data from a resource within your AWS account to S3, using an S3 VPC Endpoint ensures the traffic flows directly through the AWS network rather than over the public internet. As long as the transfer happens within the same region, this setup saves you from paying extra data transfer costs for sending or retrieving data from S3.

15. Lifecycle Manager:

One of the biggest cost drivers in Amazon S3 is storage classes. Let’s quickly refresh our memory:

S3 Standard
- Price: ~0.023 USD / GB / month
- The go-to option for data you access frequently in daily operations. High durability and low latency make it the default choice.
S3 Standard-IA (Infrequent Access)
- Price: ~0.0125 USD / GB / month (+ per-access fee)
- Best for data you don’t use often but still need quick access to — like backups or archived reports.
S3 Intelligent-Tiering
- Price: ~0.023 USD / GB / month (plus a small monitoring fee of ~0.0025 USD per 1,000 objects)
- AWS automatically moves objects to cheaper tiers based on access patterns, saving you the trouble of manual planning.
S3 Glacier
- Price: ~0.004 USD / GB / month
- A low-cost option for long-term archiving. Retrieval can take anywhere from a few minutes to several hours.
S3 Glacier Deep Archive
- Price: ~0.00099 USD / GB / month
- The cheapest storage class available. Designed for data you rarely, if ever, need to access. Retrieval can take hours.

By carefully choosing the right storage class for each object and applying lifecycle policies, you can cut your S3 costs by half or more.

So, we’ve covered what you can do to improve security and cost optimization for your AWS S3 resources and also highlighted how, if overlooked, misconfigurations could make your data accessible to the wrong hands.

I hope these article help save you time.

References:

AWS Doc

From CloudTrail Logs to Automated Alerts with AWS EventBridge

root — Tue, 16 Sep 2025 20:05:52 +0000

Hello everyone,

In this article, I will guide you through how to receive notifications for operations or events occurring across various AWS services, such as the creation, deletion, or modification of resources like RDS, EC2, S3, ECR, ECS, etc. I’ll also cover how to reduce associated costs and improve manageability throughout the process. Or, for example, you can receive notifications when the security groups you created are modified and all security groups are changed to 0.0.0.0/0.

AWS services referenced in this guide:

Amazon EventBridge
AWS Organizations (Optional)
AWS Cloudtrail
Amazon SNS
AWS Lambda
AWS CloudFormation (Optional)

At its simplest, we’ll capture the logs sent to CloudTrail by defining a specific Rule within an EventBridge Bus.

If you want to act on an event that occurs in the same account, you can route those logs directly to the AWS service that should handle them.

When the goal is to gather events in a central account, such as an organization’s management account or a dedicated security hub, and then trigger actions, you can forward them across accounts to the appropriate EventBridge Bus.

In this guide, we’ll look at two common scenarios:

Collecting and processing actions that happen within a single AWS account.
Collecting and processing actions from all accounts linked to an AWS Organization.

Before starting on to the explanation, I will also share information about the services we will use. You will understand why I am explaining these services and their details. BUT if you already know these, click here for the steps!

1. AWS EventBridge:

Amazon EventBridge is a fully managed, serverless event bus that lets different AWS services and applications communicate through events. For example, when someone uploads a file to S3, updates a database, or signs in through the console, EventBridge can capture that activity and route it to the appropriate destination such as a Lambda function, an SQS queue, or a Step Functions workflow.

EventBridge is built around four key elements:

Event – Any change or operation in a system (e.g., “A file was added to S3”).
Event Bus – A logical channel that receives events and applies rules.
Rule – Defines which events to look for and which targets should respond.
Target – The AWS service or application that acts on the event (for example, Lambda, SQS, or SNS).

a. EventBridge Rules

Rules filter incoming events and forward only the ones that match specific criteria to their targets. This makes it possible to automate workflows and respond only to the events that matter.

b. EventBridge Buses

An Event Bus is the channel that carries events. There are three types:

Default bus – Collects events from AWS services in your account.
Custom bus – Created for events generated by your own applications.
Partner bus – Handles events coming from supported SaaS providers.

In the following sections, we’ll decide whether to use a custom or the default bus based on the requirements of our setup.

2. AWS Organizations:

AWS Organizations makes it easy to manage multiple AWS accounts from a single place.

With it, you can:

Create new AWS accounts and organize them into Organizational Units (OUs).
Simplify billing by combining charges across all accounts with Consolidated Billing.
Apply organization-wide rules through Service Control Policies (SCPs), which can restrict or allow actions beyond the permissions defined in individual accounts.

Service Control Policies (SCPs) act as guardrails for your organization. They specify which services and actions are available to certain accounts or OUs. SCPs are evaluated alongside, but separate from, IAM policies: even if an IAM policy grants access, the request won’t succeed unless the SCP also allows it. In effect, SCPs serve as a top-level filter, defining the absolute boundaries of what’s possible inside your AWS Organization.

3. AWS Cloudtrail:

In AWS, most actions you perform are automatically recorded by CloudTrail, which keeps API activity logs for 90 days by default. Every action you take -whether through the SDK, CDK, CLI, or the Management Console- is captured there. In short, CloudTrail is the service that collects and displays timestamped API logs across AWS, supporting auditing and governance requirements. It’s also compliant with PCI/DSS standards.

CloudTrail groups events into four main types:

Network activity events
Record network operations performed on resources via VPC endpoints, providing insight into resource activity at the network layer.
Management events
Track administrative operations on AWS resources, such as creating users or roles, updating security groups, or attaching IAM policies.
Data events
Log interactions within or on top of a resource for example, reading or writing an S3 object or invoking a Lambda function.
Insights events
Detect and report unusual account activity, spikes in errors, or other anomalous user behavior.

If you want to receive notifications about actions across all accounts in an AWS Organization, you’ll need to create a trail from the management account. This trail will automatically apply to all member accounts, though additional costs may apply. To keep expenses and storage low, you can configure the trail to monitor only “Write” actions, which reduces the size of the log files stored in your S3 bucket. Keep in mind that CloudTrail trails are region-specific, so you must create them separately in each AWS region you use.

4. Amazon SNS:

Amazon SNS (Simple Notification Service) is AWS’s fully managed publish/subscribe messaging and notification platform.
Endpoints that subscribe to a topic -such as email, SMS, Lambda, SQS, or HTTP/S webhooks- receive messages right away.
It’s designed for high scalability, low latency, and reliable delivery.
SNS is especially useful for application-to-application communication, event notifications, and push-based mobile alerts.

5. AWS Lambda:

AWS Lambda is Amazon’s serverless compute service.

You upload your code -written in Node.js, Python, Java, .NET, Go, Ruby, or other supported languages- and Lambda runs it only when it’s triggered (for example, by an S3 file upload, an API Gateway request, or an EventBridge event).
It scales automatically, creating as many instances as needed to handle incoming requests.
Pricing is based on how long your code runs and the resources it consumes, such as memory and CPU.

6. AWS Cloudformation:

AWS CloudFormation is a service that lets you define and manage AWS resources using the Infrastructure as Code approach.

You describe your resources -such as EC2, S3, IAM, RDS, Lambda or etc.- in a template file written in YAML or JSON.
Based on that template, CloudFormation automatically provisions, updates, and removes the resources as needed.
It’s an effective way to set up complex environments quickly, manage infrastructure changes safely, and put Infrastructure as Code into practice.

Setup Part:

a. Collecting and processing actions that happen within a single AWS account:

We’ll start with CloudTrail. As noted earlier, CloudTrail keeps the event history for 90 days by default. Because we’re going to act within the same account, there’s no need to create a dedicated trail. In this scenario, CloudTrail won’t introduce any additional cost.

Next, we move on to EventBridge Rules. Here we’ll create a rule to catch the actions we care about. In this case, we want notifications both when an EC2 instance changes state (e.g., a new instance is launched or an instance is started) and when there’s an issue in EKS with add-ons like CoreDNS, VPC CNI, or kube-proxy.

After confirming you’ve selected the active AWS Region you’re using, open the EventBridge service and go to the Rules section to create the rule.

Create a new rule with a clear name, just as shown above, and select the appropriate Event Pattern.

On the next page, decide what action should be taken when this event is captured. In our example, we’ll send an email notification through SNS.
If you prefer, you can forward the event to a Lambda function, parse its contents, and trigger other actions such as sending an email via SES, restarting or stopping an instance when you’re monitoring shutdown events, or handling spot instance interruptions by running a custom workflow.

Be sure to review the Event type section for each service so you understand exactly which events you’re targeting.

If you want to receive plain-text notifications through SNS, open the SNS service in a new browser tab (keep your current tab open) and create a topic and subscription similar to the example above.

After setting up the subscription, return to the EventBridge console, go to the Select Target(s) step, and click the refresh icon next to the Topic field. If the topic was created in the correct region, it should appear there. Click Next to finish creating the rule. Also note that once you create a subscription in SNS, an email confirmation will be sent to the address you specified. You’ll need to confirm that email before messages can be delivered.

When everything is set up, you’ll receive an email that looks like the one shown below:

If you’re not satisfied with the plain email format, you can choose Lambda in the Select Target(s) step within EventBridge. From there, have the Lambda function read and parse the message, then generate an HTML email template. You can send it using SES or connect to another mail server via the SMTP protocol. For example:

b. Collecting and processing actions from all accounts linked to an AWS Organization:

Since you manage your AWS environment with AWS Organizations and have multiple accounts, I’ll walk through the steps assuming you’re already familiar with many services and operations, without dwelling too much on the basics.

The benefit of AWS Organizations is that all member accounts can be administered from a single management account as noted earlier. In this setup, you should create CloudTrail trails from the Organizations management account. This aggregates activity from all member accounts into the management account, letting you view actions across the entire organization in one place. Note that this can incur additional charges. To keep costs down, if you only need to track changes, it’s sufficient to log only Write events. That approach also keeps the S3 log files smaller and reduces the number of captured API requests. You can also disable unused AWS Regions via SCPs in AWS Organizations, so you don’t need to create trails in those regions. And yes, CloudTrail trails are regional, so you must create them separately in each AWS Region you use.

To create trails, in your management account, search for CloudTrail, click Trails on the left, and choose Create to start the process. On the creation page, select “Enable for all accounts in my organization.”

After completing this step, go back to the EventBridge service in your management account and click on Event Bus. This is where you’ll gather events coming from your other accounts that are routed through EventBridge.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "AllowSpecificAccountsToPutEvents",
    "Effect": "Allow",
    "Principal": {
      "AWS": ["arn:aws:iam::LINKEDACCOUNTID1:root", "arn:aws:iam::LINKEDACCOUNTID2:root", "arn:aws:iam::LINKEDACCOUNTID3:root", "arn:aws:iam::LINKEDACCOUNTID4:root"]
    },
    "Action": "events:PutEvents",
    "Resource": "arn:aws:events:YOURREGIOINID:ORGACCOUNTID:event-bus/YOUREVENTBUSNAME"
  }]
}

Now add a rule to the Event Bus we just created.
For example, this time let’s receive a notification whenever an Amazon Redshift cluster is created.

In the next step, you can either select SNS to receive plain-text notifications, or send the event to a Lambda function, parse the payload, and email yourself an HTML template via SES as outlined earlier.

There’s one important detail:
For the EventBridge execution role you create, you must allow EventBridge to assume the role (trust policy) and grant it permission to either publish to SNS or invoke your Lambda function. In other words, the role’s Trusted Relationship should allow EventBridge, for example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "TrustEventBridgeService",
            "Effect": "Allow",
            "Principal": {
                "Service": "events.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "YOURORGANIZATIONACCOUNTID"
                },
                "StringLike": {
                    "aws:SourceArn": [
                        "arn:aws:events:YOURREGION:YOURORGANIZATIONACCOUNTID:rule/YOUREVENTBRIDGEBUSNAME/YOUREVENTNAME"
                    ]
                }
            }
        }
    ]
}

With this step, we wrap up the configuration in the management account.
Now move on to the member accounts and create the same EventBridge rule there but with one key difference: instead of handling the events locally, configure the rule to forward any captured events to the EventBridge bus in the management account.

The role you’ll use here should have the following Trust Relationship, and its policy should be set as shown below:

Trust Relationship:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "events.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "YOURLINKEDACCOUNTID"
                },
                "StringLike": {
                    "aws:SourceArn": [
                        "arn:aws:events:eu-central-1:YOURLINKEDACCOUNTID:rule/YOUREVENTBRIDGENAME"
                    ]
                }
            }
        }
    ]
}

IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "events:PutEvents"
            ],
            "Resource": [
                "arn:aws:events:eu-central-1:YOURORGANIZATIONACCOUNTID:event-bus/YOUREVENTBRIDGEBUSNAME"
            ]
        }
    ]
}

Your AWS Organization may include more than 100 accounts. To help you understand the logic behind this setup, I’ve explained how to configure it through the console. However, you can also use CloudFormation StackSets from the management account to deploy the EventBridge rule and IAM role directly to your member accounts.

If you want to prevent even users with AdministratorAccess in those accounts from modifying this configuration, you can create a Service Control Policy (SCP) in AWS Organizations. To do this, go to your management account, open Organizations, and under Policies, choose Service Control Policies.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Deny",
			"Action": [
				"events:DeleteRule",
				"events:DisableRule",
				"events:PutTargets",
				"events:PutRule",
				"events:DescribeRule",
				"events:RemoveTargets"
			],
			"Resource": [
				"arn:aws:events:*:*:rule/EVENTRULENAME"
			]
		},
		{
			"Effect": "Deny",
			"Action": [
				"cloudtrail:DeleteTrail",
				"cloudtrail:StopLogging",
				"cloudtrail:UpdateEventDataStore"
			],
			"Resource": [
				"arn:aws:cloudtrail:*:*:trail/CLOUDTRAILNAME"
			]
		},
		{
			"Effect": "Deny",
			"Action": [
				"iam:DeleteRole",
				"iam:UpdateRole",
				"iam:UpdateRoleDescription",
				"iam:AttachRolePolicy",
				"iam:DeletePolicy",
				"iam:DeletePolicyVersion",
				"iam:DeleteRolePolicy",
				"iam:PutRolePolicy",
				"iam:GetPolicy",
				"iam:GetRolePolicy",
				"iam:CreatePolicyVersion"
			],
			"Resource": [
				"arn:aws:iam::*:role/service-role/IAMROLENAME",
				"arn:aws:iam::*:policy/service-role/IAMPOLICYNAME"
			]
		}
	]
}

I hope these article help save you time.

References:

Prompt-Driven AWS Architecture Design Using Amazon Q Developer CLI + MCP

root — Wed, 03 Sep 2025 15:41:57 +0000

Hello everyone,

In this article, I’ll highlight Amazon Q Developer CLI + MCP, because as someone who primarily works in AWS cloud, I constantly need to create architectural diagrams. In today’s era, where generative AI is so popular, manually dealing with diagrams has become unnecessarily time-consuming. That’s why I started looking for a cloud-native solution, and among the options I tried, let me introduce you to the one that impressed me the most:

-> Amazon Q Developer CLI + MCP (Model Context Protocol)

Amazon Q Developer CLI is a command line interface that brings the generative AI capabilities of Amazon Q directly to your terminal. Developers can interact with Amazon Q through natural language prompts, making it an invaluable tool for various development tasks.

Of course, Amazon Q itself is not a brand-new product. It was introduced in 2023, while Amazon Q Developer CLI was launched in 2024. However, as announced in April 2025, MCP enables Amazon Q Developer to connect with specialized servers that extend its capabilities beyond what’s possible with the base model alone. MCP servers act as plugins for Amazon Q, providing domain-specific knowledge and functionality. The AWS Diagram MCP server specifically enables Amazon Q to generate architecture diagrams using the Python diagrams package, with access to the complete AWS icon set and architectural best practices. Without dragging this out any further, let’s dive into the technical content.

Prerequisites

To implement this solution, you must have an AWS account with appropriate permissions.
You must have Python 3.10.x or higher.

Set up your environment

Before you can start creating diagrams, you need to set up your environment with Amazon Q CLI, the AWS Diagram MCP server, and AWS Documentation MCP server. This section provides detailed instructions for installation and configuration.

Install Amazon Q Developer CLI

Download and install Amazon Q Developer CLI. For instructions:
-> https://aws.amazon.com/tr/developer/learning/q-developer-cli/
Verify the installation by running the following command: q --version
You should see output similar to the following: Amazon Q Developer CLI version 1.x.x
Configure Amazon Q CLI with your AWS credentials: q login
Choose the login method suitable for you:
- Use for free with AWS Builder ID
- Use with Pro license

Set up MCP servers

Complete the following steps to set up your MCP servers:

Install uv using the following command: pip install uv
Install GraphViz for your operating system.
Add the servers to your ~/.aws/amazonq/mcp.json file:

{
  "mcpServers": {
    "awslabs.aws-diagram-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.aws-diagram-mcp-server"],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "autoApprove": [],
      "disabled": false
    },
    "awslabs.aws-documentation-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.aws-documentation-mcp-server@latest"],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "autoApprove": [],
      "disabled": false
    }
  }
}

Now, Amazon Q CLI automatically discovers MCP servers in the ~/.aws/amazonq/mcp.json file.

Understanding MCP server tools

The AWS Diagram MCP server provides several powerful tools:

list_icons – Lists available icons from the diagrams package, organized by provider and service category
get_diagram_examples – Provides example code for different types of diagrams (AWS, sequence, flow, class, and others)
generate_diagram – Creates a diagram from Python code using the diagrams package

The AWS Documentation MCP server provides the following useful tools:

search_documentation – Searches AWS documentation using the official AWS Documentation Search API
read_documentation – Fetches and converts AWS documentation pages to markdown format
recommend – Gets content recommendations for AWS documentation pages

These tools work together to help you create accurate architecture diagrams that follow AWS best practices.

“This is the part where the gates of heaven officially open.
Open your terminal and just type q chat.”

For example, write the following prompt for the demo. Before doing this, make sure that the MCP server load above has been completed successfully.

Create a diagram for an e-commerce platform with microservices architecture. Include components for product catalog, shopping cart, checkout, payment processing, order management, and user authentication. Ensure the architecture follows AWS best practices for scalability and security. Check for AWS documentation to ensure it adheres to AWS best practices before you create the diagram.

Along the way, it’ll ask for your confirmation a few times just hit ‘t’ to keep going. The cool part is, you actually get to watch what it’s doing and which pieces of content it’s checking. Based on the prompt you give, it spins up an architecture that leans on AWS best practices and governance. Honestly, this is one of my favorite parts.

And ta-daa, here’s the sample diagrams:

With the comprehensive prompt details you provide and -the more specific distinctions you make- the diagram can become much more detailed and professional. Just like the output above, you can also find additional content in AWS’s blog posts.

And of course, with MCP you can do far more than just AWS architecture diagrams. In this piece, I referenced AWS diagrams both for the fun of it and to showcase this feature.

I also came across the following content about exporting the diagram you’ve created as a .drawio file, but I haven’t yet managed to get a proper output from it as a prompt. Once I do, I’ll update this section as well.

-> Modernize Legacy AWS Architecture Diagrams with Amazon Q CLI, MCP Server, and draw.io

I hope these article help save you time.

References:

AWS Blogs

Fixing Repository Issues on RHEL 9 After Cloud Migration (AWS)

root — Tue, 29 Apr 2025 13:04:18 +0000

Hello everyone,

In this article, I will talk about an issue and its solution that you may encounter regarding package updates when you move your RHEL 9 server to AWS using lift&shift or similar migration methods/solutions. After starting the server, if you run “yum update” or “dnf update” and receive an error message similar to the one below, follow the steps outlined in this article to resolve the problem.

[root@ip-10-199-xxx-xxx ~]# sudo yum update
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered with an entitlement server. You can use subscription-manager to register.

Red Hat Enterprise Linux 9 for x86_64 - AppStream from RHUI (RPMs)                                                                                                                        2.9 kB/s | 153  B     00:00
Errors during downloading metadata for repository 'rhel-9-appstream-rhui-rpms':
  - Status code: 403 for https://rhui.eu-central-1.aws.ce.redhat.com/pulp/content/content/dist/rhel9/rhui/9/x86_64/appstream/os/repodata/repomd.xml (IP: 18.192.84.171)
Error: Failed to download metadata for repo 'rhel-9-appstream-rhui-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

If you encounter this error, it’s important to recognize that the issue is likely related to RHUI (Red Hat Update Infrastructure). RHUI is a content delivery system provided by Red Hat, specifically designed for cloud environments such as AWS. It allows RHEL users to receive updates and packages without directly registering their systems with Red Hat.

When you launch a server on AWS updates are delivered via RHUI, which acts as a proxy-similar to how RedHat’s native repositories function. You can think of it as the mechanism that enables RHEL instances acquired via cloud marketplaces (e.g., AWS Marketplace) to operate without manual registration.

However, when a server has been migrated (e.g., via lift-and-shift), you’re proceeding with a BYOL (Bring Your Own License) approach, and if the system was previously registered with Red Hat, conflicts may arise.

To avoid these issues:

Uninstall the RHUI client from the system.
Deregister the server from Red Hat, making sure to also clear any cached data.
If needed, update the hostname at this stage.
Finally, re-register the system with Red Hat using the appropriate subscription method.

These steps help eliminate RHUI-related conflicts and ensure that your system is properly registered and able to receive updates in its new environment. Here are the commands:

yum remove -y rh-amazon-rhui-client // for AWS
yum remove -y rhui-azure-rhel* rhui-client-config-azure rhui-microsoft-azure-rhel // for Azure
yum remove -y google-rhui-client* rhui-client-config-google // for GCP

subscription-manager remove --all
subscription-manager unregister
subscription-manager clean
yum clean all
rm -rf /var/cache/yum/*
subscription-manager register
subscription-manager attach --auto

After completing these steps, you’ll need to re-enable the repositories. For RHEL 9, you can use the following commands:

subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms
subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms

If you encounter the warning “Repositories disabled by configuration.” at this stage, open the “/etc/rhsm/rhsm.conf” file and check whether the value of “manage_repos” is set to 1. If it’s not, it may have been disabled by “subscription-manager“. Change the value to 1 and save the file.

Then, run the following commands again:

subscription-manager refresh
subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms
subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms
yum clean all && yum update

I hope these steps help save you time in resolving the issue.

References:

High Availability Nginx (Corosync, Pacemaker ve DRBD Cluster) Yapılandırması

root — Mon, 04 Mar 2019 17:20:13 +0000

Merhaba,

Daha önce High Availability HAProxy-Keepalived Kurulumu ve Yapılandırması başlığı adı altında HAProxy yedekliliği ile iki veri merkezi üzerinde isteğe göre aktif-aktif veya aktif-pasif bir yapı kurulumundan bahsetmiştim. Bu anlatımda da High Availability bir ortamın iki node üzerine kurulumundan bahsedeceğim ve bu anlatım Corosync-Pacemaker kurulumu ve DRBD kurulumu olmak üzere iki kısımdan oluşuyor. İşlemlere başlamadan önce aşağıdaki makalede geçen işlemleri gerçekleştirmelisiniz. DRBD (disk replikesine) ihtiyacım yok derseniz bu anlatımı direkt uygulayabilirsiniz.

DRBD ile Disk Replikasyon Yapılandırması ve Kurulumu

Makalede geçen işlemleri tamamladıktan sonra şimdi Corosync-Pacemaker ve Nginx kurulumuna geçebiliriz.

Bu anlatımda “Web Sunucu HA” kurmaktan bahsettim. Aynı senaryoyu veritabanı veya başka servisler için de kullanabilirsiniz.

DRBD anlatımında olduğu gibi sunucuların hosts dosyalarını ve hostnamelerini düzenlemeyi, Selinux’u disable etmeyi ve sunucuların birbirleriyle ALL konuşmalarını sağlamayı unutmayın.

192.168.0.69 nginx1.vahap.net nginx1
192.168.0.72 nginx2.vahap.net nginx2
192.168.0.63 Floating IP

Nginx’in son sürümünü yükleyebilmek için aşağıdaki işlemi uygulayın.

Nginx kurulumu için /yum.repos.d/ altına nginx.repo adına dosya oluşturun:
nano /etc/yum.repos.d/nginx.repo

Aşağıdaki repo tanımını kopyalayın ve bu dosya içine yapıştırın.

[nginx]
name=nginx repo
baseurl=http://nginx.org/packages/mainline/centos/7/$basearch/
gpgcheck=0
enabled=1

yum -y update
yum -y install nginx
systemctl start nginx
systemctl enable nginx

Kurulumu tamamladıktan sonra ilk makinada ilk komutu, ikinci makinada da ikinci komutu kullanın.

echo 'web01 - voc-labs' > /usr/share/nginx/html/index.html
echo 'web02 - voc-labs' > /usr/share/nginx/html/index.html

Bu işlemden sonra corosync, pacemaker ve pcsd kurun. Sırasıyla uygulayın.

yum -y install corosync pacemaker pcs

systemctl enable corosync
systemctl enable pacemaker
systemctl enable pcsd
systemctl start pcsd
passwd hacluster

İşlemi iki sunucuda gerçekleştirin ve hacluster’a şifre verin ve bu adımdan sonraki işlemleri sadece primary makinada uygulayın.

pcs cluster auth nginx1 nginx2
pcs cluster setup --name vahap-cluster nginx1 nginx2
pcs cluster start --all
pcs cluster enable --all
pcs status cluster

Bu işlem sonunda STONITH disable etmelisiniz.

pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs property list

Bu işlem sonunda floating IP eklemeniz gerekecek. Bunun içinde bir public IP belirleyin. Bu IP’yi Load Balancer IP’si gibi düşünebilirsiniz. Arka tarafta heartbeat paket gönderip, hangi sunucu IP’si aktifse trafiği onun üzerinden sunuyor. Sırasıyla ve uygun yerleri düzelterek komutu uygulayın.

pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.0.63 cidr_netmask=32 op monitor interval=30s
pcs resource create web ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf op monitor timeout="5s" interval="5s"
pcs status resources
pcs constraint colocation add web virtual_ip INFINITY
pcs constraint order virtual_ip then web
pcs cluster stop --all
pcs cluster start --all
pcs status resources

Bu adımdan sonra süreç tamamlanmıştır. Şimdi testleri gerçekleştirelim.

pcs status nodes
corosync-cmapctl | grep members
pcs status corosync

Tarayıcıya sunucuların IPlerini yazdığımda Nginx’in cevap verdiğinden emin olduktan sonra floating IP’yi yazıyorum. Nginx 1’in cevap verdiğini görüyorum.

Test amacıyla web01 makinasını cluster üzerinde down hale getirin. Eğer floating IP ikinci web makinasından cevap vermeye başlarsa işlem tamamdır.

İşlem sonunda node1 aktif etmek için de aşağıdaki komutu node1 makinasında uygulamalısınız.

pcs cluster start nginx1

Umarım zamandan kazanmanızı sağlar.

CentOS/RHEL – Disk Ekleme, Formatlama ve Mount İşlemi

root — Mon, 04 Mar 2019 15:00:40 +0000

Merhaba,

Bu anlatımı, DRBD ile Disk Replikasyon Yapılandırması ve Kurulumu (içeriği tamamlayınca link vereceğim) makalem dolayısıyla oluşturdum. Yukarıdaki makalenin bir içeriğidir. Sunucuya herhangi disk eklemeden önce df -h ve fdisk -l çıktılarını paylaşıyorum. Görebileceğiniz üzere sda olmak üzere tek disk var. Bu anlatım, sisteme yeni bir disk ekleme üzerinedir. Disk extend değildir. Eğer ortamınız VMWare ise buraya tıklayarak disk extend yazımı inceleyebilirsiniz.

Sunucuyu kapatıp, virtual veya fiziksel diski taktıktan sonra işletim sistemini başlatıyorum.
Diski taktıktan sonra ki çıktılar aşağıda ki gibidir.

CentOS, takılan diski algılamış. Şimdi bu eklediğim 10 GB disk alanını işletim sistemine ikinci bir disk olarak mount etmeliyim.
Bunun için de fdisk kullanacağım. fdisk /dev/sdb diyerek partition oluşturuyorum.

Üstteki görselin adımları fdisk /dev/sdb dedikten sonra şu şekilde:

n tuşuna tıklayıp, enter ile devam ediyoruz. (New partition oluşturuyoruz.)

Bölümlediğimiz kısmı primary yapacağımız için p tuşuna basıp enter diyerek devam ediyoruz.

Sonraki seçenekleri default bırakıp ilerledim. İki kez enter’a basın.

Son olarak w tuşu ile değişiklikleri kaydedin.

İşlem sonunda sdb1 partition oluştuğunu göreceksiniz.

Yukarıdaki görselde oluşturduğumuz /dev/sdb1 diskini, mkfs kullanarak xfs formatında biçimlendirdim. Dilerseniz ext3 veya ext4 formatında da biçimlendirebilirsiniz ancak ext4 (32 bit) kullanmanızı öneririm. Bunun için de mkfs.ext4 /dev/sdb1 demeniz yeterlidir.

Bu işlem sonunda hata almamanız halinde diski sisteme mount ederek kullanabilirsiniz. Ben /data şeklinde bir folder açtım ve diski bu dizine mount ettim. İşletim sisteminin reboot olmasıyla beraber diskin unmount olmasını önlemek için de fstab’a diski ekledim.

Dilerseniz blkid /dev/sdb1 diyerek UUID ID’yi alarak mount edebilirsiniz.

Önemli notlar: Diski xfs formatında mount edecekseniz aşağıdaki yolları da uygulayarak mount edebilirsiniz.

a. Eğer ortamınızda 2 TB’den büyük bir veri varsa ve disk boyutu 2 TB’den yüksekse şu şekilde mount etmeniz faydanıza olur. (Örnektir.)
mount -o inode64 /dev/sdb1 /data

b. XFS, güç kesintisinde dosya sistemi bütünlüğünü sağlamak için yazma engellerini, arabirim sıfırlamalarını, varsayılan olarak sistem çökmelerini korumayı sağlar. Donanımınızın bir yazma önbelleği özelliği varsa bu korumanın devre dışı bırakılmasının önerildiğini ve aksi takdirde performansı olumsuz etkilediğini gördüm. Aşağıdaki mount seçeneğini kullanarak bu korumayı devre dışı bırakabilirsiniz.
mount -o nobarrier /dev/sdb1 /data

Umarım zamandan kazanmanızı sağlar.