Objective:

Create an automated workflow using n8n to monitor and manage the size of indices in an Elasticsearch cluster. The workflow should automatically delete old data from indices that exceed a defined size threshold, helping to maintain disk space and ensure system efficiency.

Key Steps:

Scheduled Trigger: Run the workflow automatically on a fixed schedule (e.g., every 5 minutes for testing).

Get Index Statistics: Connect to Elasticsearch and fetch statistics for all indices, ignoring system indices (names starting with .)

Check Size Threshold: Compare each index’s primary store size against a defined threshold (e.g., 10MB for testing).

Delete Old Data: If an index exceeds the threshold, delete documents older than a defined period (e.g., 7 days).

Optional Enhancements:

Send an email report summarizing index names and current sizes.

Perform hard deletes to immediately reclaim disk space (resource-intensive, optional for large indices).

Outcome:

A fully automated workflow that monitors Elasticsearch indices, deletes old data when needed, and optionally reports the results, ensuring efficient disk usage and easier data management.

Setting Up the Work Environment

Ensure n8n is Installed:

I already documented all the steps, so I won’t repeat them here. I’ll start n8n directly, but here’s the LINK Here

Run n8n Temporarily in the Terminal:

sudo service docker start

docker run -it --rm --name n8n --network host -v n8n_data:/home/node/.n8n n8nio/n8n

in Browser : http://localhost:5678/

Pre-requisites

Before building the workflow, we configured the following:

1️⃣ Elasticsearch API Credential

Created a new Credential in n8n:
Type: Elasticsearch API
Auth Method: Elasticsearch Account
Username: elastic
Password: YourPass
Host URL: https://192.168.1.16:9200

This allows n8n to securely fetch logs directly from our ELK SIEM index.

2️⃣ Email App Password

Since Gmail no longer allows direct login with username & password, we:
Enabled 2-Step Verification on the Gmail account.
Generated an App Password via:\ Google Account → Security → App Passwords → Select App: Mail → Select Device: Other (Custom name: n8n)
Copied the 16-character password generated by Google.
Added it in n8n’s Email Credential settings (type: “Gmail SMTP”).

This allows the workflow to send automated reports securely via Gmail.

Analyzing Indices Before Applying the Retention Policy

From Stack Management → Index Management → Indices`

The indices marked with ✅ in the image are the ones where I can generate large logs quickly, so I can test on them. They are also the machines that are currently running and still available.

Since most of the indices are under 5 MB:
We will use an n8n workflow every 5 minutes to fetch their size data.
If any index exceeds 10 MB → we add it to the “Review or Delete” list.

Since my indices are *-nginx-logs-* and wef-win-logs, only these exceeded 5 MB, so I will continue working with them.

In other words, only 3 indices actually exceed the test threshold (Y = 5 MB) required for this task.

Executing the Workflow on n8n

In Browser: http://localhost:5678/

1️⃣ Create a Schedule Trigger

This sets how frequently the workflow will run.

The first step is to have the workflow run periodically every 5 minutes.

2️⃣ Add an HTTP Request Node

Clicked the + under the Schedule Trigger node.
From the menu, selected: Nodes → HTTP Request
Method: GET
URL:

https://192.168.1.16:9200/_cat/indices?h=index,store.size&format=json

Add the Elasticsearch credentials.
Options:
Enable Ignore SSL Issues (Insecure)

Test the Node :

3️⃣ Using a Function Node

Purpose: Filter large indices (greater than 5 MB) and ignore system indices that start with ..
Input: JSON from the Elasticsearch API Node
JavaScript:
Code Function:
- Goal: Filter Elasticsearch indices to keep only those:
- Names that do not start with . (ignore system indices)
- Size greater than 5 MB

return items
.map(item => {
const sizeStr = item.json['pri.store.size'];
let sizeMB = 0;

if (typeof sizeStr === 'string') {
const lower = sizeStr.toLowerCase();

if (lower.endsWith('mb')) {
sizeMB = parseFloat(sizeStr);
} else if (lower.endsWith('kb')) {
sizeMB = parseFloat(sizeStr) / 1024;
} else if (lower.endsWith('b')) {
sizeMB = parseFloat(sizeStr) / (1024 * 1024);
}
}

// Exclude indexes that start with a dot and filter those larger than 5MB
if (item.json.index && !item.json.index.startsWith('.') && sizeMB > 5) {
return { json: item.json };
}

if (item.json.index && !item.json.index.startsWith('.') && sizeMB > 5) {
// Store the index name in the item's data
return { json: { index: item.json.index } };
}
return null; // Ignore non-matching items
})
.filter(item => item !== null); // Remove empty items

Code Steps:

Calculate size in MB (sizeMB):
Reads pri.store.size for each index.
Converts it to MB regardless of unit (B, KB, MB).
Ignore system indices:
Any index starting with . is skipped.
Filter large indices:
Keep only indices larger than 5 MB.
Return the required items:
Returns each matching index as JSON, or sometimes only the name.
Remove empty items:
.filter(item => item !== null) → removes non-matching items.

In short: This code filters and outputs only the indices that need to be handled.

4️⃣ Delete Old Data

Method: POST
URL: https://192.168.1.16:9200/{{ $json.index }}/_delete_by_query
The {{ $json.index }} ensures that the operation is applied to each index that exceeded the threshold defined in the Function Node.
Authentication: Elasticsearch API
Body:

{
  "query": {
    "range": {
      "@timestamp": {
        "lt": "now-7d/d"
      }
    }
  }
}

Delete documents older than 7 days for each index that exceeded the threshold.

Options: Ignore SSL Issues
Results after execution:

output :

[
  { "total": 0, "deleted": 0 },  // First index: no documents older than 7 days → nothing deleted
  { "total": 100171, "deleted": 100171 }, // Second index: 100,171 documents deleted ✅
  { "total": 0, "deleted": 0 } // Third index: no documents → nothing deleted
]

Documents older than 7 days in nginx-logs-2025.10.20 were successfully deleted.

From Stack Management → Index Management → Indices:

The nginx-logs-2025.10.20 index, which contained documents older than 7 days, was deleted successfully "deleted": 100171.
The indices nginx-logs-2025.10.25 and wef-win-logs-2025.10.22 contain data but not older than 7 days, so nothing was deleted.

Everything is working perfectly so far. The main task is now complete, and I will start the Bonus part next.

Bonus Stage – Several Issues Were Solved, and Each Step Was Documented Initially

5️⃣ Email Reporting

Already completed:

An email report was sent containing the indices’ names, their sizes, and the number of deleted documents.

Hard Delete (Bonus)

Goal: Merge segments and immediately reclaim disk space.

Results:

num_committed_segments: 1 → each index now has a single segment
deleted_docs: 0 → expected after merge
compound: true → new segment is compressed and merged

✅ The operation was successfully completed.

Now I Will Test the Workflow on a Larger Scale

I will set it to run every five minutes, delete any index larger than 1 MB and older than one day, send an email with these events, and complete the workflow end-to-end.

Before:

⚙️ Core Requirements

1️⃣ Scheduled Trigger (Period X)

The workflow runs automatically at a fixed interval.
For testing: set it to every 5 minutes (or any small value).

2️⃣ Get Index Statistics

Connects to the Elasticsearch cluster.
Retrieves statistics for all indices (especially their size).

Do not attempt to delete or interact with indices that start with a dot (system indices like .security-*).

3️⃣ Check Size Threshold (Threshold Y)

Iterate through each index and check if its size exceeds the allowed threshold (e.g., 1 MB).

For testing: a small threshold like 1 MB was used.

4️⃣ Delete Old Data (Period Z)

If the index size exceeds threshold Y → delete documents older than period Z.

For testing: used 1 day (now-1d/d in query).

Bonus Tasks

🔸 1. Hard Deletes (Advanced)

Normally in Elasticsearch, deleting data performs a “soft delete” (still consumes disk space temporarily).
The task adds a Force Merge step after deletion to immediately reclaim disk space.
(Not mandatory in production, but useful for practical understanding).

🔸 2. Email Reporting

First, created code to generate the report, formatting multiple indices neatly:

Then sent the email:

Final email appearance:

After checking or deleting, the workflow sends a simple email report containing a table with:

Index Name

Current Size

Number of Deleted Documents

Final Verification of Indices on Kibana:

Almost everything was successfully completed.