Syed Jahanzaib – سید جہانزیب – Personal Blog to Share Knowledge !

February 9, 2026

Handling Stale PPPoE Sessions in MikroTik + FreeRADIUS



Handling Stale PPPoE Sessions in MikroTik + FreeRADIUS

(Exact File Locations, unlag Placement, SQL Ownership, and Cron Responsibilities)

  • Author: Syed Jahanzaib ~A Humble Human being! nothing else 😊
  • Platform: aacable.wordpress.com
  • Category: Corporate Offices / DHCP-DNS Engineering
  • Audience: Systems Administrators, IT Support, NOC Teams, Network Architects

⚠️ Disclaimer & Note on Writing Style

Every network environment is unique. A solution that works effectively in one infrastructure may require modification in another. Readers are strongly encouraged to understand the underlying concepts and adapt the guidance according to their own architecture, operational policies, and risk tolerance.

Blind copy-paste implementation without proper validation, testing, and change management is never recommended , especially in production environments. Always ensure proper backups and risk assessment before applying any configuration.

The content shared here is based on hands-on experience from real-world deployments, ISP environments, lab testing, and continuous learning. While I strive for technical accuracy, no technical implementation is entirely free from the possibility of error. Constructive discussion and alternative approaches are always welcome.

Due to professional commitments, it is not always feasible to publish highly detailed or multi-part write-ups. The technical logic and implementation details are written based on my own practical experience. AI tools such as ChatGPT are used only to refine grammar, structure, and presentation — not to generate the core technical concepts.

This blog is not intended for client acquisition or follower growth. It exists solely to share practical knowledge and real-world experience with the community.

Thank you for your understanding and continued support.


Introduction

In PPPoE deployments built on MikroTik NAS and FreeRADIUS with an SQL backend, stale accounting sessions are an unavoidable operational reality. These sessions typically arise when accounting stop packets are never received due to NAS reboots, power failures, access link disruptions, or transport issues between the NAS and the RADIUS server.

When left unverified, such stale records directly impact Simultaneous-Use enforcement, resulting in legitimate users being blocked with “user already online” conditions despite having no active session. Addressing this problem requires more than periodic record deletion; it demands a controlled and verifiable method to determine whether a session is truly active.

This article presents a production-safe approach to PPPoE session verification by leveraging accounting activity, SQL-based validation, unlang logic, and scheduled cleanup processes. The objective is to ensure accurate session state, reliable Simultaneous-Use enforcement, and clean accounting data without compromising audit integrity or service availability.

Why this happens (root cause)

  • MikroTik sends Accounting-Start
  • Every 5 minutes it sends Interim-Update
  • FreeRADIUS inserts a row in radacct with acctstoptime = NULL
  • NAS reboots / loses connectivity
  • Accounting-Stop is never sent
  • FreeRADIUS still thinks the session is active
  • Simultaneous-Use = 1 blocks re-login

👉 RADIUS cannot guess that the user is gone unless you tell it how.

What ISPs do in real production

There are three layers usually combined:

1️⃣ Interim-Update timeout validation (MOST IMPORTANT)

This is the primary verification mechanism.

Logic

If a session has not sent an Interim-Update for X minutes, it is considered dead, even if acctstoptime is NULL.

Why it works

  • A live session must send interim updates
  • If NAS rebooted, updates stop
  • This is proof, not assumption

Typical ISP rule

If last interim update > 2 × Acct-Interim-Interval → session is dead

Example:

  • Interim interval = 5 minutes
  • Timeout = 10–15 minutes

SQL condition (verification)

SELECT username, nasipaddress, acctsessionid
FROM radacct
WHERE acctstoptime IS NULL
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 15;

If this condition is true → session is NOT alive.

Proper cleanup (controlled)

UPDATE radacct
SET acctstoptime = acctupdatetime,
    acctterminatecause = 'NAS-Reboot'
WHERE acctstoptime IS NULL
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 15;

✅ This is production-safe
❌ This is NOT blind deletion
✔ This is based on verifiable inactivity

2️⃣ NAS reachability check (secondary verification)

Many ISPs also verify NAS status before cleanup.

Typical checks

  • Ping NAS IP
  • SNMP availability
  • RouterOS API availability

Logic

  • IF NAS unreachable
  • AND session has no interim updates
  • THEN mark sessions as dead

This avoids killing sessions when:

  • SQL delayed
  • Temporary RADIUS lag

Example logic (pseudo)

if ping NAS == fail:

cleanup stale sessions for that NAS

This is often implemented in cron scripts.

3️⃣ MikroTik-side protections (VERY IMPORTANT)

  1. a) Proper Acct-Interim-Interval

On MikroTik:

/radius
set accounting=yes interim-update=5m

Never leave it default or disabled.

  1. b) Send Stop on reboot (best-effort)

RouterOS already tries, but you can improve reliability:

  • Use stable storage
  • Avoid power cuts without UPS

Still, RADIUS must assume stops can be lost.

4️⃣ What ISPs DO NOT do (bad practices)

❌ Delete all acctstoptime IS NULL blindly
❌ Disable Simultaneous-Use
❌ Allow multiple sessions to avoid complaints
❌ Manual cleanup by support staff

These cause:

  • Abuse
  • Multiple logins
  • Incorrect billing
  • Legal/audit issues

5️⃣ Recommended production-grade approach (summary)

Layer Purpose Mandatory
Interim timeout check Session verification ✅ YES
NAS reachability check Extra safety ⚠ Recommended
Controlled SQL cleanup Session release ✅ YES
Simultaneous-Use=1 Abuse prevention ✅ YES

6️⃣ Industry-standard wording (how ISPs define it)

“A session is considered active only if periodic accounting updates are received within the defined timeout window. Absence of updates implies session termination due to NAS failure or connectivity loss.”

This is exactly how large ISPs justify cleanup during audits.

Final Answer to core question!

How do ISPs verify if user is actually not live on NAS?

By absence of Interim-Update packets within a defined time window, optionally combined with NAS reachability checks.

There is no other reliable method in RADIUS.

This article documents a fully production-aligned design, explicitly mapping:

  • Which logic runs inside FreeRADIUS
  • Which logic runs via SQL
  • Which logic runs via cron
  • Exact file names and locations for every decision

No unnamed logic. No invisible automation.

Environment Assumptions

  • FreeRADIUS 3.x
  • MySQL / MariaDB backend
  • MikroTik PPPoE NAS
  • Interim-Update = 5 minutes
  • Simultaneous-Use = 1

Session State Model (Conceptual)

State Meaning Where enforced
ACTIVE Interim updates arriving FreeRADIUS SQL
STALE Interim missing, NAS alive Cron
CLOSED Verified termination FreeRADIUS unlang / Cron

 


Database Layer (Schema Ownership)

Location

  • Database: radiusdb
  • Table   : radacct

Schema Extension (run once)

Concept/Logic/Purpose: Extends the radacct table to explicitly track session state and last known MikroTik session timers. This enables deterministic handling of stale, resumed, and terminated sessions without relying on assumptions.

ALTER TABLE radacct
ADD COLUMN session_state ENUM('ACTIVE','STALE','CLOSED') DEFAULT 'ACTIVE',
ADD COLUMN last_acct_session_time INT DEFAULT 0,
ADD INDEX idx_state_update (session_state, acctupdatetime),
ADD INDEX idx_nas_user (nasipaddress, username);

Responsibility:
✔ Stores session truth
✔ No logic, only state

SQL Query Ownership (queries.conf)

  • Concept/Logic/Purpose: Centralizes all accounting-related SQL logic in a single, predictable location managed by FreeRADIUS. This separation ensures that runtime logic (unlang) and data manipulation (SQL) remain clean and auditable.

File Location

  • /etc/freeradius/mods-config/sql/main/mysql/queries.conf

All SQL below lives only in this file.


Accounting-Start Query

  • Used by: FreeRADIUS accounting {}
  • Concept/Logic/Purpose: Creates a new accounting record when a PPPoE session is first established.
    This marks the authoritative beginning of a user session and initializes all tracking fields.
accounting_start_query = "
INSERT INTO radacct (
  acctsessionid, acctuniqueid, username, nasipaddress,
  acctstarttime, acctupdatetime, acctstoptime,
  acctsessiontime, last_acct_session_time, session_state
) VALUES (
  '%{Acct-Session-Id}', '%{Acct-Unique-Session-Id}',
  '%{User-Name}', '%{NAS-IP-Address}',
  NOW(), NOW(), NULL,
  '%{Acct-Session-Time}', '%{Acct-Session-Time}', 'ACTIVE'
)"

Interim-Update Query

  • Used by: FreeRADIUS accounting {}
  • Concept/Logic/Purpose: Periodically refreshes session counters and timestamps to confirm that the user is still actively connected. It also safely resumes sessions after temporary NAS–RADIUS connectivity loss.
accounting_update_query = "
UPDATE radacct
SET acctupdatetime = NOW(),
    acctsessiontime = '%{Acct-Session-Time}',
    last_acct_session_time = '%{Acct-Session-Time}',
    session_state = 'ACTIVE'
WHERE acctsessionid = '%{Acct-Session-Id}'
AND nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
"

Accounting-Stop Query

  • Used by: FreeRADIUS accounting {}
  • Concept/Logic/Purpose: Close session normally. Closes a session only when FreeRADIUS receives an explicit stop notification from the NAS. This represents a verified and intentional session termination.
accounting_stop_query = "
UPDATE radacct
SET acctstoptime = NOW(),
    acctterminatecause = '%{Acct-Terminate-Cause}',
    session_state = 'CLOSED'
WHERE acctsessionid = '%{Acct-Session-Id}'
AND nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
"

NAS Reboot Cleanup Query

  • Used by: FreeRADIUS unlang only
  • Concept/Logic/Purpose: Close sessions after verified NAS reboot. Force-closes all open sessions for a NAS only after a reboot has been positively detected. This prevents ghost sessions while preserving accounting accuracy.
nas_reboot_cleanup_query = "
UPDATE radacct
SET acctstoptime = acctupdatetime,
    acctterminatecause = 'NAS-Reboot',
    session_state = 'CLOSED'
WHERE nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
"

Last Session-Time Lookup Query

  • Used by: FreeRADIUS unlang only
  • Concept/Logic/Purpose: Detect MikroTik reboot Retrieves the previously stored Acct-Session-Time to detect timer regression.This is the primary mechanism used to infer MikroTik NAS reboots reliably.
nas_last_session_time_query = "
SELECT last_acct_session_time
FROM radacct
WHERE acctsessionid = '%{Acct-Session-Id}'
AND nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
LIMIT 1
"

Simultaneous-Use Count Query

  • Used by: SQL authorize stage
  • Concept/Logic/Purpose: Enforce single login. Counts only ACTIVE sessions to enforce single-login policies correctly. STALE sessions are excluded to avoid false user lockouts during temporary failures.
simul_count_query = "
SELECT COUNT(*)
FROM radacct
WHERE username = '%{User-Name}'
AND acctstoptime IS NULL
AND session_state = 'ACTIVE'
AND nasipaddress != '%{NAS-IP-Address}'
"

FreeRADIUS unlang Logic (Runtime Decisions)

Implements real-time decision making based on live accounting packets. This layer handles session creation, resumption, and NAS reboot detection without relying on timers.

Once stale sessions are identified at the database level, FreeRADIUS must use this verified state during authorization. This is where unlang becomes critical — it allows dynamic decision-making based on real-time session validity rather than raw record existence.

File Location

  • /etc/freeradius/sites-enabled/default

Section

  • server default → accounting { }

Responsibility

  • ✔ Real-time decisions
  • ✔ Packet-driven logic
  • ❌ No time-based cleanup

Complete accounting{} block

# /etc/freeradius/sites-enabled/default
accounting {
  if (Acct-Status-Type == Start) {
    sql
    ok
  }
  if (Acct-Status-Type == Interim-Update) {
    update request {
      Tmp-Integer-0 := "%{sql:nas_last_session_time_query}"
    }
    if (&Tmp-Integer-0 && (&Acct-Session-Time < &Tmp-Integer-0)) {
      radiusd::log_warn("NAS reboot detected %{NAS-IP-Address}")
      sql:nas_reboot_cleanup_query
    }
    sql
    ok
  }
  if (Acct-Status-Type == Stop) {
    sql
    ok
  }
}

Cron Layer (Time-Based Maintenance Only)

Practical Example

Assume the NAS is configured to send Interim-Updates every 5 minutes.

• Session start time: 10:00
• Last Interim-Update received: 10:25
• Current time: 10:45

Since no updates were received for 20 minutes (4× the interim interval), the session can be confidently classified as stale and excluded from active session counts.

NOTE: CRON intervals

In production ISP environments, such cleanup jobs are typically executed every 5 to 15 minutes. Running them too frequently can increase database load, while long intervals delay user recovery. The exact timing should be aligned with the configured Interim-Update interval.


Cron Script #1 — Mark STALE Sessions

File:

  • /usr/local/sbin/radius/mark-stale-sessions.sh
  • Concept/Logic/Purpose: Periodically identifies sessions that stopped sending Interim-Updates but may still be valid. Sessions are marked STALE instead of being disconnected, allowing safe recovery.
  • Does NOT disconnect users
#!/bin/bash
mysql -u radius -p'PASSWORD' radiusdb <<EOF
UPDATE radacct
SET session_state = 'STALE'
WHERE acctstoptime IS NULL
AND session_state = 'ACTIVE'
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 15;
EOF

Note: This query intentionally avoids deleting records blindly. Instead, it relies on time-based verification to determine whether a session has genuinely stopped sending updates. This approach prevents accidental cleanup of slow or temporarily delayed sessions and ensures billing and audit accuracy.

Crontab Entry:

  • */5 * * * * /usr/local/sbin/radius/mark-stale-sessions.sh

Cron Script #2 — Cleanup Lost Sessions

Concept/Logic/Purpose: Final cleanup for dead accounting. Performs conservative, time-based cleanup of sessions that will never return.This protects database integrity without interfering with active or recoverable sessions.

File:

  • /usr/local/sbin/radius/cleanup-lost-sessions.sh
#!/bin/bash
mysql -u radius -p'PASSWORD' radiusdb <<EOF
UPDATE radacct
SET acctstoptime = acctupdatetime,
    acctterminatecause = 'Lost-Accounting',
    session_state = 'CLOSED'
WHERE acctstoptime IS NULL
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 120;
EOF

Crontab Entry:

  • 0 * * * * /usr/local/sbin/radius/cleanup-lost-sessions.sh

What Runs Where (Zero Ambiguity)

Logic Location
Start / Stop sites-enabled/default → accounting {}
Interim resume sites-enabled/default → accounting {}
NAS reboot detection sites-enabled/default → accounting {}
STALE marking mark-stale-sessions.sh
Final cleanup cleanup-lost-sessions.sh
Simultaneous-Use queries.conf → simul_count_query

What MUST NOT Exist

❌ Anonymous cron entries
❌ Logic without file ownership
❌ Session deletion without cause
❌ Time-based reboot assumptions


 

Session Resume After Temporary Accounting Outage

In our implementation:

  • Sessions are marked as STALE, not closed
  • acctstoptime remains NULL
  • When connectivity restores and the same Acct-Session-Id sends Interim-Updates again:
    • FreeRADIUS updates the existing record
    • session_state is set back to ACTIVE
    • The STALE condition is effectively cleared
    • Accounting continues normally

This is exactly the correct behavior.

Step-by-step (what really happens in your setup)

1️⃣ Interim updates stop (temporary NAS↔RADIUS issue)

  • Cron runs:
    • mark-stale-sessions.sh
  • Result in radacct:

acctstoptime = NULL

session_state = STALE

👉 Session is not closed, only flagged as stale.

2️⃣ NAS–RADIUS connectivity is restored

  • MikroTik resumes sending:

Acct-Status-Type = Interim-Update

Acct-Session-Id = SAME

3️⃣ FreeRADIUS processes Interim-Update

Inside:

  •  /etc/freeradius/sites-enabled/default

→ accounting { }

Flow:

  1. nas_last_session_time_query finds the existing row
    (acctstoptime IS NULL ✔)
  2. NAS reboot check:
    • Acct-Session-Time has not reset
    • No reboot detected ✔
  3. accounting_update_query executes:
UPDATE radacct
SET
acctupdatetime = NOW(),
acctsessiontime = ...,
last_acct_session_time = ...,
session_state = 'ACTIVE'
WHERE acctsessionid = ?
AND acctstoptime IS NULL;

4️⃣ Result in database

Field Value
acctstoptime NULL
session_state ACTIVE
counters updated
accounting continuous

👉 The STALE flag is removed automatically by the Interim-Update.

What does NOT happen (important)

❌ No new row is created
❌ No duplicate session
❌ No accounting reset
❌ No Simultaneous-Use false block

Why this works (core principle)

FreeRADIUS can safely resume a session only if acctstoptime was never set.

OUR  design respects this rule.

That is the entire reason STALE exists as a state.

One-line confirmation…

Yes, in this design sessions are marked as STALE (not closed), and when the same accounting session resumes, FreeRADIUS continues updating the existing record and automatically restores the session to ACTIVE.

One operational warning

  • If a session is ever closed (acctstoptime set), it can never be resumed — only restarted.

🛡️ Audit Justification Points

Audit Rationale: This session model is designed to ensure that PPPoE session closures are only recorded when there is explicit evidence of termination, either through an accounting stop, verified NAS reboot, or prolonged accounting silence beyond operational thresholds. Temporary outages do not constitute termination, preserving billing integrity and avoiding false positive disconnects.

To make this useful for Enterprise/ISP managers who need to justify these changes to auditors, add a section titled “Audit & Compliance: Why This Matters”.

Audit & Compliance Justification Implementing automated stale session handling is not just an operational fix; it is a data integrity requirement.
  • AAA Data Integrity (Authentication, Authorization, Accounting): Auditors require that Accounting logs accurately reflect user usage. Leaving stale sessions “open” (with AcctStopTime IS NULL) falsifies usage duration records, leading to incorrect billing disputes and “ghost” data consumption logs.

  • Revenue Assurance: For prepaid or quota-based ISPs, stale sessions prevent the system from calculating the final session volume. By forcing a closure based on the last known Interim-Update, we ensure that the billable data matches the actual network activity, preventing revenue leakage or customer overcharging.

  • Traceability & Non-Repudiation: Our customized SQL queries introduce a acctterminatecause = 'NAS-Reboot' flag. This provides a distinct audit trail, differentiating between a user logging off (User-Request) and a system correction (System-Cleanup), which is critical for forensic analysis during network outages.*

When this logic does NOT apply

  • NAS firmware that does not send Interim-Update reliably
  • Cases where NAS sends stale Interim-Updates after long outages
  • Networks with asymmetric paths and intermittent packet loss

In such cases you may need longer STALE thresholds or secondary reachability checks (SNMP/ICMP).

📌 NOC Operational Expectation

  • STALE threshold: 15 minutes
  • Final cleanup: 120 minutes
  • Alerts for NAS reboot events should be integrated into monitoring (Syslog/SNMP)
  • Any unexpected growth in STALE counts must be investigated

⃣General View for Non-Technical Reader

In simple terms, sessions are only ended when there is verified evidence of exit. Temporary network issues are handled without affecting service continuity, ensuring users don’t lose sessions or get billed incorrectly.


Important Caveats

• If Interim-Updates are disabled or misconfigured on the NAS, this method will not work correctly.
• Database latency or replication delay must be considered in large deployments.
• Multi-NAS environments should ensure session verification is NAS-aware to avoid false positives.

Final Operational Principle

Stale PPPoE sessions are not a database anomaly but a natural consequence of real-world network behavior. Treating them as such requires session verification based on accounting activity rather than the mere presence of an open record.

By relying on Interim-Update freshness, SQL-based validation, and unlang-driven authorization logic, ISPs can accurately distinguish between active and defunct sessions. This method ensures Simultaneous-Use enforcement remains fair, prevents unnecessary customer lockouts, and preserves accounting accuracy for billing and audit purposes.

When implemented correctly and aligned with NAS interim update intervals, this approach provides a scalable and production-ready solution for managing PPPoE session state in MikroTik and FreeRADIUS environments.


© Syed Jahanzaib — aacable.wordpress.com