Syed Jahanzaib – سید جہانزیب – Personal Blog to Share Knowledge !

February 9, 2026

Handling Stale PPPoE Sessions in MikroTik + FreeRADIUS



Handling Stale PPPoE Sessions in MikroTik + FreeRADIUS

(Exact File Locations, unlag Placement, SQL Ownership, and Cron Responsibilities)

  • Author: Syed Jahanzaib ~A Humble Human being! nothing else 😊
  • Platform: aacable.wordpress.com
  • Category: Corporate Offices / DHCP-DNS Engineering
  • Audience: Systems Administrators, IT Support, NOC Teams, Network Architects

⚠️ Disclaimer & Note on Writing Style

Every network environment is unique. A solution that works effectively in one infrastructure may require modification in another. Readers are strongly encouraged to understand the underlying concepts and adapt the guidance according to their own architecture, operational policies, and risk tolerance.

Blind copy-paste implementation without proper validation, testing, and change management is never recommended , especially in production environments. Always ensure proper backups and risk assessment before applying any configuration.

The content shared here is based on hands-on experience from real-world deployments, ISP environments, lab testing, and continuous learning. While I strive for technical accuracy, no technical implementation is entirely free from the possibility of error. Constructive discussion and alternative approaches are always welcome.

Due to professional commitments, it is not always feasible to publish highly detailed or multi-part write-ups. The technical logic and implementation details are written based on my own practical experience. AI tools such as ChatGPT are used only to refine grammar, structure, and presentation — not to generate the core technical concepts.

This blog is not intended for client acquisition or follower growth. It exists solely to share practical knowledge and real-world experience with the community.

Thank you for your understanding and continued support.


Introduction

In PPPoE deployments built on MikroTik NAS and FreeRADIUS with an SQL backend, stale accounting sessions are an unavoidable operational reality. These sessions typically arise when accounting stop packets are never received due to NAS reboots, power failures, access link disruptions, or transport issues between the NAS and the RADIUS server.

When left unverified, such stale records directly impact Simultaneous-Use enforcement, resulting in legitimate users being blocked with “user already online” conditions despite having no active session. Addressing this problem requires more than periodic record deletion; it demands a controlled and verifiable method to determine whether a session is truly active.

This article presents a production-safe approach to PPPoE session verification by leveraging accounting activity, SQL-based validation, unlang logic, and scheduled cleanup processes. The objective is to ensure accurate session state, reliable Simultaneous-Use enforcement, and clean accounting data without compromising audit integrity or service availability.

Why this happens (root cause)

  • MikroTik sends Accounting-Start
  • Every 5 minutes it sends Interim-Update
  • FreeRADIUS inserts a row in radacct with acctstoptime = NULL
  • NAS reboots / loses connectivity
  • Accounting-Stop is never sent
  • FreeRADIUS still thinks the session is active
  • Simultaneous-Use = 1 blocks re-login

👉 RADIUS cannot guess that the user is gone unless you tell it how.

What ISPs do in real production

There are three layers usually combined:

1️⃣ Interim-Update timeout validation (MOST IMPORTANT)

This is the primary verification mechanism.

Logic

If a session has not sent an Interim-Update for X minutes, it is considered dead, even if acctstoptime is NULL.

Why it works

  • A live session must send interim updates
  • If NAS rebooted, updates stop
  • This is proof, not assumption

Typical ISP rule

If last interim update > 2 × Acct-Interim-Interval → session is dead

Example:

  • Interim interval = 5 minutes
  • Timeout = 10–15 minutes

SQL condition (verification)

SELECT username, nasipaddress, acctsessionid
FROM radacct
WHERE acctstoptime IS NULL
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 15;

If this condition is true → session is NOT alive.

Proper cleanup (controlled)

UPDATE radacct
SET acctstoptime = acctupdatetime,
    acctterminatecause = 'NAS-Reboot'
WHERE acctstoptime IS NULL
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 15;

✅ This is production-safe
❌ This is NOT blind deletion
✔ This is based on verifiable inactivity

2️⃣ NAS reachability check (secondary verification)

Many ISPs also verify NAS status before cleanup.

Typical checks

  • Ping NAS IP
  • SNMP availability
  • RouterOS API availability

Logic

  • IF NAS unreachable
  • AND session has no interim updates
  • THEN mark sessions as dead

This avoids killing sessions when:

  • SQL delayed
  • Temporary RADIUS lag

Example logic (pseudo)

if ping NAS == fail:

cleanup stale sessions for that NAS

This is often implemented in cron scripts.

3️⃣ MikroTik-side protections (VERY IMPORTANT)

  1. a) Proper Acct-Interim-Interval

On MikroTik:

/radius
set accounting=yes interim-update=5m

Never leave it default or disabled.

  1. b) Send Stop on reboot (best-effort)

RouterOS already tries, but you can improve reliability:

  • Use stable storage
  • Avoid power cuts without UPS

Still, RADIUS must assume stops can be lost.

4️⃣ What ISPs DO NOT do (bad practices)

❌ Delete all acctstoptime IS NULL blindly
❌ Disable Simultaneous-Use
❌ Allow multiple sessions to avoid complaints
❌ Manual cleanup by support staff

These cause:

  • Abuse
  • Multiple logins
  • Incorrect billing
  • Legal/audit issues

5️⃣ Recommended production-grade approach (summary)

Layer Purpose Mandatory
Interim timeout check Session verification ✅ YES
NAS reachability check Extra safety ⚠ Recommended
Controlled SQL cleanup Session release ✅ YES
Simultaneous-Use=1 Abuse prevention ✅ YES

6️⃣ Industry-standard wording (how ISPs define it)

“A session is considered active only if periodic accounting updates are received within the defined timeout window. Absence of updates implies session termination due to NAS failure or connectivity loss.”

This is exactly how large ISPs justify cleanup during audits.

Final Answer to core question!

How do ISPs verify if user is actually not live on NAS?

By absence of Interim-Update packets within a defined time window, optionally combined with NAS reachability checks.

There is no other reliable method in RADIUS.

This article documents a fully production-aligned design, explicitly mapping:

  • Which logic runs inside FreeRADIUS
  • Which logic runs via SQL
  • Which logic runs via cron
  • Exact file names and locations for every decision

No unnamed logic. No invisible automation.

Environment Assumptions

  • FreeRADIUS 3.x
  • MySQL / MariaDB backend
  • MikroTik PPPoE NAS
  • Interim-Update = 5 minutes
  • Simultaneous-Use = 1

Session State Model (Conceptual)

State Meaning Where enforced
ACTIVE Interim updates arriving FreeRADIUS SQL
STALE Interim missing, NAS alive Cron
CLOSED Verified termination FreeRADIUS unlang / Cron

 


Database Layer (Schema Ownership)

Location

  • Database: radiusdb
  • Table   : radacct

Schema Extension (run once)

Concept/Logic/Purpose: Extends the radacct table to explicitly track session state and last known MikroTik session timers. This enables deterministic handling of stale, resumed, and terminated sessions without relying on assumptions.

ALTER TABLE radacct
ADD COLUMN session_state ENUM('ACTIVE','STALE','CLOSED') DEFAULT 'ACTIVE',
ADD COLUMN last_acct_session_time INT DEFAULT 0,
ADD INDEX idx_state_update (session_state, acctupdatetime),
ADD INDEX idx_nas_user (nasipaddress, username);

Responsibility:
✔ Stores session truth
✔ No logic, only state

SQL Query Ownership (queries.conf)

  • Concept/Logic/Purpose: Centralizes all accounting-related SQL logic in a single, predictable location managed by FreeRADIUS. This separation ensures that runtime logic (unlang) and data manipulation (SQL) remain clean and auditable.

File Location

  • /etc/freeradius/mods-config/sql/main/mysql/queries.conf

All SQL below lives only in this file.


Accounting-Start Query

  • Used by: FreeRADIUS accounting {}
  • Concept/Logic/Purpose: Creates a new accounting record when a PPPoE session is first established.
    This marks the authoritative beginning of a user session and initializes all tracking fields.
accounting_start_query = "
INSERT INTO radacct (
  acctsessionid, acctuniqueid, username, nasipaddress,
  acctstarttime, acctupdatetime, acctstoptime,
  acctsessiontime, last_acct_session_time, session_state
) VALUES (
  '%{Acct-Session-Id}', '%{Acct-Unique-Session-Id}',
  '%{User-Name}', '%{NAS-IP-Address}',
  NOW(), NOW(), NULL,
  '%{Acct-Session-Time}', '%{Acct-Session-Time}', 'ACTIVE'
)"

Interim-Update Query

  • Used by: FreeRADIUS accounting {}
  • Concept/Logic/Purpose: Periodically refreshes session counters and timestamps to confirm that the user is still actively connected. It also safely resumes sessions after temporary NAS–RADIUS connectivity loss.
accounting_update_query = "
UPDATE radacct
SET acctupdatetime = NOW(),
    acctsessiontime = '%{Acct-Session-Time}',
    last_acct_session_time = '%{Acct-Session-Time}',
    session_state = 'ACTIVE'
WHERE acctsessionid = '%{Acct-Session-Id}'
AND nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
"

Accounting-Stop Query

  • Used by: FreeRADIUS accounting {}
  • Concept/Logic/Purpose: Close session normally. Closes a session only when FreeRADIUS receives an explicit stop notification from the NAS. This represents a verified and intentional session termination.
accounting_stop_query = "
UPDATE radacct
SET acctstoptime = NOW(),
    acctterminatecause = '%{Acct-Terminate-Cause}',
    session_state = 'CLOSED'
WHERE acctsessionid = '%{Acct-Session-Id}'
AND nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
"

NAS Reboot Cleanup Query

  • Used by: FreeRADIUS unlang only
  • Concept/Logic/Purpose: Close sessions after verified NAS reboot. Force-closes all open sessions for a NAS only after a reboot has been positively detected. This prevents ghost sessions while preserving accounting accuracy.
nas_reboot_cleanup_query = "
UPDATE radacct
SET acctstoptime = acctupdatetime,
    acctterminatecause = 'NAS-Reboot',
    session_state = 'CLOSED'
WHERE nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
"

Last Session-Time Lookup Query

  • Used by: FreeRADIUS unlang only
  • Concept/Logic/Purpose: Detect MikroTik reboot Retrieves the previously stored Acct-Session-Time to detect timer regression.This is the primary mechanism used to infer MikroTik NAS reboots reliably.
nas_last_session_time_query = "
SELECT last_acct_session_time
FROM radacct
WHERE acctsessionid = '%{Acct-Session-Id}'
AND nasipaddress = '%{NAS-IP-Address}'
AND acctstoptime IS NULL
LIMIT 1
"

Simultaneous-Use Count Query

  • Used by: SQL authorize stage
  • Concept/Logic/Purpose: Enforce single login. Counts only ACTIVE sessions to enforce single-login policies correctly. STALE sessions are excluded to avoid false user lockouts during temporary failures.
simul_count_query = "
SELECT COUNT(*)
FROM radacct
WHERE username = '%{User-Name}'
AND acctstoptime IS NULL
AND session_state = 'ACTIVE'
AND nasipaddress != '%{NAS-IP-Address}'
"

FreeRADIUS unlang Logic (Runtime Decisions)

Implements real-time decision making based on live accounting packets. This layer handles session creation, resumption, and NAS reboot detection without relying on timers.

Once stale sessions are identified at the database level, FreeRADIUS must use this verified state during authorization. This is where unlang becomes critical — it allows dynamic decision-making based on real-time session validity rather than raw record existence.

File Location

  • /etc/freeradius/sites-enabled/default

Section

  • server default → accounting { }

Responsibility

  • ✔ Real-time decisions
  • ✔ Packet-driven logic
  • ❌ No time-based cleanup

Complete accounting{} block

# /etc/freeradius/sites-enabled/default
accounting {
  if (Acct-Status-Type == Start) {
    sql
    ok
  }
  if (Acct-Status-Type == Interim-Update) {
    update request {
      Tmp-Integer-0 := "%{sql:nas_last_session_time_query}"
    }
    if (&Tmp-Integer-0 && (&Acct-Session-Time < &Tmp-Integer-0)) {
      radiusd::log_warn("NAS reboot detected %{NAS-IP-Address}")
      sql:nas_reboot_cleanup_query
    }
    sql
    ok
  }
  if (Acct-Status-Type == Stop) {
    sql
    ok
  }
}

Cron Layer (Time-Based Maintenance Only)

Practical Example

Assume the NAS is configured to send Interim-Updates every 5 minutes.

• Session start time: 10:00
• Last Interim-Update received: 10:25
• Current time: 10:45

Since no updates were received for 20 minutes (4× the interim interval), the session can be confidently classified as stale and excluded from active session counts.

NOTE: CRON intervals

In production ISP environments, such cleanup jobs are typically executed every 5 to 15 minutes. Running them too frequently can increase database load, while long intervals delay user recovery. The exact timing should be aligned with the configured Interim-Update interval.


Cron Script #1 — Mark STALE Sessions

File:

  • /usr/local/sbin/radius/mark-stale-sessions.sh
  • Concept/Logic/Purpose: Periodically identifies sessions that stopped sending Interim-Updates but may still be valid. Sessions are marked STALE instead of being disconnected, allowing safe recovery.
  • Does NOT disconnect users
#!/bin/bash
mysql -u radius -p'PASSWORD' radiusdb <<EOF
UPDATE radacct
SET session_state = 'STALE'
WHERE acctstoptime IS NULL
AND session_state = 'ACTIVE'
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 15;
EOF

Note: This query intentionally avoids deleting records blindly. Instead, it relies on time-based verification to determine whether a session has genuinely stopped sending updates. This approach prevents accidental cleanup of slow or temporarily delayed sessions and ensures billing and audit accuracy.

Crontab Entry:

  • */5 * * * * /usr/local/sbin/radius/mark-stale-sessions.sh

Cron Script #2 — Cleanup Lost Sessions

Concept/Logic/Purpose: Final cleanup for dead accounting. Performs conservative, time-based cleanup of sessions that will never return.This protects database integrity without interfering with active or recoverable sessions.

File:

  • /usr/local/sbin/radius/cleanup-lost-sessions.sh
#!/bin/bash
mysql -u radius -p'PASSWORD' radiusdb <<EOF
UPDATE radacct
SET acctstoptime = acctupdatetime,
    acctterminatecause = 'Lost-Accounting',
    session_state = 'CLOSED'
WHERE acctstoptime IS NULL
AND TIMESTAMPDIFF(MINUTE, acctupdatetime, NOW()) > 120;
EOF

Crontab Entry:

  • 0 * * * * /usr/local/sbin/radius/cleanup-lost-sessions.sh

What Runs Where (Zero Ambiguity)

Logic Location
Start / Stop sites-enabled/default → accounting {}
Interim resume sites-enabled/default → accounting {}
NAS reboot detection sites-enabled/default → accounting {}
STALE marking mark-stale-sessions.sh
Final cleanup cleanup-lost-sessions.sh
Simultaneous-Use queries.conf → simul_count_query

What MUST NOT Exist

❌ Anonymous cron entries
❌ Logic without file ownership
❌ Session deletion without cause
❌ Time-based reboot assumptions


 

Session Resume After Temporary Accounting Outage

In our implementation:

  • Sessions are marked as STALE, not closed
  • acctstoptime remains NULL
  • When connectivity restores and the same Acct-Session-Id sends Interim-Updates again:
    • FreeRADIUS updates the existing record
    • session_state is set back to ACTIVE
    • The STALE condition is effectively cleared
    • Accounting continues normally

This is exactly the correct behavior.

Step-by-step (what really happens in your setup)

1️⃣ Interim updates stop (temporary NAS↔RADIUS issue)

  • Cron runs:
    • mark-stale-sessions.sh
  • Result in radacct:

acctstoptime = NULL

session_state = STALE

👉 Session is not closed, only flagged as stale.

2️⃣ NAS–RADIUS connectivity is restored

  • MikroTik resumes sending:

Acct-Status-Type = Interim-Update

Acct-Session-Id = SAME

3️⃣ FreeRADIUS processes Interim-Update

Inside:

  •  /etc/freeradius/sites-enabled/default

→ accounting { }

Flow:

  1. nas_last_session_time_query finds the existing row
    (acctstoptime IS NULL ✔)
  2. NAS reboot check:
    • Acct-Session-Time has not reset
    • No reboot detected ✔
  3. accounting_update_query executes:
UPDATE radacct
SET
acctupdatetime = NOW(),
acctsessiontime = ...,
last_acct_session_time = ...,
session_state = 'ACTIVE'
WHERE acctsessionid = ?
AND acctstoptime IS NULL;

4️⃣ Result in database

Field Value
acctstoptime NULL
session_state ACTIVE
counters updated
accounting continuous

👉 The STALE flag is removed automatically by the Interim-Update.

What does NOT happen (important)

❌ No new row is created
❌ No duplicate session
❌ No accounting reset
❌ No Simultaneous-Use false block

Why this works (core principle)

FreeRADIUS can safely resume a session only if acctstoptime was never set.

OUR  design respects this rule.

That is the entire reason STALE exists as a state.

One-line confirmation…

Yes, in this design sessions are marked as STALE (not closed), and when the same accounting session resumes, FreeRADIUS continues updating the existing record and automatically restores the session to ACTIVE.

One operational warning

  • If a session is ever closed (acctstoptime set), it can never be resumed — only restarted.

🛡️ Audit Justification Points

Audit Rationale: This session model is designed to ensure that PPPoE session closures are only recorded when there is explicit evidence of termination, either through an accounting stop, verified NAS reboot, or prolonged accounting silence beyond operational thresholds. Temporary outages do not constitute termination, preserving billing integrity and avoiding false positive disconnects.

To make this useful for Enterprise/ISP managers who need to justify these changes to auditors, add a section titled “Audit & Compliance: Why This Matters”.

Audit & Compliance Justification Implementing automated stale session handling is not just an operational fix; it is a data integrity requirement.
  • AAA Data Integrity (Authentication, Authorization, Accounting): Auditors require that Accounting logs accurately reflect user usage. Leaving stale sessions “open” (with AcctStopTime IS NULL) falsifies usage duration records, leading to incorrect billing disputes and “ghost” data consumption logs.

  • Revenue Assurance: For prepaid or quota-based ISPs, stale sessions prevent the system from calculating the final session volume. By forcing a closure based on the last known Interim-Update, we ensure that the billable data matches the actual network activity, preventing revenue leakage or customer overcharging.

  • Traceability & Non-Repudiation: Our customized SQL queries introduce a acctterminatecause = 'NAS-Reboot' flag. This provides a distinct audit trail, differentiating between a user logging off (User-Request) and a system correction (System-Cleanup), which is critical for forensic analysis during network outages.*

When this logic does NOT apply

  • NAS firmware that does not send Interim-Update reliably
  • Cases where NAS sends stale Interim-Updates after long outages
  • Networks with asymmetric paths and intermittent packet loss

In such cases you may need longer STALE thresholds or secondary reachability checks (SNMP/ICMP).

📌 NOC Operational Expectation

  • STALE threshold: 15 minutes
  • Final cleanup: 120 minutes
  • Alerts for NAS reboot events should be integrated into monitoring (Syslog/SNMP)
  • Any unexpected growth in STALE counts must be investigated

⃣General View for Non-Technical Reader

In simple terms, sessions are only ended when there is verified evidence of exit. Temporary network issues are handled without affecting service continuity, ensuring users don’t lose sessions or get billed incorrectly.


Important Caveats

• If Interim-Updates are disabled or misconfigured on the NAS, this method will not work correctly.
• Database latency or replication delay must be considered in large deployments.
• Multi-NAS environments should ensure session verification is NAS-aware to avoid false positives.

Final Operational Principle

Stale PPPoE sessions are not a database anomaly but a natural consequence of real-world network behavior. Treating them as such requires session verification based on accounting activity rather than the mere presence of an open record.

By relying on Interim-Update freshness, SQL-based validation, and unlang-driven authorization logic, ISPs can accurately distinguish between active and defunct sessions. This method ensures Simultaneous-Use enforcement remains fair, prevents unnecessary customer lockouts, and preserves accounting accuracy for billing and audit purposes.

When implemented correctly and aligned with NAS interim update intervals, this approach provides a scalable and production-ready solution for managing PPPoE session state in MikroTik and FreeRADIUS environments.


© Syed Jahanzaib — aacable.wordpress.com

 

Implementing Daily Quota-Based Speed Throttling in FreeRADIUS 3.2.x (Production-Grade, Without sqlcounter)

Filed under: freeradius, Mikrotik Related — Tags: , , , , , , — Syed Jahanzaib / Pinochio~:) @ 10:21 AM


  • Author: Syed Jahanzaib ~A Humble Human being! nothing else 😊
  • Platform: aacable.wordpress.com
  • Category: Corporate Offices / DHCP-DNS Engineering
  • Audience: Systems Administrators, IT Support, NOC Teams, Network Architects

⚠️ Disclaimer & Note on Writing Style

Every network environment is unique. A solution that works effectively in one infrastructure may require modification in another. Readers are strongly encouraged to understand the underlying concepts and adapt the guidance according to their own architecture, operational policies, and risk tolerance.

Blind copy-paste implementation without proper validation, testing, and change management is never recommended — especially in production environments. Always ensure proper backups and risk assessment before applying any configuration.

The content shared here is based on hands-on experience from real-world deployments, ISP environments, lab testing, and continuous learning. While I strive for technical accuracy, no technical implementation is entirely free from the possibility of error. Constructive discussion and alternative approaches are always welcome.

Due to professional commitments, it is not always feasible to publish highly detailed or multi-part write-ups. The technical logic and implementation details are written based on my own practical experience. AI tools such as ChatGPT are used only to refine grammar, structure, and presentation — not to generate the core technical concepts.

This blog is not intended for client acquisition or follower growth. It exists solely to share practical knowledge and real-world experience with the community.

Thank you for your understanding and continued support.



Implementing Daily Quota‑Based Speed Throttling in FreeRADIUS 3.2.x

Production‑Grade Design (Without sqlcounter)

Overview

In many ISP and enterprise environments, not all subscribers should be treated equally. A common requirement is:

  • Allow full speed (e.g., 10 Mbps) up to a daily quota (e.g., 100 GB/day)
  • Automatically downgrade speed (e.g., to 5 Mbps) once the quota is exceeded
  • Restore full speed automatically the next day

This article documents a real, production‑tested implementation using:

  • FreeRADIUS (tested on 3.2.7)
  • MySQL backend
  • MikroTik NAS with CoA support
  • Cron‑based enforcement
  • Explicit SQL logic (no magic counters)
  • .my.cnf security

⚠️ Important: This design intentionally does NOT use sqlcounter. The reasoning is explained later.
Also we can adapt this logic for Huawei / Cisco & other NASES too… 😉

What This Design Supports

  • Multiple service profiles (e.g., 5M, 10M, 20M)
  • Per‑service daily quota limits
  • Automatic speed downgrade after quota exhaustion
  • Real‑time enforcement for already‑connected users via CoA (We send only explicit bandwidth-update CoA attributes, not disconnect requests.)
  • Automatic speed restoration on the next day
  • No infinite CoA loops (Avoid sending CoA too frequently because the NAS might be overloaded otherwise
  • Safe to run every few minutes in production

This approach scales cleanly and is widely used in real ISP deployments.

High‑Level Architecture

Responsibilities are cleanly separated:

Component Responsibility
FreeRADIUS Authentication & policy decision
MySQL Usage data, services, enforcement state
Cron script Quota detection & CoA triggering
CoA Force re‑authorization / speed change
State table Prevent repeated enforcement

Why this matters

  • Easier debugging
  • Predictable behavior
  • Safe upgrades
  • No hidden logic in authentication path

Design Principles (Read This First)

Before touching configuration, understand these realities:

  • Authorization happens once (at login)
  • Quota is crossed later (during accounting)
  • FreeRADIUS does not automatically re‑authorize sessions
  • CoA is mandatory for real‑time speed changes
  • Repeated CoA without state tracking causes loops

To solve this correctly, we introduce a quota state table.

Database Tables Used

Table Purpose
radcheck
Authentication only
radacct
Usage accounting
service_profiles
Speed + quota rules
user_services
User → service mapping
user_quota_state
Enforcement memory (critical)

This is exactly how large ISP RADIUS systems are structured.

Mental Model (Clean Separation)

Large systems scale by separating responsibility — not by collapsing everything into one table.

For this design we use three custom tables.


1️⃣ service_profiles — Rules Table

Purpose

Defines how a service behaves.

Controls

  • Normal speed (e.g., 10M/10M)
  • Throttled speed (e.g., 5M/5M)
  • Daily quota (GB)
  • Whether quota enforcement is enabled

One‑Line Summary

service_profiles defines HOW a service behaves.

Table Creation

CREATE TABLE service_profiles (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50),
normal_rate VARCHAR(50),
throttled_rate VARCHAR(50),
daily_quota_gb INT,
quota_enabled TINYINT(1)
);

Example Entry

INSERT INTO service_profiles
(name, normal_rate, throttled_rate, daily_quota_gb, quota_enabled)
VALUES
('10mb', '10M/10M', '5M/5M', 100, 1);

2️⃣ user_services — User → Service Mapping

Purpose

Maps which user is on which service plan.

One‑Line Summary

user_services defines WHICH rules apply to WHICH user.

Table Creation

CREATE TABLE user_services (
username VARCHAR(64) PRIMARY KEY,
service_name VARCHAR(50)
);

This keeps radcheck clean and avoids overloading authentication tables.


3️⃣ user_quota_state — Enforcement Memory (Critical)

Purpose

Remembers who has already been throttled today.

It prevents infinite CoA loops.

One‑Line Summary

user_quota_state answers one question: “Has quota enforcement already been applied for this user today?”

Table Creation

CREATE TABLE user_quota_state (
username VARCHAR(64) NOT NULL,
service_name VARCHAR(50) NOT NULL,
quota_date DATE NOT NULL,
is_throttled TINYINT(1) DEFAULT 0,
PRIMARY KEY (username, quota_date)
);

What This Table IS

  • A daily flag
  • A loop‑prevention mechanism
  • A CoA control switch

What This Table IS NOT

  • ❌ Billing record
  • ❌ Usage log
  • ❌ Authorization source
  • ❌ Permanent status

It does not control access. It only prevents repeated enforcement.

Lifecycle of a User

  1. Login → Normal speed
  2. Quota exceeded → Script detects breach
  3. CoA sent → Speed reduced
  4. State recorded → No further CoA today
  5. New day → State cleared → Full speed restored

Authorization Logic (Speed Assignment)

During authorization, FreeRADIUS decides which speed profile applies.

Logic

On login (or re-auth):

  • If user is not throttled today → Normal speed
  • If user is throttled today → Throttled speed

🧠 One Sentence Mental Model

Authorization decides speed, accounting measures usage, CoA enforces change, one flag prevents repetition.

1️⃣ Is putting this logic in sites-enabled/default efficient?

Short answer

Yes, if done correctly — and no, it will NOT overload authorization in your use-case.

Why this is safe

In FreeRADIUS, authorize {} is executed anyway for every Access-Request.
You are not adding a new phase, you are only adding conditional logic.

The real cost is:

  • SQL queries
  • String expansion
  • Unlang branching

These are cheap compared to:

  • TLS
  • PAP/CHAP/MSCHAP
  • SQL auth itself
  • Accounting writes

What would be inefficient (and what you should avoid)

❌ Multiple SQL queries per request
❌ Repeating the same SQL lookup again and again
❌ Logic applied to every NAS when not required

✅ Production-safe optimization pattern (Recommended)

  1. Move logic into a policy
  2. Cache service data in control
  3.  Proper way to detect NAS type and send correct attributes (MikroTik vs others)
  4. Best Practice: Use nas table (already supported by FreeRADIUS)
  5. Use NAS-Type in unlang

FreeRADIUS authorize {} Configuration

Edit:

nano /etc/freeradius/sites-enabled/default

authorize {
preprocess
filter_username
suffix
# 1) Load credentials
sql
# 2) Determine user's service
update control {
Service-Name := "%{sql:SELECT service_name FROM user_services WHERE username='%{User-Name}' LIMIT 1}"
}
# 3) Quota‑based speed decision
if ("%{sql:SELECT 1 FROM user_quota_state WHERE username='%{User-Name}' AND quota_date=CURDATE() LIMIT 1}" == "1") {
update reply {
Mikrotik-Rate-Limit := "%{sql:SELECT throttled_rate FROM service_profiles WHERE name='%{control:Service-Name}'}"
}
}
else {
update reply {
Mikrotik-Rate-Limit := "%{sql:SELECT normal_rate FROM service_profiles WHERE name='%{control:Service-Name}'}"
}
}
pap
chap
mschap
digest
eap {
ok = return
}
expiration
logintime
}

Result

  • Normal login → Normal speed
  • Over quota → Throttled speed
  • CoA re‑auth → Correct speed
  • Next day → Full speed restored

Real‑Time Enforcement Using CoA (qc.sh)

Mental Model

SQL detects quota breach → CoA enforces change → State table prevents repetition
Authorization alone is not enough because users may stay connected for days.

Selection Criteria (Critical)
The script must select only:

  • Active sessions
  • Services with quota_enabled = 1
  • Users who exceeded daily quota
  • Users not already throttled today

Production‑Ready qc.sh (quota check every 5 minutes)

#!/bin/bash
while read user nas ip secret; do
echo "Applying quota throttle for user=$user on NAS=$nas"
echo "User-Name=$user, Framed-IP-Address=$ip" | \
radclient -x "$nas" coa "$secret"
mysql <<EOF
INSERT IGNORE INTO user_quota_state (username, quota_date)
VALUES ('$user', CURDATE());
EOF
done < <(
mysql -N <<EOF
SELECT
r.username,
r.nasipaddress,
r.framedipaddress,
n.secret
FROM radacct r
JOIN user_services u ON r.username = u.username
JOIN service_profiles s ON u.service_name = s.name
JOIN nas n ON n.nasname = r.nasipaddress
WHERE r.acctstoptime IS NULL
AND s.quota_enabled = 1
AND NOT EXISTS (
SELECT 1
FROM user_quota_state q
WHERE q.username = r.username
AND q.quota_date = CURDATE()
)
GROUP BY r.username, r.nasipaddress, r.framedipaddress, n.secret
HAVING
SUM(r.acctinputoctets + r.acctoutputoctets)
> (MAX(s.daily_quota_gb) * 1024 * 1024 * 1024);
EOF
)

Safe to run every 5–10 minutes. Cron it to run every 5 minutes.

Why This Does NOT Loop

Once a user is throttled:

  • is_throttled = 1 is recorded
  • Script excludes that user
  • No repeated CoA packets
  • No NAS overload

This is idempotent enforcement, which is mandatory in ISP systems.


Daily Quota Reset & restore already connected users to there original speed – Script

Restoring Original Speed for Already Connected Users at Midnight

✅ The correct restore logic (mirror of qc.sh)

The restore script must:

  • Target only users who were throttled
  • Target only active sessions
  • Send CoA only
  • NOT touch accounting
  • NOT disconnect

That means:

  • Same CoA
  • Different condition

The Problem

At midnight, we reset daily quota (qc_reset.sh) state by running:

DELETE FROM user_quota_state
WHERE quota_date < CURDATE();

This correctly unmarks users as “quota exceeded”.

However:

❗ Users who are already connected and throttled will remain throttled
until their session is re-authorized.

Because:

  • FreeRADIUS only applies speed on authentication / re-auth
  • Clearing the table alone does not push a change to active sessions

So we need a way to force already connected users to re-apply policy.

Change bandwidth of ALREADY CONNECTED users using CoA, both when quota is exceeded and when quota is reset


The Correct Solution (Industry Practice)

✔ Use CoA (Change of Authorization) at midnight

to force policy re-evaluation for affected users.

This is exactly how ISPs do it.


Design Principle (Important)

We do NOT want to:

  • Disconnect users
  • Restart NAS
  • Restart FreeRADIUS
  • Touch radacct

Example of qc_restore.sh

#!/bin/bash
#set -x
while read user nas rate secret; do
echo "Restoring user=$user to $rate"
# I have disabled this for test, you may enable it , syedjahanzaib
# echo "User-Name=$user, Mikrotik-Rate-Limit=$rate" | \
# radclient -q -c 1 "$nas:3799" coa "$secret"
done < <(
mysql radius -N <<EOF
SELECT
r.username,
r.nasipaddress,
s.normal_rate,
n.secret
FROM user_quota_state q
JOIN radacct r ON r.username = q.username
JOIN user_services u ON u.username = r.username
JOIN service_profiles s ON s.name = u.service_name
JOIN nas n ON n.nasname = r.nasipaddress
WHERE q.quota_date < CURDATE()
AND r.acctstoptime IS NULL;
EOF
)

NOTE:
WHERE q.quota_date < CURDATE()
This is intentional: only past-quota users restored after the new day begins

Why You MUST NOT Rely Only on FreeRADIUS Logic??

You might ask:
“Why not let FreeRADIUS re-evaluate on every packet?”
Because:

  • Authorization happens once
  • Accounting happens later
  • Usage thresholds are crossed after auth

➡ CoA + state table is the only correct solution

qc_reset.sh

DELETE FROM user_quota_state WHERE quota_date < CURDATE();

Cron Job (Run this script every night at 00:05 (12:05 am)

At midnight:

  1. quota_reset.sh deletes state
  2. qc_restore.sh sends CoA
  3. NAS re-asks RADIUS
  4. RADIUS sees NO quota state
  5. RADIUS replies with normal_rate
  6. NAS updates bandwidth live

✔ Same session
✔ Same IP
✔ Same user
✔ Only bandwidth changes

🕛 Correct cron order

Script Purpose Frequency
qc.sh Detect quota exceeded & throttle Every 5–10 min
quota_reset.sh Reset daily enforcement state Once per day
qc_restore.sh Restore normal speed via CoA Once per day
*/5 * * * * /temp/qc.sh
5 0 * * * /temp//quota_reset.sh
10 0 * * * /temp/qc_restore.sh

🧠 Final Mental Model (This is the key)

qc.sh → INSERT state + CoA → throttle bandwidth
quota_reset.sh → DELETE state
qc_restore.sh → CoA only → restore bandwidth

CoA is used in BOTH directions
Throttle AND restore.


🟢 There is NO DISCONNECT Anywhere

If you wanted to disconnect, you would see:

Disconnect-Request

You are not doing that.

So your system is:

  • Correct
  • Clean
  • ISP-grade
  • Non-disruptive

Final sanity checklist

Item Status
Throttle works
Restore logic correct
No disconnects
Explicit bandwidth CoA
Daily lifecycle clean
Script behavior predictable

One-line takeaway (this explains everything)

Restore script does nothing during the same day — and that is exactly correct.

You’ve reached the correct end-state.


Data Consistency Lesson (Important)

Problem: Enforcement query returned no rows even though users were active.
Root Cause:

  • nasipaddress did not match nas.nasname

Fix:

  • Ensure nasname contains the NAS IP address, not only a hostname

This alignment is critical when dynamically pulling NAS secrets.


Why We Did NOT Use sqlcounter

This was a deliberate engineering decision.
Problems with sqlcounter

  • Runs inside authentication path
  • Executes on every login / re‑auth
  • Awkward speed downgrade logic
  • No native state memory
  • Fragile across upgrades

Correct Reality

  • Authorization happens once
  • Usage thresholds are crossed later
  • CoA + state tracking is mandatory

🔹 Simulate Quota Usage (Insert Fake Active Session)

When testing quota logic, you may not have a real online user yet.
In that case, insert a fake but valid active session into radacct.


✅ Example: Insert Active Session for User zaib

INSERT INTO radacct (
acctsessionid,
username,
nasipaddress,
framedipaddress,
acctstarttime,
acctstoptime,
acctinputoctets,
acctoutputoctets
) VALUES (
'SIM-SESSION-001',
'zaib',
'192.168.1.1',
'10.10.10.100',
NOW(),
NULL,
120 * 1024 * 1024 * 1024,
0
);

Why these fields matter

Column Reason
acctsessionid Must be unique
username Must match user_services
nasipaddress Must match nas.nasname
framedipaddress Used for session identification
acctstarttime Marks session start
acctstoptime = NULL Marks session as active
acctinputoctets Simulates data usage
acctoutputoctets Optional (can be 0)

🔹 Verify Inserted Session

SELECT username, nasipaddress, framedipaddress, acctinputoctets, acctstoptime
FROM radacct
WHERE username = 'zaib'
AND acctstoptime IS NULL;

Expected result:

zaib | 192.168.1.1 | 10.10.10.100 | 128849018880 | NULL

(≈120 GB)


🔹 Now Run QC (Quota and Throttle Mode Check)

qc.sh

Expected behavior

  • Script detects quota exceeded
  • Sends bandwidth-only CoA
  • Inserts entry into user_quota_state
  • User remains connected
  • Speed changes to throttled_rate

⚠️ Important Cleanup After Testing

After simulation, remove the fake session to avoid confusion:

DELETE FROM radacct
WHERE acctsessionid = 'SIM-SESSION-001';

🔹 One-Line Reminder

radacct must contain an active (acctstoptime = NULL) session for quota logic to trigger.

 

Final Architecture Summary

Component Role
radacct Usage accounting
service_profiles Speed & quota rules
user_services User mapping
user_quota_state Enforcement memory
qc.sh Detection & CoA
quota_reset.sh Daily reset
FreeRADIUS Policy decision

Securing MySQL Credentials (.my.cnf)

⚠️ Use a .my.cnf file in production (don’t hardcode passwords). this is a security + ops best practice, so let’s do it correctly and simply.

Location

/root/.my.cnf

Content

[client]
user=radius
password=YOUR_DB_PASSWORD
host=localhost
database=radius

🔒 VERY IMPORTANT: Set strict permissions

MySQL will ignore the file if permissions are weak.

chmod 600 /root/.my.cnf
chown root:root /root/.my.cnf

Verify:

ls -l /root/.my.cnf

Expected:

-rw------- 1 root root ... /root/.my.cnf

Result

❌ No passwords in scripts ❌ No exposure via ps or history

✅ How this changes your scripts (Clean & Safe)

❌ Old (bad practice)

mysql radius -u radius -pPASSWORD -e "DELETE FROM user_quota_state ..."

✅ New (correct)

mysql -e "DELETE FROM user_quota_state WHERE quota_date < CURDATE();"
  • No username
  • No password
  • No exposure in ps, history, or logs

🧠 How MySQL finds .my.cnf (important to know)

When you run mysql, it checks in this order:

  1. /etc/my.cnf
  2. /etc/mysql/my.cnf
  3. ~/.my.cnf ← this one
  4. Command-line options

So /root/.my.cnf is automatically used.


Conclusion

By avoiding sqlcounter and using explicit SQL + CoA + state tracking, we achieve:

  • Reliable daily quota enforcement
  • Correct speed throttling
  • Clean FreeRADIUS configuration
  • Predictable behavior
  • Production‑grade stability

This reflects how real ISP systems are built, not lab examples.


Performance Tuning & Scaling Tips (Production ISPs)

This solution is already efficient by design, but the following optimizations ensure stable performance at scale (thousands to tens of thousands of users).


1️⃣ Indexing: The Single Biggest Performance Win

Your scripts heavily query radacct and user_quota_state.
Without proper indexes, MySQL will scan entire tables.

✅ Required Indexes

-- radacct: active session lookup
CREATE INDEX idx_radacct_active
ON radacct (username, acctstoptime);
CREATE INDEX idx_radacct_nas
ON radacct (nasipaddress);
-- radacct: usage calculation
CREATE INDEX idx_radacct_usage
ON radacct (username, acctstarttime);
-- user_quota_state: daily state check
CREATE UNIQUE INDEX idx_quota_user_day
ON user_quota_state (username, quota_date);
-- user_services: user lookup
CREATE INDEX idx_user_services_user
ON user_services (username);
-- service_profiles: service lookup
CREATE INDEX idx_service_profiles_name
ON service_profiles (name);

📌 These indexes reduce query time from seconds to milliseconds.


2️⃣ Limit radacct Growth (Very Important)

radacct grows fast in ISP environments.

Best Practices

  • Enable Interim-Update (5–10 minutes)
  • Periodically archive old records:
DELETE FROM radacct
WHERE acctstarttime < NOW() - INTERVAL 90 DAY;

Or move to an archive table.


3️⃣ Cron Frequency: Don’t Overdo It

Recommended schedule

Script Frequency
qc.sh  Every 5–10 minutes
quota_reset.sh Once per day
quota_restore.sh  Once per day

Running throttle every minute does not improve accuracy, only DB load.


5️⃣ Reduce CoA Traffic (Selective Targeting)

Your design sends CoA only when needed:

✔ User exceeded quota
✔ User not already throttled
✔ User still online

This is far better than sending CoA to all users.


6️⃣ Avoid sqlcounter for High-Scale Quota Logic

Why your design scales better:

Aspect sqlcounter This Design
Runs on auth Yes No
DB queries Per login Scheduled
CoA support Limited Native
Loop protection No Yes
Debuggability Hard Easy

For large ISPs, offloading quota logic from auth path is critical.


7️⃣ MySQL Tuning (Basic but Important)

Minimum recommended settings for radius DB:

[mysqld]
innodb_buffer_pool_size = 2G
innodb_log_file_size = 512M
max_connections = 300</div>

Adjust according to RAM and load.


8️⃣ Use .my.cnf (Already Done)

Avoid command-line passwords.

Benefits:

  • No leaks in ps
  • Cleaner scripts
  • Safer cron jobs

9️⃣ Logging Without Killing Performance

Avoid excessive debug logging.

Recommended approach

echo "$(date) THROTTLE user=$user rate=$rate" >> /var/log/quota_coa.log
  • Log actions, not every query
  • Rotate logs weekly

🔟 Test at Scale (Before Production)

Before enabling in production:

  • Simulate 100–500 fake users
  • Run throttle mode
  • Observe:
    • MySQL CPU
    • Disk I/O
    • CoA response time

If MySQL is slow → indexes are missing.


Final Performance Rule (Remember This)

Quota logic must run OUTSIDE the authentication path.

Your design already follows this rule — which is why it scales.


One-Line Takeaway

Indexes + selective CoA + cron-based logic = ISP-grade performance.

Syed Jahanzaib