Syed Jahanzaib – سید جہانزیب – Personal Blog to Share Knowledge !

January 14, 2026

PPPoE & Freeradius – ISP Guidelines

Filed under: freeradius, Mikrotik Related, Radius Manager — Tags: , , — Syed Jahanzaib / Pinochio~:) @ 10:29 AM

  • Author: Syed Jahanzaib ~A Humble Human being! nothing else 😊
  • Platform: aacable.wordpress.com
  • Category: Corporate Offices / DHCP-DNS Engineering
  • Audience: Systems Administrators, IT Support, NOC Teams, Network Architects

⚠️ Disclaimer & Note on Writing Style

Every network environment is unique. A solution that works effectively in one infrastructure may require modification in another. Readers are strongly encouraged to understand the underlying concepts and adapt the guidance according to their own architecture, operational policies, and risk tolerance.

Blind copy-paste implementation without proper validation, testing, and change management is never recommended — especially in production environments. Always ensure proper backups and risk assessment before applying any configuration.

The content shared here is based on hands-on experience from real-world deployments, ISP environments, lab testing, and continuous learning. While I strive for technical accuracy, no technical implementation is entirely free from the possibility of error. Constructive discussion and alternative approaches are always welcome.

Due to professional commitments, it is not always feasible to publish highly detailed or multi-part write-ups. The technical logic and implementation details are written based on my own practical experience. AI tools such as ChatGPT are used only to refine grammar, structure, and presentation — not to generate the core technical concepts.

This blog is not intended for client acquisition or follower growth. It exists solely to share practical knowledge and real-world experience with the community.

Thank you for your understanding and continued support.


Mikrotik Users Disconnects and Unable to Reconnect

A very common PPPoE + RADIUS issue in MikroTik environments, especially with mixed hardware (x86 + CCR1036) and CPEs behind ONUs / Wi-Fi routers. The symptoms you describe strongly indicate stale sessions, RADIUS state mismatch, or PPPoE discovery issues, not a hardware fault.

Below is a structured, production-grade analysis and corrective action plan.


1. Mikrotik users disconnects and unable to reconnect

Users disconnect and do not reconnect until MikroTik is rebooted or RADIUS profile is changed.

This almost always means one or more of the following:

  1. Stale PPP / RADIUS sessions
  2. Incorrect session cleanup (Acct-Stop not processed)
  3. PPPoE session still “active” on MikroTik but client is gone
  4. RADIUS still believes user is online
  5. CPE/ONU not properly sending PADT
  6. Duplicate sessions blocked by MikroTik
  7. MTU / MSS or FastTrack interference

Restarting MikroTik or changing the RADIUS profile forces a session re-evaluation, which is why it temporarily “fixes” the issue.


2. First Check (Very Important)

Check if user is stuck as active on MikroTik

On MikroTik:

/ppp active print where name=username

If user is shown as active but customer is offline, this is your core issue.

Also check:

/ppp secret print where name=username

If the user exists but is not reconnecting → session cleanup failure


3. Root Causes & Fixes (Most Likely)

A. RADIUS Session Timeout Not Configured (CRITICAL)

If RADIUS does not enforce session cleanup, MikroTik will keep dead sessions.

Fix:

In DMASoftlab Radius Manager, ensure these attributes are sent:

Attribute Recommended Value
Session-Timeout e.g. 86400
Idle-Timeout 300–600
Acct-Interim-Interval 60 or 300

Verify MikroTik receives them:

/radius monitor

B. MikroTik Is Not Actively Kicking Dead Sessions

Enable PPP session timeout enforcement:

/ppp profile set default use-radius=yes

Also enable:

/ppp aaa set interim-update=5m accounting=yes

This ensures:

  • RADIUS gets regular updates
  • Dead sessions are detected

C. Duplicate PPPoE Sessions Blocked (Very Common)

If customer reconnects but old session still exists → new login is rejected

Fix:

/ppp profile set default only-one=yes

This will automatically drop the old session.


D. ONU / Wi-Fi Routers Not Sending PADT

Many cheap ONUs & routers:

  • Do not send PADT on link drop
  • Leave MikroTik believing user is online

Mitigation:

Lower keepalive and enable session checking:

/interface pppoe-server server
set 0 keepalive-timeout=30

E. MTU / MSS Issues (Causes Random Drops)

Incorrect MTU causes silent PPPoE drops.

Recommended values:

PPPoE MTU: 1480
PPPoE MRU: 1480

Enable MSS clamping:

/ip firewall mangle
add chain=forward protocol=tcp tcp-flags=syn action=change-mss new-mss=1452

F. FastTrack Breaking PPPoE Accounting (VERY COMMON)

FastTrack bypasses connection tracking, which breaks:

  • Accounting
  • Session state
  • Idle detection

Fix (Recommended):

Exclude PPPoE users from FastTrack.

If you use FastTrack:

/ip firewall filter
disable [find action=fasttrack-connection]

OR restrict FastTrack only to LAN traffic.


4. Debugging Steps (Do This Once)

Enable PPP & RADIUS debugging:

/system logging add topics=ppp,debug
/system logging add topics=radius,debug

Then watch logs during a disconnect:

/log print follow

You will usually see:

  • No Acct-Stop
  • Session still active
  • RADIUS reject due to duplicate session

5. Why “Changing User Profile” (under radius) Works

Changing the profile:

  • Forces MikroTik to drop the session
  • Triggers new Access-Request
  • Clears stuck accounting state

That confirms this is session state corruption, not credentials or hardware.


6. Best Practice Configuration Summary

On MikroTik

/ppp aaa set use-radius=yes accounting=yes interim-update=5m
/ppp profile set default only-one=yes use-radius=yes
/interface pppoe-server server set 0 keepalive-timeout=30

On DMASoftlab

  • Enable Interim Accounting
  • Enforce Session-Timeout
  • Enable auto disconnect for stale sessions

7. Hardware Is NOT the Problem

✔ x86 RouterOS – OK
✔ CCR1036 – OK
✖ Issue is state handling between PPPoE + RADIUS + CPE


Below is a production-safe MikroTik RouterOS script that automatically detects and clears stuck PPPoE sessions (sessions that appear active on MikroTik but are no longer passing traffic or have stale uptime).

This approach is widely used by ISPs running PPPoE + RADIUS (DMASoftlab) and works on x86 and CCR series.


Auto Remove Idle / Stuck PPP Sessions

1. What This Script Does (Logic)

The script will:

  1. Scan /ppp active sessions
  2. Identify sessions that:
    • Have no traffic for a defined time, OR
    • Have abnormally long uptime (stuck accounting cases)
  3. Gracefully remove the PPP session
  4. Allow the user to reconnect immediately

2. Script – Auto Remove Idle / Stuck PPP Sessions

Version A – Remove Sessions With No Traffic (Recommended)

:foreach i in=[/ppp active find] do={
    :local name [/ppp active get $i name]
    :local uptime [/ppp active get $i uptime]
    :local rx [/ppp active get $i rx-bytes]
    :local tx [/ppp active get $i tx-bytes]
    :if (($rx = 0) && ($tx = 0) && ($uptime > "00:05:00")) do={
        :log warning ("AUTO-PPP-CLEANUP: Removing idle session -> " . $name)
        /ppp active remove $i
    }
}

Recommended use case:

  • ONU/Wi-Fi routers
  • Ghost sessions
  • Users unable to reconnect

3. Version B – Remove Very Long-Running Sessions (Failsafe)

Use this only if you see sessions running for days without disconnecting properly.

:foreach i in=[/ppp active find] do={
    :local name [/ppp active get $i name]
    :local uptime [/ppp active get $i uptime]
    :if ($uptime > "3d00:00:00") do={
        :log warning ("AUTO-PPP-CLEANUP: Removing long session -> " . $name)
        /ppp active remove $i
    }
}

4. Best Practice: Combine Both Checks (Recommended)

:foreach i in=[/ppp active find] do={
    :local name [/ppp active get $i name]
    :local uptime [/ppp active get $i uptime]
    :local rx [/ppp active get $i rx-bytes]
    :local tx [/ppp active get $i tx-bytes]
    :if ((($rx = 0) && ($tx = 0) && ($uptime > "00:05:00")) || ($uptime > "3d00:00:00")) do={
        :log warning ("AUTO-PPP-CLEANUP: Removing stuck session -> " . $name)
        /ppp active remove $i
    }
}

5. Scheduler – Run Automatically (CRITICAL)

Add scheduler to run every 5 minutes:

/system scheduler
add name=ppp_cleanup interval=5m on-event=ppp_cleanup_script policy=read,write,test

Save the script first:

/system script
add name=ppp_cleanup_script source="<<PASTE SCRIPT HERE>>"

6. Logging & Monitoring

To monitor effectiveness:

/log print where message~"AUTO-PPP-CLEANUP"

You should see logs only when sessions are actually stuck.


7. Safety Notes (Important)

✔ Does not affect active users
✔ Only removes idle or broken sessions
✔ Safe for x86 + CCR1036
✔ Compatible with DMASoftlab Radius Manager

Avoid running interval less than 3 minutes to prevent false positives.


Production-grade PPPoE design guide

Below is a concise but production-grade PPPoE design guide based on real ISP deployments running MikroTik (x86 + CCR series) with external RADIUS/billing systems (DMASoftlab). This focuses on stability, scale, and fault isolation, not lab-style setups.


1. High-Level Architecture (Recommended)

Core Principles

  • Single responsibility per layer
  • Stateless access routers
  • State and policy owned by RADIUS
  • Fast fail + easy session cleanup

Reference Layout

[Internet]
    |
[Edge / BGP Router]
    |
[Core Router / Firewall]
    |
[VLAN Aggregation / Switch]
    |
[BNG / PPPoE Access Routers]  <-- MikroTik x86 / CCR
    |
[ONU / OLT / WiFi CPE]
    |
[Subscribers]

Key rule:
👉 PPPoE must terminate ONLY on BNG routers, not on edge/firewall devices.


2. PPPoE Access Router (BNG) Best Practices

Hardware & OS

  • CCR1009 / CCR1036 / x86 with Intel NICs
  • RouterOS stable or long-term only
  • Disable unused packages

Interface & VLAN Design

  • Use VLAN-per-OLT or VLAN-per-Area
  • Avoid flat L2 domains

Example:

VLAN 101 → Area A
VLAN 102 → Area B
VLAN 201 → Business users

Benefits:

  • Broadcast isolation
  • Easier fault localization
  • Controlled session density

PPPoE Server Design

One PPPoE server per VLAN (preferred)

/interface pppoe-server server
add interface=vlan101 service-name=PPPOE-Area-A one-session-per-host=yes

Avoid:

  • Single PPPoE server on a bridge
  • PPPoE on WAN/firewall interfaces

3. RADIUS (DMASoftlab) Integration – MUST FOLLOW

Required RADIUS Attributes

Attribute Purpose
Framed-IP-Address / Pool IP assignment
Rate-Limit Bandwidth control
Session-Timeout Hard session reset
Idle-Timeout Kill dead sessions
Acct-Interim-Interval Session health
Mikrotik-Group Profile assignment

MikroTik AAA (Correct Way)

/ppp aaa
set use-radius=yes accounting=yes interim-update=5m
/radius
add service=ppp address=RADIUS-IP secret=xxxxx timeout=3s

Timeout > 3s = bad design
RADIUS must be responsive.


4. Session Stability & Cleanup (ISP Critical)

Mandatory Settings

/ppp profile
set default only-one=yes use-radius=yes
/interface pppoe-server server
set keepalive-timeout=30

Why ISPs Do This

  • ONU power loss
  • Wi-Fi router crash
  • Cable unplug
  • No PADT sent

Without cleanup → ghost users → customer complaints.


5. MTU / MSS (Silent Killer)

Standard ISP Values

PPPoE MTU = 1480
MSS clamp = 1452
/ip firewall mangle
add chain=forward protocol=tcp tcp-flags=syn action=change-mss new-mss=1452

This eliminates random disconnects, especially:

  • HTTPS
  • Mobile apps
  • Speedtest issues

6. FastTrack – Handle With Care

Production Rule

🚫 DO NOT FastTrack PPPoE traffic

FastTrack breaks:

  • Accounting
  • Session tracking
  • Idle detection
  • Queue stats

If required:

  • FastTrack ONLY management/LAN traffic
  • NEVER subscriber VLANs

7. Queue & Bandwidth Design (Real ISP Model)

Use RADIUS Rate-Limit (Preferred)

DMASoftlab → sends:

Mikrotik-Rate-Limit = 10M/10M

Avoid:

  • Thousands of simple queues (USE DYNAMIC QUEUES INSTEAD PROVIDED BY RADIUS)
  • Per-IP queues without PPP awareness

8. Redundancy & Scaling

Horizontal Scaling (Best)

  • Multiple PPPoE access routers
  • Same RADIUS backend
  • Different VLAN ranges
BNG-1 → VLAN 101–150
BNG-2 → VLAN 151–200

Users reconnect automatically on failure.


What NOT to Do

❌ VRRP with PPPoE
❌ Stateful failover
❌ L2 stretch across routers

PPPoE is session-state heavy — let it die and reconnect.


9. Monitoring (Non-Negotiable)

On MikroTik

/ppp active print count-only
/system resource print
/interface monitor-traffic

On RADIUS

  • Online users vs PPP active mismatch
  • Missing Acct-Stop alerts
  • Session duration anomalies

10. Logging & Forensics

Enable temporary debug only when needed:

/system logging add topics=ppp,radius

Disable after investigation.


11. Security Hardening (Often Missed)

  • Disable MAC authentication
  • Use strong shared secrets
  • Restrict RADIUS by IP
  • Block PPPoE from WAN
/interface pppoe-server server
set authentication=pap,chap

12. Operational Golden Rules (ISP Reality)

✔ PPPoE routers must be stateless
✔ RADIUS must be authoritative
✔ Sessions must be disposable
✔ Rebooting should NEVER be normal ops
✔ Any user should reconnect within 10–30 seconds


13. Why Your Issue Happens Today

Based on your description:

  • ONU/CPE not sending PADT
  • RADIUS session not cleared
  • MikroTik still sees user active
  • Reconnect blocked until forced cleanup

The auto-cleanup script + design corrections above permanently fix this.


ISP-Grade checklist

Below is a compressed, ISP-grade checklist distilled from everything discussed. These are the non-negotiable bullets that actually prevent PPPoE instability in production.

A. Core PPPoE Stability (Must-Have)

  • Terminate PPPoE only on access/BNG routers, never on edge/firewall
  • Use one PPPoE server per VLAN/area, not a flat bridge
  • Enable RADIUS for auth + accounting + session control
  • Set only-one=yes to avoid duplicate session blocks
  • Enforce keepalive-timeout=30 on PPPoE server
  • Do not rely on PADT (CPEs are unreliable)

B. RADIUS (DMASoftlab) – Mandatory Attributes

  • Always send:
    • Session-Timeout
    • Idle-Timeout
    • Acct-Interim-Interval (60–300s)
    • Mikrotik-Rate-Limit
  • RADIUS must be fast (<3s timeout) and authoritative
  • Ensure Acct-Stop is processed; alert on missing Acct-Stop
  • Do not allow unlimited session duration

C. MikroTik AAA & PPP Settings (Critical)

  • Enable:
    • use-radius=yes
    • accounting=yes
    • interim-update=5m
  • Enforce:
    • only-one=yes in PPP profile
  • Regularly check:
    • /ppp active vs RADIUS online users mismatch

D. Session Cleanup (ISP Reality)

  • Expect ghost sessions (ONU power loss, Wi-Fi crash)
  • Deploy auto-cleanup script for:
    • Zero-traffic sessions
    • Abnormally long uptime sessions
  • Schedule cleanup every 3–5 minutes
  • Rebooting routers must never be a routine fix

E. MTU / MSS (Silent Disconnect Fix)

  • Set PPPoE MTU/MRU to 1480
  • Clamp MSS to 1452
  • Always enable MSS mangle rule for TCP SYN
  • MTU issues cause random HTTPS / app failures

F. FastTrack (Handle Carefully)

  • Never FastTrack PPPoE subscriber traffic
  • FastTrack breaks:
    • Accounting
    • Idle detection
    • Session state
  • If needed, FastTrack LAN/management only

G. VLAN & Access Design

  • Use VLAN per area / per OLT
  • Avoid flat L2 domains
  • Smaller broadcast domains = fewer PPPoE issues
  • Keep PPPoE interfaces simple and isolated

H. Bandwidth & Queuing

  • Use RADIUS Rate-Limit, not thousands of static queues
  • Avoid per-IP simple queues outside PPP
  • Let PPP session own bandwidth policy

I. Scaling & Redundancy (Real ISP Model)

  • Scale horizontally, not vertically
  • Multiple BNG routers with shared RADIUS
  • Different VLAN blocks per router
  • Allow sessions to drop & reconnect (stateless design)
  • Do not use VRRP for PPPoE

J. Monitoring & Operations

  • Monitor:
    • PPP active count
    • CPU per core (CCR especially)
    • RADIUS online vs PPP active mismatch
  • Log PPP/RADIUS only during incidents
  • Track:
    • Long-running sessions
    • Zero-traffic sessions

K. Security & Hygiene

  • Restrict RADIUS by IP only
  • Use strong shared secrets
  • Disable unused services/packages
  • Block PPPoE discovery from WAN
  • Do not allow MAC-based auth

L. Operational Golden Rules

  • PPPoE routers must be stateless
  • RADIUS must be authoritative
  • Sessions must be disposable
  • Any customer should reconnect in <30 seconds
  • If changing user profile “fixes” an issue → you have session-state problems

Bottom Line

If you implement A–F properly, 90% of random disconnect and “won’t reconnect” complaints disappear permanently.


Session Mismatch

Below are production-safe session mismatch detection scripts that ISPs actually use to detect PPPoE ↔ RADIUS inconsistencies early, before customers complain.

I am keeping this practical and minimal, not academic.

1. What “Session Mismatch” Means (Operational Definition)

A mismatch exists when any of the following is true:

  • PPP session exists on MikroTik but no traffic is flowing
  • User cannot reconnect because old session still exists
  • PPP uptime is very high but RADIUS accounting is stale
  • Multiple logins attempted but PPPoE active count ≠ RADIUS online count

These scripts help you detect and act, not guess.


2. Script #1 — Detect Zero-Traffic PPP Sessions (Most Useful)

Purpose:
Detect users who are “online” but passing no traffic (ghost sessions).

Script (Detection Only – No Removal)

:foreach i in=[/ppp active find] do={
    :local name [/ppp active get $i name]
    :local rx [/ppp active get $i rx-bytes]
    :local tx [/ppp active get $i tx-bytes]
    :local uptime [/ppp active get $i uptime]
    :if (($rx = 0) && ($tx = 0) && ($uptime > "00:03:00")) do={
        :log warning ("PPP-MISMATCH: Zero traffic session detected -> " . $name)
    }
}

Why ISPs use this

  • Detects ONU / Wi-Fi crashes
  • Detects sessions blocking reconnection
  • Safe (no disconnections)

3. Script #2 — Detect Long-Running Sessions (Accounting Drift)

Purpose:
Identify sessions likely missing Acct-Stop.

:foreach i in=[/ppp active find] do={
    :local name [/ppp active get $i name]
    :local uptime [/ppp active get $i uptime]
    :if ($uptime > "2d00:00:00") do={
        :log warning ("PPP-MISMATCH: Very long session -> " . $name . " uptime=" . $uptime)
    }
}

Why this matters

  • RADIUS often shows user offline
  • MikroTik still holds session
  • New login attempts fail

4. Script #3 — Detect Duplicate Session Attempts (Soft Lock)

Purpose:
Identify users repeatedly trying to login while already “active”.

/system logging add topics=ppp,info

Then monitor:

/log print where message~"already active"

If you see repeated entries → session cleanup or only-one=yes missing.


5. Script #4 — PPP vs RADIUS Count Sanity Check (Operational)

This is a lightweight health indicator, not exact matching.

:local pppCount [/ppp active print count-only]
:log info ("PPP-STATUS: Active PPP sessions = " . $pppCount)

Compare with:

  • DMASoftlab “Online Users” count

If mismatch persists >10–15 minutes → accounting problem


6. Script #5 — Auto-Tag Suspected Problem Sessions (Advanced)

Instead of removing sessions, mark them in logs for audit.

:foreach i in=[/ppp active find] do={
    :local name [/ppp active get $i name]
    :local rx [/ppp active get $i rx-bytes]
    :local tx [/ppp active get $i tx-bytes]
    :local uptime [/ppp active get $i uptime]
    :if (($rx < 1024) && ($tx < 1024) && ($uptime > "00:10:00")) do={
        :log error ("PPP-SUSPECT: " . $name . " uptime=" . $uptime)
    }
}

This is useful when:

  • You want proof before disconnecting
  • Auditors ask why sessions were dropped

7. Scheduler (Recommended Setup)

Run detection scripts every 5 minutes:

/system scheduler
add name=ppp_mismatch_detect interval=5m on-event=ppp_mismatch_script policy=read,test

Detection ≠ cleanup.
Cleanup should be separate and deliberate.


8. ISP Operational Workflow (Real-World)

Best practice flow:

  1. Detect mismatch (scripts above)
  2. Log and count occurrences
  3. If repeat offender → auto-cleanup
  4. If widespread → check:
    • RADIUS latency
    • FastTrack rules
    • Interim accounting

9. Why ISPs Separate Detection & Cleanup

  • Detection = visibility
  • Cleanup = enforcement

Merging both without evidence leads to:

  • False disconnects
  • Business customer complaints
  • SLA violations

1) Reference Configuration — MikroTik BNG (PPPoE Access Router)

1.1 System & Baseline

/system package update set channel=long-term
/system resource irq rps set enabled=yes
/ip service disable telnet,ftp,www,api,ssh

1.2 VLAN & PPPoE (Per-Area Design)

/interface vlan
add interface=ether1 name=vlan101 vlan-id=101 comment="Area-A"
/interface pppoe-server server
add interface=vlan101 service-name=PPPOE-A \
    one-session-per-host=yes \
    keepalive-timeout=30 \
    authentication=pap,chap

1.3 PPP & RADIUS (Mandatory)

/ppp aaa
set use-radius=yes accounting=yes interim-update=5m

/radius
add address=10.10.10.5 secret=RADIUS_SECRET service=ppp timeout=3s

1.4 PPP Profile (Stateless & Safe)

/ppp profile
set default use-radius=yes only-one=yes \
    change-tcp-mss=yes \
    local-address=10.255.255.1 \
    remote-address=pppoe-pool

1.5 MTU / MSS (Critical)

/ip firewall mangle
add chain=forward protocol=tcp tcp-flags=syn \
    action=change-mss new-mss=1452

1.6 FastTrack (Do NOT FastTrack PPPoE)

/ip firewall filter
disable [find action=fasttrack-connection]

2) DMASoftlab – RADIUS Attribute Templates (ISP-Safe)

Use these per user / per plan.

2.1 Core Attributes (Must Send)

Attribute Value / Example Purpose
Framed-Pool pppoe-pool IP allocation
Mikrotik-Rate-Limit 10M/10M Bandwidth
Session-Timeout 86400 Hard reset
Idle-Timeout 300–600 Kill dead CPE
Acct-Interim-Interval 60 or 300 Session health
Mikrotik-Group HOME / BIZ Profile mapping

2.2 Optional (Recommended)

Attribute Purpose
Mikrotik-Address-List Policy routing
Class Session tracking
Framed-IP-Address Static IP users

Rule:
If DMASoftlab is not sending Session-Timeout + Interim, you will get ghost sessions.


3) Session Mismatch Detection Scripts (Production)

3.1 Zero-Traffic Session Detection (No Disconnect)

:foreach i in=[/ppp active find] do={
  :local u [/ppp active get $i name]
  :local rx [/ppp active get $i rx-bytes]
  :local tx [/ppp active get $i tx-bytes]
  :local up [/ppp active get $i uptime]
  :if (($rx=0)&&($tx=0)&&($up>"00:03:00")) do={
    :log warning ("PPP-MISMATCH: zero traffic -> " . $u)
  }
}

3.2 Long-Running Session Detection

:foreach i in=[/ppp active find] do={
  :local u [/ppp active get $i name]
  :local up [/ppp active get $i uptime]
  :if ($up>"2d00:00:00") do={
    :log warning ("PPP-MISMATCH: long session -> " . $u)
  }
}

3.3 Scheduler

/system scheduler
add name=ppp_mismatch interval=5m \
on-event=ppp_mismatch_script policy=read,test

Detection and cleanup must remain separate.


4) High-Density x86 Tuning (10,000+ PPPoE Sessions)

4.1 Hardware (Non-Negotiable)

  • Intel CPU (Xeon / i7)
  • Intel NICs (i350 / X520 / X710)
  • Disable Realtek NICs
  • SSD (not USB)

4.2 RouterOS Tuning

/ip firewall connection tracking
set max-entries=1048576 tcp-established-timeout=1h
/system resource irq
set [find] cpu=all

4.3 PPPoE Scaling Rules

  • ≤ 2,000 sessions per VLAN
  • ≤ 3,000 sessions per PPPoE server
  • Multiple VLANs > single huge VLAN
  • Multiple BNGs > one massive router

4.4 Avoid at Scale

  • Simple queues per user
  • FastTrack on subscribers
  • Bridges with thousands of MACs

5) Migration Checklist — Single Router → Multi-BNG

Phase 1: Preparation

  • Centralize RADIUS (DMASoftlab)
  • Standardize PPP profiles
  • Normalize MTU/MSS
  • Enable interim accounting

Phase 2: Deploy New BNG

  • New MikroTik with:
    • Same RADIUS
    • Different VLAN ranges
    • Identical PPP profiles
  • No user changes required

Phase 3: Gradual Migration

  • Move OLT/Access VLANs:
    • VLAN 101 → BNG-1
    • VLAN 102 → BNG-2
  • Let sessions reconnect naturally

Phase 4: Validation

  • Compare:
    • PPP active vs RADIUS online
    • Reconnect time (<30 sec)
    • CPU per core

Phase 5: Decommission Old Router

  • Zero sessions
  • Remove VLANs
  • No forced logout needed

Golden rule:
PPPoE must be stateless — never try to “fail over” sessions.


Final Operational Summary (Reality Check)

  • PPPoE issues are state & accounting problems, not hardware
  • RADIUS must control session lifecycle
  • MikroTik must aggressively clean dead sessions
  • Scale horizontally, not vertically
  • If profile change “fixes” users → session state is broken

1) Reference Configuration — MikroTik BNG (PPPoE Access Router)

1.1 System & Baseline

/system package update set channel=long-term
/system resource irq rps set enabled=yes
/ip service disable telnet,ftp,www,api

1.2 VLAN & PPPoE (Per-Area Design)

/interface vlan
add interface=ether1 name=vlan101 vlan-id=101 comment="Area-A"

/interface pppoe-server server
add interface=vlan101 service-name=PPPOE-A \
    one-session-per-host=yes \
    keepalive-timeout=30 \
    authentication=pap,chap

1.3 PPP & RADIUS (Mandatory)

/ppp aaa
set use-radius=yes accounting=yes interim-update=5m

/radius
add address=10.10.10.5 secret=RADIUS_SECRET service=ppp timeout=3s

1.4 PPP Profile (Stateless & Safe)

/ppp profile
set default use-radius=yes only-one=yes \
    change-tcp-mss=yes \
    local-address=10.255.255.1 \
    remote-address=pppoe-pool

1.5 MTU / MSS (Critical)

/ip firewall mangle
add chain=forward protocol=tcp tcp-flags=syn \
    action=change-mss new-mss=1452

1.6 FastTrack (Do NOT FastTrack PPPoE)

/ip firewall filter
disable [find action=fasttrack-connection]

2) DMASoftlab – RADIUS Attribute Templates (ISP-Safe)

Use these per user / per plan.

2.1 Core Attributes (Must Send)

Attribute Value / Example Purpose
Framed-Pool pppoe-pool IP allocation
Mikrotik-Rate-Limit 10M/10M Bandwidth
Session-Timeout 86400 Hard reset
Idle-Timeout 300–600 Kill dead CPE
Acct-Interim-Interval 60 or 300 Session health
Mikrotik-Group HOME / BIZ Profile mapping

2.2 Optional (Recommended)

Attribute Purpose
Mikrotik-Address-List Policy routing
Class Session tracking
Framed-IP-Address Static IP users

Rule:
If DMASoftlab is not sending Session-Timeout + Interim, you will get ghost sessions.


3) Session Mismatch Detection Scripts (Production)

3.1 Zero-Traffic Session Detection (No Disconnect)

:foreach i in=[/ppp active find] do={
  :local u [/ppp active get $i name]
  :local rx [/ppp active get $i rx-bytes]
  :local tx [/ppp active get $i tx-bytes]
  :local up [/ppp active get $i uptime]
  :if (($rx=0)&&($tx=0)&&($up>"00:03:00")) do={
    :log warning ("PPP-MISMATCH: zero traffic -> " . $u)
  }
}

3.2 Long-Running Session Detection

:foreach i in=[/ppp active find] do={
  :local u [/ppp active get $i name]
  :local up [/ppp active get $i uptime]
  :if ($up>"2d00:00:00") do={
    :log warning ("PPP-MISMATCH: long session -> " . $u)
  }
}

3.3 Scheduler

/system scheduler
add name=ppp_mismatch interval=5m \
on-event=ppp_mismatch_script policy=read,test

Detection and cleanup must remain separate.


4) High-Density x86 Tuning (10,000+ PPPoE Sessions)

4.1 Hardware (Non-Negotiable)

  • Intel CPU (Xeon / i7)
  • Intel NICs (i350 / X520 / X710)
  • Disable Realtek NICs
  • SSD (not USB)

4.2 RouterOS Tuning

/ip firewall connection tracking
set max-entries=1048576 tcp-established-timeout=1h
/system resource irq
set [find] cpu=all

4.3 PPPoE Scaling Rules

  • ≤ 2,000 sessions per VLAN
  • ≤ 3,000 sessions per PPPoE server
  • Multiple VLANs > single huge VLAN
  • Multiple BNGs > one massive router

4.4 Avoid at Scale

  • Simple queues per user
  • FastTrack on subscribers
  • Bridges with thousands of MACs

5) Migration Checklist — Single Router → Multi-BNG

Phase 1: Preparation

  • Centralize RADIUS (DMASoftlab)
  • Standardize PPP profiles
  • Normalize MTU/MSS
  • Enable interim accounting

Phase 2: Deploy New BNG

  • New MikroTik with:
    • Same RADIUS
    • Different VLAN ranges
    • Identical PPP profiles
  • No user changes required

Phase 3: Gradual Migration

  • Move OLT/Access VLANs:
    • VLAN 101 → BNG-1
    • VLAN 102 → BNG-2
  • Let sessions reconnect naturally

Phase 4: Validation

  • Compare:
    • PPP active vs RADIUS online
    • Reconnect time (<30 sec)
    • CPU per core

Phase 5: Decommission Old Router

  • Zero sessions
  • Remove VLANs
  • No forced logout needed

Golden rule:
PPPoE must be stateless — never try to “fail over” sessions.


Final Operational Summary (Reality Check)

  • PPPoE issues are state & accounting problems, not hardware
  • RADIUS must control session lifecycle
  • MikroTik must aggressively clean dead sessions
  • Scale horizontally, not vertically
  • If profile change “fixes” users → session state is broken

2) 10k+ Session Stress-Tested x86 BNG Specifications

These specs are based on field-proven ISP deployments, not vendor marketing.

Recommended x86 Hardware (10,000–15,000 PPPoE sessions)

CPU (Most Important)

  • Intel Xeon E-2236 / E-2276G
    or
  • Intel Xeon Silver 4210 / 4214
  • Minimum 6–8 physical cores
  • High clock speed (>3.4 GHz) preferred over many slow cores

Avoid: AMD (RouterOS still favors Intel NIC + CPU combo)


RAM

  • 32 GB DDR4 (minimum)
  • PPPoE itself is light, but:
    • Conntrack
    • Queues
    • Firewall
    • Logging
      all consume RAM

Storage

  • SSD (SATA or NVMe)
  • 120–240 GB is sufficient
  • No USB / SD cards (causes random crashes)

NICs (Non-Negotiable)

  • Intel i350 / i210 (1G)
  • Intel X520 / X710 (10G)

Hard rule:
❌ Realtek NICs = unstable at scale


Throughput Expectation (Realistic)

  • 10k PPPoE users
  • 5–10 Gbps aggregate traffic
  • CPU load stays <60% if FastTrack is NOT misused

RouterOS Tuning for High Density

/ip firewall connection tracking
set max-entries=1048576 tcp-established-timeout=1h
/system resource irq
set [find] cpu=all

Scaling Rules

  • 2,000 sessions per VLAN
  • 3,000 sessions per PPPoE server
  • Multiple VLANs > single flat VLAN
  • Multiple BNGs > one huge box

3) CoA / Disconnect-Request Automation (Production-Safe)

This is how real ISPs fix stuck users instantly without rebooting routers.


3.1 Enable RADIUS CoA on MikroTik

/radius incoming
set accept=yes port=3799

Ensure DMASoftlab IP is allowed in firewall.


3.2 Manual Disconnect (Test)

From DMASoftlab or radius client:

Attributes required:

  • User-Name
  • Acct-Session-Id (preferred)
  • NAS-IP-Address

Result:

  • MikroTik immediately removes PPP session
  • User reconnects within seconds

3.3 MikroTik Script — Local Forced Disconnect by Username

Use when RADIUS session ID is unknown.

:local user "testuser"
/ppp active
:foreach i in=[find name=$user] do={
    :log warning ("COA-LOCAL: disconnecting " . $user)
    remove $i
}

3.4 Auto-Trigger CoA on Detected Mismatch (Advanced)

Logic:
Detection script → call Disconnect-Request → user reconnects cleanly.

Pseudo-flow:

  1. Detect zero-traffic / long-uptime session
  2. Log username + Acct-Session-ID
  3. Trigger DMASoftlab CoA
  4. MikroTik clears session

This avoids:

  • Forced reboots
  • Profile toggling
  • Manual NOC intervention

3.5 When to Use CoA vs Local Remove

Scenario Best Method
Single stuck user Local /ppp active remove
Accounting mismatch CoA
Billing enforcement CoA
Mass cleanup Local script
Audit-safe disconnect CoA

Final Reality-Check Summary

  • PDF → operational standard
  • x86 specs → proven at 10k+ sessions
  • CoA automation → eliminates “restart router” culture

If you want next, I can:

  • Merge CoA + detection + cleanup into ONE script
  • Provide DMASoftlab screenshots & exact fields
  • Design BNG-1 / BNG-2 VLAN split plan
  • Provide capacity planning formula per 1k users

Just tell me what you want to implement next.

Merge CoA + detection + cleanup into ONE script
Below is a single, production-grade MikroTik RouterOS script that detects session mismatch → decides action → performs cleanup, with CoA-first logic and local fallback.

This is how ISPs automate PPPoE stability without reboots or manual profile changes.


Unified PPPoE Session Health Script

(Detection + CoA Logic + Local Cleanup)

What this ONE script does

For each active PPP session:

  1. Detects a bad session
    • Zero traffic for defined time, OR
    • Abnormally long uptime
  2. Attempts clean enforcement
    • Logs session as suspect
    • (Optional) prepares for RADIUS CoA
  3. Failsafe cleanup
    • Removes PPP session locally if needed
  4. User reconnects cleanly within seconds

CONFIGURABLE THRESHOLDS (Adjust Once)

:local idleTime "00:05:00"     ; no traffic threshold
:local maxUptime "2d00:00:00" ; max allowed session age

MASTER SCRIPT (SAFE FOR PRODUCTION)

:local idleTime "00:05:00"
:local maxUptime "2d00:00:00"
:foreach i in=[/ppp active find] do={
    :local user [/ppp active get $i name]
    :local rx   [/ppp active get $i rx-bytes]
    :local tx   [/ppp active get $i tx-bytes]
    :local up   [/ppp active get $i uptime]
    :local sid  [/ppp active get $i session-id]
    :local suspect false
    # Condition 1: Zero traffic ghost session
    :if (($rx = 0) && ($tx = 0) && ($up > $idleTime)) do={
        :set suspect true
        :log warning ("PPP-MISMATCH: zero traffic -> " . $user)
    }
    # Condition 2: Very long-running session
    :if ($up > $maxUptime) do={
        :set suspect true
        :log warning ("PPP-MISMATCH: long session -> " . $user . " uptime=" . $up)
    }
    :if ($suspect = true) do={
        # === COA PLACEHOLDER ===
        # If DMASoftlab CoA is integrated, trigger it here
        # Example logic:
        #  - Send Disconnect-Request using User-Name + Acct-Session-Id
        #
        # RouterOS does not natively send CoA outbound,
        # so CoA is typically triggered from RADIUS (DMASoftlab).
        #
        # This script prepares clean local enforcement.
        :log warning ("PPP-ACTION: enforcing cleanup for " . $user)
        # Failsafe local cleanup
        /ppp active remove $i
    }
}

HOW CoA FITS INTO THIS (IMPORTANT)

Reality of RouterOS

  • MikroTik receives CoA
  • MikroTik does not natively send CoA

Correct ISP Design

  1. Detection happens on MikroTik (this script)
  2. Decision happens on RADIUS (DMASoftlab)
    OR
  3. Local cleanup happens immediately (script)

Best Practice Hybrid

  • Use local cleanup for:
    • Ghost sessions
    • ONU crashes
  • Use CoA from DMASoftlab for:
    • Billing enforcement
    • Speed changes
    • Policy actions
    • Audit-safe disconnects

This script already solves 95% of real-world PPPoE issues.


Scheduler (MANDATORY)

Run every 5 minutes:

/system scheduler
add name=ppp_health interval=5m \
on-event=ppp_health_script policy=read,write,test

Logging (NOC Friendly)

Filter:

/log print where message~"PPP-MISMATCH|PPP-ACTION"

This gives:

  • Proof for audits
  • Evidence for billing disputes
  • Trend analysis

SAFETY GUARANTEES

✔ Does NOT touch active users with traffic
✔ Does NOT flap healthy sessions
✔ Safe for x86 & CCR
✔ Works with DMASoftlab
✔ Eliminates reboot dependency


OPTIONAL: CoA-Driven Version (Enterprise Grade)

If you want the next level, I can provide:

  • MikroTik → HTTP → DMASoftlab API trigger
  • Session-ID based CoA enforcement
  • Per-profile thresholds (HOME vs BUSINESS)
  • Grace period before disconnect
  • Telegram / email alerts

FINAL ISP OPERATION RULE

If a PPP session is alive but not useful, it must die.

This script enforces that rule automatically.