Syed Jahanzaib – Personal Blog to Share Knowledge !

February 19, 2018

High CPU load when PPPoE sessions disconnects in Mikrotik

Filed under: Uncategorized — Syed Jahanzaib / Pinochio~:) @ 4:46 PM

stress


Disclaimer:

Every Network is different , so one solution cannot be applied to all. Therefore try to understand logic & create your own solution as per your network scenario. Just dont follow copy paste.

If anybody here thinks I am an expert on this stuff, I am NOT certified in anything Mikrotik/Cisco/Linux or Windows. However I have worked with some core networks and I read & research & try stuff all of the time. So I am not speaking/posting about stuff I am formerly trained in, I pretty much go with experience and what I have learned on my own. And , If I don’t know something then I read & learn all about it.

So , please don’t hold me/my-postings to be always 100 percent correct. I make mistakes just like everybody else. However – I do my best, learn from my mistakes and always try to help others


Scenario-1:

We are using Mikrotik CCR as PPPOE/NAS. We are using public ip routing setup so each user is assigned public ip via pppoe profile.

Scenario-2:

We are using single Mikrotik CCR as PPPOE/NAS. We have local dsl service therefore NATTING is also done on the same router.


Problem:

When we have network outages like light failure in any particular area , in LOG we see many PPPoE sessions disconnects with ‘peer not responding‘ messages. Exactly at this moments, our NAS CPU usage reaches to almost 100% , which results in router stops passing any kind of traffic. This can continue for a minute or so on.

As showed in the image below …

pppoe high cpu usage

If you are using Masquarade /NAT on the router, that is the problem. When using Masquarade, RouterOS has to do full connection tracking recalculation on EACH interface connect/disconnect.

So if you have lots of PPP session connecting/disconnecting, connection tracking will constantly be recalculated which can cause high CPU usage. When interfaces connect/disconnect, in combination with NAT, it gives you high CPU usage.


Solution OR Possible Workarounds :

First read this

Separating NATTING from ROUTING in Mikrotik

https://aacable.wordpress.com/2018/03/27/separating-natting-from-routing-in-mikrotik/

  • If you have private ip users with natting, Stop using Masquarade on same router that have a lot of dynamic interfaces. Just DO NOT use NAT on any router that have high number of connecting/disconnecting interfaces. Place an additional router connected with your PPPoE NAS, and route NAT there.
    Example: Add another router & perform all natting on that router by sending marked traffic from private ip series to that nat router. Setup routing between the PPPoE NAS and the NAT router.
  • IF all of your clients are on public IP , you can simply Turn Off connection tracking completely. This is the simplest approach.But beware that turning of CT will disable all NATTING / marking traffic as well.
    Note: You can exempt your specific public pool from connection tracking as well.

ct

  • Any device that is CORE device or Gateway on your network, It should be assigned to perform one job only. Try not to mix multiple functions in one device. This will save you from later headache of troubleshooting.

Please read this …

Features affected by connection tracking

  • NAT
  • firewall:
    • connection-bytes
    • connection-mark
    • connection-type
    • connection-state
    • connection-limit
    • connection-rate
    • layer7-protocol
    • p2p
    • new-connection-mark
    • tarpit
  • p2p matching in simple queues

So if you will turn OFF the connection tracking, above features will stop working.


– Code Snippet:

Some working example of excluding your public pool from connection tracking

  • First make sure Connection Tracking is set to AUTO
/ip firewall connection tracking set enabled=auto
  • Then make a address list which should have your users ip pool so that we can use this list as an Object in multiple rules later.
/ip firewall address-list
add address=1.1.1.0/24 list=public_pool
#add address=2.1.1.0/24 list=public_pool
  • Now create rule to turn off connection tracking from our public ip users witht the RAW table
/ip firewall raw
add action=notrack chain=prerouting src-address-list=public_pool
add action=notrack chain=prerouting dst-address-list=public_pool

That’s it!



Some Tips for General Router Management

  • Turn off all non essential services that are not actually being used or needed. Services place an additional CPU load on any system. Example, you can move your DHCP role to cisco switches for better response , also for intervlan routing it is highly recommended., Also if your ROS is acting as DNS as well, then move DNS role to dedicated dns server like BIND etc. This will free up some resources from the core system
  • Use 10-gig network cards instead of 1-gig / Use 1-gig network cards instead of 100 meg
  • Disable STP if it is not needed. Now this is highly debatable part I know 🙂
  • Use Dynamic queues , they are spreader over multi cores

Regard's
Syed Jahanzaib ~

14 Comments »

  1. AOA .
    Brother When i create a rules in row with notrack then service will be shutdown any solution ?

    Like

    Comment by udasschand — February 20, 2018 @ 4:44 AM

  2. Dear Jahanzaib bhai…

    I prefer to use ROS Bugfix version, becoz I saw these type of issue by Current ROS version…

    Liked by 1 person

    Comment by kashifzai86 — February 20, 2018 @ 10:04 AM

  3. Routing & Natting with Failover ! Brothers in Arms

    use version – 6.36.2

    Like

    Comment by KAMAL SK — April 16, 2018 @ 7:27 PM

    • I wouldn’t recommend you to go backward. also the router will not downgrade it self below factory shipped version.
      what will you do if your router comes with 6.4x shipped as factory version? forcing it downgrade via other methods is not recommended.

      Like

      Comment by Syed Jahanzaib / Pinochio~:) — April 25, 2018 @ 8:11 AM

  4. Dear Jahanzaib,

    We are facing a problem in MikroTik (CCR1036-12G-4S) with CPU high utilization stays on 100%. We have 68 simple queues and configured link bonding to increase the throughput. But whenever the traffic reaches up to 1.5 Gbps, the CPU utilization reaches up to 100%. I am really surprised by this because we just have 1.5 Gbps of traffic and it will be more up to 2.5 Gbps soon. Your suggestion is required whether it’s because of link bonding or high numbers of queues. If I see the datasheet of CCR1036-12G-4S it has high throughput.

    Regards,

    Like

    Comment by xpertarm — April 7, 2019 @ 5:40 PM

    • there are several points you need to look into,
      1) if you are using natting; then
      move nating to other router;
      disable connection tracking;
      # problem solved
      fi
      exit 1
      ELSE
      try acquiring ccr1072 (if budget allows) or a X86 box with 10g card & get rid of bonding, it will solve the issue as well.
      CPU speed doesnt matters, it the cache and new generation CPU’s that matters
      ELSEIf
      check your configuration, there must be something incorrectly configured.

      in short- 10g is a better choice

      Liked by 1 person

      Comment by Syed Jahanzaib / Pinochio~:) — April 8, 2019 @ 9:41 AM

      • Thanks a lot for your suggestion. We are not doing nating and we will disable connection tracking. We have just configured queues and link bonding. We have configured bonding on 6 interfaces.

        Like

        Comment by xpertarm — April 8, 2019 @ 1:32 PM

    • Why Link bonding needed?? I have 6 PTCL Gpon of 250Mbps … Did I use bonding for this?? I need reason for bonding use?? I know this is not answer for your query

      Like

      Comment by kashifzai86 — April 9, 2019 @ 12:26 PM

      • What if you have only 1G network support per port, & you still need to pass more then 1 G traffic ? if 10g is not available, then ultimately you have to select bonding route to achieve more bandwidth, although I have seen mikrotik have some cpu issues while doing bonding ,

        Like

        Comment by Syed Jahanzaib / Pinochio~:) — April 10, 2019 @ 9:47 AM

  5. Mr. Zaib

    I need to ask k konse CCR mai pppoe WAN links configure karoun?? Main CCR or Nating CCR… I have 1 Fiber Media link & 4 pppoe GPoN Links??

    Like

    Comment by kashifzai86 — April 10, 2019 @ 9:26 AM

    • you should configure PPPOE on CCR (preferably which has more specs) and natting on other CCR.

      Like

      Comment by Syed Jahanzaib / Pinochio~:) — April 10, 2019 @ 9:45 AM

      • Jhanzaib bhai
        I have both with same specs (i.e. CCR1036)… Jo mujhay samjh aya hai k connection tracking jis CCR mai ON hugi usi CCR k andar Per Conneciton Classifier (both address:0 til 3) k sath 4 WAN Links configure houngay, kio k agar mai Conection Tracking off kardounga tou Load balancing PCC kaam nhi karega…

        IS trah to 4 pppoe WAN links bhi wahin masquared houngay?? to issue to wahin ka wahin raha hai….

        kia aise hi huga??

        Like

        Comment by kashifzai86 — April 10, 2019 @ 9:59 AM

      • – You should turn on connection tracking on natting router where load balancing is configured.
        – ON THE MAIN CCR WHERE PPPOE users ARE CONNECTED, YOU CAN TURN OFF CONNECTION TRACKING on this ccr. Cpu hike occurs when dynamic interface connects/disconnects and CT is enabled. this will not happen on the pppoe server as CT will be disabled.

        Like

        Comment by Syed Jahanzaib / Pinochio~:) — April 12, 2019 @ 10:08 AM

  6. Zaib Asalam Aliqum !!!

    Connection Tracking k baad jab Connection State kaam nhi karta hai tou kia “Input” and “Forward” k accepts rules se connection state remove kardoun??
    phir kaam karega??

    Kashif Khan

    Like

    Comment by kashif khan — February 17, 2020 @ 8:21 PM


RSS feed for comments on this post. TrackBack URI

Leave a comment