Syed Jahanzaib Personal Blog to Share Knowledge !

January 21, 2021

Possibilities: Mikrotik PPP Disconnection/Yellow Sign Problems

Filed under: Mikrotik Related — Tags: , , , — Syed Jahanzaib / Pinochio~:) @ 9:58 AM


Disclaimer! This is important!

Every Network is different , so one solution cannot fit/applied to all. Therefore try to understand logics & create/modify the solution as per your network scenario. Do Not follow copy paste blindly.

My humble request is that kindly donot consider me as an expert on this stuff, I am NOT certified in anything Mikrotik/Cisco/Linux or Windows. However I have worked with some core networks and I read , research & try stuff all of the time. So I am not speaking/posting about stuff I am formerly trained in, I pretty much go with experience and what I have learned on my own. And , If I don’t know something then I read & learn all about it.

So , please don’t hold me/my-postings to be always 100 percent correct. I make mistakes just like everybody else. However – I do my best, learn from my mistakes and try to share tips that worked for me.

Tips posted here are based on personal experiences which I faced/sorted at various networks locally/internationally. It is requested to kindly contribute your valuable experience & any tips to help others.
Sharing is Caring …

Regard’s
Syed Jahanzaib~


PPP Common Problems

From some time we were getting following complains from few ISP’s regarding

  • User pppoe dial stuck , not able to reach to mikrotik pppoe server
  • User pppoe connectivity frequent/intermittent disconnection/termination
  • User pppoe dialer is connected but yellow mark at user device/workstation , No internet

 

Try to diagnose the issue one by one by below tips

  1. Mikrotik RouterOS Firmwares plays very important roles in the stability of various segments, Try STABLE or LONG-TERM release. Sometimes upgrading/downgrading sorts issues without modifying any configuration. Read Mikrotik Forums to see if other users are having similar issues on particular version.
  2. Cheap wifi routers at client end example TPLINK/TENDA are headache to manage. Most of the older models have BUGS from security & stability issues. Always make sure that you dont use buggy routers brands, Always upgrade the Firmwares to latest. This sorts much of issues. in WiFi Routers, try to make DNS static, primary to your local DNS , and secondary to google.
  3. Pay attention to mikrotik CPU, if you have high number of users on single Tik, OR if you have CONNTRACK/NATTING enabled, then disconnection of pppoe users can cause CPU spikes resulting in Tik freezing for a minute or it can cause other users disconnection dueto cpu not responding timely, resulting in looping as well. Use separate router for natting. If you have high number of PPP users along with some NATTING rules, Stop using Masquarade on same router that have a lot of dynamic interfaces. DO NOT use NAT on any router that have high number of connecting/disconnecting interfaces , like pppoe/vpn. Place an additional router connected with your PPPoE NAS, and route NAT traffic there. Make sure to disable CONNECTION TRACKING on PPPoE NAS router. As a rule of thumb, to divide load (& as a failover) , if you are using ccr1036 , add another ccr1036 after every 1200-1500 users.
  4. Adding your local DNS & assign it to user profile as a primary DNS sorted the yellow sign problems in some users WiFi Routers.
  5. PPP is sensitive to high delays and network timeouts, Make sure you dont have layer 2 level broadcast/delays
  6. If you Cisco switch with VLANs , set STP/RSTP to none on switch TRUNK  [*** This sorted the ppp disconnection at few networks]
  7. If you have Cisco switches with VLANs, Do Not allow all VLANS on TRUNK ports, Allow only limited/designated vlans on TRUNK port [*** This sorted dialup stuck / yellow signs issues at few networks]
  8. Changing the MTU [sometimes it sorts websites & few apps related issues , examples whatsAPP , Telegram, etc]
  9. Try to disable Encryption/Compression on the profile of the pppoe
  10. Choosing only (pap) for pppoe server [This sorts some old freeradius related issues]
  11. Disable RSTP on all ports/VLANS [Test with caution, for temporary basis only just to confirm if its related issue]
  12. Disable LOOP protection in mikrotik ports settings [Test with caution, for temporary basis only]
  13. Do Not disable ICMP Some user end routers checks for icmp reachability to detect internet access. It’s quite worse when there are operators that think that ICMP is dangerous and it has to be blocked. Make sure you are not blocking all ICMP traffic, just fine tune it to allow at least certain type of icmp packets, however, when someone further upstream does that, you will have problems
  14. Do Not disable NTP protocol, [it is being used by many devices like android devices like android TV’s, Gaming devices etc]

Part 3/4 Annexure Example: [Test it with caution or preferably in LAB tests]

no spanning-tree vlan 1-1014
interface GigabitEthernet2/0/1
description Trunk-LAN-2-Mikrotik
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2-16,99
switchport mode trunk

Personnel Opinion!

Well TBH, Mikrotik is a cheap/affordable solution & overall Mikrotik is excellent for core routing too BUT its not made for large scale ppp NATTING. Mikrotik is not an enterprise grade solution with reference to pppoe concentrator. It have it’s architecture’s limitations. As a rule of thumb/In general , We suggest that after crossing 1200-1400 ppp users (& max 2Gb of traffic), just add another mikrotik (ccr1036 or likewise) & so on. I knows few ISp’s locally who are using mikrotik who have used Mikrotik routers just start up their journey in the SP business but later they moves to more mature products like cisco/juniper/vBNG. One ISP in particular using 10-12 Mikrotiks to cater 15k users load (in routing mode only, no natting). With natting situation gets worse when ppp users disconnects in large quantity resulting in CPU hiking/freezing creating nightmares for admins)

If you have thousands of users , then you are in serious business, go with *Huawei/Juniper/Cisco* (which are much mature but comparatively costly products ) & as an alternate, you may look for *VBNG* which have pay as per you go modules.

Syed Jahanzaib

January 19, 2021

January 11, 2021

Cisco 10G Switch & Lenovo SFP Module Compatibility issue

Filed under: Cisco Related — Tags: , , , , — Syed Jahanzaib / Pinochio~:) @ 11:46 AM

Recently we acquired cisco 10g SFP+ switch to be added in existing stack. While trying to connect Lenovo ThinkSystem SR650 (P.No: 7X06CTO1WW ) server along with lenovo provide SFP+ modules (P.No 46C3447) with 10g Cisco switch (WS-C3850-24XS-S) via MM Fiber cable. Upon SFP+ module insertion, at both end (server to switch) then the port gets shuts with err-disabled with following error on switch logs

010834: Jan 4 09:43:44: %GBIC_SECURITY_CRYPT-4-VN_DATA_CRC_ERROR: GBIC in port Te1/0/7 has bad crc
010836: Jan 4 09:43:44: %PM-4-ERR_DISABLE: gbic-invalid error detected on Te1/0/7, putting Te1/0/7 in err-disable state

& on vmware esxi  , it showed *DISCONNECTED*

Following were technical details:

SEVER END:

  • ThinkSystem SR650 (P.No: 7X06CTO1WW )
  • 10g NIC: Emulex VFA5.2 2×10 GbE SFP+ PCIe Adapter (P.No: AT7S )
  • 10g SFP+ Module: Lenovo SFP 10gbase-sr Fiber Optic Transceiver Module (P.No 46C3447 / )

SWITCH END:

  • SWITCH MODEL : Cisco 10g SFP+ switch (P.No: WS-C3850-24XS-S )
  • Cisco Switch 10GBASE Fiber Optic SFP 10G Transceiver Module: Cisco SFP-10G-SR * Part No: 10-2415-03)
  • Vivanco Optical Fiber Patch Cord: LC-LC MM DUPLEX OM3 10M

Solution:

After searching here & there, I found that we have to disable SFP compatibility check in the switch using below CMD’s

Add these two commands (blue highlighted) to the switch:

Switch(config)# service unsupported-transceiver
— you will get a warning message here—
Switch(config)# no errdisable detect cause gbic-invalid

Afterwards , shut/no shut the switch interface then plugged in the Lenovo cable back in.  & the connectivity got OK. (make sure WRITE the config on switch so that it stays permanent.

Note: Any time non-Cisco optics are going to be plugged in to a Cisco switch it’s worth adding these commands.


Regard’s
Syed Jahanzaib

Vmware VCenter inaccessible Datastore

Filed under: VMware Related — Tags: , , — Syed Jahanzaib / Pinochio~:) @ 11:03 AM

Recently one of our VCenter 6.7 got crashed & services were not accessible, I spent hours but couldn’t restore it. To save time in further troubleshooting we removed the VC from the esxi server . Few Esxi servers were managed by this Vcenter. I logged in to each ESXI  server & in Actions I selected “Disconnect it from Vcenter“. Afterwards when new Vcenter (VCSA-7) got installed , all esxi were added successfully , But one of the ESXI server was showing some errors, therefore I removed it from the Vcenter, & when I tried to add it again to VC , following error appeared

Datastore ‘M5-11.10–8TB-raid10’ conflicts with an existing datastore in the datacenter that has the same URL (ds:///vmfs/volumes/5d810e33-e56c55bf-71be-0894ef440178/), but is backed by different physical storage.

At VCenter I was seeing below

When I right clicked on this data store, DELETE/MOUNT/UNMOUNT option were greyed out as well. How can I remove this inaccessible data store?

From the Vcenter, I Browsed that inaccessible datastore  , & clicked on VMs tab, it was showing one VM which was moved to another esxi host in the past, I edited that old VM on that esxi server , and removed the mounted ISO (which was pointing to the affected vm esxi server)  , afterwards, the inaccessible datastore disappeared automagically, and the Esxi got re-added in VC again smoothly.


Sharing is caring !

Regard’s
Syed Jahanzaib

%d bloggers like this: