Micrel KSZ9031RN PHY problem

Discussion:

Micrel KSZ9031RN PHY problem

Guillermo Rodriguez Garcia

2016-04-18 14:49:47 UTC

Hello all,

I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.

This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.

It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.

After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87

Before this commit eth0 was working fine. After this commit, the link
cannot established anymore:

[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down

Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

Any feedback is welcome.

Best,

Guillermo Rodriguez Garcia
***@gmail.com

Sascha Hauer

2016-04-19 07:11:10 UTC

Hi Guillermo,

+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

Sascha

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Guillermo Rodriguez Garcia

2016-04-20 15:58:40 UTC

Hello,

Post by Sascha Hauer
Hi Guillermo,
+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

I'm trying to get the latest kernel to boot. The latest version
supported by Atmel is 4.1, which doesn't have the patch yet. Will
report back asap.

--
Guillermo Rodriguez Garcia
***@gmail.com

Sascha Hauer

2016-04-21 07:32:06 UTC

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer
Hi Guillermo,
+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

I'm trying to get the latest kernel to boot. The latest version
supported by Atmel is 4.1, which doesn't have the patch yet. Will
report back asap.

Thanks. We can revert this patch since Philipp only ported it to stay in
sync with the kernel. Anyway, if it makes problems we'll probably need a
solution for the kernel aswell.

Sascha

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Guillermo Rodriguez Garcia

2016-04-21 11:04:58 UTC

Hello,

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer
Hi Guillermo,
+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

I'm trying to get the latest kernel to boot. The latest version
supported by Atmel is 4.1, which doesn't have the patch yet. Will
report back asap.

Thanks. We can revert this patch since Philipp only ported it to stay in
sync with the kernel. Anyway, if it makes problems we'll probably need a
solution for the kernel aswell.

Linux kernel 4.3 which incorporates the patch works fine (!)

On the other hand, U-Boot 2016.03, which also incorporates a
center_flp_timing patch
(http://www.spinics.net/lists/u-boot-v2/msg25938.html) also shows the
same problem as Barebox.

This is puzzling. The timings patch in kernel version 4.3 works fine,
but seems to cause problems in both Barebox and U-Boot.

Still investigating.

Guillermo Rodriguez Garcia
***@gmail.com

Guillermo Rodriguez Garcia

2016-04-26 09:55:29 UTC

Hello,

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer
Hi Guillermo,
+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

I'm trying to get the latest kernel to boot. The latest version
supported by Atmel is 4.1, which doesn't have the patch yet. Will
report back asap.

Thanks. We can revert this patch since Philipp only ported it to stay in
sync with the kernel. Anyway, if it makes problems we'll probably need a
solution for the kernel aswell.

More info on this after researching the issue.

Looks like problem is not directly caused by the modified FLP timings,
but is revealed by that change.

Immediately after setting the FLP timings, the autonegotiation is
restarted by calling genphy_restart_aneg(phydev).
Within genphy_restart_aneg, the following line:

oldadv = adv = phy_read(phydev, MII_ADVERTISE);

returns 0xffff, and negotiation fails.

It seems to be a timing problem. Any minimal delay _before_ that
phy_read (1ms) is enough to make it work -- phy_read returns 481
instead of 0xffff, and the negotiation completes. The same delay right
after phy_read does not solve the problem.

This happens even if I modify ksz9031_center_flp_timing() so that it
does NOT touch the FLP timings. So as I said it is not directly
related to the actual FLP timing values, but rather to some timing
issue related to the negotiation itself. It just happens that the
center_flp_timing function does restart the autonegotiation, and thus
reveals the problem.

Is it OK to add this 1ms delay in genphy_restart_aneg?

Best,

Guillermo Rodriguez Garcia
***@gmail.com

Guillermo Rodriguez Garcia

2016-04-26 11:10:31 UTC

2016-04-26 11:55 GMT+02:00 Guillermo Rodriguez Garcia

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer
Hi Guillermo,
+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

I'm trying to get the latest kernel to boot. The latest version
supported by Atmel is 4.1, which doesn't have the patch yet. Will
report back asap.

Thanks. We can revert this patch since Philipp only ported it to stay in
sync with the kernel. Anyway, if it makes problems we'll probably need a
solution for the kernel aswell.

More info on this after researching the issue.
Looks like problem is not directly caused by the modified FLP timings,
but is revealed by that change.
Immediately after setting the FLP timings, the autonegotiation is
restarted by calling genphy_restart_aneg(phydev).

This line is actually in genphy_config_advert, sorry for the noise.

Guillermo

Post by Guillermo Rodriguez Garcia
oldadv = adv = phy_read(phydev, MII_ADVERTISE);
returns 0xffff, and negotiation fails.
It seems to be a timing problem. Any minimal delay _before_ that
phy_read (1ms) is enough to make it work -- phy_read returns 481
instead of 0xffff, and the negotiation completes. The same delay right
after phy_read does not solve the problem.
This happens even if I modify ksz9031_center_flp_timing() so that it
does NOT touch the FLP timings. So as I said it is not directly
related to the actual FLP timing values, but rather to some timing
issue related to the negotiation itself. It just happens that the
center_flp_timing function does restart the autonegotiation, and thus
reveals the problem.
Is it OK to add this 1ms delay in genphy_restart_aneg?
Best,
Guillermo Rodriguez Garcia

Sascha Hauer

2016-04-27 05:59:08 UTC

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer
Hi Guillermo,
+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

I'm trying to get the latest kernel to boot. The latest version
supported by Atmel is 4.1, which doesn't have the patch yet. Will
report back asap.

Thanks. We can revert this patch since Philipp only ported it to stay in
sync with the kernel. Anyway, if it makes problems we'll probably need a
solution for the kernel aswell.

More info on this after researching the issue.
Looks like problem is not directly caused by the modified FLP timings,
but is revealed by that change.
Immediately after setting the FLP timings, the autonegotiation is
restarted by calling genphy_restart_aneg(phydev).
oldadv = adv = phy_read(phydev, MII_ADVERTISE);
returns 0xffff, and negotiation fails.

The code you are referring to seems to be the first phy access after the
network device opens. Maybe the hardware is not yet ready for some
reason. Could you add the same delay in macb_open() right before the
call to phy_device_connect()?

Sascha

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Guillermo Rodriguez Garcia

2016-04-28 09:51:12 UTC

Hello,

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hello,

Post by Sascha Hauer
Hi Guillermo,
+Cc Philipp Zabel who ported the patch to barebox

Post by Guillermo Rodriguez Garcia
Hello all,
I am playing with barebox on an Atmel SAMA5D3 Xplained board. It is
now working fine for the most part, however after updating to the
latest sources from git I found a problem that I had not seen before.
This board has two Ethernet interfaces; eth0 uses a Micrel KSZ9031RN
PHY, and eth1 uses a Micrel KSZ8081RNB PHY.
It seems that after release 2016.03.0, eth0 does not work anymore with
some routers. Specifically I found this problem with a Comtrend
VG-8050. I have tested other routers and the problem was not present
there.
After some research it seems that the problem is caused by this
commit: http://git.pengutronix.de/?p=barebox.git;a=commit;h=da89ee8f2e04e116410632a185024f58b8262d87
Before this commit eth0 was working fine. After this commit, the link
[...]
barebox:/# ping 192.168.0.128
ping failed: Network is down
Before diving further into this I thought it could be a good idea to
ask, in case someone can shed some light here.

I have no idea what the issue is here. We might have to make this option
configurable via devicetree to give boards different options. Since the
patch comes from the Linux Kernel, do you have the same problems under
Linux?

I'm trying to get the latest kernel to boot. The latest version
supported by Atmel is 4.1, which doesn't have the patch yet. Will
report back asap.

Thanks. We can revert this patch since Philipp only ported it to stay in
sync with the kernel. Anyway, if it makes problems we'll probably need a
solution for the kernel aswell.

More info on this after researching the issue.
Looks like problem is not directly caused by the modified FLP timings,
but is revealed by that change.
Immediately after setting the FLP timings, the autonegotiation is
restarted by calling genphy_restart_aneg(phydev).
oldadv = adv = phy_read(phydev, MII_ADVERTISE);
returns 0xffff, and negotiation fails.

The code you are referring to seems to be the first phy access after the
network device opens. Maybe the hardware is not yet ready for some
reason. Could you add the same delay in macb_open() right before the
call to phy_device_connect()?

Tried this, but doesn't help. Here's the call sequence, for example
when you run 'dhcp' from the command line prompt:

macb_open
phy_device_connect
phy_device_attach
phy_init_hw
ksz9031_config_init
ksz9031_center_flp_timing
genphy_restart_aneg
genphy_config_advert -> fails (adv = 0xffff)

I can verify that:
- Any delay happening before the call to genphy_restart_aneg (which
itself is called at the end of ksz9031_center_flp_timing) does not fix
the problem.
- Any delay happening after the phy_read(phydev, MII_ADVERTISE) insde
genphy_config_advert does not fix the problem.
- A delay of 1ms at any point after genphy_restart_aneg and before the
phy_read call in genphy_config_advert fixes the problem

Looks like the phy needs a bit of time after restarting autonegotiation..

Guillermo Rodriguez Garcia
***@gmail.com

Trent Piepho

2016-04-28 21:09:43 UTC

Post by Guillermo Rodriguez Garcia

Post by Sascha Hauer
The code you are referring to seems to be the first phy access after the
network device opens. Maybe the hardware is not yet ready for some
reason. Could you add the same delay in macb_open() right before the
call to phy_device_connect()?

Tried this, but doesn't help. Here's the call sequence, for example
macb_open
phy_device_connect
phy_device_attach
phy_init_hw
ksz9031_config_init
ksz9031_center_flp_timing
genphy_restart_aneg
genphy_config_advert -> fails (adv = 0xffff)
- Any delay happening before the call to genphy_restart_aneg (which
itself is called at the end of ksz9031_center_flp_timing) does not fix
the problem.
- Any delay happening after the phy_read(phydev, MII_ADVERTISE) insde
genphy_config_advert does not fix the problem.
- A delay of 1ms at any point after genphy_restart_aneg and before the
phy_read call in genphy_config_advert fixes the problem

So the same problem appears if you take out the FLP call out entirely,
if there is an aneg restart done immediately before trying to read from
MII_ADVERTISE?

Does only MII_ADVERTISE fail to read, or does reading any PHY register
fail? Specifically, does reading MII_BMSR fail?

Reading PHY register after restarting aneg isn't unusual. Especially
polling MII_BMSR to check for BMSR_ANEGCOMPLETE. It would be odd if one
couldn't do that.

The first anreg start call will also un-powerdown the PHY if BMCR_PDOWN
was set. I wonder if that is happening?

Guillermo Rodriguez Garcia

2016-04-29 11:00:52 UTC

Hello,

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia

Post by Sascha Hauer
The code you are referring to seems to be the first phy access after the
network device opens. Maybe the hardware is not yet ready for some
reason. Could you add the same delay in macb_open() right before the
call to phy_device_connect()?

Tried this, but doesn't help. Here's the call sequence, for example
macb_open
phy_device_connect
phy_device_attach
phy_init_hw
ksz9031_config_init
ksz9031_center_flp_timing
genphy_restart_aneg
genphy_config_advert -> fails (adv = 0xffff)
- Any delay happening before the call to genphy_restart_aneg (which
itself is called at the end of ksz9031_center_flp_timing) does not fix
the problem.
- Any delay happening after the phy_read(phydev, MII_ADVERTISE) insde
genphy_config_advert does not fix the problem.
- A delay of 1ms at any point after genphy_restart_aneg and before the
phy_read call in genphy_config_advert fixes the problem

So the same problem appears if you take out the FLP call out entirely,
if there is an aneg restart done immediately before trying to read from
MII_ADVERTISE?

Yes, this is correct. Even if I comment out the FLP timing writes, the
fact that ksz9031_center_flp_timing calls genphy_restart_aneg is
enough to trigger the problem.

static int ksz9031_center_flp_timing(struct phy_device *phydev)
{
/* Center KSZ9031RNX FLP timing at 16ms. */
// phy_write_mmd_indirect(phydev, MII_KSZ9031RN_FLP_BURST_TX_HI, 0, 0x0006);
// phy_write_mmd_indirect(phydev, MII_KSZ9031RN_FLP_BURST_TX_LO, 0, 0x1a80);

return genphy_restart_aneg(phydev);
}

Post by Trent Piepho
Does only MII_ADVERTISE fail to read, or does reading any PHY register
fail? Specifically, does reading MII_BMSR fail?

Looks like it will happen with any register. If I try to read MII_BMSR
right before MII_ADVERTISE, then MII_BMSR reads as 0xffff.

(In this case, the negotiation will actually complete, since this
additional MII_BMSR read seems to be enough to cause the next
MII_ADVERTISE read to succeed).

Post by Trent Piepho
Reading PHY register after restarting aneg isn't unusual. Especially
polling MII_BMSR to check for BMSR_ANEGCOMPLETE. It would be odd if one
couldn't do that.
The first anreg start call will also un-powerdown the PHY if BMCR_PDOWN
was set. I wonder if that is happening?

That was a very good hint and it looks like this is exactly what is happening.

genphy_restart_aneg() clears the BMCR_PDOWN bit and would get the phy
out of powerdown mode. I have added a trace right at the beginning of
genphy_restart_aneg and verified that BMCR_PDOWN bit was set before
genphy_restart_aneg clears it.

Then, the datasheet for the ksz9031 [1], page 44, says:

After this bit is changed from '1' to '0', an internal global reset is
automatically generated. Wait a minimum of 1ms before read/write
access to the PHY registers.

[1]: http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ9031RNX.pdf

So this seems to be what is causing the problem. At least on the
ksz9031 (don't know about others), a delay of 1ms is required when
coming out of powerdown mode.

What is the best way to fix this? We can add a 1ms delay in
genphy_restart_aneg (this is probably the easiest, and the delay is
small enough that it shouldn't make a difference for other phys that
might not need it). Or if this is not acceptable, perhaps add a custom
restart_aneg function for the ksz9031.

Best,

Guillermo Rodriguez Garcia
***@gmail.com

Trent Piepho

2016-04-29 18:18:44 UTC

Post by Guillermo Rodriguez Garcia

Post by Trent Piepho
The first anreg start call will also un-powerdown the PHY if BMCR_PDOWN
was set. I wonder if that is happening?

That was a very good hint and it looks like this is exactly what is happening.
genphy_restart_aneg() clears the BMCR_PDOWN bit and would get the phy
out of powerdown mode. I have added a trace right at the beginning of
genphy_restart_aneg and verified that BMCR_PDOWN bit was set before
genphy_restart_aneg clears it.
After this bit is changed from '1' to '0', an internal global reset is
automatically generated. Wait a minimum of 1ms before read/write
access to the PHY registers.

Mystery solved!

Post by Guillermo Rodriguez Garcia
So this seems to be what is causing the problem. At least on the
ksz9031 (don't know about others), a delay of 1ms is required when
coming out of powerdown mode.

The kernel will take the phy in/out of powerdown mode as part of the PM
suspend/resume calls, which is supported on all micrel phys since 2013.
I don't see a delay in the kernel code and wonder why this hasn't been a
problem? Might be worth asking on net-dev if this is a known issue with
some phys and how it is solved? Maybe it's an undiscovered cause of
network flakiness after a resume.

Post by Guillermo Rodriguez Garcia
What is the best way to fix this? We can add a 1ms delay in
genphy_restart_aneg (this is probably the easiest, and the delay is
small enough that it shouldn't make a difference for other phys that
might not need it). Or if this is not acceptable, perhaps add a custom
restart_aneg function for the ksz9031.

Could add a custom init function that un-powerdowns the phy and does the
wait.

Or have restart_aneg check if the powerdown bit was set before it clears
it, and only delay in that case.

Having the un-powerdown in the restart_aneg isn't really the right place
for it. If there is no reason the restart aneg, then the phy will not
be powered up.

Guillermo Rodriguez Garcia

2016-05-03 14:40:02 UTC

Hello,

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia

Post by Trent Piepho
The first anreg start call will also un-powerdown the PHY if BMCR_PDOWN
was set. I wonder if that is happening?

That was a very good hint and it looks like this is exactly what is happening.
genphy_restart_aneg() clears the BMCR_PDOWN bit and would get the phy
out of powerdown mode. I have added a trace right at the beginning of
genphy_restart_aneg and verified that BMCR_PDOWN bit was set before
genphy_restart_aneg clears it.
After this bit is changed from '1' to '0', an internal global reset is
automatically generated. Wait a minimum of 1ms before read/write
access to the PHY registers.

Mystery solved!

Indeed. Although it's strange that the problem can only be reproduced
with certain routers. I can reproduce it everytime when the board is
connected with a ComTrend VG-8050, but not with other routers..

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia
So this seems to be what is causing the problem. At least on the
ksz9031 (don't know about others), a delay of 1ms is required when
coming out of powerdown mode.

The kernel will take the phy in/out of powerdown mode as part of the PM
suspend/resume calls, which is supported on all micrel phys since 2013.
I don't see a delay in the kernel code and wonder why this hasn't been a
problem?

Perhaps this is due to the fact that it does not happen with every router.

Post by Trent Piepho
Might be worth asking on net-dev if this is a known issue with
some phys and how it is solved? Maybe it's an undiscovered cause of
network flakiness after a resume.

Would you be willing to help here? (i.e. report/ask about this on net-dev)

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia
What is the best way to fix this? We can add a 1ms delay in
genphy_restart_aneg (this is probably the easiest, and the delay is
small enough that it shouldn't make a difference for other phys that
might not need it). Or if this is not acceptable, perhaps add a custom
restart_aneg function for the ksz9031.

Could add a custom init function that un-powerdowns the phy and does the
wait.
Or have restart_aneg check if the powerdown bit was set before it clears
it, and only delay in that case.

This would be easy, would solve the problem at hand, and would only
introduce a (perhaps unnecessary) 1ms delay for phys that don't need
this.

Post by Trent Piepho
Having the un-powerdown in the restart_aneg isn't really the right place
for it. If there is no reason the restart aneg, then the phy will not
be powered up.

Yes but I would say that that's a different issue. I must say I don't feel
confident enough to move this code to somewhere else myself. Perhaps
Sascha (as the original author of this change [1]) could comment.

I would suggest so separate these two issues: 1) Adding the missing
1ms delay as described in the Micrel datasheet, 2) Consider whether
the code should be refactored / reorganized.

Does this make sense?

[1] http://git.pengutronix.de/?p=barebox.git;a=commit;h=ac48b10467ffb1c45e67e5efe36413ede140a498

--
Guillermo Rodriguez Garcia
***@gmail.com

Sascha Hauer

2016-05-04 07:43:31 UTC

Post by Guillermo Rodriguez Garcia
Hello,

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia

Post by Trent Piepho
The first anreg start call will also un-powerdown the PHY if BMCR_PDOWN
was set. I wonder if that is happening?

That was a very good hint and it looks like this is exactly what is happening.
genphy_restart_aneg() clears the BMCR_PDOWN bit and would get the phy
out of powerdown mode. I have added a trace right at the beginning of
genphy_restart_aneg and verified that BMCR_PDOWN bit was set before
genphy_restart_aneg clears it.
After this bit is changed from '1' to '0', an internal global reset is
automatically generated. Wait a minimum of 1ms before read/write
access to the PHY registers.

Mystery solved!

Indeed. Although it's strange that the problem can only be reproduced
with certain routers. I can reproduce it everytime when the board is
connected with a ComTrend VG-8050, but not with other routers..

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia
So this seems to be what is causing the problem. At least on the
ksz9031 (don't know about others), a delay of 1ms is required when
coming out of powerdown mode.

The kernel will take the phy in/out of powerdown mode as part of the PM
suspend/resume calls, which is supported on all micrel phys since 2013.
I don't see a delay in the kernel code and wonder why this hasn't been a
problem?

Perhaps this is due to the fact that it does not happen with every router.

Post by Trent Piepho
Might be worth asking on net-dev if this is a known issue with
some phys and how it is solved? Maybe it's an undiscovered cause of
network flakiness after a resume.

Would you be willing to help here? (i.e. report/ask about this on net-dev)

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia
What is the best way to fix this? We can add a 1ms delay in
genphy_restart_aneg (this is probably the easiest, and the delay is
small enough that it shouldn't make a difference for other phys that
might not need it). Or if this is not acceptable, perhaps add a custom
restart_aneg function for the ksz9031.

Could add a custom init function that un-powerdowns the phy and does the
wait.
Or have restart_aneg check if the powerdown bit was set before it clears
it, and only delay in that case.

This would be easy, would solve the problem at hand, and would only
introduce a (perhaps unnecessary) 1ms delay for phys that don't need
this.

Post by Trent Piepho
Having the un-powerdown in the restart_aneg isn't really the right place
for it. If there is no reason the restart aneg, then the phy will not
be powered up.

Yes but I would say that that's a different issue. I must say I don't feel
confident enough to move this code to somewhere else myself. Perhaps
Sascha (as the original author of this change [1]) could comment.
I would suggest so separate these two issues: 1) Adding the missing
1ms delay as described in the Micrel datasheet, 2) Consider whether
the code should be refactored / reorganized.
Does this make sense?

Yes, makes sense.

Currently I don't know a better place for clearing the BMCR_PDOWN bit.
genphy_config_init would be a candidate, but it's not called for phys
which have a custom .config_init hook.
If I'm lucky I can find the ethernet adapter which motivated me to
create ac48b10467ffb, it would be interesting to see which phy type the
adapter has.

Sascha

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Guillermo Rodriguez Garcia

2016-05-04 10:39:49 UTC

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hello,

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia

Post by Trent Piepho
The first anreg start call will also un-powerdown the PHY if BMCR_PDOWN
was set. I wonder if that is happening?

That was a very good hint and it looks like this is exactly what is happening.
genphy_restart_aneg() clears the BMCR_PDOWN bit and would get the phy
out of powerdown mode. I have added a trace right at the beginning of
genphy_restart_aneg and verified that BMCR_PDOWN bit was set before
genphy_restart_aneg clears it.
After this bit is changed from '1' to '0', an internal global reset is
automatically generated. Wait a minimum of 1ms before read/write
access to the PHY registers.

Mystery solved!

Indeed. Although it's strange that the problem can only be reproduced
with certain routers. I can reproduce it everytime when the board is
connected with a ComTrend VG-8050, but not with other routers..

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia
So this seems to be what is causing the problem. At least on the
ksz9031 (don't know about others), a delay of 1ms is required when
coming out of powerdown mode.

The kernel will take the phy in/out of powerdown mode as part of the PM
suspend/resume calls, which is supported on all micrel phys since 2013.
I don't see a delay in the kernel code and wonder why this hasn't been a
problem?

Perhaps this is due to the fact that it does not happen with every router.

Post by Trent Piepho
Might be worth asking on net-dev if this is a known issue with
some phys and how it is solved? Maybe it's an undiscovered cause of
network flakiness after a resume.

Would you be willing to help here? (i.e. report/ask about this on net-dev)

Post by Trent Piepho

Post by Guillermo Rodriguez Garcia
What is the best way to fix this? We can add a 1ms delay in
genphy_restart_aneg (this is probably the easiest, and the delay is
small enough that it shouldn't make a difference for other phys that
might not need it). Or if this is not acceptable, perhaps add a custom
restart_aneg function for the ksz9031.

Could add a custom init function that un-powerdowns the phy and does the
wait.
Or have restart_aneg check if the powerdown bit was set before it clears
it, and only delay in that case.

This would be easy, would solve the problem at hand, and would only
introduce a (perhaps unnecessary) 1ms delay for phys that don't need
this.

Post by Trent Piepho
Having the un-powerdown in the restart_aneg isn't really the right place
for it. If there is no reason the restart aneg, then the phy will not
be powered up.

Yes but I would say that that's a different issue. I must say I don't feel
confident enough to move this code to somewhere else myself. Perhaps
Sascha (as the original author of this change [1]) could comment.
I would suggest so separate these two issues: 1) Adding the missing
1ms delay as described in the Micrel datasheet, 2) Consider whether
the code should be refactored / reorganized.
Does this make sense?

Yes, makes sense.
Currently I don't know a better place for clearing the BMCR_PDOWN bit.
genphy_config_init would be a candidate, but it's not called for phys
which have a custom .config_init hook.
If I'm lucky I can find the ethernet adapter which motivated me to
create ac48b10467ffb, it would be interesting to see which phy type the
adapter has.

For what it's worth, the Micrel ksz9031 in my SAMA5D3 Xplained board
also seems to come up with the BMCR_PDOWN bit set (or at least, the
bit is set when genphy_restart_aneg() is called -- I am assuming that
Barebox is not setting this bit itself during init)

Guillermo Rodriguez Garcia
***@gmail.com

Guillermo Rodriguez Garcia

2016-06-13 17:29:15 UTC

From: grodriguez <***@gmail.com>

Commit da89ee8f2e04 ("Center FLP timing at 16ms") breaks
genphy_restart_aneg() for Micrel's ksz9031. According to the
datasheet, the ksz9031 requires a wait of 1ms after clearing
the PDOWN bit and before read/write access to any PHY registers.
---
drivers/net/phy/phy.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 73176fb..ed69d9b 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -561,7 +561,7 @@ int phy_wait_aneg_done(struct phy_device *phydev)
*/
int genphy_restart_aneg(struct phy_device *phydev)
{
- int ctl;
+ int ctl, pdown;

ctl = phy_read(phydev, MII_BMCR);

@@ -574,6 +574,7 @@ int genphy_restart_aneg(struct phy_device *phydev)
ctl &= ~(BMCR_ISOLATE);

/* Clear powerdown bit which eventually is set on some phys */
+ pdown = ctl & BMCR_PDOWN;
ctl &= ~BMCR_PDOWN;

ctl = phy_write(phydev, MII_BMCR, ctl);
@@ -581,6 +582,12 @@ int genphy_restart_aneg(struct phy_device *phydev)
if (ctl < 0)
return ctl;

+ /* Micrel's ksz9031 (and perhaps others?): Changing the PDOWN bit
+ * from '1' to '0' generates an internal reset. Must wait a minimum
+ * of 1ms before read/write access to the PHY registers. */
+ if (pdown)
+ mdelay(1);
+
return 0;
}

--
1.7.9.5

Sascha Hauer

2016-06-14 06:39:19 UTC

Hi Guillermo,

Post by Guillermo Rodriguez Garcia
Commit da89ee8f2e04 ("Center FLP timing at 16ms") breaks
genphy_restart_aneg() for Micrel's ksz9031. According to the
datasheet, the ksz9031 requires a wait of 1ms after clearing
the PDOWN bit and before read/write access to any PHY registers.
---
drivers/net/phy/phy.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

I must say that I am not overly happy with this patch as it leaks in phy
specific stuff into a somewhat generic function. Anyway, I see the need
for this patch and so I applied it.

Sascha

Post by Guillermo Rodriguez Garcia
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 73176fb..ed69d9b 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -561,7 +561,7 @@ int phy_wait_aneg_done(struct phy_device *phydev)
*/
int genphy_restart_aneg(struct phy_device *phydev)
{
- int ctl;
+ int ctl, pdown;
ctl = phy_read(phydev, MII_BMCR);
@@ -574,6 +574,7 @@ int genphy_restart_aneg(struct phy_device *phydev)
ctl &= ~(BMCR_ISOLATE);
/* Clear powerdown bit which eventually is set on some phys */
+ pdown = ctl & BMCR_PDOWN;
ctl &= ~BMCR_PDOWN;
ctl = phy_write(phydev, MII_BMCR, ctl);
@@ -581,6 +582,12 @@ int genphy_restart_aneg(struct phy_device *phydev)
if (ctl < 0)
return ctl;
+ /* Micrel's ksz9031 (and perhaps others?): Changing the PDOWN bit
+ * from '1' to '0' generates an internal reset. Must wait a minimum
+ * of 1ms before read/write access to the PHY registers. */
+ if (pdown)
+ mdelay(1);
+
return 0;
}
--
1.7.9.5

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Guillermo Rodriguez Garcia

2016-06-14 07:39:23 UTC

Hi Sascha,

Post by Sascha Hauer
Hi Guillermo,

Post by Guillermo Rodriguez Garcia
Commit da89ee8f2e04 ("Center FLP timing at 16ms") breaks
genphy_restart_aneg() for Micrel's ksz9031. According to the
datasheet, the ksz9031 requires a wait of 1ms after clearing
the PDOWN bit and before read/write access to any PHY registers.
---
drivers/net/phy/phy.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

I must say that I am not overly happy with this patch as it leaks in phy
specific stuff into a somewhat generic function. Anyway, I see the need
for this patch and so I applied it.

Yes, I understand perfectly and I feel the same.
The alternative was to create a custom genphy_restart_aneg for micrel
PHYs only add the delay there, however I am not sure it is worth the
trouble just for a 1ms delay which will be virtually unnoticeable
anyway.

Also I am not completely sure that this only applies to Micrel. See
this for example in the SMSC911x driver from the Linux kernel:
http://lxr.free-electrons.com/source/drivers/net/ethernet/smsc/smsc911x.c#L1364

Anyway if you would prefer this to be moved to micrel.c I would be
happy to prepare a patch for that; just let me know.

Thank you,

Guillermo Rodriguez Garcia
***@gmail.com

Sascha Hauer

2016-06-15 05:49:10 UTC

Post by Guillermo Rodriguez Garcia
Hi Sascha,

Post by Sascha Hauer
Hi Guillermo,

Post by Guillermo Rodriguez Garcia
Commit da89ee8f2e04 ("Center FLP timing at 16ms") breaks
genphy_restart_aneg() for Micrel's ksz9031. According to the
datasheet, the ksz9031 requires a wait of 1ms after clearing
the PDOWN bit and before read/write access to any PHY registers.
---
drivers/net/phy/phy.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

I must say that I am not overly happy with this patch as it leaks in phy
specific stuff into a somewhat generic function. Anyway, I see the need
for this patch and so I applied it.

Yes, I understand perfectly and I feel the same.
The alternative was to create a custom genphy_restart_aneg for micrel
PHYs only add the delay there, however I am not sure it is worth the
trouble just for a 1ms delay which will be virtually unnoticeable
anyway.

There's also the case when Micrel phy support is disabled. In this case
a Micrel phy is then handled by the generic phy, so we still need the
delay in the generic code, or we would have to remove the BMCR_PDOWN
setting from the generic code also.

Post by Guillermo Rodriguez Garcia
Also I am not completely sure that this only applies to Micrel. See
http://lxr.free-electrons.com/source/drivers/net/ethernet/smsc/smsc911x.c#L1364
Anyway if you would prefer this to be moved to micrel.c I would be
happy to prepare a patch for that; just let me know.

Your patch is fine for now. Let's see what unforseen problems it
generates before changing it again ;)

Sascha

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Guillermo Rodriguez Garcia

2016-06-15 07:44:13 UTC

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Hi Sascha,

Post by Sascha Hauer
Hi Guillermo,

Post by Guillermo Rodriguez Garcia
Commit da89ee8f2e04 ("Center FLP timing at 16ms") breaks
genphy_restart_aneg() for Micrel's ksz9031. According to the
datasheet, the ksz9031 requires a wait of 1ms after clearing
the PDOWN bit and before read/write access to any PHY registers.
---
drivers/net/phy/phy.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

I must say that I am not overly happy with this patch as it leaks in phy
specific stuff into a somewhat generic function. Anyway, I see the need
for this patch and so I applied it.

Yes, I understand perfectly and I feel the same.
The alternative was to create a custom genphy_restart_aneg for micrel
PHYs only add the delay there, however I am not sure it is worth the
trouble just for a 1ms delay which will be virtually unnoticeable
anyway.

There's also the case when Micrel phy support is disabled. In this case
a Micrel phy is then handled by the generic phy, so we still need the
delay in the generic code, or we would have to remove the BMCR_PDOWN
setting from the generic code also.

Ah -- I didn't know this was possible.

Post by Sascha Hauer

Post by Guillermo Rodriguez Garcia
Also I am not completely sure that this only applies to Micrel. See
http://lxr.free-electrons.com/source/drivers/net/ethernet/smsc/smsc911x.c#L1364
Anyway if you would prefer this to be moved to micrel.c I would be
happy to prepare a patch for that; just let me know.

Your patch is fine for now. Let's see what unforseen problems it
generates before changing it again ;)

Sounds like a plan :)

Best,

Guillermo Rodriguez Garcia
***@gmail.com

19 Replies
101 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Guillermo Rodriguez Garcia 2016-04-18 14:49:47 UTC

Sascha Hauer 2016-04-19 07:11:10 UTC

Guillermo Rodriguez Garcia 2016-04-20 15:58:40 UTC

Sascha Hauer 2016-04-21 07:32:06 UTC

Guillermo Rodriguez Garcia 2016-04-21 11:04:58 UTC

Guillermo Rodriguez Garcia 2016-04-26 09:55:29 UTC

Guillermo Rodriguez Garcia 2016-04-26 11:10:31 UTC

Sascha Hauer 2016-04-27 05:59:08 UTC

Guillermo Rodriguez Garcia 2016-04-28 09:51:12 UTC

Trent Piepho 2016-04-28 21:09:43 UTC

Guillermo Rodriguez Garcia 2016-04-29 11:00:52 UTC

Trent Piepho 2016-04-29 18:18:44 UTC

Guillermo Rodriguez Garcia 2016-05-03 14:40:02 UTC

Sascha Hauer 2016-05-04 07:43:31 UTC

Guillermo Rodriguez Garcia 2016-05-04 10:39:49 UTC

Guillermo Rodriguez Garcia 2016-06-13 17:29:15 UTC

Sascha Hauer 2016-06-14 06:39:19 UTC

Guillermo Rodriguez Garcia 2016-06-14 07:39:23 UTC

Sascha Hauer 2016-06-15 05:49:10 UTC

Guillermo Rodriguez Garcia 2016-06-15 07:44:13 UTC

about - legalese

Loading...